Skip to content

Milestones

List view

  • Make built-in support more complete and extend it (next to current support of Oracle, CSV files and raw files) to: - Oracle Exadata (check: already supported?) - HIVE and/or Impala - Widespread DBMS like MySQL et al. --> look at https://github.com/kennethreitz/records ("Database support includes RedShift, Postgres, MySQL, SQLite, Oracle, and MS-SQL (drivers not included).") - (NoSQL is out of scope for now, focusing on dataframe-compatible data) Maybe unify DBMS config attributes a bit (succumb to using a plain connect string?)

    No due date
    4/5 issues closed
  • Custom-programmed DataSources/Sinks can be added by users in an easy way (rules out setuptools) which is also clean (so no "magic folders" or other opaque conventions, use config instead). Also allow extending the supported types of user models in the same/similar/unified way. - R via rpy2, currently, this R support uses a magic "special" user model called r-model plus some extra config attributes to reference the R script/dependencies. Maybe can be made cleaner. - Also: supporting pyspark (possibly very little user-model-related stuff necessary, except maybe convenience base-class-to-use to deal with spark context. - Possibly also: running SAS programs (which would include dealing with SAS datasets from Python (maybe via R/HAVEN?)). Stretch: possibility to unify with a plugin system where plugins can be pip'd (those plugin packages of course would need to use setuptools)?

    No due date
    3/4 issues closed