Skip to content

Conversation

jaki729
Copy link

@jaki729 jaki729 commented Oct 1, 2025

Fixes Issue #280: Dataset.get_data(fields=[...]) returned all dataset columns with non-selected ones filled with NaN.

Changes:

  • Add Dataset._filter_df_by_fields that sanitizes requested field names (function-style + camelCase -> snake_case)
    and filters the DataFrame returned by the provider to only the requested columns that exist.
  • Apply the filter in the dataframe construction paths so that when fields is provided users get only
    the selected columns (no NaN columns).
  • Add unit test (gs_quant/test/data/test_dataset.py::test_get_data_filters_to_requested_fields) that uses a
    DummyProvider to validate the behavior without network calls.

Notes & rationale:

  • If none of the requested fields match provider columns, the original DataFrame is preserved to avoid
    unexpectedly dropping all data (non-breaking default). Maintainers may request a stricter behavior; happy to adjust.

covered by: Jakiur Rahman.dco

Fixes Issue goldmansachs#280: when callers passed fields=[...], Dataset.get_data previously returned the full
dataset schema with non-selected columns filled with NaN. Now requested fields are sanitized &
mapped to snake_case and the returned DataFrame is filtered to only those columns when present.

covered by: Jakiur Rahman.dco
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant