feat!: add `allow_large_results` option to `read_gbq_query`, aligning with `bpd.options.compute.allow_large_results` option #1935

tswast · 2025-07-24T16:32:07Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

chelsea-lin · 2025-07-24T19:57:21Z

bigframes/pandas/io/api.py

@@ -215,6 +217,7 @@ def read_gbq(
    use_cache: Optional[bool] = None,
    col_order: Iterable[str] = (),
    dry_run: bool = False,
+    allow_large_results: bool = True,


Should allow_large_results also default to None? This would allow it to inherit its value from ComputeOptions.allow_large_results.

If we do so, it'd technicaly be a breaking change. Users with query results > 10 GB would have to set this option to True.

Might be worth it though for the consistency with other places, though?

…large_results

tswast · 2025-08-21T14:27:29Z

Doctest looks like a real failure:

_______________ [doctest] bigframes.pandas.io.api.read_gbq_query _______________
[gw3] linux -- Python 3.12.7 /tmpfs/src/github/python-bigquery-dataframes/.nox/doctest/bin/python
EXAMPLE LOCATION UNKNOWN, not showing all tests of that example
??? >>> df.head(2)
Differences (unified diff with -expected +actual):
    @@ -1,6 +1,5 @@
    -         pitcherFirstName pitcherLastName  averagePitchSpeed
    -rowindex
    -1                Albertin         Chapman          96.514113
    -2                 Zachary         Britton          94.591039
    +   rowindex pitcherFirstName pitcherLastName  averagePitchSpeed
    +0         1         Albertin         Chapman          96.514113
    +1         2          Zachary         Britton          94.591039
     <BLANKLINE>
    -[2 rows x 3 columns]
    +[2 rows x 4 columns]

It seems I'm not setting the index correctly.

pytest --doctest-modules bigframes/session/__init__.py::bigframes.session.Session.read_gbq_query

tswast · 2025-08-21T16:03:52Z

It seems I'm not setting the index correctly.

This has been fixed. I've confirmed the doctest passes locally and have added two tests for the columns and index_col arguments.

tswast · 2025-08-21T17:18:39Z

Looks like I have some more failing tests to address:

FAILED tests/system/small/test_session.py::test_read_gbq_wildcard[all-read_gbq]
FAILED tests/system/small/test_session.py::test_read_gbq_wildcard[all-read_gbq_table]
FAILED tests/system/small/test_session.py::test_read_gbq_table_dry_run_with_max_results
FAILED tests/system/small/test_session.py::test_read_gbq_wildcard[max_results-read_gbq]
FAILED tests/system/small/test_session.py::test_read_gbq_wildcard[max_results-read_gbq_table]
FAILED tests/system/small/test_session.py::test_read_gbq_with_configuration[config2]
FAILED tests/system/small/test_session.py::test_read_gbq_w_max_results[two_rows_in_table]
FAILED tests/system/small/test_session.py::test_read_gbq_w_script_no_select

Getting started on that now.

tswast · 2025-08-21T20:00:36Z

A lot more failures now.

FAILED tests/system/small/test_session.py::test_read_gbq_duplicate_columns_xfail[query_input_columns_dup]
FAILED tests/system/small/test_unordered.py::test_unordered_mode_read_gbq - A...
FAILED tests/system/small/test_dataframe_io.py::test_to_sql_query_unnamed_index_included
FAILED tests/system/small/test_dataframe_io.py::test_to_sql_query_named_index_included
FAILED tests/system/small/bigquery/test_vector_search.py::test_vector_search_different_params_with_query
FAILED tests/system/small/bigquery/test_vector_search.py::test_vector_search_df_with_query_column_to_search
FAILED tests/system/small/ml/test_core.py::test_model_centroids - AssertionEr...
FAILED tests/system/small/ml/test_decomposition.py::test_pca_components_ - As...
FAILED tests/system/small/ml/test_core.py::test_pca_model_principal_components
FAILED tests/system/small/ml/test_core.py::test_model_forecast[id] - Assertio...
FAILED tests/system/small/ml/test_forecasting.py::test_arima_plus_predict_params[id]
FAILED tests/system/small/ml/test_decomposition.py::test_pca_predict - Assert...
FAILED tests/system/small/ml/test_forecasting.py::test_arima_plus_score_series[id]
FAILED tests/system/small/ml/test_forecasting.py::test_arima_plus_score[id]
FAILED tests/system/small/ml/test_forecasting.py::test_arima_plus_predict_default[id]
FAILED tests/system/small/ml/test_forecasting.py::test_arima_plus_predict_explain_default[id]

I'll take a look at this when I get back I guess.

tswast · 2025-08-21T21:16:56Z

___________________ test_to_sql_query_unnamed_index_included ___________________
[gw19] linux -- Python 3.11.10 /tmpfs/src/github/python-bigquery-dataframes/.nox/system-3-11/bin/python

session = <bigframes.session.Session object at 0x15486703c150>
scalars_df_default_index =    bool_col                                          bytes_col    date_col  \
0      True                             ...99+00:00  0 days 00:00:00.000004  
8                              <NA>         5 days 00:00:00  

[9 rows x 15 columns]
scalars_pandas_df_default_index =    bool_col  ...            duration_col
0      True  ...  0 days 00:00:00.000004
1     False  ...       -1 days +23:5...          <NA>
7      True  ...  0 days 00:00:00.000004
8     False  ...         5 days 00:00:00

[9 rows x 15 columns]

    def test_to_sql_query_unnamed_index_included(
        session: bigframes.Session,
        scalars_df_default_index: bpd.DataFrame,
        scalars_pandas_df_default_index: pd.DataFrame,
    ):
        bf_df = scalars_df_default_index.reset_index(drop=True).drop(columns="duration_col")
        sql, idx_ids, idx_labels = bf_df._to_sql_query(include_index=True)
        assert len(idx_labels) == 1
        assert len(idx_ids) == 1
        assert idx_labels[0] is None
        assert idx_ids[0].startswith("bigframes")
    
        pd_df = scalars_pandas_df_default_index.reset_index(drop=True).drop(
            columns="duration_col"
        )
        roundtrip = session.read_gbq(sql, index_col=idx_ids)
        roundtrip.index.names = [None]
>       utils.assert_pandas_df_equal(roundtrip.to_pandas(), pd_df, check_index_type=False)

[tests/system/small/test_dataframe_io.py:1036](https://cs.corp.google.com/piper///depot/google3/tests/system/small/test_dataframe_io.py?l=1036): 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[bigframes/testing/utils.py:99](https://cs.corp.google.com/piper///depot/google3/bigframes/testing/utils.py?l=99): in assert_pandas_df_equal
    pd.testing.assert_frame_equal(df0, df1, **kwargs)
testing.pyx:55: in pandas._libs.testing.assert_almost_equal
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   AssertionError: DataFrame.index are different
E   
E   DataFrame.index values are different (88.88889 %)
E   [left]:  Index([8, 4, 0, 3, 5, 1, 2, 6, 7], dtype='Int64')
E   [right]: RangeIndex(start=0, stop=9, step=1)
E   At positional index 0, first diff: 8 != 0

testing.pyx:173: AssertionError

I think this is a real failure. We aren't sorting by the index_col.

…large_results

feat: add allow_large_results option to read_gbq_query

cdec057

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jul 24, 2025

tswast added 2 commits July 24, 2025 18:25

add system test

3b78da7

Merge remote-tracking branch 'origin/main' into allow_large_results

ad11cc8

tswast marked this pull request as ready for review July 24, 2025 18:27

tswast requested review from a team as code owners July 24, 2025 18:27

tswast requested a review from chelsea-lin July 24, 2025 18:27

blunderbuss-gcf bot assigned ZehaoXU Jul 24, 2025

add to pandas module

68f1a10

tswast changed the title ~~feat: add allow_large_results option to read_gbq_query~~ feat: add allow_large_results option to read_gbq_query. Set to False to enable faster queries Jul 24, 2025

chelsea-lin reviewed Jul 24, 2025

View reviewed changes

tswast added 4 commits August 20, 2025 15:22

Merge branch 'main' into allow_large_results

b1e31e3

default to global option

3b770ae

Merge remote-tracking branch 'origin/allow_large_results' into allow_…

d01cdcb

…large_results

fix unit test

10a8302

tswast added 4 commits August 21, 2025 14:58

tweak imports so I can manually run doctest

938fb89

pytest --doctest-modules bigframes/session/__init__.py::bigframes.session.Session.read_gbq_query

support index_col and columns

78d29f0

add system tests and fix pandas warning

435c602

Merge remote-tracking branch 'origin/main' into allow_large_results

7199fee

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Aug 21, 2025

tswast requested a review from chelsea-lin August 21, 2025 16:07

use global option in remaining functions

778746f

Merge remote-tracking branch 'origin/main' into allow_large_results

766640a

chelsea-lin previously approved these changes Aug 21, 2025

View reviewed changes

supply allow_large_results=False when max_results is set

95b2fdf

tswast dismissed chelsea-lin’s stale review via 95b2fdf August 21, 2025 17:36

chelsea-lin previously approved these changes Aug 21, 2025

View reviewed changes

fix last? failing test

05c145c

tswast dismissed chelsea-lin’s stale review via 05c145c August 21, 2025 18:09

chelsea-lin previously approved these changes Aug 21, 2025

View reviewed changes

tswast enabled auto-merge (squash) August 21, 2025 19:13

Merge branch 'main' into allow_large_results

e87b548

tswast changed the title ~~feat: add allow_large_results option to read_gbq_query. Set to False to enable faster queries~~ feat!: add allow_large_results option to read_gbq_query. Set to False to enable faster queries Aug 21, 2025

tswast disabled auto-merge August 21, 2025 20:00

tswast added 2 commits August 21, 2025 22:02

fix vector search tests

58f45f8

Merge remote-tracking branch 'origin/allow_large_results' into allow_…

d611337

…large_results

tswast dismissed chelsea-lin’s stale review via d611337 August 21, 2025 22:04

fix vector search tests again

6d0fe48

tswast enabled auto-merge (squash) August 21, 2025 22:42

chelsea-lin previously approved these changes Aug 21, 2025

View reviewed changes

tswast added 2 commits August 21, 2025 23:08

fix arima tests

8c27c2c

fix more tests

5ab21a9

tswast dismissed chelsea-lin’s stale review via 5ab21a9 August 21, 2025 23:13

tswast disabled auto-merge August 21, 2025 23:13

tswast changed the title ~~feat!: add allow_large_results option to read_gbq_query. Set to False to enable faster queries~~ feat!: add allow_large_results option to read_gbq_query, aligning with bpd.options.compute.allow_large_results option Aug 21, 2025

tswast added 2 commits August 21, 2025 23:17

try again

2ca9e9e

and again

0c8b88c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat!: add `allow_large_results` option to `read_gbq_query`, aligning with `bpd.options.compute.allow_large_results` option #1935

feat!: add `allow_large_results` option to `read_gbq_query`, aligning with `bpd.options.compute.allow_large_results` option #1935

tswast commented Jul 24, 2025

Uh oh!

chelsea-lin Jul 24, 2025

Uh oh!

tswast Aug 20, 2025 •

edited

Loading

Uh oh!

tswast Aug 20, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

Uh oh!

feat!: add allow_large_results option to read_gbq_query, aligning with bpd.options.compute.allow_large_results option #1935

Are you sure you want to change the base?

feat!: add allow_large_results option to read_gbq_query, aligning with bpd.options.compute.allow_large_results option #1935

Conversation

tswast commented Jul 24, 2025

Uh oh!

chelsea-lin Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tswast Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

tswast commented Aug 21, 2025

Uh oh!

Uh oh!

feat!: add `allow_large_results` option to `read_gbq_query`, aligning with `bpd.options.compute.allow_large_results` option #1935

feat!: add `allow_large_results` option to `read_gbq_query`, aligning with `bpd.options.compute.allow_large_results` option #1935

tswast Aug 20, 2025 •

edited

Loading