chore: `_read_gbq_colab` supports querying a pandas DataFrame #1801

tswast · 2025-06-09T21:31:50Z

Work-in-progress:

Finish unit tests
Implement "happy path" (query after load job)
Implement dry run

Fixes internal issue b/406027008, b/409317722 🦕

…olab-local-df

bigframes/core/pyformat.py

tswast · 2025-06-10T20:26:16Z

bigframes/core/blocks.py

        # Note: this uses the sql from the executor, so is coupled tightly to execution
        # implementaton. It will reference cached tables instead of original data sources.
        # Maybe should just compile raw BFET? Depends on user intent.
-        sql = self.session._executor.to_sql(
-            array_value.rename_columns(substitutions), enable_cache=enable_cache


rename_columns call moved to _array_value_for_output

tests/system/small/session/test_read_gbq_colab.py

…df' into b406027008-read_gbq_colab-local-df

tests/unit/session/test_read_gbq_colab.py

bigframes/core/pyformat.py

…olab-local-df

bigframes/pandas/io/api.py

tswast · 2025-06-13T14:38:32Z

FAILED tests/unit/pandas/io/test_api.py::test_read_gbq_colab_dry_run_doesnt_call_set_location - AssertionError: Expected

I suspect this is due to the dependency on global state for "is_started". I'll see if I can mock something more out to make this more independent.

Edit: I think this has been fixed. I added an "else" block to avoid setting the default location if its a dry run.

…olab-local-df

tswast · 2025-06-13T15:31:05Z

Windows failures look like real ones too:

FAILED tests/unit/session/test_read_gbq_colab.py::test_read_gbq_colab_includes_formatted_values_in_dry_run[True]
FAILED tests/unit/session/test_read_gbq_colab.py::test_read_gbq_colab_includes_formatted_values_in_dry_run[False]

E       TypeError: Unexpected Arrow data type int32. Share your usecase with the BigQuery DataFrames team at the [https://bit.ly/bigframes-feedback](https://www.google.com/url?q=https://bit.ly/bigframes-feedback&sa=D) survey. You are currently running BigFrames version 2.6.0.

Edit: I believe this has been fixed by 07b0fa9

…olab-local-df

TrevorBergeron · 2025-06-13T17:54:36Z

bigframes/core/pyformat.py

+def _pandas_df_to_sql_dry_run(pd_df: pandas.DataFrame) -> str:
+    managed_table = bigframes.core.local_data.ManagedArrowTable.from_pandas(pd_df)
+    bqschema = managed_table.schema.to_bigquery()
+    return bigquery_schema.to_sql_dry_run(bqschema)


I think the schema here might drift a bit from the eventual "real" schema when it comes to duplicate labels? Elsewhere, I think we disambiguate just before calling ManagedArrowTable.from_pandas, and we might need to push that logic into from_pandas itself

I think it depends on the case. In some cases I think we try to preserve the pandas-y names. In this case we want it to be compatible with SQL, so I can run the de-duper before calling from_pandas.

TrevorBergeron · 2025-06-13T17:56:11Z

bigframes/dtypes.py

@@ -444,6 +444,23 @@ def dtype_for_etype(etype: ExpressionType) -> Dtype:
    if mapping.arrow_dtype is not None
 }

+# Include types that aren't 1:1 to BigQuery but allowed to be loaded in to BigQuery:
+_ARROW_TO_BIGFRAMES.update(


Can we maybe only use this extended definition for cases where we want to be lenient(eg accepting external data sources), while still being strict for most internal stuff? worry this could allows us to accidentally have our internal types drift in places.

TrevorBergeron · 2025-06-13T18:01:45Z

bigframes/core/blocks.py

            idx_labels,
        )

-    def to_view(self, include_index: bool) -> bigquery.TableReference:
+    def to_view(


since not necessarily a view anymore, maybe like to_placeholder_sql or something like that?

In this particular case there is a session, so it's either a view or a table. Renamed to _to_placeholder_table since in the BQ API views are table resources.

…olab-local-df

chore: _read_gbq_colab supports querying a pandas DataFrame

cd157bd

product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 9, 2025

tswast added 8 commits June 10, 2025 09:36

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

7bc1639

…olab-local-df

make more unit test

1a831df

add session and dry_run arguments

d227076

add dry_run to to_view

ae7a840

initial pandas support with slow dry_run

7c8c8b0

speed up dry run

d7c85bc

test with inline sql and load jobs

96a3796

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

9da0213

…olab-local-df

tswast marked this pull request as ready for review June 10, 2025 20:19

tswast requested review from a team as code owners June 10, 2025 20:19

product-auto-label bot added size: l Pull request size is large. and removed size: s Pull request size is small. labels Jun 10, 2025

tswast requested a review from TrevorBergeron June 10, 2025 20:19

blunderbuss-gcf bot assigned TrevorBergeron Jun 10, 2025

tswast commented Jun 10, 2025

View reviewed changes

bigframes/core/pyformat.py Outdated Show resolved Hide resolved

Update bigframes/core/pyformat.py

0412c7f

tswast commented Jun 10, 2025

View reviewed changes

tests/system/small/session/test_read_gbq_colab.py Outdated Show resolved Hide resolved

tswast added 2 commits June 10, 2025 15:28

remove redundant test

9eda903

Merge remote-tracking branch 'origin/b406027008-read_gbq_colab-local-…

9558438

…df' into b406027008-read_gbq_colab-local-df

tswast commented Jun 10, 2025

View reviewed changes

tests/unit/session/test_read_gbq_colab.py Outdated Show resolved Hide resolved

Update tests/unit/session/test_read_gbq_colab.py

64b5079

tswast commented Jun 10, 2025

View reviewed changes

bigframes/core/pyformat.py Outdated Show resolved Hide resolved

tswast added 3 commits June 11, 2025 13:04

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

423ef12

…olab-local-df

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

a62e11b

…olab-local-df

add dry run that works without a session

1b2e709

tswast added 3 commits June 12, 2025 13:30

fix unit test

0ab2199

add unit tests for sessionless dry run

41d2333

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

8a2b26a

…olab-local-df

tswast commented Jun 12, 2025

View reviewed changes

bigframes/pandas/io/api.py Outdated Show resolved Hide resolved

avoid binding to a location too early

268c2aa

tswast added 3 commits June 13, 2025 09:38

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

69dad00

…olab-local-df

dont try to set the default location unless its not a dry run

6e5b98c

dont try to run any assertion on the response type

109d1f7

tswast added 3 commits June 13, 2025 11:24

add support for small ints and floats

07b0fa9

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

cb04766

…olab-local-df

don't cast from float16 in earlier versions of arrow

f349ba1

TrevorBergeron reviewed Jun 13, 2025

View reviewed changes

tswast added 4 commits June 13, 2025 14:43

Merge remote-tracking branch 'origin/main' into b406027008-read_gbq_c…

d9308da

…olab-local-df

rename _to_view to _to_placeholder_table

a6c72a8

deduplicate column names in dry run

e770289

only allow lossless conversion if explicitly requested

d53a52e

TrevorBergeron approved these changes Jun 13, 2025

View reviewed changes

tswast merged commit 8ebfa57 into main Jun 13, 2025
24 checks passed

tswast deleted the b406027008-read_gbq_colab-local-df branch June 13, 2025 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: `_read_gbq_colab` supports querying a pandas DataFrame #1801

chore: `_read_gbq_colab` supports querying a pandas DataFrame #1801

Uh oh!

tswast commented Jun 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

tswast Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tswast commented Jun 13, 2025 •

edited

Loading

Uh oh!

tswast commented Jun 13, 2025 •

edited

Loading

Uh oh!

TrevorBergeron Jun 13, 2025

Uh oh!

tswast Jun 13, 2025

Uh oh!

TrevorBergeron Jun 13, 2025

Uh oh!

TrevorBergeron Jun 13, 2025

Uh oh!

tswast Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

chore: _read_gbq_colab supports querying a pandas DataFrame #1801

chore: _read_gbq_colab supports querying a pandas DataFrame #1801

Uh oh!

Conversation

tswast commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tswast Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tswast commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tswast commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TrevorBergeron Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

TrevorBergeron Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

TrevorBergeron Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chore: `_read_gbq_colab` supports querying a pandas DataFrame #1801

chore: `_read_gbq_colab` supports querying a pandas DataFrame #1801

tswast commented Jun 9, 2025 •

edited

Loading

tswast Jun 10, 2025 •

edited

Loading

tswast commented Jun 13, 2025 •

edited

Loading

tswast commented Jun 13, 2025 •

edited

Loading