Skip to content

Improve dictionary null handling in hashing and expand aggregate test coverage for nulls #16458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7,968 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
7968 commits
Select commit Hold shift + click to select a range
d388670
Updated extending operators documentation (#15612)
the0ninjas May 12, 2025
10c685b
feat(proto): udf decoding fallback (#15997)
leoyvens May 12, 2025
075f4bb
chore: Replace MSRV link on main page with Github badge (#16020)
comphead May 12, 2025
a4e5afe
Add note to upgrade guide for removal of `ParquetExec`, `AvroExec`, `…
alamb May 13, 2025
7aa717c
refactor: remove deprecated ArrowExec (#16006)
miroim May 13, 2025
946857e
refactor: remove deprecated MemoryExec (#16007)
miroim May 13, 2025
718408b
refactor: remove deprecated JsonExec (#16005)
miroim May 13, 2025
05e1409
chore(deps): bump sqllogictest from 0.28.1 to 0.28.2 (#16037)
dependabot[bot] May 13, 2025
602eb55
chores: Add lint rule to enforce string formatting style (#16024)
Lordworms May 13, 2025
f2e9c37
Use human-readable byte sizes in EXPLAIN (#16043)
tlm365 May 14, 2025
8e6296d
Docs: Add example of creating a field in `return_field_from_args` (#1…
alamb May 14, 2025
db50b12
Support `MIN` and `MAX` for `DataType::List` (#16025)
gabotechs May 14, 2025
d7e395f
fix: overcounting of memory in first/last. (#15924)
ashdnazg May 14, 2025
916b266
Improve docs for Exprs and scalar functions (#16036)
alamb May 14, 2025
673acab
Add h2o window benchmark (#16003)
2010YOUY01 May 14, 2025
625c434
fix: track coalescer's consumption (#16048)
waynexia May 14, 2025
8a1b8e2
Fix Infer prepare statement type tests (#15743)
brayanjuls May 15, 2025
7feaabd
fix: Clarify that it is only the name of the field that is ignored (#…
alamb May 15, 2025
7d10e95
style: simplify some strings for readability (#15999)
hamirmahal May 15, 2025
edea9f9
support simple/cross lateral joins (#16015)
jayzhan211 May 16, 2025
bc9ee4d
Make error msg for oom human readable (#16050)
ding-young May 16, 2025
4152f27
chore(deps): bump the arrow-parquet group with 7 updates (#16047)
dependabot[bot] May 16, 2025
e9073db
chore(deps): bump petgraph from 0.7.1 to 0.8.1 (#15669)
dependabot[bot] May 16, 2025
37c802a
[datafusion-spark] Add Spark-compatible `char` expression (#15994)
andygrove May 16, 2025
2d7aeb4
[Docs]: Added SQL example for all window functions (#16074)
Adez017 May 19, 2025
efe6fa5
chore(deps): bump substrait from 0.55.1 to 0.56.0 (#16091)
dependabot[bot] May 19, 2025
da9412e
Add test for collect_statistics (#16098)
alamb May 19, 2025
5560dfc
Add window function examples in code (#16102)
alamb May 19, 2025
41e0433
Refactor substrait producer into multiple files (#16089)
gabotechs May 19, 2025
bd76b64
Fix temp dir leak in tests (#16094)
findepi May 19, 2025
7943d9a
Label Spark functions PRs with spark label (#16095)
findepi May 19, 2025
b6abddd
feat: add slt tests for imdb data (#16067)
kumarlokesh May 19, 2025
73e09ba
fix: stack overflow for substrait functions with large argument lists…
fmonjalet May 19, 2025
190e2bf
chore: Remove SMJ experimental status (#16072)
comphead May 19, 2025
1ad0346
chore(CI) Update workspace / CI to Rust 1.87 (#16068)
kadai0308 May 19, 2025
24f8eba
minor: Add benchmark query and corresponding documentation for Averag…
logan-keede May 19, 2025
6c304b6
feat: metadata handling for aggregates and window functions (#15911)
timsaucer May 19, 2025
9398ed2
doc: fix indent format explain (#16085)
chenkovsky May 20, 2025
f8ab49c
fix: coerce int96 resolution inside of list, struct, and map types (#…
mbutrovich May 20, 2025
9ad6fac
Update documentation for `datafusion.execution.collect_statistics` (#…
alamb May 20, 2025
795c964
fix: Add coercion rules for Float16 types (#15816)
etseidl May 20, 2025
182883a
Use qualified names on DELETE selections (#16033)
nuno-faria May 20, 2025
6043ad7
chore(deps): bump testcontainers from 0.23.3 to 0.24.0 (#15989)
dependabot[bot] May 20, 2025
39b89bd
feat: make error handling in indent explain consistent with that in t…
chenkovsky May 20, 2025
65911fe
Clean up ExternalSorter and use upstream converter (#16109)
alamb May 20, 2025
e258007
Support `GroupsAccumulator` for Avg duration (#15748)
shruti2522 May 21, 2025
fa671e7
Test Duration in `fuzz` tests (#16111)
alamb May 21, 2025
5521fa6
Move PruningStatistics into datafusion::common (#16069)
adriangb May 21, 2025
3a383ad
Revert use file schema in parquet pruning (#16086)
adriangb May 21, 2025
c9bdbb1
Make `SessionContext::register_parquet` obey `collect_statistics` con…
adriangb May 21, 2025
bc5d70b
fix: describe escaped quoted identifiers (#16082)
jfahne May 21, 2025
4506a3b
Minor: Add `ScalarFunctionArgs::return_type` method (#16113)
alamb May 21, 2025
d2837fd
feat: coerce from fixed size binary to binary view (#16110)
chenkovsky May 21, 2025
d12c8ac
Improve the DML / DDL Documentation (#16115)
alamb May 21, 2025
8b1ce38
Fix `contains` function expression (#16046)
liamzwbao May 21, 2025
e1a55bb
Optimize performance of `string::ascii` function (#16087)
tlm365 May 21, 2025
632f6bf
chore: Use materialized data for filter pushdown tests (#16123)
comphead May 21, 2025
e9601b7
chore: Upgrade rand crate and some other minor crates (#16062)
comphead May 21, 2025
3dec526
Include data types in logical plans of inferred prepare statements (#…
brayanjuls May 21, 2025
9d57b72
docs: Fix typos and minor grammatical issues in Architecture docs (#1…
patrickcsullivan May 22, 2025
e3391b6
add top-memory-consumers option in cli (#16081)
ding-young May 22, 2025
11dfd74
fix ci extended test (#16144)
2010YOUY01 May 22, 2025
44a9407
Fix: handle column name collisions when combining UNION logical input…
LiaCastaneda May 22, 2025
24ae9b3
adding support for Min/Max over LargeList and FixedSizeList (#16071)
logan-keede May 22, 2025
b7aa55e
Move prepare/parameter handling tests into `params.rs` (#16141)
liamzwbao May 22, 2025
9122abc
Add `StateFieldsArgs::return_field` (#16112)
alamb May 22, 2025
3ca0eb2
Support filtering specific sqllogictests identified by line number (#…
gabotechs May 23, 2025
1eb541b
Enrich GroupedHashAggregateStream name to ease debugging Resources ex…
ahmed-mez May 24, 2025
d0689c1
chore(deps): bump uuid from 1.16.0 to 1.17.0 (#16162)
dependabot[bot] May 24, 2025
73e75a7
Minor: Fix links in substrait readme (#16156)
alamb May 24, 2025
1ff67cb
Remove Filter::having field (#16154)
findepi May 24, 2025
938196e
Clarify docs and names in parquet predicate pushdown tests (#16155)
alamb May 24, 2025
1a1fa91
Minor: Fix name() for FilterPushdown physical optimizer rule (#16175)
adriangb May 24, 2025
b37cd0e
migrate tests in `pool.rs` to use insta (#16145)
lifan-ake May 24, 2025
3342852
refactor(optimizer): add `.with_schema` for defining test tables (#16…
atahanyorganci May 24, 2025
0ae8bba
[Minor] Speedup TPC-H benchmark run with memtable option (#16159)
Dandandan May 24, 2025
f0aef35
Fast path for joins with distinct values in build side (#16153)
Dandandan May 24, 2025
85166b0
chore: Reduce repetition in the parameter type inference tests (#16079)
jsai28 May 25, 2025
3287fe8
feat: array_length for fixed size list (#16167)
chenkovsky May 25, 2025
cde7b77
fix: remove trailing whitespace in `Display` for `LogicalPlan::Projec…
atahanyorganci May 26, 2025
3f46869
chore(deps): bump tokio from 1.45.0 to 1.45.1 (#16190)
dependabot[bot] May 26, 2025
f3227c2
Improve `unproject_sort_expr` to handle arbitrary expressions (#16127)
phillipleblanc May 27, 2025
fef7aa4
chore(deps): bump rustyline from 15.0.0 to 16.0.0 (#16194)
dependabot[bot] May 27, 2025
d19ce3c
feat: ADD sha2 spark function (#16168)
getChan May 27, 2025
2b86a5d
Add macro for creating DataFrame (#16090) (#16104)
cj-zhukov May 27, 2025
d999afa
migrate `logical_plan` tests to insta (#16184)
lifan-ake May 27, 2025
4b34d96
doc: Move `dataframe!` example into dedicated example (#16197)
comphead May 27, 2025
f29902d
chore(deps): bump clap from 4.5.38 to 4.5.39 (#16204)
dependabot[bot] May 28, 2025
c01e456
implement `AggregateExec.partition_statistics` (#15954)
UBarney May 28, 2025
814c88f
doc: add diagram to describe how DataSource, FileSource, and DataSour…
onlyjackfrost May 28, 2025
e29b050
Clarify documentation about gathering statistics for parquet files (#…
alamb May 28, 2025
fb21cea
Propagate .execute() calls immediately in `RepartitionExec` (#16093)
gabotechs May 28, 2025
f20894b
Shift from Field to FieldRef for all user defined functions (#16122)
timsaucer May 28, 2025
064f63f
Set aggregation hash seed (#16165)
ctsk May 28, 2025
520da88
feat: create builder for disk manager (#16191)
jdrouet May 29, 2025
0cdb49e
Fix ScalarStructBuilder::build() for an empty struct (#16205)
Blizzara May 29, 2025
c4271a7
Return an error on overflow in `do_append_val_inner` (#16201)
liamzwbao May 29, 2025
2f26081
Change default SQL mapping for `VARCAHR` from `Utf8` to `Utf8View` (…
zhuqi-lucas May 30, 2025
9eefac5
chore(deps): bump testcontainers-modules from 0.12.0 to 0.12.1 (#16212)
dependabot[bot] May 30, 2025
0635bd8
Substrait: handle identical grouping expressions (#16189)
cht42 May 30, 2025
76ce4d5
Add new stats pruning helpers to allow combining partition values in …
adriangb May 30, 2025
0dbb214
Implement schema adapter support for FileSource and add integration t…
kosiew May 30, 2025
3053881
Minor: update documentation for PrunableStatistics (#16213)
alamb May 30, 2025
5299a3e
Minor: Remove dead code (#16215)
alamb May 31, 2025
eb65939
Add change to VARCHAR in the upgrade guide (#16216)
alamb Jun 1, 2025
a819a55
Reduce size of `Expr` struct (#16207)
hendrikmakait Jun 2, 2025
c6b5cab
fix: metadata of join schema (#16221)
chenkovsky Jun 2, 2025
750b46a
fix: add missing row count limits to TPC-H queries (#16230)
0ax1 Jun 3, 2025
4893f50
Remove use of deprecated dict_ordered in datafusion-proto (#16218) (#…
cj-zhukov Jun 3, 2025
b90fada
debug cargo command in bench script (#16236)
2010YOUY01 Jun 3, 2025
f116e5e
Add iceberg-rust to user list (#16246)
jonathanc-n Jun 4, 2025
a16ea98
Simplify FileSource / SchemaAdapterFactory API (#16214)
alamb Jun 4, 2025
bea45bd
Prepare for 48.0.0 release: Version and Changelog (#16238)
xudong963 Jun 4, 2025
bfa7efd
Add dicts to aggregation fuzz testing (#16232)
blaginin Jun 4, 2025
166ab26
chore(deps): bump sysinfo from 0.35.1 to 0.35.2 (#16247)
dependabot[bot] Jun 4, 2025
e26ad2b
Improve performance of constant aggregate window expression (#16234)
suibianwanwank Jun 4, 2025
bd32977
Support compound identifier when parsing tuples (#16225)
hozan23 Jun 4, 2025
2a18895
Schema adapter helper (#16108)
kosiew Jun 4, 2025
084778f
Update tpch, clickbench, sort_tpch to mark failed queries (#16182)
ding-young Jun 5, 2025
dc44959
Adjust slttest to pass without RUST_BACKTRACE enabled (#16251)
alamb Jun 5, 2025
767f13b
fix: NaN semantics in GROUP BY (#16256)
chenkovsky Jun 5, 2025
43711d0
Handle dicts for distinct count (#15871)
blaginin Jun 5, 2025
ad1130b
Add `--substrait-round-trip` option in sqllogictests (#16183)
gabotechs Jun 5, 2025
d3921e5
Minor: fix upgrade papercut where structure was moved (#16264)
alamb Jun 5, 2025
a3c1c40
feat: Add Aggregate UDF to FFI crate (#14775)
timsaucer Jun 5, 2025
13881b2
feat(small): Add `BaselineMetrics` to `generate_series()` table funct…
2010YOUY01 Jun 5, 2025
bd057ac
feat: Add Window UDFs to FFI Crate (#16261)
timsaucer Jun 5, 2025
bfa5cd5
Chore: update DF48 changelog (#16269)
xudong963 Jun 6, 2025
5a468c3
chore(deps): bump sqllogictest from 0.28.2 to 0.28.3 (#16286)
dependabot[bot] Jun 6, 2025
5fa0efd
chore(deps-dev): bump webpack-dev-server (#16253)
dependabot[bot] Jun 6, 2025
a607053
Improve DataFusion subcrate readme files (#16263)
alamb Jun 6, 2025
cf58ded
Fix intermittent SQL logic test failure in limit.slt by adding ORDER …
kosiew Jun 6, 2025
0ceaa63
Extend benchmark comparison script with more detailed statistics (#16…
pepijnve Jun 6, 2025
277e0ba
feat: Support defining custom MetricValues in PhysicalPlans (#16195)
sfluor Jun 6, 2025
577e9cf
feat: add metadata to literal expressions (#16170)
timsaucer Jun 6, 2025
6bff3ca
[MAJOR] Equivalence System Overhaul (#16217)
ozankabak Jun 7, 2025
dd2a19a
Minor: Add upgrade guide for `Expr::WindowFunction` (#16313)
alamb Jun 8, 2025
7d4acdd
Fix `array_position` on empty list (#16292)
Blizzara Jun 8, 2025
708d4dd
chore(deps): bump flate2 from 1.1.1 to 1.1.2 (#16338)
dependabot[bot] Jun 9, 2025
766872f
chore(deps): bump petgraph from 0.8.1 to 0.8.2 (#16337)
dependabot[bot] Jun 9, 2025
58e0eb7
chore(deps): bump substrait from 0.56.0 to 0.57.0 (#16143)
dependabot[bot] Jun 9, 2025
bc7efe4
feat: Allow cancelling of grouping operations which are CPU bound (#1…
zhuqi-lucas Jun 9, 2025
08eb3bf
Add test for ordering of predicate pushdown into parquet (#16169)
adriangb Jun 9, 2025
4606569
Fix distinct count for DictionaryArray to correctly account for nulls…
kosiew Jun 9, 2025
726cfbc
Fix inconsistent schema projection in ListingTable even when schema i…
kosiew Jun 10, 2025
7cf40fc
Fix: mark "Spilling (to disk) Joins" as supported in features (#16343)
kosiew Jun 10, 2025
1389e9e
Add late pruning of Parquet files based on file level statistics (#16…
adriangb Jun 10, 2025
3e1c1fc
tpch: move reading of SQL queries out of timed span. (#16357)
pepijnve Jun 10, 2025
c8a46d7
chore(deps): bump clap from 4.5.39 to 4.5.40 (#16354)
dependabot[bot] Jun 11, 2025
5d684d9
chore(deps): bump syn from 2.0.101 to 2.0.102 (#16355)
dependabot[bot] Jun 11, 2025
47cf40a
Fix cp_solver doc format (#16352)
xudong963 Jun 11, 2025
a5af89a
docs: Expand `MemoryPool` docs with related structs (#16289)
2010YOUY01 Jun 11, 2025
67f3c94
Encapsulate metadata for literals on to a `FieldMetadata` structure (…
alamb Jun 11, 2025
c3443c5
Add support `UInt64` and other integer data types for `to_hex` (#16335)
tlm365 Jun 11, 2025
f9c7359
Support datafusion-cli access to public S3 buckets that do not requir…
alamb Jun 11, 2025
71d9aaa
Document `copy_array_data` function with example (#16361)
alamb Jun 11, 2025
e3304e6
Fix array_agg memory over use (#16346)
gabotechs Jun 11, 2025
391b057
Update publish command (#16377)
xudong963 Jun 11, 2025
da9ee2e
Add more context to error message for datafusion-cli config failure (…
alamb Jun 11, 2025
879d632
Fix: datafusion-sqllogictest 48.0.0 can't be published (#16376)
xudong963 Jun 11, 2025
668ef78
fix: preserve null_equals_null flag in eliminate_cross_join rule (#16…
waynexia Jun 11, 2025
1c27b80
feat: support FixedSizeList for array_has (#16333)
chenkovsky Jun 11, 2025
362fda2
bug: remove busy-wait while sort is ongoing (#16322)
pepijnve Jun 12, 2025
8254259
Document Table Constraint Enforcement Behavior in Custom Table Provid…
kosiew Jun 12, 2025
10f345a
chore: refactor Substrait consumer's "rename_field" and implement the…
Blizzara Jun 12, 2025
f3daa32
feat: Support tpch and tpch10 csv format (#16373)
zhuqi-lucas Jun 12, 2025
ce21f3a
chore(deps): bump object_store from 0.12.1 to 0.12.2 (#16368)
dependabot[bot] Jun 12, 2025
f846d72
Disable `datafusion-cli tests in hash (#16382)
alamb Jun 12, 2025
52b5b66
Fix array_concat with NULL arrays (#16348)
alexanderbianchi Jun 12, 2025
0be53c0
doc: Add SQL examples for SEMI + ANTI Joins (#16316)
jonathanc-n Jun 12, 2025
1c929dd
Minor: add testing case for add YieldStreamExec and polish docs (#16369)
zhuqi-lucas Jun 13, 2025
bc3b91b
chore(deps): bump aws-config from 1.6.3 to 1.8.0 (#16394)
dependabot[bot] Jun 13, 2025
cfc959b
fix: Fix SparkSha2 to be compliant with Spark response and add suppor…
rishvin Jun 13, 2025
2ca67c6
fix typo in test file name (#16403)
adriangb Jun 13, 2025
3b19243
Add topk_tpch benchmark (#16410)
Dandandan Jun 14, 2025
8e0d12f
Remove some clones (#16404)
simonvandel Jun 14, 2025
f74103b
chore(deps): bump syn from 2.0.102 to 2.0.103 (#16393)
dependabot[bot] Jun 15, 2025
216e237
[datafusion-spark] Example of using Spark compatible function library…
alamb Jun 15, 2025
37a633b
Simplify expressions passed to table functions (#16388)
simonvandel Jun 15, 2025
ea4b235
Add fast paths for try_process_unnest (#16389)
simonvandel Jun 15, 2025
5562701
cleanup bench.sh usage message (#16416)
2010YOUY01 Jun 15, 2025
052b344
Add note in upgrade guide about changes to `Expr::Scalar` in 48.0.0 (…
alamb Jun 16, 2025
f3bce10
Update PMC management instructions to follow new ASF process (#16417)
alamb Jun 16, 2025
7c28959
feat: Support RightMark join for NestedLoop and Hash join (#16083)
jonathanc-n Jun 16, 2025
5b34e45
fix: Fixed error handling for `generate_series/range` (#16391)
jonathanc-n Jun 16, 2025
3e1f476
Add design process section to the docs (#16397)
alamb Jun 16, 2025
2a7b549
chore(deps): bump rust_decimal from 1.37.1 to 1.37.2 (#16422)
dependabot[bot] Jun 16, 2025
cb30dee
feat: mapping sql Char/Text/String default to Utf8View (#16290)
zhuqi-lucas Jun 17, 2025
edd7545
Migrate core test to insta, part1 (#16324)
Chen-Yuan-Lai Jun 17, 2025
4f220f0
fuzz: include dict with null values
kosiew Jun 17, 2025
35529eb
chore(deps): bump mimalloc from 0.1.46 to 0.1.47 (#16426)
dependabot[bot] Jun 17, 2025
6ce8205
chore(deps): bump libc from 0.2.172 to 0.2.173 (#16421)
dependabot[bot] Jun 17, 2025
5c8088d
Unify Metadata Handing: use `FieldMetadata` in `Expr::Alias` and `Exp…
alamb Jun 17, 2025
8a1e0f8
Dynamic filter pushdown for TopK sorts (#15770)
adriangb Jun 17, 2025
ddb5e9c
fix: Enable WASM compilation by making sqlparser's recursive-protecti…
jonmmease Jun 17, 2025
276ae31
Use dedicated NullEquality enum instead of null_equals_null boolean (…
tobixdev Jun 17, 2025
cdaf473
chore: generate basic spark function tests (#16409)
shehabgamin Jun 17, 2025
4cfa2a3
replace false with NullEqualsNothing (#16437)
ding-young Jun 18, 2025
ed29567
chore(deps): bump bzip2 from 0.5.2 to 0.6.0 (#16441)
dependabot[bot] Jun 18, 2025
c9fca98
Update Roadmap documentation (#16399)
alamb Jun 18, 2025
04c9ee3
chore(deps): bump libc from 0.2.173 to 0.2.174 (#16440)
dependabot[bot] Jun 18, 2025
410a7fc
feat: support fixed size list for array reverse (#16423)
chenkovsky Jun 18, 2025
801e43d
feat: add SchemaProvider::table_type(table_name: &str) (#16401)
epgif Jun 18, 2025
8fb7133
fix: create file for empty stream (#16342)
chenkovsky Jun 18, 2025
12118b1
Remove redundant license-header-check CI job (#16451)
alamb Jun 19, 2025
1cab2b7
. (#16449)
AdamGS Jun 19, 2025
f4cddf7
feat: enhance error reporting in AggregationFuzzTestTask with dataset…
kosiew Jun 17, 2025
ad8a694
Fix typo in expect message
kosiew Jun 18, 2025
6dcbcf0
feat: add tests for GROUP BY with MAX aggregations, including handlin…
kosiew Jun 18, 2025
ae3914e
feat: add tests for GROUP BY with MAX aggregations on dictionary arra…
kosiew Jun 18, 2025
629274a
feat: enhance GROUP BY tests with detailed assertions for dictionary …
kosiew Jun 18, 2025
fa0cd98
feat: enhance GROUP BY MAX tests with comprehensive assertions for nu…
kosiew Jun 18, 2025
f842154
feat: update test assertions for GROUP BY MAX with nulls and dictiona…
kosiew Jun 18, 2025
47eb439
feat: add tests for COUNT, SUM, MIN, MEDIAN, and FIRST/LAST_VALUE han…
kosiew Jun 18, 2025
458acdf
feat: add comprehensive tests for GROUP BY with dictionary columns ha…
kosiew Jun 18, 2025
8d09dca
fix: correct median value assertions in GROUP BY tests for dictionary…
kosiew Jun 18, 2025
dc1059b
feat: remove redundant GROUP BY tests for MAX aggregations and enhanc…
kosiew Jun 18, 2025
2260efb
feat: enhance COUNT, SUM, MIN, and MEDIAN tests for null handling wit…
kosiew Jun 18, 2025
5b1d332
feat: enhance COUNT, SUM, MIN, and MEDIAN tests for null handling wit…
kosiew Jun 18, 2025
e65936c
feat: enhance COUNT, SUM, MIN, and MEDIAN tests for null handling wit…
kosiew Jun 18, 2025
09834c2
feat: add helper functions and test data structures for handling null…
kosiew Jun 18, 2025
10804ba
Merge branch 'main' into fuzz-16266a
kosiew Jun 19, 2025
4c09b97
refactor: improve batch splitting logic in setup_test_contexts
kosiew Jun 19, 2025
1913471
refactor: add num_partitions variable, streamline setup_test_contexts…
kosiew Jun 19, 2025
0049921
refactor: simplify setup_test_contexts by introducing create_context_…
kosiew Jun 19, 2025
4b9b57a
refactor: consolidate dictionary creation functions in aggregate tests
kosiew Jun 19, 2025
da42e17
refactor: introduce string_dict_type function to simplify dictionary …
kosiew Jun 19, 2025
b330bc3
refactor: move create_context_with_partitions function to improve tes…
kosiew Jun 19, 2025
ef5e0f4
add more first_value, last_value tests
kosiew Jun 19, 2025
2f563a4
test: add fuzz tests for MAX with dictionary columns containing null …
kosiew Jun 19, 2025
8a3f964
test: add fuzz tests for MIN aggregation with timestamp and dictionar…
kosiew Jun 19, 2025
b460864
test: add COUNT and COUNT DISTINCT tests for fuzz table with dictiona…
kosiew Jun 19, 2025
b23c64d
test: add median and median distinct tests for fuzz table with numeri…
kosiew Jun 19, 2025
31e2050
test: fix first and last value null handling in aggregate tests
kosiew Jun 19, 2025
1124234
test: update expected values in count distinct test for dictionary co…
kosiew Jun 19, 2025
a244905
test: update expected values in aggregate tests for dictionary column…
kosiew Jun 19, 2025
2d7b02b
fix: improve null handling in dictionary hash calculations
kosiew Jun 19, 2025
92be3a7
test: enable generation of null values in record batch generator
kosiew Jun 19, 2025
74ed9f2
fix: remove dataset rows and sort keys from error report in Aggregati…
kosiew Jun 19, 2025
b7f3840
test: remove overlapping tests
kosiew Jun 19, 2025
87afa31
fix: correct formatting in AggregationFuzzTestTask output
kosiew Jun 19, 2025
fd2d6bb
Merge branch 'main' into fuzz-16266a
kosiew Jun 19, 2025
5ed1dfc
fix: rename variables for clarity in hash_dictionary function
kosiew Jun 20, 2025
0f6f050
fix: refactor hash_dictionary to use helper function for updating dic…
kosiew Jun 20, 2025
08fb46c
fix: add debug assertion for null percentage in RecordBatchGenerator
kosiew Jun 20, 2025
d25b703
fix: introduce create_test_dict helper function for improved dictiona…
kosiew Jun 20, 2025
00b1fe0
refactor: remove legacy create_dict function in favor of create_test_…
kosiew Jun 20, 2025
44b3f30
refactor: enhance test helper functions for improved readability and …
kosiew Jun 20, 2025
86db749
feat: add fuzz timestamp test data structure and related setup functions
kosiew Jun 20, 2025
1968a37
refactor: simplify run_snapshot_test function by removing unused para…
kosiew Jun 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
46 changes: 35 additions & 11 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,46 @@
# specific language governing permissions and limitations
# under the License.

# This file controls the settings of this repository
#
# See more details at
# https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features

notifications:
commits: [email protected]
issues: [email protected]
pullrequests: [email protected]
commits: [email protected]
issues: [email protected]
pullrequests: [email protected]
discussions: [email protected]
jira_options: link label worklog
github:
description: "Apache Arrow DataFusion and Ballista query engines"
homepage: https://arrow.apache.org/
description: "Apache DataFusion SQL Query Engine"
homepage: https://datafusion.apache.org/
labels:
datafusion
ballista
bug
performance
- arrow
- big-data
- dataframe
- datafusion
- olap
- python
- query-engine
- rust
- sql
enabled_merge_buttons:
squash: true
merge: false
rebase: false
rebase: false
features:
issues: true
issues: true
discussions: true
protected_branches:
main:
required_pull_request_reviews:
required_approving_review_count: 1
pull_requests:
# enable updating head branches of pull requests
allow_update_branch: true

# publishes the content of the `asf-site` branch to
# https://datafusion.apache.org/
publish:
whoami: asf-site
13 changes: 13 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM rust:bookworm

RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
# Remove imagemagick due to https://security-tracker.debian.org/tracker/CVE-2019-10131
&& apt-get purge -y imagemagick imagemagick-6-common

# Add protoc
# https://datafusion.apache.org/contributor-guide/getting_started.html#protoc-installation
RUN curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v25.1/protoc-25.1-linux-x86_64.zip \
&& unzip protoc-25.1-linux-x86_64.zip -d $HOME/.local \
&& rm protoc-25.1-linux-x86_64.zip

ENV PATH="$PATH:$HOME/.local/bin"
16 changes: 16 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"build": {
"dockerfile": "./Dockerfile",
"context": "."
},
"customizations": {
"vscode": {
"extensions": [
"rust-lang.rust-analyzer"
]
}
},
"features": {
"ghcr.io/devcontainers/features/rust:1": "latest"
}
}
22 changes: 0 additions & 22 deletions .dir-locals.el

This file was deleted.

28 changes: 2 additions & 26 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,26 +1,2 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Turn .dockerignore to .dockerallow by excluding everything and explicitly
# allowing specific files and directories. This enables us to quickly add
# dependency files to the docker content without scanning the whole directory.
# This setup requires to all of our docker containers have arrow's source
# as a mounted directory.

ci
dev
**/target/*
.git
**target
27 changes: 27 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true

[*.rs]
indent_style = space
indent_size = 4
74 changes: 0 additions & 74 deletions .env

This file was deleted.

10 changes: 4 additions & 6 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
r/R/RcppExports.R linguist-generated=true
r/R/arrowExports.R linguist-generated=true
r/src/RcppExports.cpp linguist-generated=true
r/src/arrowExports.cpp linguist-generated=true
r/man/*.Rd linguist-generated=true

.github/ export-ignore
datafusion/core/tests/data/newlines_in_values.csv text eol=lf
datafusion/proto/src/generated/prost.rs linguist-generated
datafusion/proto/src/generated/pbjson.rs linguist-generated
19 changes: 0 additions & 19 deletions .github/.dir-locals.el

This file was deleted.

72 changes: 0 additions & 72 deletions .github/CONTRIBUTING.md

This file was deleted.

27 changes: 27 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Bug report
description: Create a report to help us improve
labels: bug
body:
- type: textarea
attributes:
label: Describe the bug
description: Describe the bug.
placeholder: >
A clear and concise description of what the bug is.
validations:
required: true
- type: textarea
attributes:
label: To Reproduce
placeholder: >
Steps to reproduce the behavior:
- type: textarea
attributes:
label: Expected behavior
placeholder: >
A clear and concise description of what you expected to happen.
- type: textarea
attributes:
label: Additional context
placeholder: >
Add any other context about the problem here.
22 changes: 0 additions & 22 deletions .github/ISSUE_TEMPLATE/config.yml

This file was deleted.

Loading