-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: Support u32
indices for HashJoinExec
#16434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cc @Dandandan |
🤖 |
🤖: Benchmark completed Details
|
Those benchmarks make sense, just saves memory. |
Which issue does this PR close?
Rationale for this change
We can use
u32
indices instead ofu64
indices when there are less thanu32::MAX
rows when building the hashmap. This acts as a memory optimizationWhat changes are included in this PR?
During
HashJoinExec
we construct theJoinLeftData
with aBox<dyn JoinHashMapType>
choosing between a u32 indice or u64JoinHashMap
.I changed the
JoinHashMapType
to hold theupdate_from_iter
,get_matched_indice
, andget_matched_indices_with_limit_offset
and split theJoinHashMap
intoJoinHashMapU32
andJoinHashMapU64
.I deliberately did not try to expose a generic in the trait, nor did I try to create a generic on the
JoinHashMap
struct as doing so would lead to many problems with having to call preceding functions with a generic; doing so is not possible since we are determining theJoinHashMapType
during runtime.Are these changes tested?
Yes I added a test for checking the hashmap created using u32 indices.