Skip to content

Commit 1bed04c

Browse files
authored
Optimize coalesce kernel for StringView (10-50% faster) (#7650)
# Which issue does this PR close? - Part of #7456 # Rationale for this change Currently the `coalesce` kernel buffers views / data until there are enough rows and then concat's the results together. StringViewArrays can be even worse as there is a second copy in `gc_string_view_batch` This is wasteful because it 1. Buffers memory (has 2x the peak usage) 2. Copies the data twice We can make it faster and more memory efficient by directly creating the output array # What changes are included in this PR? 1. Add a specialization for incrementally building `StringViewArray` without buffering Note this PR does NOT (yet) add specialized filtering -- instead it focuses on reducing the overhead of appending views by not copying them (again!) with `gc_string_view_batch` # Open questions: 1. There is substantial overlap / duplication with StringViewBuilder -- I wonder if we can / should consolidate them somehow The differences are that the 1. Block size calculation management (aka look at the buffer sizes of the incoming buffers) 2. Finishing array allocates sufficient space for views # Are there any user-facing changes? The kernel is faster, no API changes
1 parent 7276819 commit 1bed04c

File tree

4 files changed

+915
-148
lines changed

4 files changed

+915
-148
lines changed

arrow-array/src/array/byte_view_array.rs

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -479,6 +479,32 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
479479
builder.finish()
480480
}
481481

482+
/// Returns the total number of bytes used by all non inlined views in all
483+
/// buffers.
484+
///
485+
/// Note this does not account for views that point at the same underlying
486+
/// data in buffers
487+
///
488+
/// For example, if the array has three strings views:
489+
/// * View with length = 9 (inlined)
490+
/// * View with length = 32 (non inlined)
491+
/// * View with length = 16 (non inlined)
492+
///
493+
/// Then this method would report 48
494+
pub fn total_buffer_bytes_used(&self) -> usize {
495+
self.views()
496+
.iter()
497+
.map(|v| {
498+
let len = (*v as u32) as usize;
499+
if len > 12 {
500+
len
501+
} else {
502+
0
503+
}
504+
})
505+
.sum()
506+
}
507+
482508
/// Compare two [`GenericByteViewArray`] at index `left_idx` and `right_idx`
483509
///
484510
/// Comparing two ByteView types are non-trivial.

0 commit comments

Comments
 (0)