Skip to content

[coalesce] Implement specialized BatchCoalescer::push_batch for StringArray #7764

Open
@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The BatchCoalescer 's api push_batch incrementally builds up an array and produces a final output

The specialized implementations can go quite a bit faster (30-50% depending)

Describe the solution you'd like
Improved performance, as measured by benchmarks for the data type named above

cargo bench --bench coalesce_kernels

Describe alternatives you've considered

For StringArray and BinaryArray the tricky part here will be to avoid copying the data strings as much as possible (by pre-allocating buffer space for example, and postponing the copies appropriately until the required space is known

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions