Skip to content

Partition file filtering logic is incorrect for logical not() function #1355

Open
@Nathan-Fenner

Description

@Nathan-Fenner

Apache Iceberg Rust version

0.4.0 (latest version)

Describe the bug

Consider the following example table partition file:

id count
"a" 3
"b" 2
"c" 5
"d" 7
"e" 5

This partition file will have statistics:

statistic column value
lower_bound count 2
upper_bound count 7

If I perform a query with predicate count >= 5, this partition must be included.

If I perform a query with predicate not(count >= 5), this partition must still be included.

The logic in manifest_evaluator.rs is currently 2-values, with options ROWS_MIGHT_MATCH and ROWS_CANNOT_MATCH. But then the only correct behavior for not would be

    fn not(&mut self, inner: bool) -> crate::Result<bool> {
      ROWS_MIGHT_MATCH
    }

The tests need to be expanded in order to catch this logical error, and a 3-value logic with (MUST_MATCH, MIGHT_MATCH, CANNOT_MATCH) values probably needs to be adopted to correctly handle all of these cases.

To Reproduce

No response

Expected behavior

No response

Willingness to contribute

I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions