Skip to content

Conversation

rkhachatryan
Copy link
Contributor

@rkhachatryan rkhachatryan commented Sep 30, 2025

Introduce OrderedMultiSetState and 3 implementations (map, value, adaptive) to be used in SInkUpsertMaterializerV2.

Test coverage is currently provided on the operator level (#27070).

I'm planning to add lower-level unit tests later to this PR.

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 30, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@rkhachatryan rkhachatryan marked this pull request as ready for review October 1, 2025 08:40
@rkhachatryan rkhachatryan requested a review from pnowojski October 1, 2025 08:40

/**
* Remove the given element. If there are multiple instances of the same element, remove the
* first one in insertion order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious : should we allow the user to choose LIFO or FIFO for the remove ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing the potential usages of this data structure (listed in the FLIP document) I couldn't find any that would require removal of the last element.

* An element was removed, it was not the most recently added, there are more elements. The
* result will not contain any elements
*/
REMOVED_OTHER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why nothing is returned in this case, this seems inconsistent with REMOVED_LAST_ADDED which will return the element added before it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no use-cases requiring the element removed "from the middle" (or from the beginning) of the data structure.
I think generalizing the contract here to something like "return a row always except for NOTHING_REMOVED" might make it actually more confusing because the semantics is different in different cases: return the removed element in case of ALL_REMOVED, return the new last element in case of REMOVED_LAST_ADDED.

SizeChangeInfo append(T element, long timestamp) throws Exception;

/** Get iterator over all remaining elements and their timestamps, in order of insertion. */
Iterator<Tuple2<T, Long>> iterator() throws Exception;
Copy link
Contributor

@davidradl davidradl Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the multi set changes (i.e. there is a removal) under the iterator what will happen? AI unit test for this would be good. It would be useful to understand any locking that has been considered or is in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the implementation:

  • in case of LinkedMultiSetState, the iteration is over a copy of the state (so change has no impact)
  • in case of ValueStateMultiSetState, it uses ArrayList.iterator(), which is fail-fast

But since the client code should not make any assumptions about the implementation, I don't think this should be part of the contract.

Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(partial review, I haven't yet reviewed linked variant and tests).

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Oct 3, 2025
Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, as discussed offline, I've shared with you some pointers to still potentially missing test coverage.

Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. LGTM % green build & assuming comments from @davidradl are also resolved

@rkhachatryan
Copy link
Contributor Author

Thanks a lot for the reviews!
I'll squash the commits, rebase, and merge the PR unless there are any objections

@rkhachatryan rkhachatryan merged commit 83db8f7 into apache:master Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants