Skip to content

Semgrex uniqueness operation #1490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2025
Merged

Semgrex uniqueness operation #1490

merged 2 commits into from
Jun 6, 2025

Conversation

AngledLuffa
Copy link
Contributor

No description provided.

@stanfordnlp stanfordnlp deleted a comment from strongerfly Jun 6, 2025
@AngledLuffa AngledLuffa merged commit 2d64892 into dev Jun 6, 2025
1 check passed
@AngledLuffa AngledLuffa deleted the semgrex_sort branch June 6, 2025 05:53
Add a UniqPattern which removes duplicates based on the node names given (using the values of those nodes)

Add some unit testing of the uniq search parsing functionality -
SemgrexParser should fail if a requested node is not in the pattern.
Also, uniq should be usable as a node name

Test a couple varieties of this operation

To allow uniq for a ProcessSemgrexRequest, need to decode all sentences from the request first, then turn that into a response.
Flip the order of matching in ProcessSemgrexRequest so that for each pattern, it matches all of the sentences at once.  Allows for operations on the complete batch of matches, such as the new uniq operator

We also refactor the ProcessSemgrexRequest and make the CoreNLPServer use the refactored method as well

Add a test of uniq for ProcessSemgrexRequest as well (it should only produce one result now for two graphs, not two)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant