-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Optimize fixed-length regex patterns with both leading and trailing anchors #120886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ling anchors Co-authored-by: stephentoub <[email protected]>
…present Co-authored-by: stephentoub <[email protected]>
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
trailingAnchor is RegexNodeKind.End or RegexNodeKind.EndZ && | ||
maxLength == rm.Tree.FindOptimizations.MinRequiredLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot, if trailingAnchor is RegexNodeKind.EndZ, will maxLength ever equal MinRequiredLength? Seems like "no" because EndZ has to allow for an optional \n
at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot, please reorder the checks to minimize cost, e.g. this can be root.ComputeMaxLength() == rm.Tree.FindOptimizations.MinRequiredLength
rather than computing the max earlier, so that we only compute it once we've verified other data we already have (like the kind of anchor in use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. I've updated the code to only apply this optimization for RegexNodeKind.End
(the \z
anchor), excluding RegexNodeKind.EndZ
(the $
anchor) since EndZ allows an optional \n
at the end. Fixed in commit 8e7b2d9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've reordered the checks so that ComputeMaxLength()
is only called after verifying !rtl
and trailingAnchor == RegexNodeKind.End
, minimizing the cost when those conditions aren't met. Fixed in commit 8e7b2d9.
Assert.Contains("base.pattern = \"\\n\";", actual); | ||
} | ||
|
||
[Fact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot, we don't need this [Fact] test. Instead, just make sure that there's at least one pattern in Regex.Match.Tests that will exercise this optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've removed the [Fact] test and added test cases to Match_MemberData_Cases
for patterns like ^1234\z
and ^abc\z
with various inputs to exercise the optimization. Fixed in commit 8e7b2d9.
Co-authored-by: stephentoub <[email protected]>
Summary
Successfully implemented and refined the optimization for regex patterns with both leading and trailing anchors and fixed length. The optimization now correctly:
^1234\z
Original prompt
Fixes #118489
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.