Expose Lexer.comments in the Parse module #590

wrobson-js · 2025-08-05T20:17:05Z

This PR exposes a way to get the comments when parsing things, using the existing function Lexer.comments. I renamed it to try and make it clearer that this was mutable state that gets reset every time you parse things.

Testing: have tested in the context of the internal feature I'm writing. I'm relying on CI for testing stability across versions.

Signed-off-by: Will Robson <[email protected]>

wrobson-js · 2025-08-05T20:39:04Z

Not sure why one of the CI builds failed but seems unrelated?

patricoferris · 2025-08-06T08:20:22Z

Hi @wrobson-js,

Thank you for the PR. I was wondering, can we have a little more information about why this feature would be useful?

wrobson-js · 2025-08-06T11:23:04Z

The use case is an extender that optionally can use Driver.register_correction to automatically sort its payload alphabetically (which looks like a ;-separated list with very constrained syntax making doing this feasible). The current implementation drops comments as a side effect of this process but we'd rather not do that. So the plan is that if we detect the input is unsorted in the AST we re-read the input file, pull out the raw input, split by semicolon and emit the correction using that[1].

However naively doing that breaks because comments could themselves contain semicolons, so we need to account for that. Because ocaml comments are a little bit complicated with the nesting/nested string literal rules to do this correctly in all cases requires kind of complicated rules. The compiler libs already expose a mechanism to do this (this one) but our build rules around ppxes specifically block depending on internal compiler libs, so we need to instead expose this through ppxlib[2].

My thought was this shouldn't be overly controversial given ppxlib already exposes these parse functions? The use case is a little niche, but it's not infeasible other ppxes would want to use it for similar reasons. I can get not wanting to expose a way for people to assign semantic meanings to comments but this is more about not dropping them when registering a correction. I'm not actually using the comment values, just the range of source indices from the locations to know which semicolons to ignore when splitting.

Let me know if you want more details.

[1]: re-reading the input file is definitely cursed (but also not the first PPX at JS that works this way to get around the abstractness of the AST) but we think retaining comments in this case make it a necessary evil vs throwing them away
[2]: or use a more naive approach to detecting where comments are that'll sometimes break (I'll probably do this if we can't get this merged) or basically reimplement the lexer rules for comments and string literals in a brittle way

wrobson-js requested review from ceastlund, NathanReb, panglesd, patricoferris and pitag-ha as code owners August 5, 2025 20:17

Expose Lexer.comments in the Parse module

b0aa304

Signed-off-by: Will Robson <[email protected]>

wrobson-js force-pushed the expose-comments-from-lexer-in-parse branch from cb467c1 to b0aa304 Compare August 5, 2025 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose Lexer.comments in the Parse module #590

Expose Lexer.comments in the Parse module #590

Uh oh!

wrobson-js commented Aug 5, 2025

Uh oh!

wrobson-js commented Aug 5, 2025

Uh oh!

patricoferris commented Aug 6, 2025

Uh oh!

wrobson-js commented Aug 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Expose Lexer.comments in the Parse module #590

Are you sure you want to change the base?

Expose Lexer.comments in the Parse module #590

Uh oh!

Conversation

wrobson-js commented Aug 5, 2025

Uh oh!

wrobson-js commented Aug 5, 2025

Uh oh!

patricoferris commented Aug 6, 2025

Uh oh!

wrobson-js commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wrobson-js commented Aug 6, 2025 •

edited

Loading