Skip to content

Conversation

wrobson-js
Copy link

This PR exposes a way to get the comments when parsing things, using the existing function Lexer.comments. I renamed it to try and make it clearer that this was mutable state that gets reset every time you parse things.

Testing: have tested in the context of the internal feature I'm writing. I'm relying on CI for testing stability across versions.

@wrobson-js wrobson-js force-pushed the expose-comments-from-lexer-in-parse branch from cb467c1 to b0aa304 Compare August 5, 2025 20:17
@wrobson-js
Copy link
Author

Not sure why one of the CI builds failed but seems unrelated?

@patricoferris
Copy link
Collaborator

Hi @wrobson-js,

Thank you for the PR. I was wondering, can we have a little more information about why this feature would be useful?

@wrobson-js
Copy link
Author

wrobson-js commented Aug 6, 2025

The use case is an extender that optionally can use Driver.register_correction to automatically sort its payload alphabetically (which looks like a ;-separated list with very constrained syntax making doing this feasible). The current implementation drops comments as a side effect of this process but we'd rather not do that. So the plan is that if we detect the input is unsorted in the AST we re-read the input file, pull out the raw input, split by semicolon and emit the correction using that[1].

However naively doing that breaks because comments could themselves contain semicolons, so we need to account for that. Because ocaml comments are a little bit complicated with the nesting/nested string literal rules to do this correctly in all cases requires kind of complicated rules. The compiler libs already expose a mechanism to do this (this one) but our build rules around ppxes specifically block depending on internal compiler libs, so we need to instead expose this through ppxlib[2].

My thought was this shouldn't be overly controversial given ppxlib already exposes these parse functions? The use case is a little niche, but it's not infeasible other ppxes would want to use it for similar reasons. I can get not wanting to expose a way for people to assign semantic meanings to comments but this is more about not dropping them when registering a correction. I'm not actually using the comment values, just the range of source indices from the locations to know which semicolons to ignore when splitting.

Let me know if you want more details.

[1]: re-reading the input file is definitely cursed (but also not the first PPX at JS that works this way to get around the abstractness of the AST) but we think retaining comments in this case make it a necessary evil vs throwing them away
[2]: or use a more naive approach to detecting where comments are that'll sometimes break (I'll probably do this if we can't get this merged) or basically reimplement the lexer rules for comments and string literals in a brittle way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants