Skip to content

internal/encoding/yaml: encode YAML anchors as CUE definitions #3987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

OmriSteiner
Copy link
Contributor

Fixes #3818

There's one small issue open to handle. Tests are currently failing with:

--- FAIL: TestUnmarshalErrors (0.00s)
    --- FAIL: TestUnmarshalErrors/test_7:_"a:_&a\n__b:_*a\n" (0.00s)
        decode_test.go:971:   yaml:
            a: &a
              b: *a
        decode_test.go:977: got <nil>; want test.yaml:2: anchor "a" value contains itself; (value #a: {
                b: #a
            }
            a: #a)
--- FAIL: TestDecoderErrors (0.00s)
    --- FAIL: TestDecoderErrors/test_7:_"a:_&a\n__b:_*a\n" (0.00s)
        decode_test.go:986:   yaml:
            a: &a
              b: *a
        decode_test.go:988: got <nil>; want test.yaml:2: anchor "a" value contains itself

I could change the code to match the old behavior of rejecting such YAML documents, since the resulting CUE document is invalid.
However, I think the original reason we error out for such document was that it would lead to endless recursion in the parser. This is no longer the case, so I'm debating whether the correct fix would be to error out, or change the test.

@OmriSteiner OmriSteiner requested a review from cueckoo as a code owner July 7, 2025 21:31
@mvdan
Copy link
Member

mvdan commented Jul 7, 2025

If recursive anchors like that no longer fail, what CUE do they result in?

@OmriSteiner
Copy link
Contributor Author

If recursive anchors like that no longer fail, what CUE do they result in?

You can see in the error above:

#a: {
    b: #a
}
a: #a

Still recursive, so it's invalid. But the encoder doesn't recurse infinitely.

@mvdan
Copy link
Member

mvdan commented Jul 7, 2025

Ah I see. That's fine; I don't imagine these cases will actually show up in practice often. Indeed I think we avoided cycles because of endless recursion.

@OmriSteiner
Copy link
Contributor Author

On the one hand it makes no sense to emit an invalid CUE file.
But on the other hand - perhaps there's a use-case I'm missing here?

@OmriSteiner
Copy link
Contributor Author

Ah I see. That's fine; I don't imagine these cases will actually show up in practice often. Indeed I think we avoided cycles because of endless recursion.

Fine, then I'll edit the test.

return nil, err
}

return d.addAnchorNodes(expr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than buffering these for the entire document and adding them at the very top, it would be better to add them closer to where they are used. if an anchor is declared in the middle of the document, we should place the definition in roughly the same spot, and not at the very start or end of the whole file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this when we create this issue several months ago.

Since CUE definitions are scoped, we can't place the anchor definition exactly at the scope of the original anchor, since there may be aliases outside this scope.

So there are two options to go about this, the simpler would be to lift all anchor definitions to the top-level of the CUE document. But then - is it any different than putting everything at the top / bottom?
The more complex solution would be to find the inner-most scope containing all anchor aliases. But since this code is recursive, doing so without buffering would be quite complex - is it worth it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I don't mean to place them in an inner scope - the top level file scope is fine. I just mean placing them relatively close to their original position - in my example above, near the middle of the file - rather than at the start or end of the whole file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an implementation for this, but it's not really nice.
I can push this as a separate commit if you want to have a look.
Basically you have to add a recursion parameter to differentiate between when you're handling a top-level map as opposed to any other map.
IMO this just makes the code more complicated for little actual benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I pushed the patch. But I think it's unwise to merge it like this.
If you have any suggestions for a cleaner implementation I'm up for it.
But in the current form, I think keeping things simple is more important than being marginally better.

@OmriSteiner OmriSteiner force-pushed the retain-anchors branch 3 times, most recently from 0e46f6f to add7dce Compare July 8, 2025 18:27
@OmriSteiner
Copy link
Contributor Author

I pushed some changes to make the tests pass.
I also found an additional edge case I'll have to fix, so currently working on that.

This commits supports encoding YAML documents such as:
a: &a 3
b: *a

To this CUE document:
 #a: 3
a: #a
b: #a

Fixes cue-lang#3818

Signed-off-by: Omri Steiner <[email protected]>
@OmriSteiner
Copy link
Contributor Author

I pushed some changes to make the tests pass. I also found an additional edge case I'll have to fix, so currently working on that.

Alright, that should be sorted. Had a positioning issue where I accidentally called d.pos for the anchor reference after extracting the node itself.

@OmriSteiner OmriSteiner requested a review from mvdan July 16, 2025 18:41
@mvdan
Copy link
Member

mvdan commented Jul 28, 2025

@OmriSteiner the tests are failing, not sure if you saw that?

@OmriSteiner
Copy link
Contributor Author

@OmriSteiner the tests are failing, not sure if you saw that?

Yes, I did.
As I wrote:

Anyway, I pushed the patch. But I think it's unwise to merge it like this.
If you have any suggestions for a cleaner implementation I'm up for it.
But in the current form, I think keeping things simple is more important than being marginally better.

I only pushed the additional commit for you to look at. If you think this is worth the tradeoff with implementation complexity, then I can fix the tests. Otherwise I think it's better if I remove the last commit here (and then the tests pass).

@mvdan
Copy link
Member

mvdan commented Aug 15, 2025

Ah, I had misunderstood. And apologies for the slowness. Yes, I think your latest commit is a perfectly fine approach. If you can fix the tests, I'll do a final review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

encoding/yaml: decode anchors without expanding them
2 participants