Skip to content

Fix linkcheck anchor encoding issue #13621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nrdlngr
Copy link

@nrdlngr nrdlngr commented Jun 6, 2025

Fix linkcheck anchor encoding issue (#13620)

Description

This PR fixes an issue where the linkcheck builder incorrectly reports "Anchor not found" errors
for URLs with encoded characters in fragment identifiers (anchors), despite these URLs working
correctly in web browsers.

Current Behavior

When encountering a URL with percent-encoded characters in the anchor/fragment (e.g.,
https://example.com/page#standard-input%2Foutput-stdio), the linkcheck builder:

  1. Extracts the fragment: standard-input%2Foutput-stdio
  2. Decodes it to: standard-input/output-stdio
  3. Searches for an HTML element with id="standard-input/output-stdio" or
    name="standard-input/output-stdio"
  4. Reports a broken link when the element isn't found, even though the URL works in browsers

Changes Made

  • Enhanced AnchorCheckParser to check for multiple variants of the anchor:
    • The decoded version (current behavior)
    • The original encoded version
    • A re-encoded version if the decoded version contains encoding-required characters
  • Added comprehensive tests to verify the new behavior
  • Updated the contains_anchor function to accept both decoded and original encoded anchors
  • Added entry to CHANGES.rst

Testing Done

  • Added unit tests for the AnchorCheckParser class
  • Added integration tests with a mock HTTP server that serves HTML with encoded anchors
  • Verified that all tests pass with the new implementation

Fixes

Fixes #13620

@nrdlngr nrdlngr force-pushed the fix-linkcheck-anchor-encoding branch 3 times, most recently from 98f3a53 to f896df9 Compare June 6, 2025 17:42
- Enhanced AnchorCheckParser to handle multiple anchor variations
- Added comprehensive test coverage for encoded anchors
- Fixed false 'Anchor not found' errors for URLs with encoded characters
- Maintains full backward compatibility
- All linting checks pass
@nrdlngr nrdlngr force-pushed the fix-linkcheck-anchor-encoding branch from f896df9 to 5265662 Compare June 6, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linkcheck Builder: False Positive "Anchor not found" Errors with Encoded Characters in Fragment Identifiers
1 participant