Skip to content

Conversation

DHowett
Copy link
Member

@DHowett DHowett commented Sep 9, 2025

No description provided.

@DHowett
Copy link
Member Author

DHowett commented Sep 9, 2025

@lhecker i have no idea how to validate this

@DHowett
Copy link
Member Author

DHowett commented Sep 9, 2025

From the Unicode 17.0.0 release

UTC 181 approved a significant change to the linebreaking algorithm that introduces a new Line_Break character property value, Unambiguous_Hyphen.
U+034F COMBINING GRAPHEME JOINER (CGJ) is not frequently used but is essential for certain situations, including in German and in Biblical Hebrew text. Although CGJ was first added to Unicode 3.2 in 2002, it has been difficult to specify stable character properties and segmentation rules for it. An analysis of the issues has now been done. A detailed history of how the handling of this character in Unicode’s specifications has evolved over the years has been added to UAX #14. See Section 6.3 of L2/24-224 for details.

That last section is titled,

UAX #14 CGJ should not break a combining character sequence

Do we need to worry about CGJ?

@DHowett
Copy link
Member Author

DHowett commented Sep 10, 2025

Well @lhecker the breaking rules have changed enough that even regenerating the tests from unicode 17 results in hundreds of failures :)

@DHowett
Copy link
Member Author

DHowett commented Sep 10, 2025

First off, they added tests for grapheme breaking around NULL (0x0)

…n-comment-out some of the illegal forms Leonard mentioned
@DHowett
Copy link
Member Author

DHowett commented Sep 10, 2025

They introduced a bunch of weird phrases like ExtendmConjunctLinkermConjunctExtender to this version of the break test table

Copy link

@check-spelling-bot Report

🔴 Please review

See the 📂 files view, the 📜action log, or 📝 job summary for details.

Unrecognized words (16)
ASAT
BISAH
CANDRA
COENG
Consonantm
Extenderm
Extendm
Linkerm
OKARA
REPA
SAMYOK
SANNYA
SHADDA
SUKU
TEDUNG
XXm
These words are not needed and should be removed CANDRABINDU Ccc ESFCIB foob fuzzyfinder lstrcmpi oob REPH

To accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands

... in a clone of the [email protected]:microsoft/terminal.git repository
on the dev/duhowett/unicode-17 branch (ℹ️ how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/v0.0.25/apply.pl' |
perl - 'https://github.com/microsoft/terminal/actions/runs/17599128086/attempts/1' &&
git commit -m 'Update check-spelling metadata'
Forbidden patterns 🙅 (1)

In order to address this, you could change the content to not match the forbidden patterns (comments before forbidden patterns may help explain why they're forbidden), add patterns for acceptable instances, or adjust the forbidden patterns themselves.

These forbidden patterns matched content:

In English, duplicated words are generally mistakes

There are a few exceptions (e.g. "that that").
If the highlighted doubled word pair is in:

  • code, write a pattern to mask it.
  • prose, have someone read the English before you dismiss this error.
\s([A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})\s\g{-1}\s
Errors and Warnings ❌ (2)

See the 📂 files view, the 📜action log, or 📝 job summary for details.

❌ Errors and Warnings Count
❌ forbidden-pattern 6
⚠️ ignored-expect-variant 1

See ❌ Event descriptions for more information.

✏️ Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

If the listed items are:

  • ... misspelled, then please correct them instead of using the command.
  • ... names, please add them to .github/actions/spelling/allow/names.txt.
  • ... APIs, you can add them to a file in .github/actions/spelling/allow/.
  • ... just things you're using, please add them to an appropriate file in .github/actions/spelling/expect/.
  • ... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spelling/patterns/.

See the README.md in each directory for more information.

🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉

If the flagged items are 🤯 false positives

If items relate to a ...

  • binary file (or some other file you wouldn't want to check at all).

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant