Skip to content

[libc++] Avoid duplicate LWGXYZ prefixes in status tables #148874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Jul 15, 2025

When synchronizing the status tables with Github issues, we use the title of the Github issue as the name of the paper in the status table. However, the Github issue titles are prefixed with PXYZ or LWGXYZ (which is useful to quickly find papers), and that is redundant in the context of status tables. This patch ensures that we don't add that redundant PXYZ or LWGXYZ prefix.

As a drive-by, also specify the encoding for opening files explicitly, which fixes issues on Windows.

When synchronizing the status tables with Github issues, we use the
title of the Github issue as the name of the paper in the status table.
However, the Github issue titles are prefixed with PXYZ or LWGXYZ (which
is useful to quickly find papers), and that is redundant in the context
of status tables. This patch ensures that we don't add that redundant
PXYZ or LWGXYZ prefix.

As a drive-by, also specify the encoding for opening files explicitly,
which fixes issues on Windows.
@ldionne ldionne requested a review from a team as a code owner July 15, 2025 15:48
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jul 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2025

@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

Changes

When synchronizing the status tables with Github issues, we use the title of the Github issue as the name of the paper in the status table. However, the Github issue titles are prefixed with PXYZ or LWGXYZ (which is useful to quickly find papers), and that is redundant in the context of status tables. This patch ensures that we don't add that redundant PXYZ or LWGXYZ prefix.

As a drive-by, also specify the encoding for opening files explicitly, which fixes issues on Windows.


Full diff: https://github.com/llvm/llvm-project/pull/148874.diff

1 Files Affected:

  • (modified) libcxx/utils/synchronize_csv_status_files.py (+4-4)
diff --git a/libcxx/utils/synchronize_csv_status_files.py b/libcxx/utils/synchronize_csv_status_files.py
index 3132857434e7a..5dbd734de7fb0 100755
--- a/libcxx/utils/synchronize_csv_status_files.py
+++ b/libcxx/utils/synchronize_csv_status_files.py
@@ -231,7 +231,7 @@ def from_github_issue(issue: Dict):# -> PaperInfo:
 
         return PaperInfo(
             paper_number=paper,
-            paper_name=issue['title'],
+            paper_name=issue['title'].removeprefix(paper + ': '),
             status=PaperStatus.from_github_issue(issue),
             meeting=issue.get('meeting Voted', None),
             first_released_version=None, # TODO
@@ -269,14 +269,14 @@ def merge(paper: PaperInfo, gh: PaperInfo) -> PaperInfo:
 
 def load_csv(file: pathlib.Path) -> List[Tuple]:
     rows = []
-    with open(file, newline='') as f:
+    with open(file, newline='', encoding='utf-8') as f:
         reader = csv.reader(f, delimiter=',')
         for row in reader:
             rows.append(row)
     return rows
 
 def write_csv(output: pathlib.Path, rows: List[Tuple]):
-    with open(output, 'w', newline='') as f:
+    with open(output, 'w', newline='', encoding='utf-8') as f:
         writer = csv.writer(f, quoting=csv.QUOTE_ALL, lineterminator='\n')
         for row in rows:
             writer.writerow(row)
@@ -417,7 +417,7 @@ def main(argv):
     # Load all the Github issues tracking papers from Github.
     if args.load_github_from:
         print(f"Loading all issues from {args.load_github_from}")
-        with open(args.load_github_from, 'r') as f:
+        with open(args.load_github_from, 'r', encoding='utf-8') as f:
             project_info = json.load(f)
     else:
         print("Loading all issues from Github")

Copy link

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r HEAD~1...HEAD libcxx/utils/synchronize_csv_status_files.py
View the diff from darker here.
--- synchronize_csv_status_files.py	2025-07-15 15:41:40.000000 +0000
+++ synchronize_csv_status_files.py	2025-07-15 15:51:34.690656 +0000
@@ -229,11 +229,11 @@
         notes = extract_between_markers(issue_description, 'BEGIN-RST-NOTES', 'END-RST-NOTES')
         notes = notes.strip() if notes is not None else notes
 
         return PaperInfo(
             paper_number=paper,
-            paper_name=issue['title'].removeprefix(paper + ': '),
+            paper_name=issue["title"].removeprefix(paper + ": "),
             status=PaperStatus.from_github_issue(issue),
             meeting=issue.get('meeting Voted', None),
             first_released_version=None, # TODO
             notes=notes,
             original=issue,
@@ -267,19 +267,19 @@
         result = copy.deepcopy(paper)
     return result
 
 def load_csv(file: pathlib.Path) -> List[Tuple]:
     rows = []
-    with open(file, newline='', encoding='utf-8') as f:
-        reader = csv.reader(f, delimiter=',')
+    with open(file, newline="", encoding="utf-8") as f:
+        reader = csv.reader(f, delimiter=",")
         for row in reader:
             rows.append(row)
     return rows
 
 def write_csv(output: pathlib.Path, rows: List[Tuple]):
-    with open(output, 'w', newline='', encoding='utf-8') as f:
-        writer = csv.writer(f, quoting=csv.QUOTE_ALL, lineterminator='\n')
+    with open(output, "w", newline="", encoding="utf-8") as f:
+        writer = csv.writer(f, quoting=csv.QUOTE_ALL, lineterminator="\n")
         for row in rows:
             writer.writerow(row)
 
 def create_github_issue(paper: PaperInfo, labels: List[str]) -> None:
     """
@@ -415,11 +415,11 @@
         return
 
     # Load all the Github issues tracking papers from Github.
     if args.load_github_from:
         print(f"Loading all issues from {args.load_github_from}")
-        with open(args.load_github_from, 'r', encoding='utf-8') as f:
+        with open(args.load_github_from, "r", encoding="utf-8") as f:
             project_info = json.load(f)
     else:
         print("Loading all issues from Github")
         gh_command_line = ['gh', 'project', 'item-list', LIBCXX_CONFORMANCE_PROJECT, '--owner', 'llvm', '--format', 'json', '--limit', '9999999']
         project_info = json.loads(subprocess.check_output(gh_command_line))

@ldionne
Copy link
Member Author

ldionne commented Jul 15, 2025

CC @frederick-vs-ja @H-G-Hristov

I did a few things:

  • I tweaked the script so it wouldn't add spurious prefixes to the status rows
  • I updated several Github issues that had incorrect, outdated or missing BEGIN-RST-NOTES/END-RST-NOTES

The way this works is that the script parses the body of each issue for the BEGIN-RST-NOTES and END-RST-NOTES markers, and then slurps that into the notes for the status pages. This means that we should keep these annotations up-to-date in Github issues as we progress on partially-implemented papers.

For example, I added missing RST notes to #105180 and a few others, and I removed now-obsolete notes from other issues. As of right now, running the script produces no diff at all, which should mean that our status pages and our Github issues are 100% in sync.

If you want to try it out, I suggest you do this to avoid rate limiting by Github (this uses the gh command-line tool):

$ gh project item-list 31 --owner llvm --format json --limit 9999999 > issues.json
$ libcxx/utils/synchronize_csv_status_files.py --load-github-from issues.json

That way, you can run the script multiple times without extracting the project issues list, which won't work if you do it more than 2-3 times in a row (due to rate limiting).

Going forward, I would suggest that we strive to keep the Github issues up-to-date as the primary source of truth and then use the script to synchronize the status pages. But this is just a suggestion, if you folks think there's another approach that would work better, let's talk about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants