Skip to content

Conversation

clarfonthey
Copy link
Contributor

@clarfonthey clarfonthey commented Aug 13, 2025

This is basically a small portion of #129411 with a smaller scope. It does not* affect any public APIs; this code is still internal to the standard library. It just moves the WTF-8 code into core and alloc so it can be accessed by no_std crates like backtrace.

* The only public API this affects is by adding a Debug implementation to std::os::windows::ffi::EncodeWide, which was not present before. This is due to the fact that core requires Debug implementations for all types, but std does not (yet) require this. Even though this was ultimately changed to be a wrapper over the original type, not a re-export, I decided to keep the Debug implementation so it remains useful.

Like we do with ordinary strings, the tests are still located entirely in alloc, rather than splitting them into core and alloc.


Reviewer note: for ease of review, this is split into three commits:

  1. Moving the original files into their new "locations"
  2. Actually modifying the code to compile.
  3. Removing aesthetic changes that were made so that the diff for commit 2 was readable.

You can review commits 1 and 3 to verify these claims, but commit 2 contains the majority of the changes you should care about.


API changes: impl Debug for std::os::windows::ffi::EncodeWide

@rustbot
Copy link
Collaborator

rustbot commented Aug 13, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 13, 2025
@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added A-tidy Area: The tidy tool T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Aug 13, 2025
@rust-log-analyzer

This comment has been minimized.

@clarfonthey
Copy link
Contributor Author

Hmm, that appears to be a genuine rustdoc bug: if the module is marked as #[doc(hidden)], the types still show up on traits, which are broken links. Will play around with extra attributes and then file an issue.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@clarfonthey clarfonthey force-pushed the wtf8-core-alloc branch 3 times, most recently from d0674a4 to 3fc7d04 Compare August 13, 2025 23:03
Comment on lines -602 to -627
#[test]
fn wtf8_to_ascii_lowercase() {
let lowercase = Wtf8::from_str("").to_ascii_lowercase();
assert_eq!(lowercase.bytes, b"");

let lowercase = Wtf8::from_str("GrEeN gRaPeS! 🍇").to_ascii_lowercase();
assert_eq!(lowercase.bytes, b"green grapes! \xf0\x9f\x8d\x87");

let lowercase = unsafe { Wtf8::from_bytes_unchecked(b"\xED\xA0\x80").to_ascii_lowercase() };
assert_eq!(lowercase.bytes, b"\xED\xA0\x80");
assert!(!lowercase.is_known_utf8);
}

#[test]
fn wtf8_to_ascii_uppercase() {
let uppercase = Wtf8::from_str("").to_ascii_uppercase();
assert_eq!(uppercase.bytes, b"");

let uppercase = Wtf8::from_str("GrEeN gRaPeS! 🍇").to_ascii_uppercase();
assert_eq!(uppercase.bytes, b"GREEN GRAPES! \xf0\x9f\x8d\x87");

let uppercase = unsafe { Wtf8::from_bytes_unchecked(b"\xED\xA0\x80").to_ascii_uppercase() };
assert_eq!(uppercase.bytes, b"\xED\xA0\x80");
assert!(!uppercase.is_known_utf8);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to get rid of these tests because they didn't add much on top of the make_ascii_*case tests, and because the #[cfg(not(test))] on the incoherent impl Wtf8 block would have required me to make separate standalone functions for these to ensure that we're testing the correct version of Wtf8Buf. Just felt easier to delete them since the tests aren't adding a whole lot.

@@ -0,0 +1 @@
// All `wtf8` tests live in library/alloctests/tests/wtf8.rs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str module has its own version of this so I decided to make one here too.

@@ -181,6 +88,7 @@ impl fmt::Display for Wtf8Buf {
}
}

#[cfg_attr(test, allow(dead_code))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, all methods would be tested, but my goal was to move the code, not improve its test suite. The code originally had allow(dead_code) in the entire module, so, strictly speaking, this is an improvement.

[.., 0xED, b2 @ 0xA0..=0xAF, b3] => Some(decode_surrogate(b2, b3)),
_ => None,
}
#[cfg(not(test))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are due to the unfortunate way that alloctests works when testing private internals. Since we have to include a copy of the module in the tests crate, we end up having two versions of these methods implemented on the same Wtf8 type, but returning Wtf8Buf from two different crates. This is why I decided to leave standalone methods that are emitted in all cases and just wrap them outside of tests.

@@ -1046,21 +572,19 @@ impl Iterator for EncodeWide<'_> {
}
}

impl fmt::Debug for EncodeWide<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("EncodeWide").finish_non_exhaustive()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike Wtf8CodePoints, it's not entirely clear how you'd reconstruct the original string from EncodeWide and include the unpaired surrogate, so, I decided to not bother for now. Someone can write a better debug implementation later.

@clarfonthey clarfonthey force-pushed the wtf8-core-alloc branch 2 times, most recently from 556a689 to 486f9ed Compare August 13, 2025 23:17
@rustbot
Copy link
Collaborator

rustbot commented Aug 21, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@Mark-Simulacrum
Copy link
Member

Even though this was ultimately changed to be a wrapper over the original type, not a re-export, I decided to keep the Debug implementation so it remains useful.

Is this still a public API change? Can we allow the lint if so? I think a public API change technically would need libs-api FCP (for an insta-stable change), even if it's likely to be non-controversial...

@Mark-Simulacrum
Copy link
Member

r=me if this isn't changing public APIs, otherwise we'll need to libs-api nominate and run through FCP

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Aug 21, 2025

I guess it needs a libs-api FCP for the one Debug impl…

The purpose of that lint was to make sure no new types get added to core without it, but libstd never had it, so, yeah.

Specifically, std::os::windows::ffi::EncodeWide now has a Debug impl. I could remove it, but it feels wrong to add it to core and not add it to std too.

@Mark-Simulacrum Mark-Simulacrum added needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. labels Aug 21, 2025
@Mark-Simulacrum
Copy link
Member

(I'd also be happy merging this with an #[allow(...)] for the lint, if you'd prefer. I'm not sure how much rebasing is likely to be needed for this code).

@clarfonthey
Copy link
Contributor Author

To be clear, right now it's not explicitly required since I turned the type into a wrapper instead of a re-export, and I felt it was reasonable to add Debug to the original type anyway. But it feels wrong to not just copy it onto the libstd type as well, and since Debug impls are required for all types in libcore, I figured it'd be uncontroversial to add to libstd too, especially since the impl in question just shows EncodeWide { .. }.

If an FCP is required, we can go with it IMHO. I'd just be making another PR to add it anyway.

@Amanieu
Copy link
Member

Amanieu commented Aug 26, 2025

We already have an accepted FCP for adding a Debug impl to EncodeWide in #140153. The PR hasn't been merged yet because it seems to be stuck on bikeshedding the style.

@Amanieu Amanieu removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Aug 26, 2025
@clarfonthey
Copy link
Contributor Author

From that perspective, I think it's okay to merge this without a "helpful" debug implementation then, and then that PR can decide how to actually do it.

@Mark-Simulacrum Mark-Simulacrum removed the needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. label Aug 26, 2025
@Mark-Simulacrum
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 26, 2025

📌 Commit 7c81a06 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 26, 2025
bors added a commit that referenced this pull request Aug 27, 2025
Rollup of 6 pull requests

Successful merges:

 - #142215 (Use -Zmir-opt-level=0 in tests for MIR building)
 - #143341 (Mention that casting to *const () is a way to roundtrip with from_raw_parts)
 - #145078 (Fix wrong cache line size of riscv64)
 - #145290 (Improve std::fs::read_dir docs)
 - #145335 (Move WTF-8 code from std into core and alloc)
 - #145904 (Move `riscv64-gc-unknown-linux-musl` from Tier 2 with Host tools to Tier 2)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 62e5341 into rust-lang:master Aug 27, 2025
10 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Aug 27, 2025
rust-timer added a commit that referenced this pull request Aug 27, 2025
Rollup merge of #145335 - clarfonthey:wtf8-core-alloc, r=Mark-Simulacrum

Move WTF-8 code from std into core and alloc

This is basically a small portion of #129411 with a smaller scope. It *does not*\* affect any public APIs; this code is still internal to the standard library. It just moves the WTF-8 code into `core` and `alloc` so it can be accessed by `no_std` crates like `backtrace`.

> \* The only public API this affects is by adding a `Debug` implementation to `std::os::windows::ffi::EncodeWide`, which was not present before. This is due to the fact that `core` requires `Debug` implementations for all types, but `std` does not (yet) require this. Even though this was ultimately changed to be a wrapper over the original type, not a re-export, I decided to keep the `Debug` implementation so it remains useful.

Like we do with ordinary strings, the tests are still located entirely in `alloc`, rather than splitting them into `core` and `alloc`.

----

Reviewer note: for ease of review, this is split into three commits:

1. Moving the original files into their new "locations"
2. Actually modifying the code to compile.
3. Removing aesthetic changes that were made so that the diff for commit 2 was readable.

You can review commits 1 and 3 to verify these claims, but commit 2 contains the majority of the changes you should care about.

----

API changes: `impl Debug for std::os::windows::ffi::EncodeWide`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tidy Area: The tidy tool O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants