-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
Found while reviewing #126597 for the nth and final time. Many parts of the emitter look fishy to me wrt. Unicode handling and to be honest it was quite frustrating to review code related to it because the emitter doesn't make any attempt to newtype the different units used, namely byte lengths/offsets, char / Unicode scalar lengths/offsets and Unicode widths (I hope rust-lang/annotate-snippets-rs will remedy that). That's just an aside.
The string offset acrobatics performed in HumanEmitter::render_source_line
and HumanEmitter::draw_line
look incredibly suspicious to me. Let me just link some parts where we likely incorrectly reinterpret different units (byte len, char count, Unicode width):
rust/compiler/rustc_errors/src/emitter.rs
Lines 733 to 736 in c22887b
let left = margin.left(source_string.len()); | |
// Account for unicode characters of width !=0 that were removed. | |
let left = source_string.chars().take(left).map(|ch| char_width(ch)).sum(); |
rust/compiler/rustc_errors/src/emitter.rs
Lines 663 to 681 in c22887b
let line_len = source_string.len(); | |
// Create the source line we will highlight. | |
let left = margin.left(line_len); | |
let right = margin.right(line_len); | |
// On long lines, we strip the source line, accounting for unicode. | |
let mut taken = 0; | |
let code: String = source_string | |
.chars() | |
.skip(left) | |
.take_while(|ch| { | |
// Make sure that the trimming on the right will fall within the terminal width. | |
let next = char_width(*ch); | |
if taken + next > right - left { | |
return false; | |
} | |
taken += next; | |
true | |
}) | |
.collect(); |
I gave up trying to make sense of this – having to look at all the weakly typed variables and fields of type usize
.
However, based on these functions I crafted a pathological input file where it's clear something is amiss.
Example Reproducer
/*这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。这是宽的。*/?
Compiler Output
Clearly butchered:
error: expected item, found `?`
--> col.rs:1:170
|
1.|....
|^ expected item
|
= note: for a full list of items that can appear in modules, see <https://doc.rust-lang.org/reference/items.html>
error: aborting due to 1 previous error
Counterexample
Compare this to ASCII-only input:
/*aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa*/?
Compiler Output
Perfectly sensible:
error: expected item, found `?`
--> col_ascii.rs:1:335
|
1 | ...aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa*/?
| ^ expected item
|
= note: for a full list of items that can appear in modules, see <https://doc.rust-lang.org/reference/items.html>
error: aborting due to 1 previous error