CommandScreen: handle multi-byte UTF-8 code sequences while wrapping #1726

ryenus · 2025-06-22T14:22:58Z

In earlier implementation of the command screen, the process command was treated as a sequence of single-byte characters. When there are wide chars, especially UTF-8 byte sequences in the command, it's possible to have part of the multi-byte sequence wrapped to next line.

To fix that we now leverage mbrtowc and wcwidth to know the byte count and character width for non-ASCII characters, so that we can now wrap near window edge and more importantly at character boundaries.

@BenBE Would you mind having a look at this when you get a chance?

size line buffer for wide chars w/ arbitrary marks This ensures sufficient space to handle UTF-8 code sequences, especially unicode characters with arbitrary combining marks or diacritics, that might otherwise cause buffer overflow errors.

Explorer09 · 2025-06-22T15:43:59Z

I have been trying to make a few PRs (pull requests) that could bring Unicode character width support in htop. So you are not the first one that proposed the idea.

There are many places in htop codebase that need to be upgraded for Unicode character width support. So your width calculation function would be better not limited to CommandScreen.c use.

To avoid duplicate effort, maybe you could look at my PR #1642 to see what you can do with the width calculation.

Explorer09 · 2025-06-22T15:47:28Z

CommandScreen.c

-      int width = wcwidth(wc);
-      if (bytes == (size_t)-1 || bytes == (size_t)-2 || bytes == 0 || width < 0) {
+      unsigned char c = (unsigned char)p[i];
+      if (c < 0x80) { // skip mbrtowc for ASCII characters


This lacks checking or assertion on control characters.

Could you please clarify what the expected behavior is? I’m asking because, as far as I know, it has always worked this way, even in the main screen. For now, I’d prefer to focus on resolving the line wrapping issue first—one step at a time. 😊

Code point range [0x00, 0x1F] and 0x7F are unprintable. You shouldn't assume the width would be 1 here.

CommandScreen.c

Explorer09 · 2025-06-23T12:37:51Z

CommandScreen.c

+            last_spc_offset = line_offset;
+            last_spc_cols = line_cols;
+         }
+         bytes = width = 1;


Code style issue. Don't do multiple assignments like this, especially when the types of two variables are different (size_t and int).

Write the two assignments as separate lines.

ryenus added 3 commits June 20, 2025 13:51

CommandScreen: wrap at CJK character boundaries

2514b54

skip UTF-8 codepoint decoding for ASCII characters

d57bd98

ryenus changed the title ~~Wrap cmd wide~~ CommandScreen: handle multi-byte UTF-8 code sequences while wrapping Jun 22, 2025

ryenus force-pushed the wrap-cmd-wide branch 2 times, most recently from ebbfd75 to fe82f5a Compare June 22, 2025 14:34

add missing include and ifdef guard for CRT_utf8

5529d2b

ryenus force-pushed the wrap-cmd-wide branch 2 times, most recently from cf02101 to 4f202bb Compare June 22, 2025 15:26

Explorer09 reviewed Jun 22, 2025

View reviewed changes

CommandScreen.c Outdated Show resolved Hide resolved

check mbrtowc before char width calculation

8dc7e12

ryenus force-pushed the wrap-cmd-wide branch from 4f202bb to 8dc7e12 Compare June 23, 2025 12:04

Explorer09 reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CommandScreen: handle multi-byte UTF-8 code sequences while wrapping #1726

CommandScreen: handle multi-byte UTF-8 code sequences while wrapping #1726

ryenus commented Jun 22, 2025 •

edited

Loading

Uh oh!

Explorer09 commented Jun 22, 2025

Uh oh!

Explorer09 Jun 22, 2025

Uh oh!

ryenus Jun 23, 2025

Uh oh!

Explorer09 Jun 23, 2025

Uh oh!

Uh oh!

Explorer09 Jun 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

CommandScreen: handle multi-byte UTF-8 code sequences while wrapping #1726

Are you sure you want to change the base?

CommandScreen: handle multi-byte UTF-8 code sequences while wrapping #1726

Conversation

ryenus commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Explorer09 commented Jun 22, 2025

Uh oh!

Explorer09 Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

ryenus Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Explorer09 Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Explorer09 Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryenus commented Jun 22, 2025 •

edited

Loading

Explorer09 Jun 23, 2025 •

edited

Loading