fix(cli): avoid splitting emoji when truncating display strings#28224
fix(cli): avoid splitting emoji when truncating display strings#28224feizhuzheng wants to merge 1 commit into
Conversation
sanitizeForDisplay measured length with String.length and cut with substring, both of which count UTF-16 code units. When maxLength lands inside a surrogate pair (an emoji or other astral character), the pair is split and the leftover lone surrogate renders as a replacement character in notification titles/bodies and command descriptions. Use the cpLen/cpSlice helpers already defined in this file so truncation happens on code point boundaries.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where display strings containing emojis or other astral characters were being incorrectly truncated. By switching from UTF-16 code unit-based string operations to code-point-aware utilities, the implementation now ensures that multi-unit characters remain intact during truncation, improving the visual consistency of terminal outputs and command descriptions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates sanitizeForDisplay in textUtils.ts to use code point length (cpLen) and slicing (cpSlice) instead of UTF-16 code unit operations, preventing surrogate pairs (such as emojis) from being split during truncation. A corresponding unit test was also added. The reviewer identified a potential bug where a maxLength of less than 3 would result in a negative end index for cpSlice, which behaves differently than substring and can cause unexpected truncation behavior. A suggestion was provided to use Math.max(0, maxLength - 3) to prevent this issue.
| if (maxLength && cpLen(sanitized) > maxLength) { | ||
| sanitized = cpSlice(sanitized, 0, maxLength - 3) + '...'; | ||
| } |
There was a problem hiding this comment.
Using cpSlice with maxLength - 3 can result in a negative end index if maxLength is less than 3. Unlike String.prototype.substring (which treats negative indices as 0), cpSlice uses slice under the hood, where a negative index is treated as an offset from the end of the string/array. This causes the function to return a much longer string than intended (e.g., slicing all but the last character) and append '...', leading to a bug.
To fix this, we should ensure the end index is at least 0 by using Math.max(0, maxLength - 3).
| if (maxLength && cpLen(sanitized) > maxLength) { | |
| sanitized = cpSlice(sanitized, 0, maxLength - 3) + '...'; | |
| } | |
| if (maxLength && cpLen(sanitized) > maxLength) { | |
| sanitized = cpSlice(sanitized, 0, Math.max(0, maxLength - 3)) + '...'; | |
| } |
|
📊 PR Size: size/S
|
Summary
sanitizeForDisplaytruncates withstr.lengthandstr.substring, which count UTF-16 code units. WhenmaxLengthfalls inside a surrogate pair (an emoji or other astral character), the pair is split and the leftover lone surrogate renders as a replacement character. This affects terminal notification titles/bodies and slash-command descriptions, which can contain emoji.Details
The same file already exports
cpLenandcpSlice, which operate on code points.sanitizeForDisplaynow uses them, so length checks and truncation happen on code-point boundaries and a trailing emoji is either kept whole or dropped, never cut in half.For a purely ASCII string the behavior is unchanged (
cpLen/cpSliceshort-circuit on the ASCII fast path).How to Validate
Added a unit test in
textUtils.test.tscovering this case (asserts the result contains no lone surrogate) alongside the existing ASCII truncation test.