New issue
Advanced search Search tips

Issue 757476 link

Starred by 0 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

libdot: lib.wc.substr incorrectly splits surrogate pairs

Project Member Reported by vapier@chromium.org, Aug 21 2017

Issue description

since JS uses UTF-16 for string encoding, it uses surrogate pairs to encode high elements, and native string operations count code units.  since lib.wc.substr is currently written based on charCodeAt, we end up incorrectly splitting surrogate pairs.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt

example:
U+2099d 𠦝 lib.wc.substr('\u{2099d}', 0, 1) -> "�"
U+1f60e 😎 lib.wc.substr('\u{1f60e}', 0, 1) -> "�"

ideally lib.wc.substr should be operating on graphemes, but that's a bit more work (and tracked elsewhere), so we should be able to fix up this one func in the meantime.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Aug 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/apps/libapps/+/02dbf77169360ddf936d0050b3d63227db468b03

commit 02dbf77169360ddf936d0050b3d63227db468b03
Author: Mike Frysinger <vapier@chromium.org>
Date: Mon Aug 21 23:22:38 2017

libdot: wc: fix substring handling of surrogate pairs

The lib.wc.strWidth function was operating on codeunits (charCodeAt)
instead of codepoints (codePointAt) causing it to incorrectly count
and split surrogate pairs.  This doesn't normally matter as JS uses
UTF-16 for strings which means the two are the same for codepoints
in U+FFFF and lower (which are the most common), but when processing
codepoints higher than that (like U+2099d 𠦝 or U+1f60e 😎), we would
chop off a following character one too earlier.

Rewrite the function to operate on codepoints like we already do in
lib.wc.strWidth.

BUG= chromium:757476 

Change-Id: I5891f56bcf1b26bc5639ee88c25de8560c1fd0cf
Reviewed-on: https://chromium-review.googlesource.com/624195
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Brandon Gilmore <varz@google.com>

[modify] https://crrev.com/02dbf77169360ddf936d0050b3d63227db468b03/libdot/third_party/wcwidth/lib_wc.js
[modify] https://crrev.com/02dbf77169360ddf936d0050b3d63227db468b03/libdot/third_party/wcwidth/lib_wc_tests.js

Comment 2 by vapier@chromium.org, Aug 21 2017

Status: Fixed (was: Untriaged)
this will be fixed in hterm-1.71+ and nassh-0.8.36.12+

Sign in to add a comment