New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 782565 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

WTF::kEntitiesForUnencodables and related substitutions don't work with ISO-2022-JP

Project Member Reported by bsittler@chromium.org, Nov 8 2017

Issue description

Chrome Version: (copy from chrome://version)
OS: (e.g. Win7, OSX 10.9.5, etc...)

What steps will reproduce the problem?
(1) Open this URL:

data:text/html;charset=UTF-8,<title>ISO-2022-JP</title><style>*{margin:0;padding:0;border:none;width:100%}</style><body onload=document.forms[0].submit()><form action="data:text/html;charset=ISO-2022-JP,<body style='font-family:monospace' onload='document.body.innerText=document.all.data.innerText.substr(7)+String.fromCharCode(10,10)+location.href.split(String.fromCharCode(61)).pop()'><plaintext id=data style='display:none'>&#63;" accept-charset="ISO-2022-JP" target="output"><label for="input">Unicode</label><br><input readonly name=input value="ABC~%C2%A4%E2%80%A2%E2%98%85%E6%98%9F%F0%9F%8C%9F%E6%98%9F%E2%98%85%E2%80%A2%C2%A4~XYZ"><br><label for="output">ISO-2022-JP</label><br><iframe name=output id=output></iframe></form>

a.k.a.

data:text/html;charset=utf-8,%3Chtml%3E%3Chead%3E%3Ctitle%3EISO-2022-JP%3C%2Ftitle%3E%3Cstyle%3E*%7Bmargin%3A0%3Bpadding%3A0%3Bborder%3Anone%3Bwidth%3A100%25%7D%3C%2Fstyle%3E%3C%2Fhead%3E%3Cbody%20onload%3D%22document.forms%5B0%5D.submit()%22%3E%3Cform%20action%3D%22data%3Atext%2Fhtml%3Bcharset%3DISO-2022-JP%2C%26lt%3Bbody%20style%3D'font-family%3Amonospace'%20onload%3D'document.body.innerText%3Ddocument.all.data.innerText.substr(7)%2BString.fromCharCode(10%2C10)%2Blocation.href.split(String.fromCharCode(61)).pop()'%26gt%3B%26lt%3Bplaintext%20id%3Ddata%20style%3D'display%3Anone'%26gt%3B%3F%22%20accept-charset%3D%22ISO-2022-JP%22%20target%3D%22output%22%3E%3Clabel%20for%3D%22input%22%3EUnicode%3C%2Flabel%3E%3Cbr%3E%3Cinput%20readonly%3D%22%22%20name%3D%22input%22%20value%3D%22ABC~%C2%A4%E2%80%A2%E2%98%85%E6%98%9F%F0%9F%8C%9F%E6%98%9F%E2%98%85%E2%80%A2%C2%A4~XYZ%22%3E%3Cbr%3E%3Clabel%20for%3D%22output%22%3EISO-2022-JP%3C%2Flabel%3E%3Cbr%3E%3Ciframe%20name%3D%22output%22%20id%3D%22output%22%3E%3C%2Fiframe%3E%3C%2Fform%3E%3C%2Fbody%3E%3C%2Fhtml%3E

What is the expected result?

ISO-2022-JP shift state reset before numeric character reference insertion, causing ASCII-compatible interpretation.


Unicode

ABC~Β€β€’β˜…ζ˜ŸπŸŒŸζ˜Ÿβ˜…β€’Β€~XYZ

ISO-2022-JP

ABC~&#164;&#8226;β˜…ζ˜Ÿ&#127775;ζ˜Ÿβ˜…&#8226;&#164;~XYZ

ABC%7E%26%23164%3B%26%238226%3B%1B%24B%21z%401%1B%28B%26%23127775%3B%1B%24B%401%21z%1B%28B%26%238226%3B%26%23164%3B%7EXYZ


What happens instead?

ISO-2022-JP shift state not reset before numeric character reference insertion, causing ASCII-incompatible interpretation and causing misinterpretation of the remainder of the string (at least until the next shift or reset).


Unicode

ABC~Β€β€’β˜…ζ˜ŸπŸŒŸζ˜Ÿβ˜…β€’Β€~XYZ

ISO-2022-JP

ABC~&#164;&#8226;β˜…ζ˜ŸΞ“ζΈ¦η₯ε¦ι…Έι™’ζƒ•οΌ˜θ‡†θƒΈΞ“θ”šζŸ‘~XYZ

ABC%7E%26%23164%3B%26%238226%3B%1B%24B%21z%401%26%23127775%3B%401%21z%26%238226%3B%26%23164%3B%1B%28B%7EXYZ


Note that WTF::kEntitiesForUnencodables is not alone in having this bug - other WTF::*ForUnencodables replacement modes have this problem, too.

Please use labels and text to provide additional information.


For graphics-related bugs, please copy/paste the contents of the about:gpu
page at the end of this report.

 
Description: Show this description
BTW Firefox gets this one right:

Unicode

ABC~Β€β€’β˜…ζ˜ŸπŸŒŸζ˜Ÿβ˜…β€’Β€~XYZ

ISO-2022-JP

ABC~&#164;&#8226;β˜…ζ˜Ÿ&#127775;ζ˜Ÿβ˜…&#8226;&#164;~XYZ

ABC%7E%26%23164%3B%26%238226%3B%1B%24B%21z%401%1B%28B%26%23127775%3B%1B%24B%401%21z%1B%28B%26%238226%3B%26%23164%3B%7EXYZ

Safari also gets this right, and produces exactly the expected output.
Project Member

Comment 4 by bugdroid1@chromium.org, Nov 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3d17f1f120dc22992fd938afcb7a3cd7edd6aae0

commit 3d17f1f120dc22992fd938afcb7a3cd7edd6aae0
Author: Benjamin C. Wiley Sittler <bsittler@chromium.org>
Date: Fri Nov 10 02:23:31 2017

TextCodecICU: respect encoder state when replacing unencodables

This also fixes the IgnorableCodePoint test to use inputs which have
nontrivial ISO-2022-JP shift states; previously this test did not
actually ensure correct behavior when non-ASCII characters
representable in ISO-2022-JP and non-ASCII characters not
representable in ISO-2022-JP occur in sequence.

This also adds WPT coverage.

Prior to this fix, encoder state was not respected, leading to
incorrect interpretation of the replacements and sometimes following
bytes too, depending on whether the replacement lengths were even or
odd, and on whether the active state of the ISO-2022-JP G0 character
set was one-byte or two-byte. An example, with results transcribed in
Unicode for readability:

Input: ABC~Β€β€’β˜…ζ˜ŸπŸŒŸζ˜Ÿβ˜…β€’Β€~XYZ

Old output:
Bytes: ABC~&#164;&#8226;␛$B!z@1&#127775;@1!z&#8226;&#164;␛(B~XYZ
Meaning: ABC~&#164;&#8226;β˜…ζ˜ŸΞ“ζΈ¦η₯ε¦ι…Έι™’ζƒ•οΌ˜θ‡†θƒΈΞ“θ”šζŸ‘~XYZ

New output:
Bytes: ABC~&#164;&#8226;␛$B!z@1␛(B&#127775;␛$B@1!z␛(B&#8226;&#164;~XYZ
Meaning: ABC~&#164;&#8226;β˜…ζ˜Ÿ&#127775;ζ˜Ÿβ˜…&#8226;&#164;~XYZ

Bug:  782565 
Change-Id: If2a7b76b99ce77cbec433af5384ed5c4d2e3c581
Reviewed-on: https://chromium-review.googlesource.com/758405
Commit-Queue: Benjamin Wiley Sittler <bsittler@chromium.org>
Reviewed-by: Jungshik Shin <jungshik@google.com>
Reviewed-by: Joshua Bell <jsbell@chromium.org>
Reviewed-by: Emil A Eklund <eae@chromium.org>
Cr-Commit-Position: refs/heads/master@{#515425}
[add] https://crrev.com/3d17f1f120dc22992fd938afcb7a3cd7edd6aae0/third_party/WebKit/LayoutTests/external/wpt/encoding/legacy-mb-japanese/iso-2022-jp/iso2022jp-encode-form-errors-stateful.html
[modify] https://crrev.com/3d17f1f120dc22992fd938afcb7a3cd7edd6aae0/third_party/WebKit/Source/platform/wtf/text/TextCodecICU.cpp
[modify] https://crrev.com/3d17f1f120dc22992fd938afcb7a3cd7edd6aae0/third_party/WebKit/Source/platform/wtf/text/TextCodecICUTest.cpp

Comment 5 by pwnall@chromium.org, Nov 10 2017

Owner: bsittler@chromium.org
Status: Fixed (was: Untriaged)
bsittler@: It seems like you landed a CL that fixes this. Please reopen if I guessed incorrectly.
Correct! Thanks for closing, I had intended to today if the fix was not reverted (apparently it's not!)

Sign in to add a comment