New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 833255 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 829868
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , iOS , Chrome , Mac , Fuchsia
Pri: 3
Type: Bug



Sign in to add a comment

net::UnescapeURLComponent decodes invalid UTF-8 sequences

Project Member Reported by mgiuca@chromium.org, Apr 16 2018

Issue description

Chrome Version: 68
OS: All

net::UnescapeURLComponent(%D8%9C%85%D8%9C%85", UnescapeRule::NORMAL)

Expected output:
"%D8%9C%85%D8%9C%85"

Actual output:
"%D8%9C\x85%D8%9C\x85"

This is how you format a URL for display; spoofing, control chars and other "non-displayable" characters should be left percent-encoded. Yet ill-formed UTF-8 sequences (e.g., "%85") are actually decoded by this method. This results in potentially displaying ill-formed UTF-8 byte sequences.

The method should always return a valid UTF-8 string, so it should simply not decode those byte sequences.

EXCEPTION: When called with UnescapeRule::SPOOFING_AND_CONTROL_CHARS, the function has a completely different purpose, which is to decode all escape sequences irrespective of whether they are displayable or even legal. This version should not consider UTF-8 at all, and simply return a byte sequence.

Actually, as pointed out by mmenke in https://crrev.com/c/998014, SPOOFING_AND_CONTROL_CHARS is essentially a different function (it has a different "return type": it returns a byte sequence while the normal invocation returns a text string; it just happens that both of those "types" are std::string in C++). SPOOFING_AND_CONTROL_CHARS should be removed from the UnescapeRule enum, and a separate function should be written for the non-displayable 8-bit-clean version of unescaping.

Prior art (my own!): https://docs.python.org/3/library/urllib.parse.html has both unquote() which returns a string, and unquote_to_bytes() which returns a bytes.
 

Comment 1 by mgiuca@chromium.org, Apr 16 2018

Mergedinto: 829868
Status: Duplicate (was: Assigned)

Sign in to add a comment