New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 796192 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Fetch API: BOM is lost when Request and Response's |body| is retrieved as form data

Project Member Reported by raphael....@intel.com, Dec 19 2017

Issue description

This causes "request.formData() with input: test=" and "response.formData() with input: test=" to fail in LayoutTests/external/wpt/url/urlencoded-parser.html:

    FAIL request.formData() with input: test= assert_array_equals: property 0, expected "test" but got "test"
    FAIL response.formData() with input: test= assert_array_equals: property 0, expected "test" but got "test"

The strings above are misleading, as test's actually creating Request and Response objects like this:

    let init = new Request("about:blank", { body: "\uFEFFtest=\uFEFF", method: "LADIDA", headers: {"Content-Type": "application/x-www-form-urlencoded;charset=windows-1252"} }).formData()

"\uFEFF" is the UTF-16 BOM.

When |body| is being extracted (https://fetch.spec.whatwg.org/#concept-bodyinit-extract), we parse a USVString ("\uFEFFtest=\uFEFF") and correctly encode it into UTF-8 ("\xef\xbb\xbftest=\xef\xbb\xbf").

However, if we later analyze that request or response by calling formData(), json() or text(), we can see that the initial "\uFEFF" is lost and we're left with "test=\uFEFF". https://url.spec.whatwg.org/#urlencoded-parsing says we should just decode the UTF-8 string without BOM, we shouldn't be stripping the BOM from the beginning.

The BOM is being stripped because TextResourceDecoder::Decode() unconditionally decodes content without the BOM if it finds one.
 
Cc: hirosh...@chromium.org
Components: -Blink>HTML>Parser
Summary: Fetch API: BOM is lost when Request and Response's |body| is retrieved as form data (was: Fetch API: BOM is lost when Request and Response's |body| is retrieved as form data, json or text)
Thanks for reporting!

UTF-16 BOM \uFEFF is encoded to \xef\xbb\xbf in UTF-8, and
\xef\xbb\xbf (in raw bytes) is UTF-8 BOM.

Body's text() and json() uses "UTF-8 decode"
https://encoding.spec.whatwg.org/#utf-8-decode
that strips 0xEF 0xBB 0xBF, so the current behavior of text() and json() is spec conformant.

formData() uses, as you mentioned, 
https://url.spec.whatwg.org/#urlencoded-parsing that uses "UTF-8 decode without BOM". Using "without BOM" here is probably not to strip the BOMs in the middle of the body, but this also causes not stripping the BOM at the beginning of the body.
And thus, the current Blink behavior is not spec conformant.

However, I feel it would be more consistent if we can modify the spec so that a UTF-8 BOM at the beginning of a body is also stripped in formData().
Thanks for pointing out that json() and text() are working as expected. I've filed https://github.com/whatwg/fetch/issues/650 to see if there's anything that should be changed in the spec.
It doesn't look like the spec's going to change according to the discussion there, so I guess I should look into changing the Blink implementation.
Cc: domfarolino@gmail.com

Sign in to add a comment