New issue
Advanced search Search tips

Issue 904017 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Parent encoding (not default) used for XML resources without encoding specification

Project Member Reported by jsb...@chromium.org, Nov 10

Issue description

If an XHTML file loaded into an iframe does not specify an encoding, the parent document's encoding is used. This differs from Firefox, despite protestations in our code that we're matching behavior here and using UTF-8 as the default.

Repro:
1. Serve out attached files
2. load parent.html

Expected: reports "child encoding: UTF-8"
Actual: reports "child encoding: windows-1252"

Works as expected in Firefox

...

In BuildTextResourceDecoderFor() there's a clause 

"Disable autodetection for XML/JSON to honor the default encoding (UTF-8) for unlabelled documents" 

... but then at the bottom of the function, the encoding source is set to TextResourceDecoder::kEncodingFromParentFrame which causes this behavior to be ignored for subresources. This manifests in loading an iframe.

This seems to be the root cause of Firefox/Chrome inconsistency noted in https://github.com/web-platform-tests/wpt/pull/13082

A fix might be as simple as setting `use_hint_encoding` to false for XML (and JSON?) MIME types.
 
parent.html
276 bytes View Download
child.xml
196 bytes View Download
Description: Show this description
FYI, this is captured in third_party/WebKit/LayoutTests/external/wpt/encoding/utf-32-from-win1252.html

This is a testharness.js-based test.
PASS Expect resources/utf-32-big-endian-bom.html to parse as windows-1252
FAIL Expect resources/utf-32-big-endian-bom.xml to parse as UTF-8 assert_equals: expected "UTF-8" but got "windows-1252"
PASS Expect resources/utf-32-big-endian-nobom.html to parse as windows-1252
FAIL Expect resources/utf-32-big-endian-nobom.xml to parse as UTF-8 assert_equals: expected "UTF-8" but got "windows-1252"
PASS Expect resources/utf-32-little-endian-bom.html to parse as UTF-16LE
PASS Expect resources/utf-32-little-endian-bom.xml to parse as UTF-16LE
PASS Expect resources/utf-32-little-endian-nobom.html to parse as windows-1252
FAIL Expect resources/utf-32-little-endian-nobom.xml to parse as UTF-8 assert_equals: expected "UTF-8" but got "windows-1252"
Harness: the test ran to completion.

Passes in Firefox.
Owner: jinsuk...@chromium.org
Status: Assigned (was: Untriaged)
jsbell@ I can take it if you haven't already started. Feel free to take it back if you have.
Owner: jsb...@chromium.org
Looks like you already did. 
Cc: japhet@chromium.org
I uploaded a CL just to see if anything would fail with a trivial fix: 

https://chromium-review.googlesource.com/c/chromium/src/+/1331953

Result: No failures. So we're missing tests for this behavior.

jinsukkim@ - you probably have more context, so you're welcome to take it further. We probably need to find the right Blink>Loader folks to weigh in, too.
Status: Started (was: Assigned)
Project Member

Comment 7 by bugdroid1@chromium.org, Nov 15

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3d1e2e08c8e1118ef9a53d8c97dca95d422411dd

commit 3d1e2e08c8e1118ef9a53d8c97dca95d422411dd
Author: Joshua Bell <jsbell@chromium.org>
Date: Thu Nov 15 20:02:01 2018

Loader: use default encoding not parent encoding for XML/JSON

If unspecified (in headers or content), resources like HTML in iframes
fall back to the parent's encoding. This should not be the case for
XML/JSON which have well-defined defaults.

Bug:  904017 
Change-Id: I42f8950c5c2ef63c98bcec58a1f4b0bd0b190c33
Reviewed-on: https://chromium-review.googlesource.com/c/1331953
Reviewed-by: Nate Chapin <japhet@chromium.org>
Reviewed-by: Jinsuk Kim <jinsukkim@chromium.org>
Commit-Queue: Joshua Bell <jsbell@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608483}
[delete] https://crrev.com/d503ca09233b6a5e72c2816ae4d84c4cff7aa4cd/third_party/WebKit/LayoutTests/external/wpt/encoding/utf-32-from-win1252-expected.txt
[add] https://crrev.com/3d1e2e08c8e1118ef9a53d8c97dca95d422411dd/third_party/WebKit/LayoutTests/fast/encoding/resource-default-encoding.html
[add] https://crrev.com/3d1e2e08c8e1118ef9a53d8c97dca95d422411dd/third_party/WebKit/LayoutTests/fast/encoding/resources/undeclared-encoding.json
[add] https://crrev.com/3d1e2e08c8e1118ef9a53d8c97dca95d422411dd/third_party/WebKit/LayoutTests/fast/encoding/resources/undeclared-encoding.xhtml
[modify] https://crrev.com/3d1e2e08c8e1118ef9a53d8c97dca95d422411dd/third_party/blink/renderer/core/loader/text_resource_decoder_builder.cc

Status: Fixed (was: Started)

Sign in to add a comment