New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 655801 link

Starred by 3 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

NULs in stream result in replacement characters

Project Member Reported by jsb...@chromium.org, Oct 13 2016

Issue description

Noticed when dealing with UTF-32 parsed as UTF-16 (see  issue 417850 ) - try out attached utf32-as-16.html

The most minimal repro I have is nulls.html which is basically:

NUL NUL < NUL h t m l >

In Firefox this shows as "<html>" and parses equivalent to:

&lt;html&gt;

In Chrome this shows as "<�html>" parses equivalent to:

&lt;�html&gt;

I don't know who is incorrect, so I'll assume it's us. :)

 
nulls.html
10 bytes View Download
utf32-as-16.html
3.3 KB View Download

Comment 1 by tkent@chromium.org, Oct 20 2016

Status: Available (was: Untriaged)

Comment 2 by tkent@chromium.org, Oct 20 2016

Labels: Hotlist-Interop
Project Member

Comment 3 by sheriffbot@chromium.org, Oct 23 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 4 by kochi@chromium.org, Dec 27 2017

Status: Available (was: Untriaged)
Status: WontFix (was: Available)
I believe our behavior here is WAI and spec-compliant, and that we do it this way for reasons of security, to avoid NUL injection attacks that might otherwise bypass pattern-based rejections of specific markup by out-of-browser HTML preprocessors, e.g. forum software or webmail software.

Spec:

https://html.spec.whatwg.org/#parse-error-unexpected-null-character

Josh and/or Jungshik, do you agree with my assessment? If not, please bounce this back to Available. Otherwise, this is probably deserving of a Firefox bug report to encourage consistent and secure behavior across browsers.

Status: Available (was: WontFix)
That is not the right spec to cite. That is an explicitly non-normative description of a diagnostic message intended for HTML validation tools, not browsers.

Start at https://html.spec.whatwg.org/#data-state. You end up after tokenization with data tokens for each character:

U+000 U+000 < U+000 h t m l >

(note: no tag tokens, i.e. the < and > do not create a tag.)

I am then pretty sure this ends up being interpreted as content of the <body> element. In https://html.spec.whatwg.org/#parsing-main-inbody, U+0000 is ignored. So Firefox is correct, I am pretty sure.
Apologies for the misinformation, and thanks for the correction
Cc: bsittler@chromium.org
(I think it's a rite of passage for web platform contributors to fall down the wrong side of the HTML-for-validators vs. HTML-for-browsers spec -- and get politely corrected by domenic or annevk -- at least 3 times. We could probably make a checklist of the other rites of passage. Or maybe it's just me flailing in the darkness...)
Project Member

Comment 10 by sheriffbot@chromium.org, Jan 3

Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Recharge-Cold
Status: Available (was: Untriaged)

Sign in to add a comment