NULs in stream result in replacement characters |
|||||||||
Issue descriptionNoticed when dealing with UTF-32 parsed as UTF-16 (see issue 417850 ) - try out attached utf32-as-16.html The most minimal repro I have is nulls.html which is basically: NUL NUL < NUL h t m l > In Firefox this shows as "<html>" and parses equivalent to: <html> In Chrome this shows as "<�html>" parses equivalent to: <�html> I don't know who is incorrect, so I'll assume it's us. :)
,
Oct 20 2016
,
Oct 23 2017
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 27 2017
,
Dec 27 2017
I believe our behavior here is WAI and spec-compliant, and that we do it this way for reasons of security, to avoid NUL injection attacks that might otherwise bypass pattern-based rejections of specific markup by out-of-browser HTML preprocessors, e.g. forum software or webmail software. Spec: https://html.spec.whatwg.org/#parse-error-unexpected-null-character Josh and/or Jungshik, do you agree with my assessment? If not, please bounce this back to Available. Otherwise, this is probably deserving of a Firefox bug report to encourage consistent and secure behavior across browsers.
,
Dec 29 2017
That is not the right spec to cite. That is an explicitly non-normative description of a diagnostic message intended for HTML validation tools, not browsers. Start at https://html.spec.whatwg.org/#data-state. You end up after tokenization with data tokens for each character: U+000 U+000 < U+000 h t m l > (note: no tag tokens, i.e. the < and > do not create a tag.) I am then pretty sure this ends up being interpreted as content of the <body> element. In https://html.spec.whatwg.org/#parsing-main-inbody, U+0000 is ignored. So Firefox is correct, I am pretty sure.
,
Jan 2 2018
Apologies for the misinformation, and thanks for the correction
,
Jan 2 2018
,
Jan 2 2018
(I think it's a rite of passage for web platform contributors to fall down the wrong side of the HTML-for-validators vs. HTML-for-browsers spec -- and get politely corrected by domenic or annevk -- at least 3 times. We could probably make a checklist of the other rites of passage. Or maybe it's just me flailing in the darkness...)
,
Jan 3
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 7
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by tkent@chromium.org
, Oct 20 2016