New issue
Advanced search Search tips

Issue 839945 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-05-10
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 1
Type: Bug

Blocking:
issue 268640



Sign in to add a comment

CORB: confirmation sniffing is confused by html/js polyglots using <!-- --> <script>

Project Member Reported by lukasza@chromium.org, May 4 2018

Issue description

REPRO STEPS:
1. Visit http://www.pythonforbeginners.com/files/reading-and-writing-files-in-python
2. Observe some CORB message in the DevTools console.  Among them is a message about blocking https://clarium.global.ssl.fastly.net/?wrapper=bbdvOAJnqH-Idffgn_02C2Cyx_E&tpid=YmJkdk9BSm5xSC1JZGZmZ25fMDJDMkN5eF9FL2Fkc3BhcmM6MzAweDYwMA%3D%3D&d=eyJ3aCI6IlltSmtkazlCU201eFNDMUpaR1ptWjI1Zk1ESkRNa041ZUY5RkwyRmtjM0JoY21NNk16QXdlRFl3TUE9PSIsIndkIjp7ImsiOnsiaGJfYmlkZGVyIjpbImFkc3BhcmMiXSwiaGJfc2l6ZSI6WyIzMDB4NjAwIl19fSwid3IiOjB9

NOTES:
- https://clarium.global.ssl.fastly.net/?wrapper responds with:
    - Content-Length: 11496
    - Content-Type: text/html; charset=UTF-8
    - no X-Content-Type-Options header
    - body:
        <!-- confiant --> <script type='text/javascript'>
        //<![CDATA[
        ...... valid javascript goes here ......
        //]]>--></script>

- I've tested that this resource works fine 1) when sourced by a script tag and 2) when opened as a top-level html doc.
 

Comment 1 by creis@chromium.org, May 4 2018

Cc: lukasza@chromium.org creis@chromium.org dcheng@chromium.org
My first guess is that the script tag parser might be treating the HTML comment as a // comment, since this example stops working if you put a newline between the HTML comment and script tag on the first line.

We look more closely at the actual script tag parsing rules.
https://www.ecma-international.org/ecma-262/8.0/index.html#sec-html-like-comments includes the following possible BNF rules/expansions:

    <Comment> ::= <SingleLineHTMLOpenComment>

    <SingleLineHTMLOpenComment> ::= "<!--" [<SingleLineCommentChars>]

    <SingleLineCommentChars> ::= <SingleLineCommentChar> [<SingleLineCommentChars>]

    <SingleLineCommentChar> ::= <SourceCharacter but not LineTerminator>

    <SourceCharacter> ::= any Unicode code point

    <LineTerminator> ::= <LF> | <CR> | <LS> | <PS>


AFAIU this means that the following 2 comments single lines are parsed in a similar way

    <!-- this --> is a comment
    x = 123;

    // this is a comment
    x = 123;


There is also:

    <SingleLineHTMLCloseComment> ::= <LineTerminatorSequence> <HTMLCloseComment>

    <HTMLCloseComment> ::= [<WhiteSpaceSequence>] [<SingleLineDelimitedCommentSequence>] "-->" [<SingleLineCommentChars>]


I am not yet sure what this means for CORB's HTML sniffer.  In particular - I am not sure when skipping of HTML comments should end after a single line VS after encountering "-->" characters.  Some examples to consider:

    Should not sniff as HTML:
        <!-- blah --> <script>
        var x = "I am a html/javascript polyglot"
        // </script>

    Should sniff as HTML (this javascript contains only a comment):
        <!-- blah --> <html> <!-- single-line HTML / no javascript here -->

    Should sniff as HTML ("<script>" is not a valid javascript statement):
        <!-- blah // -->
        <script>

    ???:
        <!-- this line is a valid comment in javascript -> SingleLineHTMLOpenComment rule

        // We can have javascript here (inside HTML comment) delimited by <! -- and -- >
        var we_are_inside = "html comment"; // here.

        /*  
        --> this line is a valid javascript comment -> SingleLineHTMLCloseComment rule

        <!-- we can have html here (inside javascript comment) delimited by / * ... * / -->
        <p>Blah<p>

        <!--
        */
        -->

Comment 3 by creis@chromium.org, May 4 2018

Blocking: 268640
Labels: -Pri-2 M-67 OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac OS-Windows Pri-1
It looks like we need to ignore not just leading HTML comments, but anything on the line after the close HTML comment.  We can only block it if the first line following the leading HTML comments sniffs as HTML.

For example, we could not block this:
  <!-- a -->
  <!-- b
  --><something>
  code();

But we could block this:
  <!-- a
  -->anything
  <html>...</html>

Does that sound right?

Comment 4 by creis@chromium.org, May 4 2018

Commenting on your examples from comment 2:

1)    Should not sniff as HTML:
        <!-- blah --> <script>
        var x = "I am a html/javascript polyglot"
        // </script>

Agreed.  This is script and we can't block it.


2)    Should sniff as HTML (this javascript contains only a comment):
        <!-- blah --> <html> <!-- single-line HTML / no javascript here -->

It's probably ok if we don't block this.  It's not really possible to identify if it actually contains JavaScript.  It's a valid prefix to a JavaScript file, so I don't think we can block it.


3)    Should sniff as HTML ("<script>" is not a valid javascript statement):
        <!-- blah // -->
        <script>

Agreed.  To the extent that we can sniff line 2 (<script>) as HTML, we should block this, since it's not parseable as valid JavaScript.


4)    ???:
        <!-- this line is a valid comment in javascript -> SingleLineHTMLOpenComment rule

        // We can have javascript here (inside HTML comment) delimited by <! -- and -- >
        var we_are_inside = "html comment"; // here.

        /*  
        --> this line is a valid javascript comment -> SingleLineHTMLCloseComment rule

        <!-- we can have html here (inside javascript comment) delimited by / * ... * / -->
        <p>Blah<p>

        <!--
        */
        -->

I don't think we would block this.  It looks like several HTML comments, and thus could be JavaScript.
I think I like the current proposal to drop everything until the end of the line after "-->" characters (when skipping a HTML comment).  OTOH, this would mean that the following wouldn't sniff as HTML - I assume this is acceptable?

    <!-- blah --> <p>
    Paragraph <!-- <- "Paragraph" doesn't sniff as a html tag. -->
    </p>

Comment 6 by creis@chromium.org, May 4 2018

Owner: lukasza@chromium.org
Status: Assigned (was: Available)
Correct.  It does cut into our coverage of HTML documents in a non-trivial way, but it was always kind of best effort there.  We're trying to encourage people to use nosniff (or perhaps From-Origin) to get more effective coverage in general.  I think it's more important not to block things that could be functional JavaScript, as long as we can keep the HTML definition reasonably large.

Would you be able to help update the sniffer for this?
Is this M67 stable blocker? If yes, pls apply "RBS" label.
Labels: ReleaseBlock-Stable Proj-SiteIsolation-LaunchBlocking
Status: Started (was: Assigned)
RE: #c7

Yes - this is a blocker for shipping site-per-process to 100% of M67 stable.

WIP CL @ https://chromium-review.googlesource.com/c/chromium/src/+/1047851
Project Member

Comment 9 by bugdroid1@chromium.org, May 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ebd882cf95df2b1a3376eefba28a410c9e76c41f

commit ebd882cf95df2b1a3376eefba28a410c9e76c41f
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Wed May 09 00:04:02 2018

Handle more html/js polyglots in CORB confirmation sniffing.

Cross-Origin Read Blocking (CORB) tries to protect certain resource
types (e.g. text/html).  To be resilient against HTTP responses
mislabeled with an incorrect Content-Type, CORB sniffs the response body
to confirm if it truly is the protected type.

Before this CL the confirmation sniffing logic blocked resources that
are both a valid html and a valid javascript.  Blocking of such
resources is undesirable, because it is disruptive to existing websites
that use such polyglot responses in <script> tags.

After this CL, CORB sniffer takes into account the
https://www.ecma-international.org/ecma-262/8.0/index.html#prod-annexB-SingleLineHTMLCloseComment
rule which means that the sniffing doesn't resume immediately after
"-->" characters, but instead also consumes all the characters until the
first line terminator.

Bug:  839945 
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_mojo
Change-Id: I7c8221acc2013adffe8095d188ae22e1c6a2fdab
Reviewed-on: https://chromium-review.googlesource.com/1047851
Commit-Queue: Charlie Reis <creis@chromium.org>
Reviewed-by: Charlie Reis <creis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#557018}
[modify] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/content/browser/loader/cross_site_document_blocking_browsertest.cc
[modify] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/content/browser/loader/cross_site_document_resource_handler_unittest.cc
[add] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/content/test/data/site_isolation/js-html-polyglot2.html
[modify] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/services/network/cross_origin_read_blocking.cc
[modify] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/services/network/cross_origin_read_blocking_explainer.md
[modify] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/services/network/cross_origin_read_blocking_unittest.cc
[modify] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/resources/html-js-polyglot.js
[add] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/resources/html-js-polyglot2.js
[add] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/resources/html-js-polyglot2.js.headers
[modify] https://crrev.com/ebd882cf95df2b1a3376eefba28a410c9e76c41f/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/script-html-js-polyglot.sub.html

NextAction: 2018-05-10
Thank you lukasza@, pls update the bug with canary result on Thursday and request a merge to M67 if canary result looks good. 
Labels: Target-67
Status: Fixed (was: Started)
Commit ebd882cf... initially landed in 68.0.3425.0.  I've tried this Canary version on a Mac and while http://www.pythonforbeginners.com/files/reading-and-writing-files-in-python still hit quite a few "Blocked current origin from receiving cross-site document" messages, I didn't see any html/js polyglots after  extracting the URLs of the affected resources, downloading them via wget and manually inspecting the response headers and bodies.  Note that some requests were rejected when done by the wget client (like www.atdmt.com one), but looking at the response manually shows that it has content-length: 0.

Let's give this a bit more time until Thursday and request the merge tomorrow if everything looks good.
Labels: Merge-TBD
[Auto-generated comment by a script] We noticed that this issue is targeted for M-67; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-67 label, otherwise remove Merge-TBD label. Thanks.
The NextAction date has arrived: 2018-05-10
Cc: gov...@chromium.org
Labels: Merge-Request-67
Things are still looking good (e.g. no magic signature looks related when I look at crashes from 68.0.3425.0).  I've already confirmed yesterday that the fix works as intended on Canary - see #c11 above.  So - let me request a merge to M67.
Project Member

Comment 15 by sheriffbot@chromium.org, May 10 2018

Labels: -Merge-Request-67 Merge-Review-67 Hotlist-Merge-Review
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-TBD -Merge-Review-67 Merge-Approved-67
Approving merge to M67 branch 3396 based on comment #8, #11 and #14. Please merge ASAP. Thank you.
Project Member

Comment 17 by bugdroid1@chromium.org, May 10 2018

Labels: -merge-approved-67 merge-merged-3396
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668

commit 54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Thu May 10 18:35:34 2018

Handle more html/js polyglots in CORB confirmation sniffing.

Cross-Origin Read Blocking (CORB) tries to protect certain resource
types (e.g. text/html).  To be resilient against HTTP responses
mislabeled with an incorrect Content-Type, CORB sniffs the response body
to confirm if it truly is the protected type.

Before this CL the confirmation sniffing logic blocked resources that
are both a valid html and a valid javascript.  Blocking of such
resources is undesirable, because it is disruptive to existing websites
that use such polyglot responses in <script> tags.

After this CL, CORB sniffer takes into account the
https://www.ecma-international.org/ecma-262/8.0/index.html#prod-annexB-SingleLineHTMLCloseComment
rule which means that the sniffing doesn't resume immediately after
"-->" characters, but instead also consumes all the characters until the
first line terminator.

Bug:  839945 
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_mojo
Change-Id: I7c8221acc2013adffe8095d188ae22e1c6a2fdab
Reviewed-on: https://chromium-review.googlesource.com/1047851
Commit-Queue: Charlie Reis <creis@chromium.org>
Reviewed-by: Charlie Reis <creis@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#557018}
Reviewed-on: https://chromium-review.googlesource.com/1054192
Cr-Commit-Position: refs/branch-heads/3396@{#549}
Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428}
[modify] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/content/browser/loader/cross_site_document_blocking_browsertest.cc
[modify] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/content/browser/loader/cross_site_document_resource_handler_unittest.cc
[add] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/content/test/data/site_isolation/js-html-polyglot2.html
[modify] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/services/network/cross_origin_read_blocking.cc
[modify] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/services/network/cross_origin_read_blocking_explainer.md
[modify] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/services/network/cross_origin_read_blocking_unittest.cc
[modify] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/resources/html-js-polyglot.js
[add] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/resources/html-js-polyglot2.js
[add] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/resources/html-js-polyglot2.js.headers
[modify] https://crrev.com/54eb5b35cd06d8092d0c9a91a6f6c0479e1f1668/third_party/WebKit/LayoutTests/external/wpt/fetch/corb/script-html-js-polyglot.sub.html

Sign in to add a comment