Remove background HTML parser |
|||||
Issue descriptionThe background HTML parser consists of rather a large amount of complicated code, was supposed to execute asynchronously (but currently doesn't) and causes issues (e.g. exacerbating https://bugs.chromium.org/p/chromium/issues/detail?id=777763). There's a performance advantage to removing it of about 5% on TTFCP.
,
Nov 1
I'm very interested in your findings, can you share data/code?
,
Nov 2
Absolutely, the patches are undergoing internal review ATM (this bug's open to track them and any layout test regressions), and will be uploaded once this is done. The full result file's too big to upload (it's about 250 MB), but I've attached a screenshot of some WIP results. Would you be interesting in reviewing the patches once they're uploaded?
,
Nov 2
(Also attaching TTFMP shot). Should mention that there are some heuristics inside the patch which still need tuning, so hopefully some of the big percentage regressions will come down. Rough plan is to implement behind a flag so any layout test regressions can be fixed.
,
Nov 2
+cc html/parser/OWNERS who may be interested
,
Nov 2
Yep this interests me, and I'm not surprised there are performance wins in removing it.
,
Nov 6
Thanks for splitting CLs. Would you mind sharing the high-level picture of the change before landing them? I'd love to see a design doc if possible, or the entire diff at least. In general, I agree that there are a lot of technical debt that are also causing performance issues. However, I also want to be careful not to make everything synchronous, since they can affect responsiveness.
,
Nov 6
,
Nov 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9d9cd7eec7fc024ebb8debfabae0f745dc706152 commit 9d9cd7eec7fc024ebb8debfabae0f745dc706152 Author: Richard Townsend <Richard.Townsend@arm.com> Date: Wed Nov 14 19:40:37 2018 Add CSP meta detection to HTMLPreloadScanner::Scan The content-security policy meta tag must be read before preloads can be dispatched. This change allows the foreground HTML parser to use a single HTMLPreloadScanner to read when this tag's been consumed. It can then dispatch preloads at the correct time. Bug: 901056 Change-Id: I70d0c75132b4203992f3b9d16db5c4442037ab13 Reviewed-on: https://chromium-review.googlesource.com/c/1320853 Commit-Queue: Richard Townsend <richard.townsend@arm.com> Reviewed-by: Charlie Harrison <csharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#608086} [modify] https://crrev.com/9d9cd7eec7fc024ebb8debfabae0f745dc706152/third_party/blink/renderer/core/html/parser/html_document_parser.cc [modify] https://crrev.com/9d9cd7eec7fc024ebb8debfabae0f745dc706152/third_party/blink/renderer/core/html/parser/html_preload_scanner.cc [modify] https://crrev.com/9d9cd7eec7fc024ebb8debfabae0f745dc706152/third_party/blink/renderer/core/html/parser/html_preload_scanner.h [modify] https://crrev.com/9d9cd7eec7fc024ebb8debfabae0f745dc706152/third_party/blink/renderer/core/html/parser/html_preload_scanner_fuzzer.cc [modify] https://crrev.com/9d9cd7eec7fc024ebb8debfabae0f745dc706152/third_party/blink/renderer/core/html/parser/html_preload_scanner_test.cc
,
Nov 16
@kouhei: So https://chromium-review.googlesource.com/c/chromium/src/+/1333755/4 is roughly what I'm thinking: keep the existing synchronous parser but add a budget that controls the maximum it's allowed to parse in one go (this allows it to maintain responsiveness). I'd quite like to be able to review as-is, but I seem to be hitting some flakiness on Windows (basically, the changes made to html_document_parser.cc end up crashing the renderer process when it displays chrome://welcome_win10, making all the installation tests time out. This problem only seems to affect this particular page). I'm trying to get access to a powerful enough machine to build a Windows build and track down what's going on, but if I can't do that then the plan looks like this: * Produce a BaseHTMLDocumentParser that can attach to Document, and all the other places it's used. * Duplicate the existing HTMLDocumentParser to LegacyHTMLDocumentParser and ForegroundHTMLDocumentParser * Remove all traces of the BackgroundHTMLParser from ForegroundHTMLParser, place behind a flag. * Fix the layout test regressions behind the flag. * Eventually, the flag becomes the default. * Remove the LegacyHTMLDocumentParser, BaseHTMLDocumentParser. * Rename ForegroundHTMLDocumentParser back to HTMLDocumentParser. Does that sound reasonable to you?
,
Nov 19
(Reposted with a typo correction). OK, so despite my best efforts, https://chromium-review.googlesource.com/c/chromium/src/+/1333755/ still seems to be hitting some state management/timing issues meaning that it locks up on Windows systems (debugging it in my Windows VM, it looks like there's a circular pattern of HTMLImport::StateDidChange, calling HTMLDocumentParser::ExecuteScriptsWaitingForResources, which ends up calling HTMLImport::StateDidChange again, all with a bunch of squiggly asynchronous callback stuff in the middle). It's not clear why this only affects Windows, but I think complete separation of the two HTML parsers, gating via a runtime flag is probably the most reliable option for the time being.
,
Nov 21
+cc: hayato Thanks for the explanation and the CL. re: plan, I'd like to see it in more detail. If you can create a design doc, it would be great. Sorry for the delay on the reviews. I'm currently overloaded so I'm asking hayato@ (DOM TL ccd) if he can find another reviewer for this.
,
Nov 26
Here's a design doc outlining some potential options: https://docs.google.com/document/d/1qbKybVe3Vx85lTQTDXWunxg-jKxEs4b80APD_MxB6Nk/edit?usp=sharing Basically, how best to proceed boils down to these two decisions: * Should we wait until HTML Imports are removed? * Should we have two HTML parsers for a while, so that the behaviour change can be more reliably isolated behind a feature flag?
,
Dec 3
@kouhei: looking at https://bugs.chromium.org/p/chromium/issues/detail?id=766694, it's not looking like there's much progress at the moment RE removing HTML Imports. I'm tilting towards duplicating the parser (see image) and putting the budgeted parser entirely behind the feature flag for the time being, what do you think? |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by richard....@arm.com
, Nov 1