New issue
Advanced search Search tips

Issue 841549 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 268640



Sign in to add a comment

Performance improvement opportunity: CORB sniffing

Project Member Reported by lukasza@chromium.org, May 9 2018

Issue description

There might be some performance improvement opportunities in CORB sniffing implementation.

- HTML/XML/JSON sniffing is currently redone from scratch after each network packet.  This means that some degenerate cases take O(N^2) instead of O(N) time.  We could fix this by introducing state (which is also a prerequisite for fixing issue 795470) although it is not clear to me what kind of state (manually implemented state machine?  regex/yacc compiled into a state machine?).  CORB sniffing is in theory based on context-free BNF rules, but AFAICT they can be replaced with an equivalent regex.

- There might be some extra work done when looking for a line terminator in MaybeSkipHtmlComment.  For example - looking through the *whole* input string when a \u2028 character might be present in the first few characters of the input string.  This means that in some cases O(N) work will be done instead of O(K) (where K is the number of characters needed to be looked at [e.g. don't have to look beyond "<!-- -->\n<html"] and N is the length of the input string [e.g. "<!-- -->\n<html> no \u2028 character in a long string"]


 
I see that RE2 library is listed as allowed in content/browser/DEPS and chrome/browser/DEPS, but AFAICT it only exposes methods for doing a match from scratch (no way to resume a match once more of the input string is available).

AFAICT flex also doesn't support resuming.

Maybe we could manually introduce a state - for HTML this could be a tuple of (currentPos, sniffingState) where sniffingState is one of (normal, inside-html-comment, after-html-comment-before-line-terminator).

Or maybe we could write our own customized tool that takes a regex as input and spits out code that conforms to the CrossOriginReadBlocking::ResponseAnalyzer::ConfirmationSniffer o_O


I guess even non-resumable RE2 lib can help improve searching for "(\r|\n|\u2028|\u2029)" but this would feel a bit heavyweight.  I guess I can try to write a manual mini-state machine instead.
Owner: lukasza@chromium.org
Status: Started (was: Untriaged)
WIP CL with the mini-state machine for line terminator search: https://crrev.com/c/1053243
Project Member

Comment 3 by bugdroid1@chromium.org, May 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/40cba7aed7e82197e414b22b8ec874b1dc51aa01

commit 40cba7aed7e82197e414b22b8ec874b1dc51aa01
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Mon May 14 22:53:32 2018

Performance improvement for searching for line terminators.

Bug: 841549
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_mojo
Change-Id: I259234712ef9dbef570bca26ce3e5028da6a83a3
Reviewed-on: https://chromium-review.googlesource.com/1053243
Commit-Queue: Ɓukasz Anforowicz <lukasza@chromium.org>
Reviewed-by: Charlie Reis <creis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#558502}
[modify] https://crrev.com/40cba7aed7e82197e414b22b8ec874b1dc51aa01/services/network/cross_origin_read_blocking.cc
[modify] https://crrev.com/40cba7aed7e82197e414b22b8ec874b1dc51aa01/services/network/cross_origin_read_blocking_unittest.cc

Cc: lukasza@chromium.org
Owner: ----
Status: Available (was: Started)
I don't have any immediate plans to look into preserving the state of sniffers in-between network packets (this situation seems rate) - marking as Available.

Sign in to add a comment