New issue
Advanced search Search tips

Issue 675493 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

containsOnlyASCII() scans strings many times unnecessarily

Project Member Reported by csharrison@chromium.org, Dec 19 2016

Issue description

There are many places in Blink code where we want to know if the string contains only ASCII characters. This often happens when we want to get access to the strings UTF8 bytes using WebString::utf8 or StringUTF8Adaptor. 

Today, each check involves scanning each character of the string, even if we know (via some property of a parser like our URL parser) that the string definitely only has ASCII characters. For many Strings we do this kind of scanning many times.

We should:
1. Cache a bit in StringImpl if the String contains only ASCII characters. This can be lazily set on the first call to containsOnlyASCII().

2. Add a setter (or constructor) to StringImpl which automatically sets the containsOnlyASCII bits. I could see this used in a few different places:
 - substring()
 - StringView::toString()
 - String copies
 - URL parts (e.g. KURL::host will always contain ascii, even if the ref part doesn't)
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jan 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/36ecc9ec6ab699d9300f50fd4ca26030acad3e26

commit 36ecc9ec6ab699d9300f50fd4ca26030acad3e26
Author: csharrison <csharrison@chromium.org>
Date: Thu Jan 05 00:52:37 2017

Cache contains only ascii in StringImpl

This patch adds a lazily calculated value in StringImpl for whether the
string contains only ascii characters. This avoids code that re-scans
the same string many times.

BUG= 675493 

Review-Url: https://codereview.chromium.org/2585063002
Cr-Commit-Position: refs/heads/master@{#441535}

[modify] https://crrev.com/36ecc9ec6ab699d9300f50fd4ca26030acad3e26/third_party/WebKit/Source/wtf/text/ASCIIFastPath.h
[modify] https://crrev.com/36ecc9ec6ab699d9300f50fd4ca26030acad3e26/third_party/WebKit/Source/wtf/text/StringImpl.cpp
[modify] https://crrev.com/36ecc9ec6ab699d9300f50fd4ca26030acad3e26/third_party/WebKit/Source/wtf/text/StringImpl.h
[modify] https://crrev.com/36ecc9ec6ab699d9300f50fd4ca26030acad3e26/third_party/WebKit/Source/wtf/text/WTFString.h

Status: Fixed (was: Started)
Marking as fixed for now. (1) is implemented and if we find that (2) would be useful in profiling we could re-visit.

I think (1) solves the vast majority of cases.

Sign in to add a comment