New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 702321 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Autofill: don't need transliterator to compare strings disregarding diacritics / case and taking into normalization

Project Member Reported by js...@chromium.org, Mar 16 2017

Issue description

Came across investigating trybot failures ( https://codereview.chromium.org/2755963002/ ; well even without what I'm suggesting here, it should not fail. ). 

In https://codereview.chromium.org/2041413004.  the following was done:

Normalize fields before comparison to fold case, remove diacritics,
      remove punctuation and collapse or remove whitespace.


 std::unique_ptr<icu::Transliterator> transliterator(
      icu::Transliterator::createInstance(
          "NFD; [:Nonspacing Mark:] Remove; Lower; NFC", UTRANS_FORWARD,
          status));

However, "NFD; [:Nonspacing Mark:] Remove; Lower; NFC" is not necessary because icu Collator is already used (can be used) to compare strings with strength set to PRIMARY (in some locales, it might not be PRIMARY, though). Normalization is also taken care of by ICU collator. 
PRIMARY strength means both diacrtics and case differences are ignored.


In the following code snippet, strength can be set to PRIMARY explicitly for the 1st ctor to avoid any locale difference. 

----------
CaseInsensitiveCompare::CaseInsensitiveCompare()
    : CaseInsensitiveCompare(icu::Locale::getDefault()) {}

CaseInsensitiveCompare::CaseInsensitiveCompare(const icu::Locale& locale)
    : collator_(GetCollatorForLocale(locale)) {
  if (collator_)
    collator_->setStrength(icu::Collator::PRIMARY);
}
----------------


 

Comment 1 by js...@chromium.org, Mar 16 2017

Cc: riesa@chromium.org
Roger, is there any reason to pre-normalize (including removal of diacritics and case-folding) values in autofill before comparison? 

If not, I can remove transliteation part. 

Comment 2 by rogerm@chromium.org, Mar 16 2017

For some structured comparisons (address fields, for example), there are
some transformations we apply (string replacement for abbreviations and
what not) that take place post normalization.

Comment 3 by js...@chromium.org, Mar 16 2017

Thanks, Roger.  I'll leave it alone for now, but leave this issue open for the reason below:. 

Even replacement for abbreviation etc can be done with case-insensitive/diacritic-agnostic and unicode-normalization-independent manner with ICU's string search API (+ some caller side code) with an ICU collator (strength=PRIMARY). 

Sign in to add a comment