Autofill: don't need transliterator to compare strings disregarding diacritics / case and taking into normalization |
|
Issue descriptionCame across investigating trybot failures ( https://codereview.chromium.org/2755963002/ ; well even without what I'm suggesting here, it should not fail. ). In https://codereview.chromium.org/2041413004. the following was done: Normalize fields before comparison to fold case, remove diacritics, remove punctuation and collapse or remove whitespace. std::unique_ptr<icu::Transliterator> transliterator( icu::Transliterator::createInstance( "NFD; [:Nonspacing Mark:] Remove; Lower; NFC", UTRANS_FORWARD, status)); However, "NFD; [:Nonspacing Mark:] Remove; Lower; NFC" is not necessary because icu Collator is already used (can be used) to compare strings with strength set to PRIMARY (in some locales, it might not be PRIMARY, though). Normalization is also taken care of by ICU collator. PRIMARY strength means both diacrtics and case differences are ignored. In the following code snippet, strength can be set to PRIMARY explicitly for the 1st ctor to avoid any locale difference. ---------- CaseInsensitiveCompare::CaseInsensitiveCompare() : CaseInsensitiveCompare(icu::Locale::getDefault()) {} CaseInsensitiveCompare::CaseInsensitiveCompare(const icu::Locale& locale) : collator_(GetCollatorForLocale(locale)) { if (collator_) collator_->setStrength(icu::Collator::PRIMARY); } ----------------
,
Mar 16 2017
For some structured comparisons (address fields, for example), there are some transformations we apply (string replacement for abbreviations and what not) that take place post normalization.
,
Mar 16 2017
Thanks, Roger. I'll leave it alone for now, but leave this issue open for the reason below:. Even replacement for abbreviation etc can be done with case-insensitive/diacritic-agnostic and unicode-normalization-independent manner with ICU's string search API (+ some caller side code) with an ICU collator (strength=PRIMARY). |
|
►
Sign in to add a comment |
|
Comment 1 by js...@chromium.org
, Mar 16 2017