New issue
Advanced search Search tips

Issue 612043 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug



Sign in to add a comment

Investigate using Unicode normalization (through ICU's Normalizer2)

Project Member Reported by ma...@chromium.org, May 15 2016

Issue description

On the server side, improvements can be seen by first normalizing addresses before comparing them. It gets rid of diacritics (é -> e), for example. We use the NFKD normalizer, and that variant (along with many others) is available in ICU's Normalizer2 class[1]

We should investigate whether Normalizer2 can help us in address comparison. Good fix for the week leading up to M-52 branch. A first investigation point is generalizing the CaseInsensitiveCompare [2] 

[1] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/icu/source/common/unicode/normalizer2.h&q=Normalizer2&sq=package:chromium&type=cs&l=78

[2] https://code.google.com/p/chromium/codesearch#chromium/src/components/autofill/core/common/autofill_l10n_util.h&rcl=1463311200&l=23
 

Comment 1 by ma...@chromium.org, May 15 2016

Summary: Investigate using Unicode normalization (through ICU's Normalizer2) (was: Investigate using Unicode's Normalizer2)

Comment 2 by rogerm@chromium.org, May 26 2016

Cc: -rogerm@chromium.org se...@chromium.org
Owner: rogerm@chromium.org
ICU Transliterator is your friend.
Project Member

Comment 3 by bugdroid1@chromium.org, May 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/96e2312ade0d985d9b2c4877f54dfacd7939bc9f

commit 96e2312ade0d985d9b2c4877f54dfacd7939bc9f
Author: rogerm <rogerm@chromium.org>
Date: Mon May 30 18:16:56 2016

Remove diacritics when normalizing autofill profile strings for comparison.

Uses the ICU Transliterator to remove accents (and other non-spacing marks)
from characters that have an ASCII equivalents and transforms uppercase chars
to lower case, while leaving other charcters unchanged.

See: http://userguide.icu-project.org/transforms/general

BUG= 612043 
R=mathp@chromium.org, sebsg@chromium.org

Review-Url: https://codereview.chromium.org/2013063002
Cr-Commit-Position: refs/heads/master@{#396748}

[modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/autofill_manager_unittest.cc
[modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/autofill_profile.cc
[modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/autofill_profile_unittest.cc
[modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/personal_data_manager_unittest.cc

Comment 4 by rogerm@chromium.org, May 30 2016

Status: Fixed (was: Assigned)
We might be able to leverage ICU for other aspects of our address normalization... it seems to be a pretty generalizable/programmable sed-like facility... but, for the purposes of this bug, this is resolved.

Sign in to add a comment