Investigate using Unicode normalization (through ICU's Normalizer2) |
|||
Issue descriptionOn the server side, improvements can be seen by first normalizing addresses before comparing them. It gets rid of diacritics (é -> e), for example. We use the NFKD normalizer, and that variant (along with many others) is available in ICU's Normalizer2 class[1] We should investigate whether Normalizer2 can help us in address comparison. Good fix for the week leading up to M-52 branch. A first investigation point is generalizing the CaseInsensitiveCompare [2] [1] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/icu/source/common/unicode/normalizer2.h&q=Normalizer2&sq=package:chromium&type=cs&l=78 [2] https://code.google.com/p/chromium/codesearch#chromium/src/components/autofill/core/common/autofill_l10n_util.h&rcl=1463311200&l=23
,
May 26 2016
ICU Transliterator is your friend.
,
May 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/96e2312ade0d985d9b2c4877f54dfacd7939bc9f commit 96e2312ade0d985d9b2c4877f54dfacd7939bc9f Author: rogerm <rogerm@chromium.org> Date: Mon May 30 18:16:56 2016 Remove diacritics when normalizing autofill profile strings for comparison. Uses the ICU Transliterator to remove accents (and other non-spacing marks) from characters that have an ASCII equivalents and transforms uppercase chars to lower case, while leaving other charcters unchanged. See: http://userguide.icu-project.org/transforms/general BUG= 612043 R=mathp@chromium.org, sebsg@chromium.org Review-Url: https://codereview.chromium.org/2013063002 Cr-Commit-Position: refs/heads/master@{#396748} [modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/autofill_manager_unittest.cc [modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/autofill_profile.cc [modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/autofill_profile_unittest.cc [modify] https://crrev.com/96e2312ade0d985d9b2c4877f54dfacd7939bc9f/components/autofill/core/browser/personal_data_manager_unittest.cc
,
May 30 2016
We might be able to leverage ICU for other aspects of our address normalization... it seems to be a pretty generalizable/programmable sed-like facility... but, for the purposes of this bug, this is resolved. |
|||
►
Sign in to add a comment |
|||
Comment 1 by ma...@chromium.org
, May 15 2016