New issue
Advanced search Search tips

Issue 630113 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Use CED encoding detection lib in place of ICU

Project Member Reported by jinsuk...@chromium.org, Jul 21 2016

Issue description

CED (Compact Encoding Detection), lately open-sourced to public, is a better alternative to ICU in terms of speed and accuracy. This bug is to track the adoption of the new library, with a view to replacing all the use cases in Chrome repository.

Blink already uses CED for autodetection: crrev.com/2138643002
 
Quoted from crrev.com/2081653007 for more background:

Original issue's description:
> Replace ICU with CED for auto encoding detection
>
> This is a drop-in replacement of ICU library performing automatic text
> encoding detection with CED (Compact Encdoing Detection).
>
> CED is used extensively in Google for every crawled web page,
> email message, query string, etc., and recently open-sourced for
> public use. (https://github.com/google/compact_enc_det)
>
> Also it is a much better alternative to ICU in terms of speed.
> ICU introduces significant regression in page loading (up to 30%):
>
> = ICU auto-detection vs. TOT =
> page_cycler.typical_25:cold_times.page_load_time 1085.13±9.31% 754.28±12.03% (30.49%)
>
> http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-05-08_21-20-58
>
> while CED adds virtually no additional loading time (delta < sigma):
>
> = CED auto-detection vs. TOT =
> page_cycler.typical_25:cold_times.page_load_time ms 705.70±9.49% vs. 760.31±11.90% (-7.74%)
>
> http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-05-08_20-37-54
>
> With CED, it is feasible to turn on auto encoding detection by default
> so that web pages without encoding label can be taken care of. It will be
> done in a follow-up CL.

Project Member

Comment 2 by bugdroid1@chromium.org, Jul 21 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/64cb3a3e39bf7972db8b1834a0ab5c23380bc6bb

commit 64cb3a3e39bf7972db8b1834a0ab5c23380bc6bb
Author: jinsukkim <jinsukkim@chromium.org>
Date: Thu Jul 21 22:45:27 2016

Rolling DEPS for third_party/ced

This CL cleans up unnecessary build configuration to facilitate
Chrome build.

BUG= 630113 
TEST=gclient sync, make

Review-Url: https://codereview.chromium.org/2169843002
Cr-Commit-Position: refs/heads/master@{#406976}

[modify] https://crrev.com/64cb3a3e39bf7972db8b1834a0ab5c23380bc6bb/DEPS

Project Member

Comment 3 by bugdroid1@chromium.org, Jul 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/65676f001882f6273c8745d786ab906fc4079ad6

commit 65676f001882f6273c8745d786ab906fc4079ad6
Author: jinsukkim <jinsukkim@chromium.org>
Date: Wed Jul 27 02:48:47 2016

Rolling DEPS for third_party/ced

9012c0a Introduce HTML5 mode

BUG= 630113 
TEST=gclient sync, make
TBR=brettw@chromium.org

Review-Url: https://codereview.chromium.org/2188663002
Cr-Commit-Position: refs/heads/master@{#408023}

[modify] https://crrev.com/65676f001882f6273c8745d786ab906fc4079ad6/DEPS

Project Member

Comment 4 by bugdroid1@chromium.org, Aug 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/70de1704feea35840fbed4bc1e864a24b620569e

commit 70de1704feea35840fbed4bc1e864a24b620569e
Author: Jinsuk Kim <jinsukkim@chromium.org>
Date: Wed Aug 10 22:23:56 2016

Replace ICU encoding detection with CED

CED (Compact Encoding Detection) is a better alternative to
ICU in terms of speed and accuracy. This CL switches the encoding
detection library to CED and fixes some of the failing cases.

BUG= 630113 
R=armansito@chromium.org, cbentzel@chromium.org, ellyjones@chromium.org, jshin@chromium.org, phajdan.jr@chromium.org, sievers@chromium.org, thestig@chromium.org

Review URL: https://codereview.chromium.org/2168003003 .

Cr-Commit-Position: refs/heads/master@{#411163}

[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/base/BUILD.gn
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/base/DEPS
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/base/base.gyp
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/base/base.gypi
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/base/base_nacl.gyp
[add] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/base/i18n/encoding_detection.cc
[add] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/base/i18n/encoding_detection.h
[delete] https://crrev.com/713aedbdd27d1bd0d431a02220748c740b471af9/base/i18n/icu_encoding_detection.cc
[delete] https://crrev.com/713aedbdd27d1bd0d431a02220748c740b471af9/base/i18n/icu_encoding_detection.h
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/chromeos/network/network_state_unittest.cc
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/chromeos/network/shill_property_util.cc
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/chromeos/test/data/network/shill_wifi_non_utf8_ssid.json
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/chromeos/test/data/network/translation_of_shill_wifi_non_utf8_ssid.onc
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/content/child/BUILD.gn
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/content/child/ftp_directory_listing_response_delegate.cc
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/ios/net/protocol_handler_util.mm
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/net/data/ftp/dir-listing-ls-20.expected
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/net/data/ftp/dir-listing-ls-21.expected
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/net/data/ftp/dir-listing-ls-22.expected
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/net/ftp/ftp_directory_listing_parser.cc
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/pdf/pdfium/pdfium_engine.cc
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/third_party/ced/BUILD.gn
[modify] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/third_party/ced/ced.gyp
[add] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/third_party/ced/ced.gypi
[add] https://crrev.com/70de1704feea35840fbed4bc1e864a24b620569e/third_party/ced/ced_nacl.gyp

Status: Fixed (was: Untriaged)
Project Member

Comment 6 by bugdroid1@chromium.org, Mar 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/706a26bfd8ee3677c8b59fc1a6859276ab7686c7

commit 706a26bfd8ee3677c8b59fc1a6859276ab7686c7
Author: jinsukkim <jinsukkim@chromium.org>
Date: Thu Mar 09 05:48:02 2017

Rolling DEPS for third_party/ced

e21eb6a Post-detection mapping for HTML5 mode

BUG= 630113 
TEST=gclient sync, make
TBR=brettw@chromium.org

Review-Url: https://codereview.chromium.org/2736423002
Cr-Commit-Position: refs/heads/master@{#455675}

[modify] https://crrev.com/706a26bfd8ee3677c8b59fc1a6859276ab7686c7/DEPS

Project Member

Comment 7 by bugdroid1@chromium.org, Sep 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8ecd07bb059073145fbba3d9d6c26e84e649edc6

commit 8ecd07bb059073145fbba3d9d6c26e84e649edc6
Author: Jinsuk Kim <jinsukkim@chromium.org>
Date: Tue Sep 12 22:09:35 2017

Rolling DEPS for third_party/ced

94c367a Exclude UTF-16 encoding for automatic detection

BUG= 630113 
Test=gclient sync, make
TBR=brettw@chromium.org

Change-Id: I8457204afbeb489a8259bda34314b1dca30135fe
Reviewed-on: https://chromium-review.googlesource.com/662082
Reviewed-by: Jinsuk Kim <jinsukkim@chromium.org>
Commit-Queue: Jinsuk Kim <jinsukkim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#501423}
[modify] https://crrev.com/8ecd07bb059073145fbba3d9d6c26e84e649edc6/DEPS

Sign in to add a comment