Investigate pak file duplicate entries |
|||
Issue description
Running:
unzip ChromePublic.apk assets/en-US.pak
tools/grit/pak_util.py print assets/en-US.pak | cut -d: -f2 | sort | uniq -c | sort -rn
Shows:
8 This server could not prove that it is <strong>$1</strong>; ...
4 You attempted to reach <strong>$1</strong>, but the server p...
4 Status
4 Reload
4 Other…
4 Exp
4 Details
4 Block
4 Allow
3 Unknown error
3 Times New Roman
3 Save
3 OK
3 Never
3 Name
3 Learn more
3 Hide details
3 Google Safe Browsing recently <a href="#" id="diagnostic-lin...
3 Continue
3 Chromium
3 <a jsvalues="href
3 $1, $2
2 ZIP code
2 Year
2 Week
2 Waiting for $1…
2 video
2 User
2 Update
2 Today
2 This ZIP code format is not recognized. Example of a valid Z...
2 This ZIP code does not appear to match the rest of this addr...
2 This postal code format is not recognized. Example of a vali...
2 This postal code does not appear to match the rest of this a...
2 The certificate chain for this site contains a certificate s...
2 Submit
2 <strong jscontent="hostName"></strong> took too long to resp...
2 State
2 Show saved copy
2 Show original
2 Settings
2 Secure connection
2 {SECONDS, plural, =1 {1 second} other {# seconds}}
2 Refresh
2 Province
2 Prefecture
2 Postal code
2 Please lengthen this text to $2 characters or more (you are ...
2 play on remote device
2 Parish
2 Not secure
2 Notifications
2 None
2 Network error
2 Month
2 {MINUTES, plural, =1 {1 minute} other {# minutes}}
2 It may have been moved or deleted.
2 Island
2 If you understand the risks to your security, you may <a hre...
2 {HOURS, plural, =1 {1 hour} other {# hours}}
2 Guest
2 Further, this page includes other resources which are not se...
2 exit full screen
2 Emirate
2 District
2 Deprovisioned
2 Department
2 County
2 {COUNT, plural,\n =0 {None}\n =1 {1 item}\n other {...
2 control remote playback
2 Close
2 Chromium Autofill settings…
2 Chromium Apps
2 Back to safety
2 audio
2 Area
2 $1 with $2
2 {1, plural,\n =1 {This server could not prove that it is ...
1 yyyy
<snip>
Running:
strings en-US.pak | grep "Times New Roman"
shows that the strings are actually being stored multiple times.
While some of this duplication is probably unnecessary, duplication of strings for translation can also happen for words that are the same in english, but have different translations depending on context / semantics.
Rather than audit the strings, we should change the .pak file format to accommodate duplicates.
Idea #1:
Store two tables at the start of the file. One that maps IDs->offsets, and the other that maps IDs->IDs for duplicates.
* This would mean doing two binary searches for each ID lookup at runtime.
Idea #2:
Store just one table, but have duplicates point to the same offset within the file.
* This would require us to change the runtime logic that calculates: entry_length = next_offset - my_offset. It would instead become: entry_length = next_offset_that_is_larget_than_my_offset - my_offset.
Idea 2 adds in an O(n) search, but in reality, given the small number of duplicates, I suspect it will always be O(less than 3)
,
Jul 6 2017
Did a brainstorm with flackr about the file format, and we're going to go with: 1. Store an extra 2 bytes in the header for the number of aliases 2. Store the main table of entries as with today 3. Store a second table of id->index into main table 4. Search first table first, then search alias table 5. To be backwards compatible with v4, set the size of alias table to 0 when loading.
,
Jul 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ade347f539a378fb37500f6d17e8835edc1d8ec0 commit ade347f539a378fb37500f6d17e8835edc1d8ec0 Author: agrieve <agrieve@chromium.org> Date: Thu Jul 20 12:48:12 2017 Add deduplication logic to .pak files Now, when multiple entries contain the same content, multiple table-of-contents entries will be created, but only one data region. As of now, en-US.pak has ~3200 entries, and 350 of them are duplicates. For MonochromePublic.apk, shrinks uncompressed .pak size by ~130kb, and compressed .pak size by 32kb. BUG= 738566 Review-Url: https://codereview.chromium.org/2969123002 Cr-Commit-Position: refs/heads/master@{#488215} [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/build/android/resource_sizes.py [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/tools/grit/grit/format/data_pack.py [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/tools/grit/grit/format/data_pack_unittest.py [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/tools/grit/pak_util.py [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/ui/base/BUILD.gn [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/ui/base/resource/data_pack.cc [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/ui/base/resource/data_pack.h [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/ui/base/resource/data_pack_literal.cc [add] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/ui/base/resource/data_pack_literal.h [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/ui/base/resource/data_pack_unittest.cc [modify] https://crrev.com/ade347f539a378fb37500f6d17e8835edc1d8ec0/ui/base/resource/resource_bundle_unittest.cc
,
Jul 20 2017
,
Jul 20 2017
Great work, thanks for adding this :) It's 1000x better than having to manage the strings source files!
,
Jul 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a commit 59f00a7d06b43ae4127637baaf7d8e8f5ee7847a Author: achuith <achuith@chromium.org> Date: Fri Jul 21 02:11:35 2017 Revert of Add deduplication logic to .pak files (patchset #10 id:180001 of https://codereview.chromium.org/2969123002/ ) Reason for revert: https://bugs.chromium.org/p/chromium/issues/detail?id=747171 Original issue's description: > Add deduplication logic to .pak files > > Now, when multiple entries contain the same content, multiple > table-of-contents entries will be created, but only one data region. > > As of now, en-US.pak has ~3200 entries, and 350 of them are duplicates. > > For MonochromePublic.apk, shrinks uncompressed .pak size by ~130kb, and > compressed .pak size by 32kb. > > BUG= 738566 > > Review-Url: https://codereview.chromium.org/2969123002 > Cr-Commit-Position: refs/heads/master@{#488215} > Committed: https://chromium.googlesource.com/chromium/src/+/ade347f539a378fb37500f6d17e8835edc1d8ec0 TBR=flackr@chromium.org,sadrul@chromium.org,agrieve@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 738566 Review-Url: https://codereview.chromium.org/2989443002 Cr-Commit-Position: refs/heads/master@{#488554} [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/build/android/resource_sizes.py [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/tools/grit/grit/format/data_pack.py [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/tools/grit/grit/format/data_pack_unittest.py [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/tools/grit/pak_util.py [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/ui/base/BUILD.gn [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/ui/base/resource/data_pack.cc [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/ui/base/resource/data_pack.h [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/ui/base/resource/data_pack_literal.cc [delete] https://crrev.com/b1c22e2d5380b32a6379f43fff578fff2c807930/ui/base/resource/data_pack_literal.h [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/ui/base/resource/data_pack_unittest.cc [modify] https://crrev.com/59f00a7d06b43ae4127637baaf7d8e8f5ee7847a/ui/base/resource/resource_bundle_unittest.cc
,
Jul 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1595ed49627bfa8af05e54112e27b6c904e160b2 commit 1595ed49627bfa8af05e54112e27b6c904e160b2 Author: achuith <achuith@chromium.org> Date: Mon Jul 24 19:56:53 2017 Revert of Add deduplication logic to .pak files (patchset #10 id:180001 of https://codereview.chromium.org/2969123002/ ) Reason for revert: https://bugs.chromium.org/p/chromium/issues/detail?id=747171 Original issue's description: > Add deduplication logic to .pak files > > Now, when multiple entries contain the same content, multiple > table-of-contents entries will be created, but only one data region. > > As of now, en-US.pak has ~3200 entries, and 350 of them are duplicates. > > For MonochromePublic.apk, shrinks uncompressed .pak size by ~130kb, and > compressed .pak size by 32kb. > > BUG= 738566 > > Review-Url: https://codereview.chromium.org/2969123002 > Cr-Commit-Position: refs/heads/master@{#488215} > Committed: https://chromium.googlesource.com/chromium/src/+/ade347f539a378fb37500f6d17e8835edc1d8ec0 TBR=flackr@chromium.org,sadrul@chromium.org,agrieve@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 738566 Review-Url: https://codereview.chromium.org/2989443002 Cr-Original-Commit-Position: refs/heads/master@{#488554} Change-Id: I33eff508d4177907d050424bee967d643fe022e4 Reviewed-on: https://chromium-review.googlesource.com/583542 Reviewed-by: Bernie Thompson <bhthompson@chromium.org> Cr-Commit-Position: refs/branch-heads/3163@{#13} Cr-Branched-From: ff259bab28b35d242e10186cd63af7ed404fae0d-refs/heads/master@{#488528} [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/build/android/resource_sizes.py [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/tools/grit/grit/format/data_pack.py [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/tools/grit/grit/format/data_pack_unittest.py [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/tools/grit/pak_util.py [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/ui/base/BUILD.gn [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/ui/base/resource/data_pack.cc [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/ui/base/resource/data_pack.h [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/ui/base/resource/data_pack_literal.cc [delete] https://crrev.com/48aa68957520402a20073d92b6c5f4965479528e/ui/base/resource/data_pack_literal.h [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/ui/base/resource/data_pack_unittest.cc [modify] https://crrev.com/1595ed49627bfa8af05e54112e27b6c904e160b2/ui/base/resource/resource_bundle_unittest.cc
,
Jul 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4735fae6caad85ab08433367108d3b96c0f7d646 commit 4735fae6caad85ab08433367108d3b96c0f7d646 Author: agrieve <agrieve@chromium.org> Date: Fri Jul 28 20:39:08 2017 Reland of Add deduplication logic to .pak files Revert was here: https://codereview.chromium.org/2989443002/ Reason for reland: No changes from origina. Bot issue fixed by: https://chromium-review.googlesource.com/c/589888/ Now, when multiple entries contain the same content, multiple table-of-contents entries will be created, but only one data region. As of now, en-US.pak has ~3200 entries, and 350 of them are duplicates. For MonochromePublic.apk, shrinks uncompressed .pak size by ~130kb, and compressed .pak size by 32kb. TBR=flackr@chromium.org,sadrul@chromium.org,achuith@chromium.org BUG= 738566 Review-Url: https://codereview.chromium.org/2984383002 Cr-Commit-Position: refs/heads/master@{#490501} [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/build/android/resource_sizes.py [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/tools/grit/grit/format/data_pack.py [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/tools/grit/grit/format/data_pack_unittest.py [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/tools/grit/pak_util.py [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/ui/base/BUILD.gn [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/ui/base/resource/data_pack.cc [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/ui/base/resource/data_pack.h [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/ui/base/resource/data_pack_literal.cc [add] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/ui/base/resource/data_pack_literal.h [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/ui/base/resource/data_pack_unittest.cc [modify] https://crrev.com/4735fae6caad85ab08433367108d3b96c0f7d646/ui/base/resource/resource_bundle_unittest.cc |
|||
►
Sign in to add a comment |
|||
Comment 1 by agrieve@chromium.org
, Jul 4 2017