New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 749521 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Oct 15
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android , All
Pri: 3
Type: Bug



Sign in to add a comment

Make pak file table of contents smaller (200kb)

Project Member Reported by agrieve@chromium.org, Jul 27 2017

Issue description

$ ../../tools/grit/pak_util.py list-id assets/en-US.pak |wc -l
1175
$ ../../tools/grit/pak_util.py list-id assets/resources.pak |wc -l
473
$ ../../tools/grit/pak_util.py list-id assets/chrome_100_percent.pak |wc -l
224

At 53 locales, we have 53*1175+224+473 = 62972 pak entries.


As for size, unzip -lv reports:
  211202  Stored   211202   0% 2001-01-01 00:00 31a7aa96  assets/chrome_100_percent.pak
 1874938  Stored  1874938   0% 2001-01-01 00:00 27062e69  assets/resources.pak
   54522  Defl:N    20280  63% 2001-01-01 00:00 395dbe47  assets/fi.pak
   60134  Defl:N    20540  66% 2001-01-01 00:00 1980577b  assets/fil.pak
   62705  Defl:N    21369  66% 2001-01-01 00:00 167b6134  assets/fr.pak
   69598  Defl:N    21550  69% 2001-01-01 00:00 6e2b2422  assets/he.pak
  116213  Defl:N    25570  78% 2001-01-01 00:00 4cfa7898  assets/hi.pak
...

Observations:
* Each entry is 6 bytes for the table of contents header (377832 bytes total).
* If we used two bytes for the file offset rather than 4, the overhead would 251888.
* Locale pak files are stored compressed, so actual savings would likely be < 100k.
* resources.pak is almost 2MB. Using 2 byte offsets wouldn't help.
* Larges locale pak is 115k bytes (assets/hi.pak). Average seems to be around 64k
  * Most entry offsets need only 2 bytes (64k)
  * All entry offsets would fit in two tables of 2 bytes (128k)


Here is a proposed new file format:

Header: [int32 version][int8 encoding][int8 null byte][int16 num_resources][int16 num_aliases][int16 num_resources in first 64k][int16 num_resources in second 64k][int16 num_resources in third 64k]
id_table: [int16] * num_resources
alias_table: [int16] * num_aliases
offset_table1: [int16] * num_resources in first 64k
offset_table2: [int16] * num_resources in second 64k
offset_table3: [int16] * num_resources in third 64k
offset_table4: [int32] * num_resources with offsets > 192k
data: packed in tight!

* header is now 16 bytes.
* tables with no entries consume 0 bytes
* offsets are now relative to start of &data, and have an offset depending on which offset_table they are in
* Binary search of id now needs to return the index of the id for lookup in the offset tables
* id_table now more dense than in previous version, so more cpu cache friendly.



 
Owner: mheikal@chromium.org
mheikal - Could you prototype this new format to see what the post-compression savings would be?
Here is the resulting improvements data using this new format:
MonochromePublic.apk_Breakdown (-32,768 bytes)
      -576 bytes Native resources (no l10n) size
      +415 bytes Zip Overhead
   -10,888 bytes Native resources (l10n) size
        +7 bytes Package metadata size
   -21,726 bytes Native resources stored (l10n) size
MonochromePublic.apk_Specifics
   -38,223 bytes normalized apk size


uncompressed diff is 5339963-5217857 = 122106

Header: [int32 version][int8 encoding][int8 null byte][int16 num_resources][int16 num_aliases][int16 num_resources in first 64k]
[int16 num_resources in second 64k][int16 num_resources in third 64k]
id_table: [int16][int16] * num_grd_files(resource_id -> index in offset table)
alias_table: [int16][int16] * num_aliases
offset_table1: [int16] * num_resources in first 64k
offset_table2: [int16] * num_resources in second 64k
offset_table3: [int16] * num_resources in third 64k
offset_table4: [int32] * num_resources with offsets > 192k
data: packed in tight!

Comment 4 by flackr@chromium.org, Nov 10 2017

I find this level of optimization to be making the format a bit complicated. If we're willing to make the pak file more complex to reduce header overhead there are likely many other optimizations we should consider: e.g.
- have two id tables, one for each resource at a new 64kb block (6 bytes per entry) and another for all the resources in between (4 bytes per entry). Then with a binary search into both of these tables you could locate the current and next resource. (somewhere between 4 to 6 bytes per resource)
- store contiguous ranges of ids (I believe there are many of these) in a way that doesn't repeat id value. (slightly more than 4 bytes per resource)
- combine the above idea with only grouping ranges which are in the same 64kb block, store the 64kb block offset at the start of the range so it doesn't need to be repeated. (may need only slightly more than 2 bytes per resource)
MonochromePublic.apk_Breakdown (-118,784 bytes)
    -1,520 bytes Native resources (no l10n) size
    +2,842 bytes Zip Overhead
   -81,323 bytes Native resources (l10n) size
        +1 bytes Package metadata size
   -38,784 bytes Native resources stored (l10n) size
MonochromePublic.apk_Specifics
  -145,670 bytes normalized apk size

flackr@: I tried one of your ideas (grouping ranges of ids). The comment #5 shows the results of that improvement. I have submitted a cr/766707 with that change for further discussion.
Next steps:

a) See what effect using extern ints has on binary size:
Change .h files to:
  #define IDS_FOO pak::kIdsFoo
Generate a .cc file like:
  namespace pak { const uint16_t kIdsFoo = 3; ... }

E.g.: If this works, we can do away with IDs and directly use indices in code.

b) To shrink offsets, consider using a variable number of int16 offset tables, where each one defines: cumulative size, start-offset
Just realized is_component_build throws a sizeable wrench into a). In order to appease the linker, we'd need to generate a dummy "pak_ids.so" for each of them to not have link errors for their extern'ed pak constants. We'd then have to create a per-binary pak_ids.so that holds the correct IDs, and ensure that it's loaded first.

Let's first check to see how large the id->index tables are after the last optimization  to see how much is left to shrink.
Here is the partitioned size data for the en-US.pak

Size of id table is 564 bytes for 141 entries
Size of offset_table_1 is 4250 bytes for 2125 entries
Size of offset_table_2 is 1996 bytes for 998 entries
Size of offset_table_3 is 0 bytes for 0 entries
Size of offset_table_4 is 0 bytes for 0 entries
Size of alias_table is 1312 bytes for 328 entries
Size of data is 98307 bytes for 3123 entries

Thanks! So for ~60 locale pak files, eliminating the id table would save 564 * 60 = 33840 bytes (uncompressed). I'd guess 50% compression (on average, the entire files are 66%, so it's a conservative estimate), then it's ~16kb we'd save.

Encoding IDR_ defines as indices would still be worth it for code-simplification.

Just did a quick test, and turns out I was wrong about there being a linker issue. I'd guess as long as we manually ensure libPakIds.so is loaded first, it should be fine.

I'll play around a bit more.
I haven't yet gotten around to trying to move from ids->indices for the generated symbols, but another realization:

The ID->index table will be the same for all translations. We could just extract it to a locale.pakindex and strip it from all locale .pak files.
After some digging & offline conversation, I'd like to conclude that trying to remove resource IDs (in favor of storing resource indices directly) would be too little return-on-investment for the complexity of it.

Here's a concrete proposal for changes to implement:

Create a file: assets/locale_pak_ids, which contains:
Header: [int16 version][int16 num_resources]
id_table_keys: [int16 base_id] * num_sequential_blocks
id_table_values: [int16 resource_index] * num_sequential_blocks

Change locale pak files to:
Header: [int32 version][int8 id_table_present_and_encoding][int8 null byte][int16 num_resources][int16 num_aliases]
Header (cont'd): [int16 start_index of offset_table2][int16 start_index of offset_table3][int16 start_index of offset_table4]
alias_table_keys: [int16 src_index] * num_aliases
alias_table_values: [int16 dst_index] * num_aliases
offset_table1: [int16] * (num_resources + 1) in first 64k # +1 so that all entries have a next entry.
offset_table2: [int16] * (num_resources + 1) in second 64k
offset_table3: [int16] * (num_resources + 1) in third 64k
padding: 0 or two bytes of padding to ensure offset_table4 is 4-byte aligned
offset_table4: [int32] * num_resources with offsets > 192k
data: packed in tight!

To look up an index given an ID:
1. Binary search id_table_keys, finding the closest lower key.
2. Index is resource_index + (resource_id - base_id)
3. Sanity check ID is valid by CHECK'ing resource_index < num_resources && resource_index < next_id_table_resource_index

To look up data given index:
1. Find offset_table with closest-while-smaller start_index.
2. Read offset_table[index - start_index] to get offset 
3. Read offset_table[index - start_index + 1] to get next_offset
4. If offset!=next_offset, return data given by offset+base_offset->next_offset+base_offset
5. Otherwise: it's an alias:
5a) Binary Search alias_table_keys to find index
5b) Go back to step 1. with result of lookup (but don't allow subsequent aliases).


Remarks:
* id_table_present_and_encoding byte will contain a bit denoting whether the id_table is present in the file or not.
  * For non-locale paks, id_table will be embedded.
* Separate keys from values for tables in order to minimize cpu cache misses for binary searches
* Exactly 3 int16 offset tables to simplify code (compared to an arbitrary number). We should never required for our locale .paks.
* No need to maintain support for reading previous file formats.
 * Chrome always bundles pak files
 * The exception is extension themes, which just required a version number to be incremented to have .paks regenerated.

Estimated savings: 160kb (based on prototype in #5 + externalizing id_table)




Main points of contention:
#1: Should we allow for variable number of int16 offset tables
#2: Should we instead pack offset tables as tight as possible (e.g. by encoding size of each entry using LEB128), and the extract it to RAM.
  * Would currently require ~1200 * sizeof(int32) = 4800 bytes of dirty RAM.
  * Assuming avg 1 byte saved per string, 1430 strings (the current count) 60 locales, this would save:
    * 60 * 1430 * 1 = 86kb of disk (uncompressed), and 29kb compressed (assuming ratio from #2)

My opinion is that #1 isn't worth the added complexity, and that #2 isn't enough savings for the added RAM / start-up computation.
Project Member

Comment 13 by bugdroid1@chromium.org, Nov 29 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3

commit 18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3
Author: Andrew Grieve <agrieve@chromium.org>
Date: Wed Nov 29 16:39:39 2017

Add size breakdown to //tools/grit/pak_util.py print

Example output for: pak_util.py print locales/zh-CN.pak
version: 5
encoding: utf-8
num_resources: 3120
num_aliases: 413
size_header: 12
size_id_table: 16248
size_alias_table: 1652
size_data: 99490
total_size: 117402
Entry(id=423, canonical_id=423, size=271, sha1=bd30bee710): ...
...

Bug:  749521 
Change-Id: Id0c3e788c83d9e97d67b7eb6a4090ca68cb97431
Reviewed-on: https://chromium-review.googlesource.com/794340
Commit-Queue: agrieve <agrieve@chromium.org>
Reviewed-by: Robert Flack <flackr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#520138}
[modify] https://crrev.com/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3/tools/grit/grit/format/data_pack.py
[modify] https://crrev.com/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3/tools/grit/grit/format/data_pack_unittest.py
[modify] https://crrev.com/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3/tools/grit/pak_util.py

Realized that we can skip adding redundant offset entries for aliases if we look in the aliases table first.

id_table changes to:
id_table_keys: [int16 base_id] * num_sequential_blocks
id_table_values: [int16 resource_index - num_proceeding_aliases] * num_sequential_blocks

To look up an index given an ID:
1. Binary search the alias table
2a. If entry is found, it's value is the index
2b. Otherwise, record the index of the closest alias (alias_shift)
3. Binary search id_table_keys, finding the closest lower key.
4. Index is id_table_value + (resource_id - id_table_key) - alias_shift

To look up data given index:
1. Find offset_table with closest-while-smaller start_index.
2. Read offset_table[index - start_index] to get offset 
3. Read offset_table[index - start_index + 1] to get next_offset
4. return = data[base_offset + offset:base_offset + next_offset]

(we have about 100 aliases per locale .pak, so 100*60*2 = 12k of offset bytes)
Cc: -flackr@chromium.org mheikal@chromium.org
Owner: flackr@chromium.org
Wanted to elaborate on #12: "remove resource IDs (in favor of storing resource indices directly) would be too little return-on-investment for the complexity of it."

The idea here was to define all IDS_ variables as extern'ed constants, and then generate a grit_consants.cc that defined them all as resource indices rather than resource IDs.

Why this adds complexity:

* Unreachable sections are not removed until linking (--gc-sections), so we couldn't use the variables to create the usage whitelist unless we were willing to link all targets twice (I don't think we are).

* Component builds complicate this. For them, we'd need to ensure we load a "resource_ids.cr.so" before any .cr.so is loaded (where that library defined the constants).



Rob - Wanting to start working on this, and would feel better if you would comment on the approach outlined by #12.
Note about string lengths:

for en-US, 1374 of 1438 are < 128 bytes (95%)
for ml (largest .pak), 1079 of 1438 are < 128 bytes (75%), and 1313 of 1438 are < 255.

From #12 "Main points of contention", using 1 byte per offset would save ~30kb and require ~5kb of RAM.

The 5kb of ram assumes that we expand every size to be a uint32, but perhaps there's another way?

E.g., we could linear search the offset table for each lookup. This would require decoding and summing up to 1438 LEB128 numbers, most of which would be 1 byte. I'd guess this would be on the boarder of how slow we'd want look-up to be.

Alternatively, maybe we could cache the value and position of every 100th offset. This would then require only 14 * sizeof(int) * 2 = ~100 bytes of RAM and have a one-time look-up cost for populating these cached values.


Revamped proposal (Copied from #12, updated with #14 and #17).

Create a file: assets/locale_pak_ids, which contains:
Header: [int16 version][int16 num_resources]
id_table_keys: [int16 base_id] * num_sequential_blocks (keys are resource_ids)
id_table_values: [int16 resource_index - num_proceeding_aliases] * num_sequential_blocks (values are indices into size_table)

Change locale pak files to:
Header: [int32 version][int8 id_table_present_and_encoding][int8 null byte][int16 num_resources][int16 num_aliases]
Header (cont'd): 
alias_table_keys: [int16 src_index] * num_aliases (keys are resource ids)
alias_table_values: [int16 dst_index] * num_aliases (values are indices into offset table)
size_table: [sleb128-encoded uint32] * num_resources (values are entry sizes)
data: packed in tight!

To look up an index given an ID:
1. Binary search the alias table
2a. If entry is found, it's value is the index
2b. Otherwise, record the index of the closest alias as "alias_shift"
3. Binary search id_table_keys, finding the closest lower key.
4. Index is id_table_value + (resource_id - id_table_key) - alias_shift

To look up data given index:
1. Let the be an array "CACHE" of size num_resources / K (where K=100 in the example from #17)
2. Iterate through CACHE until either the index / K entry, or until an entry is missing
3. Iterate and sum entries in size_table until "index" entries have been summed (using CACHE value as a jump-start into this).


Remarks:
* id_table_present_and_encoding byte will contain a bit denoting whether the id_table is present in the file or not.
  * For non-locale paks, id_table will be embedded.
* Separate keys from values for tables in order to minimize cpu cache misses for binary searches
* No need to maintain support for reading previous file formats.
 * Chrome always bundles pak files
 * The exception is extension themes, which just required a version number to be incremented to have .paks regenerated.

Estimated savings: ~200kb (based on estimate in #12, plus tweaks from #14 and #17)
Summary: Make pak file table of contents smaller (200kb) (was: Make pak file table of contents smaller (100kb))
Status: Assigned (was: Available)
Cc: p...@chromium.org
Cc: digit@chromium.org
Status: WontFix (was: Assigned)
Moving locale pak files to dynamic modules will reduce the savings of this proposal to the point where it wouldn't be worth it.

With only 2 locale paks (the case for english: en-US, en-GB), the per-entry overhead is:

(224+473+1175+1175)*6 ~= 18kb

Closing as it wouldn't be worth the effort!


Sign in to add a comment