Make pak file table of contents smaller (200kb) |
||||||
Issue description$ ../../tools/grit/pak_util.py list-id assets/en-US.pak |wc -l 1175 $ ../../tools/grit/pak_util.py list-id assets/resources.pak |wc -l 473 $ ../../tools/grit/pak_util.py list-id assets/chrome_100_percent.pak |wc -l 224 At 53 locales, we have 53*1175+224+473 = 62972 pak entries. As for size, unzip -lv reports: 211202 Stored 211202 0% 2001-01-01 00:00 31a7aa96 assets/chrome_100_percent.pak 1874938 Stored 1874938 0% 2001-01-01 00:00 27062e69 assets/resources.pak 54522 Defl:N 20280 63% 2001-01-01 00:00 395dbe47 assets/fi.pak 60134 Defl:N 20540 66% 2001-01-01 00:00 1980577b assets/fil.pak 62705 Defl:N 21369 66% 2001-01-01 00:00 167b6134 assets/fr.pak 69598 Defl:N 21550 69% 2001-01-01 00:00 6e2b2422 assets/he.pak 116213 Defl:N 25570 78% 2001-01-01 00:00 4cfa7898 assets/hi.pak ... Observations: * Each entry is 6 bytes for the table of contents header (377832 bytes total). * If we used two bytes for the file offset rather than 4, the overhead would 251888. * Locale pak files are stored compressed, so actual savings would likely be < 100k. * resources.pak is almost 2MB. Using 2 byte offsets wouldn't help. * Larges locale pak is 115k bytes (assets/hi.pak). Average seems to be around 64k * Most entry offsets need only 2 bytes (64k) * All entry offsets would fit in two tables of 2 bytes (128k) Here is a proposed new file format: Header: [int32 version][int8 encoding][int8 null byte][int16 num_resources][int16 num_aliases][int16 num_resources in first 64k][int16 num_resources in second 64k][int16 num_resources in third 64k] id_table: [int16] * num_resources alias_table: [int16] * num_aliases offset_table1: [int16] * num_resources in first 64k offset_table2: [int16] * num_resources in second 64k offset_table3: [int16] * num_resources in third 64k offset_table4: [int32] * num_resources with offsets > 192k data: packed in tight! * header is now 16 bytes. * tables with no entries consume 0 bytes * offsets are now relative to start of &data, and have an offset depending on which offset_table they are in * Binary search of id now needs to return the index of the id for lookup in the offset tables * id_table now more dense than in previous version, so more cpu cache friendly.
,
Nov 9 2017
Here is the resulting improvements data using this new format:
MonochromePublic.apk_Breakdown (-32,768 bytes)
-576 bytes Native resources (no l10n) size
+415 bytes Zip Overhead
-10,888 bytes Native resources (l10n) size
+7 bytes Package metadata size
-21,726 bytes Native resources stored (l10n) size
MonochromePublic.apk_Specifics
-38,223 bytes normalized apk size
uncompressed diff is 5339963-5217857 = 122106
,
Nov 10 2017
Header: [int32 version][int8 encoding][int8 null byte][int16 num_resources][int16 num_aliases][int16 num_resources in first 64k] [int16 num_resources in second 64k][int16 num_resources in third 64k] id_table: [int16][int16] * num_grd_files(resource_id -> index in offset table) alias_table: [int16][int16] * num_aliases offset_table1: [int16] * num_resources in first 64k offset_table2: [int16] * num_resources in second 64k offset_table3: [int16] * num_resources in third 64k offset_table4: [int32] * num_resources with offsets > 192k data: packed in tight!
,
Nov 10 2017
I find this level of optimization to be making the format a bit complicated. If we're willing to make the pak file more complex to reduce header overhead there are likely many other optimizations we should consider: e.g. - have two id tables, one for each resource at a new 64kb block (6 bytes per entry) and another for all the resources in between (4 bytes per entry). Then with a binary search into both of these tables you could locate the current and next resource. (somewhere between 4 to 6 bytes per resource) - store contiguous ranges of ids (I believe there are many of these) in a way that doesn't repeat id value. (slightly more than 4 bytes per resource) - combine the above idea with only grouping ranges which are in the same 64kb block, store the 64kb block offset at the start of the range so it doesn't need to be repeated. (may need only slightly more than 2 bytes per resource)
,
Nov 13 2017
MonochromePublic.apk_Breakdown (-118,784 bytes)
-1,520 bytes Native resources (no l10n) size
+2,842 bytes Zip Overhead
-81,323 bytes Native resources (l10n) size
+1 bytes Package metadata size
-38,784 bytes Native resources stored (l10n) size
MonochromePublic.apk_Specifics
-145,670 bytes normalized apk size
,
Nov 13 2017
flackr@: I tried one of your ideas (grouping ranges of ids). The comment #5 shows the results of that improvement. I have submitted a cr/766707 with that change for further discussion.
,
Nov 13 2017
Next steps:
a) See what effect using extern ints has on binary size:
Change .h files to:
#define IDS_FOO pak::kIdsFoo
Generate a .cc file like:
namespace pak { const uint16_t kIdsFoo = 3; ... }
E.g.: If this works, we can do away with IDs and directly use indices in code.
b) To shrink offsets, consider using a variable number of int16 offset tables, where each one defines: cumulative size, start-offset
,
Nov 14 2017
Just realized is_component_build throws a sizeable wrench into a). In order to appease the linker, we'd need to generate a dummy "pak_ids.so" for each of them to not have link errors for their extern'ed pak constants. We'd then have to create a per-binary pak_ids.so that holds the correct IDs, and ensure that it's loaded first. Let's first check to see how large the id->index tables are after the last optimization to see how much is left to shrink.
,
Nov 14 2017
Here is the partitioned size data for the en-US.pak Size of id table is 564 bytes for 141 entries Size of offset_table_1 is 4250 bytes for 2125 entries Size of offset_table_2 is 1996 bytes for 998 entries Size of offset_table_3 is 0 bytes for 0 entries Size of offset_table_4 is 0 bytes for 0 entries Size of alias_table is 1312 bytes for 328 entries Size of data is 98307 bytes for 3123 entries
,
Nov 14 2017
Thanks! So for ~60 locale pak files, eliminating the id table would save 564 * 60 = 33840 bytes (uncompressed). I'd guess 50% compression (on average, the entire files are 66%, so it's a conservative estimate), then it's ~16kb we'd save. Encoding IDR_ defines as indices would still be worth it for code-simplification. Just did a quick test, and turns out I was wrong about there being a linker issue. I'd guess as long as we manually ensure libPakIds.so is loaded first, it should be fine. I'll play around a bit more.
,
Nov 23 2017
I haven't yet gotten around to trying to move from ids->indices for the generated symbols, but another realization: The ID->index table will be the same for all translations. We could just extract it to a locale.pakindex and strip it from all locale .pak files.
,
Nov 27 2017
After some digging & offline conversation, I'd like to conclude that trying to remove resource IDs (in favor of storing resource indices directly) would be too little return-on-investment for the complexity of it.
Here's a concrete proposal for changes to implement:
Create a file: assets/locale_pak_ids, which contains:
Header: [int16 version][int16 num_resources]
id_table_keys: [int16 base_id] * num_sequential_blocks
id_table_values: [int16 resource_index] * num_sequential_blocks
Change locale pak files to:
Header: [int32 version][int8 id_table_present_and_encoding][int8 null byte][int16 num_resources][int16 num_aliases]
Header (cont'd): [int16 start_index of offset_table2][int16 start_index of offset_table3][int16 start_index of offset_table4]
alias_table_keys: [int16 src_index] * num_aliases
alias_table_values: [int16 dst_index] * num_aliases
offset_table1: [int16] * (num_resources + 1) in first 64k # +1 so that all entries have a next entry.
offset_table2: [int16] * (num_resources + 1) in second 64k
offset_table3: [int16] * (num_resources + 1) in third 64k
padding: 0 or two bytes of padding to ensure offset_table4 is 4-byte aligned
offset_table4: [int32] * num_resources with offsets > 192k
data: packed in tight!
To look up an index given an ID:
1. Binary search id_table_keys, finding the closest lower key.
2. Index is resource_index + (resource_id - base_id)
3. Sanity check ID is valid by CHECK'ing resource_index < num_resources && resource_index < next_id_table_resource_index
To look up data given index:
1. Find offset_table with closest-while-smaller start_index.
2. Read offset_table[index - start_index] to get offset
3. Read offset_table[index - start_index + 1] to get next_offset
4. If offset!=next_offset, return data given by offset+base_offset->next_offset+base_offset
5. Otherwise: it's an alias:
5a) Binary Search alias_table_keys to find index
5b) Go back to step 1. with result of lookup (but don't allow subsequent aliases).
Remarks:
* id_table_present_and_encoding byte will contain a bit denoting whether the id_table is present in the file or not.
* For non-locale paks, id_table will be embedded.
* Separate keys from values for tables in order to minimize cpu cache misses for binary searches
* Exactly 3 int16 offset tables to simplify code (compared to an arbitrary number). We should never required for our locale .paks.
* No need to maintain support for reading previous file formats.
* Chrome always bundles pak files
* The exception is extension themes, which just required a version number to be incremented to have .paks regenerated.
Estimated savings: 160kb (based on prototype in #5 + externalizing id_table)
Main points of contention:
#1: Should we allow for variable number of int16 offset tables
#2: Should we instead pack offset tables as tight as possible (e.g. by encoding size of each entry using LEB128), and the extract it to RAM.
* Would currently require ~1200 * sizeof(int32) = 4800 bytes of dirty RAM.
* Assuming avg 1 byte saved per string, 1430 strings (the current count) 60 locales, this would save:
* 60 * 1430 * 1 = 86kb of disk (uncompressed), and 29kb compressed (assuming ratio from #2)
My opinion is that #1 isn't worth the added complexity, and that #2 isn't enough savings for the added RAM / start-up computation.
,
Nov 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3 commit 18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3 Author: Andrew Grieve <agrieve@chromium.org> Date: Wed Nov 29 16:39:39 2017 Add size breakdown to //tools/grit/pak_util.py print Example output for: pak_util.py print locales/zh-CN.pak version: 5 encoding: utf-8 num_resources: 3120 num_aliases: 413 size_header: 12 size_id_table: 16248 size_alias_table: 1652 size_data: 99490 total_size: 117402 Entry(id=423, canonical_id=423, size=271, sha1=bd30bee710): ... ... Bug: 749521 Change-Id: Id0c3e788c83d9e97d67b7eb6a4090ca68cb97431 Reviewed-on: https://chromium-review.googlesource.com/794340 Commit-Queue: agrieve <agrieve@chromium.org> Reviewed-by: Robert Flack <flackr@chromium.org> Cr-Commit-Position: refs/heads/master@{#520138} [modify] https://crrev.com/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3/tools/grit/grit/format/data_pack.py [modify] https://crrev.com/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3/tools/grit/grit/format/data_pack_unittest.py [modify] https://crrev.com/18d614e2a09a57fc9d3ed7e0b9c01934bf4793a3/tools/grit/pak_util.py
,
Dec 1 2017
Realized that we can skip adding redundant offset entries for aliases if we look in the aliases table first. id_table changes to: id_table_keys: [int16 base_id] * num_sequential_blocks id_table_values: [int16 resource_index - num_proceeding_aliases] * num_sequential_blocks To look up an index given an ID: 1. Binary search the alias table 2a. If entry is found, it's value is the index 2b. Otherwise, record the index of the closest alias (alias_shift) 3. Binary search id_table_keys, finding the closest lower key. 4. Index is id_table_value + (resource_id - id_table_key) - alias_shift To look up data given index: 1. Find offset_table with closest-while-smaller start_index. 2. Read offset_table[index - start_index] to get offset 3. Read offset_table[index - start_index + 1] to get next_offset 4. return = data[base_offset + offset:base_offset + next_offset]
,
Dec 1 2017
(we have about 100 aliases per locale .pak, so 100*60*2 = 12k of offset bytes)
,
Dec 19 2017
Wanted to elaborate on #12: "remove resource IDs (in favor of storing resource indices directly) would be too little return-on-investment for the complexity of it." The idea here was to define all IDS_ variables as extern'ed constants, and then generate a grit_consants.cc that defined them all as resource indices rather than resource IDs. Why this adds complexity: * Unreachable sections are not removed until linking (--gc-sections), so we couldn't use the variables to create the usage whitelist unless we were willing to link all targets twice (I don't think we are). * Component builds complicate this. For them, we'd need to ensure we load a "resource_ids.cr.so" before any .cr.so is loaded (where that library defined the constants). Rob - Wanting to start working on this, and would feel better if you would comment on the approach outlined by #12.
,
Jan 10 2018
Note about string lengths: for en-US, 1374 of 1438 are < 128 bytes (95%) for ml (largest .pak), 1079 of 1438 are < 128 bytes (75%), and 1313 of 1438 are < 255. From #12 "Main points of contention", using 1 byte per offset would save ~30kb and require ~5kb of RAM. The 5kb of ram assumes that we expand every size to be a uint32, but perhaps there's another way? E.g., we could linear search the offset table for each lookup. This would require decoding and summing up to 1438 LEB128 numbers, most of which would be 1 byte. I'd guess this would be on the boarder of how slow we'd want look-up to be. Alternatively, maybe we could cache the value and position of every 100th offset. This would then require only 14 * sizeof(int) * 2 = ~100 bytes of RAM and have a one-time look-up cost for populating these cached values.
,
Jan 10 2018
Revamped proposal (Copied from #12, updated with #14 and #17). Create a file: assets/locale_pak_ids, which contains: Header: [int16 version][int16 num_resources] id_table_keys: [int16 base_id] * num_sequential_blocks (keys are resource_ids) id_table_values: [int16 resource_index - num_proceeding_aliases] * num_sequential_blocks (values are indices into size_table) Change locale pak files to: Header: [int32 version][int8 id_table_present_and_encoding][int8 null byte][int16 num_resources][int16 num_aliases] Header (cont'd): alias_table_keys: [int16 src_index] * num_aliases (keys are resource ids) alias_table_values: [int16 dst_index] * num_aliases (values are indices into offset table) size_table: [sleb128-encoded uint32] * num_resources (values are entry sizes) data: packed in tight! To look up an index given an ID: 1. Binary search the alias table 2a. If entry is found, it's value is the index 2b. Otherwise, record the index of the closest alias as "alias_shift" 3. Binary search id_table_keys, finding the closest lower key. 4. Index is id_table_value + (resource_id - id_table_key) - alias_shift To look up data given index: 1. Let the be an array "CACHE" of size num_resources / K (where K=100 in the example from #17) 2. Iterate through CACHE until either the index / K entry, or until an entry is missing 3. Iterate and sum entries in size_table until "index" entries have been summed (using CACHE value as a jump-start into this). Remarks: * id_table_present_and_encoding byte will contain a bit denoting whether the id_table is present in the file or not. * For non-locale paks, id_table will be embedded. * Separate keys from values for tables in order to minimize cpu cache misses for binary searches * No need to maintain support for reading previous file formats. * Chrome always bundles pak files * The exception is extension themes, which just required a version number to be incremented to have .paks regenerated. Estimated savings: ~200kb (based on estimate in #12, plus tweaks from #14 and #17)
,
Jan 19 2018
,
Aug 1
,
Aug 10
,
Oct 15
Moving locale pak files to dynamic modules will reduce the savings of this proposal to the point where it wouldn't be worth it. With only 2 locale paks (the case for english: en-US, en-GB), the per-entry overhead is: (224+473+1175+1175)*6 ~= 18kb Closing as it wouldn't be worth the effort! |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by agrieve@chromium.org
, Nov 3 2017