Issue metadata
Sign in to add a comment
|
Sanitizer binaries are huge in size |
||||||||||||||||||||||
Issue descriptionSee title. Sanitizer build archives on all platform are growing to crazy huge in size. We need to see if these could be optimized. We can create sub-bugs later, this is for tracking the meta issue.
,
Aug 10 2016
Spun out from https://bugs.chromium.org/p/chromium/issues/detail?id=635715#c48: I think it broke between r408781 and r409964. asan-win32-release-408692.zip 2016-07-29 22:06:20 1708.49MB 408692 3a796de6b43352fb756aa1c9c24a8bc5fc481553 [DIR] asan-win32-release-408734.zip 2016-07-30 00:04:56 1707.06MB 408734 a8e4ad8678760fe8d08c24ad1788347990042b34 [DIR] asan-win32-release-408781.zip 2016-07-30 02:01:04 1707.06MB 408781 59e7b40948030815a49af887b922b370ba048b8b [DIR] asan-win32-release-409964.zip 2016-08-05 04:34:26 2555.79MB 409964 3a6cbe4c1dab38c5e9094a3ce6164b902c319bf2 [DIR] asan-win32-release-409973.zip 2016-08-05 06:22:34 2555.75MB 409973 4f70f4aa155228044960e6521266c22e9f4e5539
,
Aug 10 2016
r408890 is the gn switch for the lkgr win asan bots, so this is likely due to that. First step is probably to download a smaller and a larger zip and check if the larger one has more files, or if some file that's in both is larger. (Builds are archived at http://commondatastorage.googleapis.com/chromium-browser-asan/index.html?prefix=win32-release/)
,
Aug 10 2016
Two ideas: * was full debug info enabled somewhere? * 'xz' may offer a much higher compression rate than zip
,
Aug 15 2016
Max, can you check c#3 and first point in c#4 ?
,
Aug 15 2016
I see this significant difference: r408781: ... -rw-r----- 1 mmoroz eng 134457344 Jul 30 05:19 ipc_fuzzer.exe -rw-r----- 1 mmoroz eng 19888128 Jul 30 05:16 ipc_fuzzer_replay.exe -rw-r----- 1 mmoroz eng 1074688 Jul 30 05:16 ipc_message_list.exe -rw-r----- 1 mmoroz eng 17195520 Jul 30 05:16 ipc_message_util.exe ... r409964: ... -rw-r----- 1 mmoroz eng 673590272 Aug 5 07:41 ipc_fuzzer.exe -rw-r----- 1 mmoroz eng 649572352 Aug 5 07:41 ipc_fuzzer_replay.exe -rw-r----- 1 mmoroz eng 647089664 Aug 5 07:41 ipc_message_list.exe -rw-r----- 1 mmoroz eng 649726464 Aug 5 07:41 ipc_message_util.exe ...
,
Aug 15 2016
Number of files in "small" archive is actually bigger than number of files in "large" one, because "small" contains the following files: -rw-r----- 1 mmoroz eng 175384064 Jul 30 04:59 v8_shell.exe -rw-r----- 1 mmoroz eng 822 Jul 30 04:59 v8_shell.exe.assert.manifest -rw-r----- 1 mmoroz eng 822 Jul 30 04:59 v8_shell.exe.manifest -rw-r----- 1 mmoroz eng 109 Jul 30 04:59 v8_shell.exe.manifest.rc -rw-r----- 1 mmoroz eng 888 Jul 30 04:59 v8_shell.exe.manifest.res -rw-r----- 1 mmoroz eng 142381056 Jul 30 04:59 v8_shell.exe.pdb while the "large" has only .exe and .pdb: -rw-r----- 1 mmoroz eng 176911360 Aug 5 07:27 v8_shell.exe -rw-r----- 1 mmoroz eng 14094336 Aug 5 07:27 v8_shell.exe.pdb
,
Aug 15 2016
I mean for other executable as well, v8_shell is just an example.
,
Aug 15 2016
A few more significant differences: r408781: ... -rw-r----- 1 mmoroz eng 1156608 Jul 30 05:03 blink_deprecated_test_plugin.dll -rw-r----- 1 mmoroz eng 756736 Jul 30 05:03 blink_test_plugin.dll ... -rw-r----- 1 mmoroz eng 13796352 Jul 30 05:16 ipc_message_dump.dll ... -rw-r----- 1 mmoroz eng 121344 Jul 30 04:59 libEGL.dll ... r409964: ... -rw-r----- 1 mmoroz eng 48297984 Aug 5 07:27 blink_deprecated_test_plugin.dll -rw-r----- 1 mmoroz eng 48059904 Aug 5 07:27 blink_test_plugin.dll ... -rw-r----- 1 mmoroz eng 649198592 Aug 5 07:41 ipc_message_dump.dll ... -rw-r----- 1 mmoroz eng 282624 Aug 5 07:22 libEGL.dll ... -rw-r----- 1 mmoroz eng 131174912 Aug 5 07:27 ui_library.dll ...
,
Aug 16 2016
Could it be so that the binaries are being built with -g instead of -gline-tables-only?
,
Aug 16 2016
I'll take a look into differences between those binaries in 1-2 hours.
,
Aug 16 2016
Also there are huge differences in subdirectories.
,
Aug 16 2016
Since we suspect the gn switch as a culprit, is it possible that we link too many things while building with gn?
,
Aug 16 2016
I suggest you compare the lists of functions in the .dll files, so that it gives you the idea of where the extra code comes from. Maybe this is dead code, and we're just missing the option that should remove it? (Another cause of binary size mismatch could be disabling of ICF, but that shouldn't affect the symbol table size). Regarding the -gline-tables-only flag, I don't think compilation flags are present in the bot logs (at least I couldn't find them), so you'll probably need to generate the .ninja files locally and look the flags up.
,
Aug 16 2016
The .dll files itself don't contain function names. It should be in .pdb files. I don't know a good way to parse them in Linux, so I'll have to set up build environment on my windows desktop tomorrow and see what's going on.
,
Aug 18 2016
It may sound silly, but my windows workstation still not ready. yesterday I had to replace some hardware and re-image it. Today i'm waiting for the MSVS version required (it takes more than 1 day to obtain that). I'll be OOO until next Wednesday.
,
Aug 18 2016
I think the new gn builds include a large number of object files: $ fd obj ./clang_nacl_win64/obj ./clang_newlib_x64/obj ./clang_newlib_x86/obj ./clang_x64/obj ./glibc_x64/obj ./glibc_x86/obj ./irt_x64/obj ./irt_x86/obj ./nacl_win_as_x64/obj ./nacl_win_as_x86/obj ./newlib_pnacl/obj ./newlib_pnacl_nonsfi/obj $ ll irt_x86/obj/url/url total 2140 drwxr-xr-x 1 rnk 1049089 0 Aug 4 21:16 . drwxr-xr-x 1 rnk 1049089 0 Aug 4 21:20 .. -rw-r--r-- 1 rnk 1049089 234872 Aug 4 21:16 gurl.o -rw-r--r-- 1 rnk 1049089 115200 Aug 4 21:16 origin.o -rw-r--r-- 1 rnk 1049089 150364 Aug 4 21:16 scheme_host_port.o -rw-r--r-- 1 rnk 1049089 79680 Aug 4 21:16 url_canon_etc.o -rw-r--r-- 1 rnk 1049089 75844 Aug 4 21:16 url_canon_filesystemurl.o -rw-r--r-- 1 rnk 1049089 77736 Aug 4 21:16 url_canon_fileurl.o ... These files aren't present with gyp. Why aren't these object files in obj/? Where do we decide which files to include in the archived build?
,
Aug 18 2016
We need to update build/scripts/slave/recipe_modules/archive/api.py to exclude object file directories created by gn, which are no longer at the top-level of the build directory. It's a bit involved, because the existing filter only really works on the top level.
,
Aug 18 2016
That's certainly possible, if you've been filtering things out in the past. The file layout for objects changed from GYP->GN, particularly with respect to NaCl objects. Also, we know we build significantly more objects in the NaCl toolchains now than we did w/ GYP; in GYP we had separate copies of //base, //ipc, etc. in the build files that were kept minimal; in GN, we got rid of that to lower the overall build maintenance, and the cost of slightly increased cycle time and counting on the linker to strip out the unneeded code.
,
Aug 25 2016
A side note from bug 607627 regarding sizes of libfuzzer-based fuzzers: Interesting, that debug builds are smaller than release ones, for example: Release: -rwxr-x--- 1 mmoroz eng 21014352 Aug 11 13:48 out/Release/icu_break_iterator_fuzzer Debug: -rwxr-x--- 1 mmoroz eng 2974824 Aug 11 13:59 out/Release/icu_break_iterator_fuzzer
,
Aug 25 2016
+machenbach, who looks like the best owner for archive/api.py
,
Sep 7 2016
Is it still important that we filter object files out of the archived clusterfuzz builds? Do we need to reassign?
,
Sep 15 2016
,
Sep 15 2016
Filed issue 647353 for the api.py problem. It sounds like there's more than that here -- mmoroz, updates to comment 18?
,
Sep 16 2016
Well, I don't think that the problem caused by debug symbols. Debug symbols should be in .pdb file (if we don't use "/Z7" flag - I believe we don't): r408781: 129M ipc_fuzzer.exe 100M ipc_fuzzer.exe.pdb r409964: 643M ipc_fuzzer.exe 59M ipc_fuzzer.exe.pdb which is even smaller after the gn switch. Unfortunately, I haven't found a convenient way to compare contents of two .pdb files. Not sure if anybody on the Earth know how to parse/analyze it :( I've compiled ipc_fuzzer.exe locally with the same configuration as the build bot (https://build.chromium.org/p/chromium.lkgr/builders/Win%20ASan%20Release/builds/2922/steps/generate_build_files/logs/stdio): enable_ipc_fuzzer = true is_asan = true is_clang = true is_component_build = false is_debug = false target_cpu = "x86" use_goma = false v8_enable_verify_heap = true and got other sizes: 09/16/2016 02:14 PM 269,165,056 ipc_fuzzer.exe 09/16/2016 02:14 PM 421,990,400 ipc_fuzzer.exe.pdb I've tried to compare exact options passed to the compiler, but looks like we don't have logs from the GYP time: https://build.chromium.org/p/chromium.lkgr/builders/Win%20ASan%20Release/builds/2669/steps/generate_build_files I'm trying one more way to compare the symbols, will post an update soon.
,
Sep 16 2016
We have a pretty good understanding of pdbs in llvm land. zturner, do we have any tools to dump contents of pdb files which are useful for understanding why one pdb is larger than another one?
,
Sep 16 2016
I've opened both binaries in IDA Pro and loaded PDBs as well. Extracted symbols attached. Roughly looks the same... Probably that confirms that debug symbols are not the issue.
,
Sep 16 2016
Though there are many symbols, not sure about my statement that they are roughly the same.
,
Sep 16 2016
I'm going to take a wild guess that one has a larger internal block size. If you can make the pdbs available I'll take a look later this afternoon
,
Sep 16 2016
All builds are here: http://commondatastorage.googleapis.com/chromium-browser-asan/index.html?prefix=win32-release/ I've looked into 408781 and 409964. Thanks!
,
Sep 16 2016
I'm taking a look at the two builds you suggest. There's some interesting differences. The biggest is that the big pdb has a stream of "Fixup data" that is 86MB. This stream is completely non-existant in the small PDB. The difference in size between the two is obviously less than 86MB, but there are some places in the small PDB where the streams are actually bigger than the big PDB, and there are also MORE streams in the small PDB.
Anyway, this fixup data is is almost certainly the culprit since it accounts for 80% of the entire file size of the big pdb.
Fixup data seems to be related to instrumented code. For example, the `XFIXUP_DATA` structure in Microsoft's cvinfo.h header is defined like this:
typedef struct tagXFIXUP_DATA {
unsigned short wType;
unsigned short wExtra;
unsigned long rva;
unsigned long rvaTarget;
} XFIXUP_DATA;
The `rva` and `rvaTarget` fields seem to indicate that it's mapping an address from one binary's address space to another, which is something you might do after instrumenting some code.
I don't know *why* 408781 contains fixup data and 409964 doesn't though.
Anyway, maybe this is a red herring. 409964 is the BIG build and 408781 is the SMALL build right?
Yet the sizes of EXEs and PDBs seem to tell a different story:
(Don't shoot me for using Powershell)
PS D:\bigpdb\408781> Get-ChildItem *.pdb,*.exe | Measure-
Object -property length -sum
Count : 63
Average :
Sum : 5635425792
Maximum :
Minimum :
Property : Length
PS D:\bigpdb\409964> Get-ChildItem *.pdb,*.exe | Measure-Object -property length -sum
Count : 65
Average :
Sum : 5362196992
Maximum :
Minimum :
Property : Length
So your size discrepancy is coming from somewhere else.
,
Sep 16 2016
Ahh, I forgot to include DLLs. The size is coming from DLLs. PS D:\bigpdb\409964 (big)> Get-ChildItem *.dll | Measure-Object -property length -sum Count : 59 Average : Sum : 3145553408 Maximum : Minimum : Property : Length PS D:\bigpdb\408781 (small)> Get-ChildItem *.dll | Measure-Object -property length -sum Count : 56 Average : Sum : 2180530688 Maximum : Minimum : Property : Length So 409964 has 1GB extra of DLLs, while the sum of PDB and EXE is roughly the same.
,
Sep 16 2016
A ton of it seems to be coming from ipc_message_dump.dll. It's 633MB in 409964, and 13MB in 408781 Total difference is 500MB, which accounts for ~25% of the entire difference between the two builds.
,
Sep 16 2016
Also you've got 1.2GB coming from these irtx86 and irtx64 directories, which aren't present in the smaller builds. I think that accounts for roughly 90% of the increased size. ipc_message_dump.dll - 600 / 2000 = 30% irt* folders - 1200 / 2000 = 60%
,
Sep 16 2016
I'm pretty sure irt* is full of object files, aka issue 647353 .
,
Sep 20 2016
Bruce, do comments 33 to 35 speak to you? Any idea why this might've changed? Possibly /INCREMENTAL?
,
Sep 20 2016
13 MB to 633 MB (or is that 133 MB to 633 MB). Either way, huge jump. The main cause of differences in .dll and .exe sizes from the gyp to gn transition usually turns out to be source sets. I'm currently working on figuring out which ones are causing gn's chrome.dll to be larger than gyp's, and ipc_message_dump.dll could be hitting the same problem. Unfortunately there are hundreds of source sets, some have to stay that way, and it's hard to tell which ones you need to change to fix a particular size problem. Another possibility is that /opt:icf or /opt:ref could be disabled. These tell the linker to discard redundant and unreachable blocks of code/data which can make an enormous difference to the size of binaries. These two work together. The cost of linking in a lot of source sets is reduced if /opt:ref is used. I believe that /opt:ref is normally only on for official builds. I would definitely experiment with that.
,
Sep 22 2016
Clearing the Proj-GN-Migration label since it didn't block the GN migration (I'm trying to figure out what, if any GYP/GN-related tasks might be left).
,
Mar 3 2017
Assigning to Bruce as per c#39.
,
Mar 18 2017
,
Mar 18 2017
,
Mar 24 2017
I think I happened to run a possible cause for this just now. I just learned that `-glinetables-only -gsplit-dwarf` has the same effect as `-g2 -gsplit-dwarf` (since http://llvm.org/viewvc/llvm-project?view=revision&revision=279687). If the order of the flags is the other way round, that's not the case. I'm guessing that the order of these two flags happened to flip during the gn switch. The fix is to not pass -gsplit-dwarf in config("minimal_symbols"). I'll give that a try.
,
Mar 24 2017
Ah no, we only pass -gsplit-dwarf in debug builds, and the asan bots are all release builds. Ah well. At least I'll make the debug compile bots faster.
,
Mar 25 2017
,
Jul 11
brucedawson@, do you have any plan / ETA on this?
,
Jul 12
Apologies for not investigating this lately. I just grabbed the data (archive size versus commit position) from http://commondatastorage.googleapis.com/chromium-browser-asan/index.html?prefix=win32-release/ and graphed it and then grabbed key points: Commit Pos Size(MB) 408781 1707.1 409964 2555.8 - initial regression ... 433191 2958.3 - worst case ... 434178 2423.6 434216 1704.3 - big improvement! ... 443500 1724.9 443512 1105.9 - even more improvement ... 523446 1450.9 - gradual increase The last data point is from 2017-12-12 - I don't know why they stop at this point. For http://commondatastorage.googleapis.com/chromium-browser-asan/index.html?prefix=linux-debug/ I doni't see any sudden increases, just gradual growth over time. For http://commondatastorage.googleapis.com/chromium-browser-asan/index.html?prefix=linux-debug-v8-arm/ there is effectively no size growth over the last four years. So, the original issue has been solved. I'm not sure when, but either some gn cleanup or packaging fixes or compiler fixes have done the job. I'm going to close as fixed.
,
Jul 12
We use the 64-bit builds on windows now, that are archived in a different sub-dir http://commondatastorage.googleapis.com/chromium-browser-asan/index.html?prefix=win32-release_x64/ |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by infe...@chromium.org
, Aug 10 2016