Win x64 Builder failure on chromium.perf |
|||||||||||||||
Issue descriptionhttps://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder?numbuilds=200 First seen at https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/9544 [5442/5460] LINK(DLL) chrome.dll chrome.dll.lib FAILED: chrome.dll chrome.dll.lib C:/b/depot_tools/python276_bin/python.exe gyp-win-tool link-wrapper environment.x64 False link.exe /nologo /IMPLIB:./chrome.dll.lib /DLL /OUT:./chrome.dll /PDB:./chrome.dll.pdb @./chrome.dll.rsp obj/chrome/browser/ui/ui.lib : fatal error LNK1107: invalid or corrupt file: cannot read at 0x14B19402 [5443/5460] LINK(DLL) chrome_child.dll chrome_child.dll.lib ninja: build stopped: subcommand failed. Looks like https://codereview.chromium.org/1517463003 causes it, which modified some ui files. Will go ahead and revert the CL.
,
Jun 14 2016
Hmm, seems the revert didn't work... Last known good compile was 9541: https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/9541 Then we have compile exceptions in 9542 and 9543. We are not entirely sure if they would pass compilation. Then compile fail in 9544. All three have touched chrome/browser/ui/. The range for 9532-9544 is: 399598-399650
,
Jun 14 2016
Starting return code bisect at: https://chromeperf.appspot.com/group_report?bug_id=619949
,
Jun 14 2016
This is a Visual Studio linker limitation, we need to do something like this https://codereview.chromium.org/2046573003/ to fix it.
,
Jun 14 2016
+brucedawson@ looks like our first LNK1107 (lib file > 2GB) with GN, any idea how to fix this ? The sourceset for this target[1] should probably be splitted ? [1] https://cs.chromium.org/chromium/src/chrome/browser/ui/BUILD.gn?q=chrome/browser/ui/bui&sq=package:chromium&l=32
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/25722a4a11f04300296119496fcc644a4fb03667 commit 25722a4a11f04300296119496fcc644a4fb03667 Author: zhenw <zhenw@chromium.org> Date: Tue Jun 14 15:18:47 2016 Revert of Added comprehensive tests for views::Border. (patchset #13 id:260001 of https://codereview.chromium.org/1517463003/ ) Reason for revert: This CL causes Win x64 Builder failure on chromium.perf. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/9544 [5442/5460] LINK(DLL) chrome.dll chrome.dll.lib FAILED: chrome.dll chrome.dll.lib C:/b/depot_tools/python276_bin/python.exe gyp-win-tool link-wrapper environment.x64 False link.exe /nologo /IMPLIB:./chrome.dll.lib /DLL /OUT:./chrome.dll /PDB:./chrome.dll.pdb @./chrome.dll.rsp obj/chrome/browser/ui/ui.lib : fatal error LNK1107: invalid or corrupt file: cannot read at 0x14B19402 [5443/5460] LINK(DLL) chrome_child.dll chrome_child.dll.lib ninja: build stopped: subcommand failed. BUG= 619949 Original issue's description: > Added comprehensive tests for views::Border. > > BUG= 568389 > > Committed: https://crrev.com/92614dc154ad143a1ef9849ae0fe90c02df09f30 > Cr-Commit-Position: refs/heads/master@{#399650} TBR=sadrul@chromium.org,danakj@chromium.org,mgiuca@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 568389 Review-Url: https://codereview.chromium.org/2060983004 Cr-Commit-Position: refs/heads/master@{#399709} [delete] https://crrev.com/8458e05a54ad72235f9329863ebb87307ace8291/ui/views/border_unittest.cc [modify] https://crrev.com/25722a4a11f04300296119496fcc644a4fb03667/ui/views/views.gyp
,
Jun 15 2016
+picksi (next sheriff) The bisect is still running. Comment 6 is just a repetition of comment 1 (not sure why it happened).
,
Jun 15 2016
Is this potentially an infra-structure issue? There is occasional purple and the failure in the logs says: FAILED: chrome.dll chrome.dll.lib C:/b/depot_tools/python276_bin/python.exe gyp-win-tool link-wrapper environment.x64 False link.exe /nologo /IMPLIB:./chrome.dll.lib /DLL /OUT:./chrome.dll /PDB:./chrome.dll.pdb @./chrome.dll.rsp obj/chrome/browser/ui/ui.lib : fatal error LNK1107: invalid or corrupt file: cannot read at 0x14B62506 Has a file become locked or something? Adding infra>labs and infra-troopers in case it is an infra problem.
,
Jun 15 2016
No, the problem is that there's a 2GB limitation on the size of the static libraries on Windows (.lib). You can still produce a library bigger than that but the linker won't be able to read it and you'll hit the error 'LNK1107'. We had this issue several time with Gyp but there was an easy way to fix it (msvs_shard[1]) but this doesn't exist in GN (afaik). We thought that this wouldn't be an issue in GN because the sourcesets (i.e. it sounded like a nice way to implicitly limit the number of files that get compiled in the same static library), but it look like it's not the case (i.e. some source sets are huge). So the solution here is to split the ui.lib sourceset defined here: https://cs.chromium.org/chromium/src/chrome/browser/ui/BUILD.gn?q=chrome/browser/ui/bui&sq=package:chromium&l=32 , I don't know if there's an easy way to do this. [1] https://cs.chromium.org/chromium/src/tools/gyp/pylib/gyp/MSVSUtil.py?q=msvs_shard+file:%5Esrc/tools/gyp/&sq=package:chromium&dr=C&l=73
,
Jun 15 2016
(Please disregard my last three deleted comments: they were about the FYI builder rather than the perf.fyi builder)
,
Jun 15 2016
Wow, I just did a local test and obj/chrome/browser/ui/ui.lib has a size of ~4GB, this is way above the 2GB limit !
,
Jun 15 2016
You should try to revert https://codereview.chromium.org/2060983002 to fix this, it'll have some performance implication but it'll fix the build.
,
Jun 15 2016
I'm reverting this now
,
Jun 15 2016
CL for revert is here: https://codereview.chromium.org/2071493002/
,
Jun 15 2016
I will change this target to be a source set in 64-bits. This gets around the issue easily (in case it comes up for other targets) but causes some slowdown in linking so we prefer static libraries for these. Reverting the patch will cause some regressions in cycle times for Windows builders which are already on the edge of their SLA. Since this patch has landed we flipped some more Windows bots.
,
Jun 15 2016
,
Jun 15 2016
+1 to what brettw wrote in #19 ... reverting this CL will affect the CQ in a bad way; brettw@ has a CL posted that should fix the perf bot and we should land that instead.
,
Jun 15 2016
For completeness brettw's CL is here: https://codereview.chromium.org/2067143003/
,
Jun 15 2016
awesome, I'll do more tests on my end to check if more static libraries are about to hit the 2GB lib file limit. Would it make sense to have an equivalent to the GYP's msvs_shard flag in GN ?
,
Jun 15 2016
The much easier solution is to make them source sets. The shard thing is kind of a complicated hack. The fact that these are shared libraries is only a linking performance optimization anyway.
,
Jun 15 2016
yeah, it's just that it's sad to have to sacrifice the linking performance optimizations. So I understand that the short term easy solution is to make them source sets but we should probably try to find a better long term one ? The solution might be to split the ui static library into several distinct libraries ? i.e. rework its build file. (I'm not familiar with this part of the codebase so I don't know if there's a clear way to split this into separate UI components).
,
Jun 15 2016
I think there's a bunch more work to do here as we try to figure out the best way to speed up link time and keep things below the limits.
,
Jun 15 2016
Ok, I just want to make sure that we do something about these limitations. This LNK1107 link error caused us a lot of trouble with Gyp as not so many people know the msvs_shard option. A lot of people have wasted a lot of time trying to fix the build (generally by randomly reverting some CLs) and I want to make sure that we don't start hitting this with GN as well.
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c840a2adcf6540d189c41e7cb9825536035082a4 commit c840a2adcf6540d189c41e7cb9825536035082a4 Author: brettw <brettw@chromium.org> Date: Wed Jun 15 16:20:54 2016 Make chrome/browser/ui a source set on 64-bit official builds. The static library for this target was exceeding 4GB on Windows. This condition matches the one we added chrome/browser. BUG= 619949 TBR=picksi1 Review-Url: https://codereview.chromium.org/2067143003 Cr-Commit-Position: refs/heads/master@{#399916} [modify] https://crrev.com/c840a2adcf6540d189c41e7cb9825536035082a4/chrome/browser/ui/BUILD.gn
,
Jun 15 2016
@sebmarchand - yup, no question.
,
Jun 15 2016
The build has failed but for a different reason (timeout, I'm 'fixing' this in https://codereview.chromium.org/2070623002/) The fact that it timed out make me believe that the original issue (LNK1107) is fixed as it was usually failing after ~40 mins, now we're just timing out while linking the big tests.
,
Jun 15 2016
It's not super hard to split these targets up manually if we desire.
,
Jun 16 2016
Apparently the file size limit for .lib files has been raised recently from 2 GiB to 4 GiB. This is proven by the fact that I've seen ui.lib be successfully used at just over 3.9 GiB. Given how much .lib file sizes can very between build flavors I'm not surprised that it broke 4 GB on some variants. The second largest .lib file on the build I examined was only 1.5 GiB so if we fix ui.lib we will have a lot of headroom. Large source sets are proving problematic for build times (see comment #30) so we probably need to break ui.lib into two or three .lib files - the manual equivalent of msvs_shard.
,
Jun 16 2016
This splitting will be easier after we remove GYP, since currently the file list is slurped out of a .gypi.
,
Jun 23 2016
Friendly sheriff ping (According to the Perf Bot Sheriffing guide, Pri-1 build bot bugs should be pinged daily. Please close the issue if it's already been fixed.)
,
Jun 23 2016
The address error on the UI target should be fixed now. It looks like currently the bot is just timing out a lot of the time (sometimes it succeeds).
,
Jun 23 2016
,
Jun 23 2016
Stealing this ticket, time to disconnect faulty machines.
,
Jun 23 2016
So, there are several issues that are in play here. brettw@ fixed one of them. However, another is that there are three machines serving this builder: - build131-m1, which has 128GB of RAM and seems to currently be happy - build34-m1, which has 64GB and isn't - vm326-m1, which has 16GB and is even less so. If I'm reading the output correctly, all of them have 8 cores. I've asked pgervais@ to shut off build34-m1 and vm326-m1 temporarily. We'll drop down to even slower cycle times, but it should at least work. Once we get things working somewhat reliably, it looks like the builder cycle time is still maybe 4x slower than GYP, so we still have work to do there.
,
Jun 23 2016
I disconnected vm326-m1 and build34-m1. Reassigning the ticket to brettw@
,
Jun 23 2016
I'll take this over for now ... brettw@'s going to work on GN link time stuff, but I'll own the overall issue.
,
Jun 23 2016
Issue 617982 has been merged into this issue.
,
Jun 23 2016
Talked to Dirk about this in person.. we're going to make sure all the hw matches for this builder. I'll replace build34-m1 in place and allocate a new machine for vm326-m1.
,
Jun 23 2016
thanks, dba@!
,
Jun 23 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome-golo/chrome-golo.git/+/bfa8014a9b8c8148b3492ac90efc21afa355183a commit bfa8014a9b8c8148b3492ac90efc21afa355183a Author: dba <dba@google.com> Date: Thu Jun 23 22:52:33 2016
,
Jun 24 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome-golo/chrome-golo.git/+/8115d7b047d585a8a3a434014be8a62e5b235ab9 commit 8115d7b047d585a8a3a434014be8a62e5b235ab9 Author: Bryce Albritton <dba@google.com> Date: Fri Jun 24 02:01:18 2016
,
Jun 24 2016
build34-m1 has been redeployed on newer hardware and build13-m1 is ready to take the place for vm326-m1 once someone updates the slaves config.
,
Jun 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d28aba56dde82051680f9048daf1b1501b9bd89a commit d28aba56dde82051680f9048daf1b1501b9bd89a Author: brettw <brettw@chromium.org> Date: Fri Jun 24 03:04:28 2016 Make GN components static libraries on Win x64 For the static builds, previously there was a static library exception for the windows official builds. They were source sets instead on the theory that some of them could get over 2GB. There's only one component over 2GB (blink_modules). The rest of the targets that were too large before were all explicit static libraries which have been dealt with already. This change whitelists blink_modules. After that, the largest static library is now blink_platform (1.2GB). The rest are less than 1 GB. So we can now disable the special case for this configuration. BUG= 619949 Review-Url: https://codereview.chromium.org/2098673002 Cr-Commit-Position: refs/heads/master@{#401800} [modify] https://crrev.com/d28aba56dde82051680f9048daf1b1501b9bd89a/build/config/BUILDCONFIG.gn [modify] https://crrev.com/d28aba56dde82051680f9048daf1b1501b9bd89a/third_party/WebKit/Source/modules/BUILD.gn
,
Jun 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/ed4365255ca8636f33e75b36c6e98f4d8aeff6f8 commit ed4365255ca8636f33e75b36c6e98f4d8aeff6f8 Author: dpranke <dpranke@chromium.org> Date: Fri Jun 24 03:07:58 2016 Update build slave assignment for Win x64 Builder on chromium.perf. TBR=dba@chromium.org BUG= 619949 Review-Url: https://codereview.chromium.org/2091233002 [modify] https://crrev.com/ed4365255ca8636f33e75b36c6e98f4d8aeff6f8/masters/master.chromium.perf/slaves.cfg
,
Jun 24 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/1a2009cb0519e43151ba0a1f033edf30cd782b9a commit 1a2009cb0519e43151ba0a1f033edf30cd782b9a Author: dpranke <dpranke@google.com> Date: Fri Jun 24 03:13:55 2016
,
Jun 24 2016
bounce, bounce ... The master has been restarted and the new slave added. dba@, pgervais@, can one of you start up the new builder (build13-m1)?
,
Jun 24 2016
Just restarted build13-m1. It's now connected up.
,
Jun 24 2016
Thanks, pschmidt@! Looks like things should be happy now, so I'm closing this. |
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by bugdroid1@chromium.org
, Jun 14 2016