falco-release times-out at BuildPackage |
|||||||||
Issue descriptionhttps://luci-milo.appspot.com/buildbot/chromeos/falco-release/3028 It took ~7hours before being killed. It looks to me that this is a problem uncovered by the chrome experiment we are doing [1]. I have a few successful tryjobs which took 14-15 hours to BuildPackage. The build log is much larger than a successful build. It seems that the build process is restarted repeatedly. For example, I saw this 10 times: > chromeos-chrome-64.0.3274.0_rc-r1: [13557/41631] CXX host/obj/base/base/process_handle_linux.o[K I cannot reproduce it locally. However, I observed some heavy loading on my system when building; A sampled "load" showed more than 200 jobs and ate up all the memory. [1] We use different AutoFDO profiles on falco-release and peppy-release. Therefore, they may not have prebuilt chrome package and need to build from source. It is strange that this started to show up after I updated a profile for falco manually. The tryjob I used to test the CL finished in ~1h. peppy-release is also fine. I'm trying to see if reverting the profile hides the problem.
,
Nov 23 2017
Update: the repeated log seems to be printed every 60m so it is unlikely restarted repeatedly. Another observation: the profile is ~4x larger than previous ones. I'm trying to measure the time spent in each process.
,
Nov 23 2017
Since previous tryjob doesn't catch the problem, the tryjob that reverts the CL may not tell anything as well. Therefore, I reverted it on R63. Let's see if it fixes the problem.
,
Nov 23 2017
Alright, the new profile make compilation much slower and eats a lot of memory. For example, when compiling memory_region_map.cc in tcmalloc, the old profile uses (elapsed time, user time, sys time, max rss in kb): 2.464410 : 1.944000 : 0.312000 : 334176 while the new profile uses 53.394735 : 4.712000 : 6.612000 : 1545784 I guess most of the elapsed time comes from threshing because my workstation becomes unusable when building chrome. I'm going to revert the profile on master, too.
,
Nov 23 2017
,
Nov 23 2017
,
Nov 23 2017
I noticed that the profile was larger when I created it. It is almost 5x larger than the previous one, but "only" 2.3x larger than the first one: 22884910 -> Oct 10 13:16 chrome_cwp_62.0.3202.43_solo_peppy.afdo 11769227 -> Nov 2 17:22 chrome_cwp_63.0.3239.20_celes.afdo 52325126 -> Nov 16 15:23 chrome_cwp_63.0.3239.42_peppy.afdo Since the previous, smaller profile didn't perform that well (performance wise), I had hoped that this one will do better. But it is too large. The likely cause is the symbol aliasing introduced by ICF (b/38454265). My feeling is that aliases are handled incorrectly by the autofdo profile creator. I am looking over the code and I think that I can fix at least the part that dumps the symbol information for each alias, which causes the size blowout.
,
Nov 23 2017
> It seems that the build process is restarted repeatedly. nothing was restarted. emerge is configured to dump the build log every 60 minutes. if you look at the parallel_emerge output that brackets the chrome build log: === Start output for job chromeos-chrome-64.0.3274.0_rc-r1 (60m3.6s) === ... === Still running: job chromeos-chrome-64.0.3274.0_rc-r1 (60m3.6s) === ... === Continue output for job chromeos-chrome-64.0.3274.0_rc-r1 (120m7.2s) === ... === Still running: job chromeos-chrome-64.0.3274.0_rc-r1 (120m7.2s) === ... === Continue output for job chromeos-chrome-64.0.3274.0_rc-r1 (180m10.8s) === ...
,
Nov 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/a8889de841211103d105d397cd9f94a029227726 commit a8889de841211103d105d397cd9f94a029227726 Author: Ting-Yuan Huang <laszio@chromium.org> Date: Sat Nov 25 07:18:48 2017 Revert "chrome: update experimental autofdo profiles to 3239.42" This reverts commit fd92f4beba2f98bd29756a0a64fc922c60c66073. Reason for revert: the CL seemed to cause BuildPackages timeout. Original change's description: > chrome: update experimental autofdo profiles to 3239.42 > > BUG=b:37251947, chromium:777730 > TEST=USE=chrome_afdo_exp1 emerge-peppy chromeos-chrome > cros tryjob falco-release > > Change-Id: Ic549960c8db4b124bb0208faaba4198f2e749d28 > Reviewed-on: https://chromium-review.googlesource.com/777026 > Reviewed-by: Ting-Yuan Huang <laszio@chromium.org> > Tested-by: Ting-Yuan Huang <laszio@chromium.org> BUG=b:37251947, chromium:777730 , chromium:788017 TEST=USE=afdo_chrome_exp1 emerge-falco chromeos-chrome Change-Id: I3f31f39fe4ce45c9097f935843c3a53fc21ccf21 Reviewed-on: https://chromium-review.googlesource.com/786826 Commit-Ready: Ting-Yuan Huang <laszio@chromium.org> Tested-by: Ting-Yuan Huang <laszio@chromium.org> Reviewed-by: Ting-Yuan Huang <laszio@chromium.org> [modify] https://crrev.com/a8889de841211103d105d397cd9f94a029227726/chromeos-base/chromeos-chrome/chromeos-chrome-9999.ebuild [modify] https://crrev.com/a8889de841211103d105d397cd9f94a029227726/chromeos-base/chromeos-chrome/Manifest
,
Nov 27 2017
falco-release still failing: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Ffalco-release%2F3039%2F%2B%2Frecipes%2Fsteps%2FBuildPackages__afdo_use_%2F0%2Fstdout Probably because we are still trying to build chromeos-chrome-64.0.3274.0_rc-r1, which does not have the revert in #9.
,
Nov 27 2017
Chrome just got up-reved.
,
Nov 28 2017
We had to pin Chrome due to 788925, so not fixed yet.
,
Nov 28 2017
The bad profile only affected 3274. Pinning it to 3273 hides the problem. After unpinning, the problem is already solved. Unless you want to pin to 3274, this can be marked fixed.
,
Jan 22 2018
,
Jan 23 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by laszio@chromium.org
, Nov 22 2017