lakitu adding 40-50 minutes to most CrOS CQ build times |
|||||
Issue descriptionhttps://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8925497126423509504 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8925513420745731760 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8925538343723018752 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8925549980962617328 Going to remove lakitu from CrOS CQ until this is resolved.
,
Jan 3
,
Jan 3
Assigning to our oncall to take a look.
,
Jan 3
Thanks for filing the bug. For future incidents, instead of copying individuals, you could assign the bug to our production on call (can be found at go/cos), and cc lakitu-dev@google.com.
,
Jan 4
I have spent some time on this. Let me take a stab.
,
Jan 4
Any root cause identified or should I proceed with removal for now?
,
Jan 4
Jason, feel free to make lakitu-paladin(s) experimental (if that's still a thing), while Ke is figuring it out.
,
Jan 5
I fixed this in my local sdk by building and using an old version of Portage (new `sudo emerge --version` is 2.2.28). Still root causing.
,
Jan 5
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/a31ffc024724bcb8c016595b07615957e85d33ee commit a31ffc024724bcb8c016595b07615957e85d33ee Author: Jason D. Clinton <jclinton@chromium.org> Date: Sat Jan 05 17:39:53 2019 Mark Lakitu as experimental until performance problems addressed BUG=chromium:918723 TEST=run_tests Change-Id: I4803d902453086fc7cc45c465f9becfa10748ee5 Reviewed-on: https://chromium-review.googlesource.com/c/1396263 Tested-by: Jason Clinton <jclinton@chromium.org> Reviewed-by: David Burger <dburger@chromium.org> Commit-Queue: Jason Clinton <jclinton@chromium.org> [modify] https://crrev.com/a31ffc024724bcb8c016595b07615957e85d33ee/config/chromeos_config.py [modify] https://crrev.com/a31ffc024724bcb8c016595b07615957e85d33ee/lib/constants.py [modify] https://crrev.com/a31ffc024724bcb8c016595b07615957e85d33ee/config/config_dump.json
,
Jan 7
The issue was introduced between commits 1879d3ca019ebe4b870c3ee8d80910a90a8e4408 and 47a3f7e905b3ba252659fa1a98071d03a1560807 in portage_tool. This heavily suggests that the new INSTALL_MASK implementation in Portage is broken somehow.
,
Jan 9
As Robert suggested, INSTALL_MASK implementation is the culprit. The INSTALL_MASK in lakitu looks like this: '*.a *.c *.cc *.go *.la *.h *.hh *.hpp *.h++ *.hxx */.keep* /etc/init.d /etc/runlevels ... /usr/src /boot/config-* /boot/System.map-* /usr/local/build/autotest /lib/modules/*/build /lib/modules/*/source test_*.ko /etc/init /boot/kdump/System.map-* /usr/share/man /usr/share/info /usr/share/doc' In old implementation, every entry in the list is deleted as a whole: https://chromium.googlesource.com/chromiumos/third_party/portage_tool/+/1879d3ca019ebe4b870c3ee8d80910a90a8e4408/bin/misc-functions.sh#348. However, in the new implementation, the files matching an entry are removed one by one: https://chromium.googlesource.com/chromiumos/third_party/portage_tool/+/47a3f7e905b3ba252659fa1a98071d03a1560807/pym/portage/util/install_mask.py#110. This took so much time.
,
Jan 9
What's the next step here?
,
Jan 10
Seems to me that we need to patch portage. I can take this over if Ke doesn't have time.
,
Jan 10
Thanks for volunteering, Robert. I'm assigning it to you then. We need a resolution for this so that we can add lakitu-paladin back to the CQ.
,
Jan 10
Thanks Robert for the help. Another thing worth trying is to find out why other CrOS boards don't suffer from this severely.
,
Jan 11
We've fixed this on Lakitu's end in CL:1406266. Can we enable Lakitu in the CQ again?
,
Jan 11
Lakitu is enabled in the CQ today: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=lakitu-paladin&buildBranch=master It is currently marked as "not important" but will continue to run. I'd suggest we leave it as not important until we have consistent runs with a lowered execution time. At that time we can definitely remove it from that classification. Does that sound reasonable? -- Mike
,
Jan 11
sgtm.
,
Jan 18
(5 days ago)
I believe we now have consistent runs with lowered execution time. WDYT about marking lakitu-paladin as important again?
,
Jan 18
(4 days ago)
I'll update it today to important. -- Mike
,
Jan 18
(4 days ago)
You can probably just revert my CL.
,
Jan 20
(3 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/60a66c9da216b252d3397d0e12ea25337eb8d85c commit 60a66c9da216b252d3397d0e12ea25337eb8d85c Author: Mike Nichols <mikenichols@chromium.org> Date: Sun Jan 20 06:24:56 2019 Remove lakitu from experimental The issues with the blocking bug, regarding lakitu performance, has been resolved. Removing lakitu from experimental. BUG=chromium:918723 TEST=run_tests Change-Id: I8a2253ea728ee8b520042f7eea0289e563b8f2b4 Reviewed-on: https://chromium-review.googlesource.com/1422221 Commit-Ready: Mike Nichols <mikenichols@chromium.org> Tested-by: Mike Nichols <mikenichols@chromium.org> Reviewed-by: David Burger <dburger@chromium.org> Reviewed-by: Robert Kolchmeyer <rkolchmeyer@google.com> [modify] https://crrev.com/60a66c9da216b252d3397d0e12ea25337eb8d85c/config/chromeos_config.py [modify] https://crrev.com/60a66c9da216b252d3397d0e12ea25337eb8d85c/lib/constants.py [modify] https://crrev.com/60a66c9da216b252d3397d0e12ea25337eb8d85c/config/config_dump.json |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by mikewu@google.com
, Jan 2