New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 918723 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

lakitu adding 40-50 minutes to most CrOS CQ build times

Project Member Reported by jclinton@chromium.org, Jan 2

Issue description

It looks like starting from 11445.0.0, BuildImage step takes ~2 hours to build sys-kernel/lakitu-kernel-4_14 package. The last build before 11445.0.0 is 11438.0.0 which only took 6 mins to build the same package.
Cc: -wonderfly@chromium.org wonderfly@google.com
Owner: xueweiz@google.com
Assigning to our oncall to take a look.
Thanks for filing the bug. For future incidents, instead of copying individuals, you could assign the bug to our production on call (can be found at go/cos), and cc lakitu-dev@google.com.
Cc: lakitu-dev@google.com
Owner: mikewu@google.com
I have spent some time on this. Let me take a stab.
Any root cause identified or should I proceed with removal for now?
Jason, feel free to make lakitu-paladin(s) experimental (if that's still a thing), while Ke is figuring it out.
I fixed this in my local sdk by building and using an old version of Portage (new `sudo emerge --version` is 2.2.28). Still root causing.
Project Member

Comment 9 by bugdroid1@chromium.org, Jan 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/a31ffc024724bcb8c016595b07615957e85d33ee

commit a31ffc024724bcb8c016595b07615957e85d33ee
Author: Jason D. Clinton <jclinton@chromium.org>
Date: Sat Jan 05 17:39:53 2019

Mark Lakitu as experimental until performance problems addressed

BUG=chromium:918723
TEST=run_tests

Change-Id: I4803d902453086fc7cc45c465f9becfa10748ee5
Reviewed-on: https://chromium-review.googlesource.com/c/1396263
Tested-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: David Burger <dburger@chromium.org>
Commit-Queue: Jason Clinton <jclinton@chromium.org>

[modify] https://crrev.com/a31ffc024724bcb8c016595b07615957e85d33ee/config/chromeos_config.py
[modify] https://crrev.com/a31ffc024724bcb8c016595b07615957e85d33ee/lib/constants.py
[modify] https://crrev.com/a31ffc024724bcb8c016595b07615957e85d33ee/config/config_dump.json

The issue was introduced between commits 1879d3ca019ebe4b870c3ee8d80910a90a8e4408 and 47a3f7e905b3ba252659fa1a98071d03a1560807 in portage_tool. This heavily suggests that the new INSTALL_MASK implementation in Portage is broken somehow.
As Robert suggested, INSTALL_MASK implementation is the culprit.

The INSTALL_MASK in lakitu looks like this:
'*.a *.c *.cc *.go *.la *.h *.hh *.hpp *.h++ *.hxx */.keep* /etc/init.d /etc/runlevels ... /usr/src /boot/config-* /boot/System.map-* /usr/local/build/autotest /lib/modules/*/build /lib/modules/*/source test_*.ko /etc/init /boot/kdump/System.map-* /usr/share/man /usr/share/info /usr/share/doc'

In old implementation, every entry in the list is deleted as a whole: https://chromium.googlesource.com/chromiumos/third_party/portage_tool/+/1879d3ca019ebe4b870c3ee8d80910a90a8e4408/bin/misc-functions.sh#348.

However, in the new implementation, the files matching an entry are removed one by one: https://chromium.googlesource.com/chromiumos/third_party/portage_tool/+/47a3f7e905b3ba252659fa1a98071d03a1560807/pym/portage/util/install_mask.py#110. This took so much time.
What's the next step here?
Seems to me that we need to patch portage. I can take this over if Ke doesn't have time.
Owner: rkolchmeyer@google.com
Thanks for volunteering, Robert. I'm assigning it to you then. We need a resolution for this so that we can add lakitu-paladin back to the CQ.
Thanks Robert for the help. Another thing worth trying is to find out why other CrOS boards don't suffer from this severely. 
We've fixed this on Lakitu's end in CL:1406266. Can we enable Lakitu in the CQ again?
Lakitu is enabled in the CQ today:   https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=lakitu-paladin&buildBranch=master

It is currently marked as "not important" but will continue to run.  I'd suggest we leave it as not important until we have consistent runs with a lowered execution time.  At that time we can definitely remove it from that classification.  

Does that sound reasonable?

-- Mike
sgtm.

Comment 19 by rkolchmeyer@google.com, Jan 18 (5 days ago)

I believe we now have consistent runs with lowered execution time. WDYT about marking lakitu-paladin as important again?

Comment 20 by mikenichols@chromium.org, Jan 18 (4 days ago)

I'll update it today to important.  

-- Mike

Comment 21 by jclinton@chromium.org, Jan 18 (4 days ago)

You can probably just revert my CL.
Project Member

Comment 22 by bugdroid1@chromium.org, Jan 20 (3 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/60a66c9da216b252d3397d0e12ea25337eb8d85c

commit 60a66c9da216b252d3397d0e12ea25337eb8d85c
Author: Mike Nichols <mikenichols@chromium.org>
Date: Sun Jan 20 06:24:56 2019

Remove lakitu from experimental

The issues with the blocking bug, regarding lakitu performance, has been
resolved.  Removing lakitu from experimental.

BUG=chromium:918723
TEST=run_tests

Change-Id: I8a2253ea728ee8b520042f7eea0289e563b8f2b4
Reviewed-on: https://chromium-review.googlesource.com/1422221
Commit-Ready: Mike Nichols <mikenichols@chromium.org>
Tested-by: Mike Nichols <mikenichols@chromium.org>
Reviewed-by: David Burger <dburger@chromium.org>
Reviewed-by: Robert Kolchmeyer <rkolchmeyer@google.com>

[modify] https://crrev.com/60a66c9da216b252d3397d0e12ea25337eb8d85c/config/chromeos_config.py
[modify] https://crrev.com/60a66c9da216b252d3397d0e12ea25337eb8d85c/lib/constants.py
[modify] https://crrev.com/60a66c9da216b252d3397d0e12ea25337eb8d85c/config/config_dump.json

Sign in to add a comment