New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 604920 link

Starred by 0 users

Issue metadata

Status: Duplicate
Owner:
Closed: May 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Potentially remove UnitTest step from PFQ bots

Project Member Reported by achuith@chromium.org, Apr 19 2016

Issue description

It's not clear if we need this step.

If there's no dependence on chrome, we should remove it from the PFQ. It would save ~25 min.
 
Unit tests run in parallel with other stages, including HWTest. You might save some time (if the long pole is SimpleChrome and it is significantly speeded up for instance), but not the full 25 minutes.

I'm not certain if there are unit tests that are run that are affected by chrome. It's certainly possible in theory, but I can't point to a specific example without investigation.
The unittests run on the builder, not on the device.

Basically they call cros_run_unit_tests from within the chroot.
Yes, they run on the builder. That doesn't mean they can't be affected by the chrome binary or by the artifacts that the chromeos-chrome ebuild produced.

I'm doing a little cidb query to fish out unit test failures on the chrome pfq. I'll post back here.
mysql> SELECT b.build_config, b.build_number, b.start_time, t.name, t.status from buildTable b JOIN buildStageTable t on b.id = t.build_id WHERE b.build_type="chrome" AND t.name = "UnitTest" AND b.waterfall="chromeos" AND t.status="fail" ORDER BY b.id DESC LIMIT 25;
+----------------------------+--------------+---------------------+----------+--------+
| build_config               | build_number | start_time          | name     | status |
+----------------------------+--------------+---------------------+----------+--------+
| x86-generic-chromium-pfq   |         8496 | 2016-04-11 21:18:57 | UnitTest | fail   |
| lumpy-chrome-pfq           |         8511 | 2016-04-11 21:18:55 | UnitTest | fail   |
| tricky-chrome-pfq          |         1744 | 2016-04-11 21:18:54 | UnitTest | fail   |
| x86-alex-chrome-pfq        |          471 | 2016-04-11 21:18:54 | UnitTest | fail   |
| x86-alex-chrome-pfq        |          470 | 2016-04-11 07:07:22 | UnitTest | fail   |
| tricky-chrome-pfq          |         1743 | 2016-04-11 07:07:21 | UnitTest | fail   |
| peppy-chrome-pfq           |         1915 | 2016-04-11 03:33:21 | UnitTest | fail   |
| lumpy-chrome-pfq           |         8508 | 2016-04-11 00:08:49 | UnitTest | fail   |
| x86-alex-chrome-pfq        |          467 | 2016-04-10 17:58:49 | UnitTest | fail   |
| amd64-generic-chromium-pfq |         8506 | 2016-04-10 01:38:24 | UnitTest | fail   |
| x86-generic-chromium-pfq   |         8491 | 2016-04-10 01:38:23 | UnitTest | fail   |
| peppy-chrome-pfq           |         1910 | 2016-04-09 07:21:14 | UnitTest | fail   |
| tricky-chrome-pfq          |         1737 | 2016-04-09 07:21:14 | UnitTest | fail   |
| amd64-generic-chromium-pfq |         8503 | 2016-04-09 03:36:25 | UnitTest | fail   |
| tricky-chrome-pfq          |         1736 | 2016-04-09 03:36:21 | UnitTest | fail   |
| amd64-generic-chromium-pfq |         8502 | 2016-04-09 00:07:56 | UnitTest | fail   |
| lumpy-chrome-pfq           |         8501 | 2016-04-08 18:02:50 | UnitTest | fail   |
| tricky-chrome-pfq          |         1734 | 2016-04-08 18:02:49 | UnitTest | fail   |
| cyan-cheets-chrome-pfq     |           43 | 2016-04-08 04:13:48 | UnitTest | fail   |
| amd64-generic-chromium-pfq |         8500 | 2016-04-08 03:45:20 | UnitTest | fail   |
| tricky-chrome-pfq          |         1733 | 2016-04-08 03:45:16 | UnitTest | fail   |
| cyan-cheets-chrome-pfq     |           42 | 2016-04-08 00:08:54 | UnitTest | fail   |
| amd64-generic-chromium-pfq |         8499 | 2016-04-08 00:08:54 | UnitTest | fail   |
| lumpy-chrome-pfq           |         8262 | 2016-02-08 01:09:50 | UnitTest | fail   |
| x86-generic-chromium-pfq   |         8200 | 2016-01-25 20:18:41 | UnitTest | fail   |
+----------------------------+--------------+---------------------+----------+--------+
25 rows in set (0.35 sec)

mysql> 
Of course, many of those failures are likely due to ToT being broken, not due to chrome. These are the most recent 25. If I increase the limit much beyond that then the query time goes up enormously, suggesting to me that these are quite sparse.
The recent cluster of failures looks like it is "Error: the test process started auxillary processes which did not shutdown (they were forcefully killed)" which was a bug in the sdk that has been fixed (I looked at only a few, but given that there's only been this cluster and then a long quiet spell it seems like a safe bet).
Would it make sense to remove them from most of the bots, and have them on one x86 and one arm bot? That way we still retain some coverage. 

My experience with this suite is that it's mostly quite stable, and the few failures are infra issues or cros failures. I haven't seen a failure that we traced to a chrome CL.

I definitely want to remove them from the chromiumos.chromium bots here:
https://uberchromegw.corp.google.com/i/chromiumos.chromium/console

If this seems reasonable, I think it would be a good task for the shadow gardener this week.
+1 to removing them from the continuous builders you mentioned in #7 first.

Then yeah, removing them on the pfq or most of it is probably fine.
Cc: lhchavez@chromium.org achuith@chromium.org
Owner: victorhsieh@chromium.org
Victor, could you take this on? Let's start by removing the UnitTest step from the informational continuous builders. 

Luis has done some work with changing timeout configs of these builders, so he may be able to point you in the right direction.

Thanks!
Great! Thanks for all the analysis! This is huge improvement from a pfq perspective.
I'm looking at the output of a UnitTests run and I see the following output:

14:18:08: WARNING: The following packages do not have tests:
...
chromeos-base/chromeos-chrome

Is it possible to determine if any of the packages that do have tests actually depend on chromeos-chrome?

Unless we know about a test dependency on chromeos-chrome, I would suggest that we can skip this stage entirely on the PFQ. If we do know about any dependencies, maybe we can run just those?

Agreed that we should remove them from the informational builders first.

Victor, ping me if you have any questions, although I don't know that I will be able to answer them :)

This will eliminate redundancy and potentially reduce build times for the pfq.
Hint: it should just be a matter of turning the "unittest" value to false for these configs in chromeos_config.py
Status: Started (was: Assigned)
Sure.  Looking.
CL to disable unit test for informational builders:
  https://chromium-review.googlesource.com/#/c/339663

Steven, what do you mean in #11 about chrome?  Do you mean some unit tests are depending on chromeos-base/chromeos-chrome, and we could still keep some of unit tests?
Some -packages- that have unit tests -may- depend on chromeos-chrome.

I'm not familiar enough with portage to know how to determine this. I do know that a fairly large number of packages have chromeos-chrome dependencies.

Project Member

Comment 17 by bugdroid1@chromium.org, Apr 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/8a699279b2706c9b4871190fdcb3e568410c8e15

commit 8a699279b2706c9b4871190fdcb3e568410c8e15
Author: Victor Hsieh <victorhsieh@google.com>
Date: Tue Apr 19 23:52:33 2016

Disable unit test on informational PFQ

Running unit test on PFQ is not very helpful.  Removing it would save us
about 25 minutes.

BUG= chromium:604920 
TEST=run_tests

Change-Id: Ie8060d30c5f22ae7ff5af60cc72c1d1276b07c46
Reviewed-on: https://chromium-review.googlesource.com/339663
Commit-Ready: Victor Hsieh <victorhsieh@chromium.org>
Tested-by: Victor Hsieh <victorhsieh@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/8a699279b2706c9b4871190fdcb3e568410c8e15/cbuildbot/config_dump.json
[modify] https://crrev.com/8a699279b2706c9b4871190fdcb3e568410c8e15/cbuildbot/chromeos_config.py

Cc: victorhsieh@chromium.org
Owner: ----
Status: Available (was: Started)
Releasing to future gardeners.
Owner: akes...@chromium.org
Status: Assigned (was: Available)
Thank you for this, Victor! Aviv, do the bots need to be restarted or do they pick up the new config automatically?
They will pick this up on their next run.
Owner: ----
Status: Available (was: Assigned)
Ok, let's monitor this over the week and look at disabling this on the pfq bots next week.
Labels: Hotlist-CrOS-Gardener
Achuith@ Have we confirmed that the continuous builders are fine with the new config? Are we ready to port this change to the actual config.
Cc: jen...@chromium.org afakhry@chromium.org
Adding current gardeners.
Owner: achuith@chromium.org
Status: Started (was: Available)
The informational pfq builders look healthy with UnitTest removed.
I looked at the stats from the spreadsheet:

https://docs.google.com/spreadsheets/d/1JnEQNEFhnBhncv_COqLC5hQC9Ge9oMjww5Xv-7mzIks/edit#gid=0

and the only time we saw more than one failure in UniTests was the week of April 25th where a failure in chromeos_config_unittest.py affected all pfq builders but was found and fixed in the canary builders before it belatedly showed up in the pfq builders.

I put up https://chromium-review.googlesource.com/#/c/344458/ to change this on the pfq builders.

Owner: steve...@chromium.org
https://chromium-review.googlesource.com/#/c/344458/

We also need to get rid of these on the telemetry builders
I'm not nearly as concerned about the cycle time on the telemetry builders so I don't think that it is especially important whether or not the tests run there.

Project Member

Comment 30 by bugdroid1@chromium.org, May 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/64e8b9a603e481483293a13872a3d01e2dfef22a

commit 64e8b9a603e481483293a13872a3d01e2dfef22a
Author: Steven Bennetts <stevenjb@chromium.org>
Date: Thu May 12 21:59:24 2016

Disable chromeos unittests stage from PFQ builders

These tests take 20-25 minutes on the pfq builders and should be
redundant since it is none of the tests should be affected by
chrome behavior. See issue for more info.

BUG= chromium:604920 
TEST=-chrome-pfq and -chromium-pfq builders should not have a
     UnitTests stage.

Change-Id: Iff902a6d13014c7d2337c21dff4b744e4f8fc0e8
Reviewed-on: https://chromium-review.googlesource.com/344458
Commit-Ready: Steven Bennetts <stevenjb@chromium.org>
Tested-by: Steven Bennetts <stevenjb@chromium.org>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/64e8b9a603e481483293a13872a3d01e2dfef22a/cbuildbot/config_dump.json
[modify] https://crrev.com/64e8b9a603e481483293a13872a3d01e2dfef22a/cbuildbot/chromeos_config.py

Labels: -Pri-2 Pri-1
Status: Fixed (was: Started)
I am marking this as done. If we decide to remove these from the telemetry builders I think we should file a separate lower priority issue for that.

Cc: josa...@chromium.org
Thanks for fixing this folks! Huge first step in getting the pfq in better shape.
Mergedinto: 591630
Status: Duplicate (was: Fixed)

Sign in to add a comment