Potentially remove UnitTest step from PFQ bots |
|||||||||||||
Issue descriptionIt's not clear if we need this step. If there's no dependence on chrome, we should remove it from the PFQ. It would save ~25 min.
,
Apr 19 2016
The unittests run on the builder, not on the device. Basically they call cros_run_unit_tests from within the chroot.
,
Apr 19 2016
Yes, they run on the builder. That doesn't mean they can't be affected by the chrome binary or by the artifacts that the chromeos-chrome ebuild produced. I'm doing a little cidb query to fish out unit test failures on the chrome pfq. I'll post back here.
,
Apr 19 2016
mysql> SELECT b.build_config, b.build_number, b.start_time, t.name, t.status from buildTable b JOIN buildStageTable t on b.id = t.build_id WHERE b.build_type="chrome" AND t.name = "UnitTest" AND b.waterfall="chromeos" AND t.status="fail" ORDER BY b.id DESC LIMIT 25; +----------------------------+--------------+---------------------+----------+--------+ | build_config | build_number | start_time | name | status | +----------------------------+--------------+---------------------+----------+--------+ | x86-generic-chromium-pfq | 8496 | 2016-04-11 21:18:57 | UnitTest | fail | | lumpy-chrome-pfq | 8511 | 2016-04-11 21:18:55 | UnitTest | fail | | tricky-chrome-pfq | 1744 | 2016-04-11 21:18:54 | UnitTest | fail | | x86-alex-chrome-pfq | 471 | 2016-04-11 21:18:54 | UnitTest | fail | | x86-alex-chrome-pfq | 470 | 2016-04-11 07:07:22 | UnitTest | fail | | tricky-chrome-pfq | 1743 | 2016-04-11 07:07:21 | UnitTest | fail | | peppy-chrome-pfq | 1915 | 2016-04-11 03:33:21 | UnitTest | fail | | lumpy-chrome-pfq | 8508 | 2016-04-11 00:08:49 | UnitTest | fail | | x86-alex-chrome-pfq | 467 | 2016-04-10 17:58:49 | UnitTest | fail | | amd64-generic-chromium-pfq | 8506 | 2016-04-10 01:38:24 | UnitTest | fail | | x86-generic-chromium-pfq | 8491 | 2016-04-10 01:38:23 | UnitTest | fail | | peppy-chrome-pfq | 1910 | 2016-04-09 07:21:14 | UnitTest | fail | | tricky-chrome-pfq | 1737 | 2016-04-09 07:21:14 | UnitTest | fail | | amd64-generic-chromium-pfq | 8503 | 2016-04-09 03:36:25 | UnitTest | fail | | tricky-chrome-pfq | 1736 | 2016-04-09 03:36:21 | UnitTest | fail | | amd64-generic-chromium-pfq | 8502 | 2016-04-09 00:07:56 | UnitTest | fail | | lumpy-chrome-pfq | 8501 | 2016-04-08 18:02:50 | UnitTest | fail | | tricky-chrome-pfq | 1734 | 2016-04-08 18:02:49 | UnitTest | fail | | cyan-cheets-chrome-pfq | 43 | 2016-04-08 04:13:48 | UnitTest | fail | | amd64-generic-chromium-pfq | 8500 | 2016-04-08 03:45:20 | UnitTest | fail | | tricky-chrome-pfq | 1733 | 2016-04-08 03:45:16 | UnitTest | fail | | cyan-cheets-chrome-pfq | 42 | 2016-04-08 00:08:54 | UnitTest | fail | | amd64-generic-chromium-pfq | 8499 | 2016-04-08 00:08:54 | UnitTest | fail | | lumpy-chrome-pfq | 8262 | 2016-02-08 01:09:50 | UnitTest | fail | | x86-generic-chromium-pfq | 8200 | 2016-01-25 20:18:41 | UnitTest | fail | +----------------------------+--------------+---------------------+----------+--------+ 25 rows in set (0.35 sec) mysql>
,
Apr 19 2016
Of course, many of those failures are likely due to ToT being broken, not due to chrome. These are the most recent 25. If I increase the limit much beyond that then the query time goes up enormously, suggesting to me that these are quite sparse.
,
Apr 19 2016
The recent cluster of failures looks like it is "Error: the test process started auxillary processes which did not shutdown (they were forcefully killed)" which was a bug in the sdk that has been fixed (I looked at only a few, but given that there's only been this cluster and then a long quiet spell it seems like a safe bet).
,
Apr 19 2016
Would it make sense to remove them from most of the bots, and have them on one x86 and one arm bot? That way we still retain some coverage. My experience with this suite is that it's mostly quite stable, and the few failures are infra issues or cros failures. I haven't seen a failure that we traced to a chrome CL. I definitely want to remove them from the chromiumos.chromium bots here: https://uberchromegw.corp.google.com/i/chromiumos.chromium/console If this seems reasonable, I think it would be a good task for the shadow gardener this week.
,
Apr 19 2016
+1 to removing them from the continuous builders you mentioned in #7 first. Then yeah, removing them on the pfq or most of it is probably fine.
,
Apr 19 2016
Victor, could you take this on? Let's start by removing the UnitTest step from the informational continuous builders. Luis has done some work with changing timeout configs of these builders, so he may be able to point you in the right direction. Thanks!
,
Apr 19 2016
Great! Thanks for all the analysis! This is huge improvement from a pfq perspective.
,
Apr 19 2016
I'm looking at the output of a UnitTests run and I see the following output: 14:18:08: WARNING: The following packages do not have tests: ... chromeos-base/chromeos-chrome Is it possible to determine if any of the packages that do have tests actually depend on chromeos-chrome? Unless we know about a test dependency on chromeos-chrome, I would suggest that we can skip this stage entirely on the PFQ. If we do know about any dependencies, maybe we can run just those? Agreed that we should remove them from the informational builders first. Victor, ping me if you have any questions, although I don't know that I will be able to answer them :)
,
Apr 19 2016
This will eliminate redundancy and potentially reduce build times for the pfq.
,
Apr 19 2016
Hint: it should just be a matter of turning the "unittest" value to false for these configs in chromeos_config.py
,
Apr 19 2016
Sure. Looking.
,
Apr 20 2016
CL to disable unit test for informational builders: https://chromium-review.googlesource.com/#/c/339663 Steven, what do you mean in #11 about chrome? Do you mean some unit tests are depending on chromeos-base/chromeos-chrome, and we could still keep some of unit tests?
,
Apr 20 2016
Some -packages- that have unit tests -may- depend on chromeos-chrome. I'm not familiar enough with portage to know how to determine this. I do know that a fairly large number of packages have chromeos-chrome dependencies.
,
Apr 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/8a699279b2706c9b4871190fdcb3e568410c8e15 commit 8a699279b2706c9b4871190fdcb3e568410c8e15 Author: Victor Hsieh <victorhsieh@google.com> Date: Tue Apr 19 23:52:33 2016 Disable unit test on informational PFQ Running unit test on PFQ is not very helpful. Removing it would save us about 25 minutes. BUG= chromium:604920 TEST=run_tests Change-Id: Ie8060d30c5f22ae7ff5af60cc72c1d1276b07c46 Reviewed-on: https://chromium-review.googlesource.com/339663 Commit-Ready: Victor Hsieh <victorhsieh@chromium.org> Tested-by: Victor Hsieh <victorhsieh@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/8a699279b2706c9b4871190fdcb3e568410c8e15/cbuildbot/config_dump.json [modify] https://crrev.com/8a699279b2706c9b4871190fdcb3e568410c8e15/cbuildbot/chromeos_config.py
,
Apr 26 2016
Releasing to future gardeners.
,
Apr 26 2016
Thank you for this, Victor! Aviv, do the bots need to be restarted or do they pick up the new config automatically?
,
Apr 26 2016
They will pick this up on their next run.
,
Apr 26 2016
Ok, let's monitor this over the week and look at disabling this on the pfq bots next week.
,
May 6 2016
,
May 6 2016
Achuith@ Have we confirmed that the continuous builders are fine with the new config? Are we ready to port this change to the actual config.
,
May 6 2016
Adding current gardeners.
,
May 6 2016
,
May 6 2016
The informational pfq builders look healthy with UnitTest removed.
,
May 12 2016
I looked at the stats from the spreadsheet: https://docs.google.com/spreadsheets/d/1JnEQNEFhnBhncv_COqLC5hQC9Ge9oMjww5Xv-7mzIks/edit#gid=0 and the only time we saw more than one failure in UniTests was the week of April 25th where a failure in chromeos_config_unittest.py affected all pfq builders but was found and fixed in the canary builders before it belatedly showed up in the pfq builders. I put up https://chromium-review.googlesource.com/#/c/344458/ to change this on the pfq builders.
,
May 12 2016
https://chromium-review.googlesource.com/#/c/344458/ We also need to get rid of these on the telemetry builders
,
May 12 2016
I'm not nearly as concerned about the cycle time on the telemetry builders so I don't think that it is especially important whether or not the tests run there.
,
May 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/64e8b9a603e481483293a13872a3d01e2dfef22a commit 64e8b9a603e481483293a13872a3d01e2dfef22a Author: Steven Bennetts <stevenjb@chromium.org> Date: Thu May 12 21:59:24 2016 Disable chromeos unittests stage from PFQ builders These tests take 20-25 minutes on the pfq builders and should be redundant since it is none of the tests should be affected by chrome behavior. See issue for more info. BUG= chromium:604920 TEST=-chrome-pfq and -chromium-pfq builders should not have a UnitTests stage. Change-Id: Iff902a6d13014c7d2337c21dff4b744e4f8fc0e8 Reviewed-on: https://chromium-review.googlesource.com/344458 Commit-Ready: Steven Bennetts <stevenjb@chromium.org> Tested-by: Steven Bennetts <stevenjb@chromium.org> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/64e8b9a603e481483293a13872a3d01e2dfef22a/cbuildbot/config_dump.json [modify] https://crrev.com/64e8b9a603e481483293a13872a3d01e2dfef22a/cbuildbot/chromeos_config.py
,
May 13 2016
I am marking this as done. If we decide to remove these from the telemetry builders I think we should file a separate lower priority issue for that.
,
May 16 2016
Thanks for fixing this folks! Huge first step in getting the pfq in better shape.
,
May 17 2016
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by akes...@chromium.org
, Apr 19 2016