coral-release: HWTest timesout because autoupdate_EndToEndTest_paygen... takes a long time |
|||||||
Issue descriptionA lot of the coral-release builds fail because the HWTest suite times out after 3 hours. The failing builds spend a lot of time running the autoupdate_EndToEnd_paygen... tests. Here's an example of a failing test. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=255781691 This is a different board that passes. It doesn't look like it runs the autotupdate test at all. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=255781693
,
Nov 7
I can't find a release build that runs EndToEnd tests at all. dhaddock@ do you know what's going on here?
,
Nov 7
Here's the coral-release builder on legoland https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=coral-release&buildBranch=master Here's a build that has failed https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8930523277315354496
,
Nov 7
I can see those, but I'm looking for a board that runs those tests successfully and doesn't timeout so we can compare them.
,
Nov 7
HWTest [bvt-inline] [nasher] from the second link runs the tests. There is one failure, but I guess it retries and passes. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=255781688
,
Nov 8
The paygen tests are not run in HWTest phases (bvt-inline etc). They are run in the PaygenTestCanary and PaygenTestDev phases. And they are not run on every coral board just the ones we have DUTs for in the lab. So if you click on your builder link from c#3: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=coral-release&buildBranch=master Then click on the first failure: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8930492698799646352 Then scroll down to paygenTestCanary and click on the "link to suite" link you will be brought to page like this that shows they are running ok: http://cautotest-prod/afe/#tab_id=view_job&object_id=255900321
,
Nov 8
This bug is confusing me though. Where are you seeing that autoupdate_EndToEndTest is taking too long?
,
Nov 8
The issue is HWTest only has 3 hours to run. It seems like whenever the autoupdate_EndToEndTest is run coral-release hits this timeout.
,
Nov 8
Ah and all of the suites are sharing the same pool of DUTs
,
Nov 8
Can you point me to how you know that the HWTest times out? Is there a log entry or tool that shows it?
,
Nov 8
I'm wondering why this is only an issue on coral
,
Nov 8
re #6: Aren't autoupdate_EndToEnd_paygen tests different that paygen tests in PaygenTest* ?
,
Nov 8
Nope. autoupdate_EndToEndTest is never run as a stand alone test. It is only ever run as part of the paygen_au* suites which are generated during PaygenBuild* builder phase and kicked off by the PaygenTest* builder phase
,
Nov 8
Right? Or am I still jetlagged and forgetting something obvious :D ?
,
Nov 8
But why are they running as a part of bvt suite (and before paygen?). Aren't they a different stage in the build? I'm sorry, I'm not really familiar with how autotests work in the lab, that's why I'm asking these stupid questions ;)
,
Nov 8
They are paygen tests for a different coral build! In the failing builder link from #3 is building coral-release/R72-11238.0.0 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8930523277315354496 But the paygen tests that appear during the failed bvt-inline nasher: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=255781688 Are for coral stable R70-11021.72.0 So it appears the HWTest phase is being allocated DUTs that already have a bunch of stuff queued.
,
Nov 8
OK I think I have an idea about the problem and why it is coral only. There are now 15 different FSI's for coral. This is a different FSI for each coral model. During stable we update FROM every FSI to the current stable build. Now paygen_au_stable for coral runs 30 tests (one for delta one from full from each FSI)! So we are not running only the FSI for the current model. What's worse is every coral model is doing the same thing: running AU tests for every FSI for every model. That's a lot of wasted time You can see this for the stable I mentioned above in the suite details of paygen_au_stable for two different models: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=255787361 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=255787363
,
Nov 8
What's also weird is this is probably putting older coral images on new models. So let's say coral model A FSI'd with version N. This could be putting version N-1 on it.
,
Nov 8
,
Nov 8
,
Nov 13
I think this is something for the GoldenEye team to fix. I've a sync with them this week. I'll bring it up with them
,
Nov 26
-> dhaddock to address with GoldenEye
,
Nov 27
,
Nov 27
This is ongoing. In the mean time, I've made it so AU tests are now only running against one coral model (astronaut) so this shouldn't be as much as an issue as before (it was previously running against 6 models).
,
Nov 29
The problem identified in #17 will be fixed after these two bugs 1. https://b.corp.google.com/issues/120164701 2. https://bugs.chromium.org/p/chromium/issues/detail?id=909972 Please follow along there if interested. The main problem in this bug should have been fixed by #24 anyway. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by mruthven@chromium.org
, Nov 7