[reef] paygen_au suites not running on reef |
||||||||||||||||
Issue descriptionDuring investigation into b/122599583, it looks like reef is not running any paygen_au suites (possibly since M70). Not sure if this is related to the unique lab deployment of reef devices or Skylab or some other reason. Here's a look at results in Skylab: https://stainless.corp.google.com/search?view=matrix&row=build&col=test&first_date=2018-12-30&last_date=2019-01-13&owner=chromeos-test&suite=%5CQpaygen_au_canary%5CE&model=electro&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=false&exclude_acts=true&exclude_retried=false&exclude_non_production=false These show the tests are aborted. ⛆ |
|
|
,
Jan 9
[hi on] xixuan@xixuan0:~/chromiumos/chromite$ git grep '\.RunHWTestSuite(' | grep -v unittest
cbuildbot/stages/test_stages.py: cmd_result = commands.RunHWTestSuite(
lib/paygen/paygen_build_lib.py: cmd_result = commands.RunHWTestSuite(
Seems paygen test is the only hole here.
,
Jan 9
this is worse than it sounds. iirc, PaygentTest*** dynamically generates control files and injects them into the database somehow. The test then executes these control file blobs from the DB. In skylab, there is no way to generate arbitrary control files to execute. Users must request a test to be run by the name, and skylab searches for the control file before execution.
,
Jan 9
Re #3, I don't see it's too different from common suite. From my understanding, PaygenBuildStage, e.g. PaygenBuildCanary is generating the arbitrary control files and upload them to google storage. https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8940429518364006000/+/steps/PaygenBuildCanary/0/stdout PaygenTestStage, e.g. PaygenTestCanary, kicks off a suite, which use devserver to stage control files, parse and schedule child tests... https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8940429518364006000/+/steps/PaygenTestCanary/0/stdout Finally each child test, also calls devserver to stage control files, parse and execute them. Maybe skylab need to add some special cases to handle the arbitrary test names. + @ayatane to estimate whether skylab needs extra work.
,
Jan 9
We may as well enable it and see if it works, since it's not even running now. I still need to wrap my head around what paygen is doing, but Skylab can run a test given only the name, if the test is actually packaged with the build.
,
Jan 10
After taking a closer look, the only difference in the "generated" control file from the standard autoupdate_EndToEndTest test is that some stuff get appended to the test name, e.g. autoupdate_EndToEndTest_paygen_au_canary_full_10896.0.0 (And the DOC string is different for some reason, probably because I'm looking at a different branch.) We can just go ahead with autoupdate_EndToEndTest as the test name and everything should work. I think the infra side has no problem with it, unsure if downstream depends on it. Even if downstream depends on it, I think the onus is on downstream to fix whatever weird thing they are doing, with help from us as needed of course.
,
Jan 10
Downstream does depend on different names, because the test is actually different depending upon the name, these generate payload specific tests. It's not OK to consider the results of one test as pass/fail aggregated.
,
Jan 10
,
Jan 10
There are some args that get injected into the control file, which is troublesome. I think we can derive all/most of them from the cros-version though. name = 'paygen_au_dev' update_type = 'full' source_release = '10896.0.0' target_release = '10896.0.0' target_payload_uri = 'gs://chromeos-releases/dev-channel/link/10896.0.0/payloads/chromeos_10896.0.0_link_dev-channel_full_test.bin-fe24000b79a54945fe8fe4b70a1f9ba6' SUITE = 'paygen_au_dev' source_payload_uri = 'gs://chromeos-releases/dev-channel/link/10896.0.0/payloads/chromeos_10896.0.0_link_dev-channel_full_test.bin-fe24000b79a54945fe8fe4b70a1f9ba6' source_archive_uri = 'gs://chromeos-releases/dev-channel/link/10896.0.0' I don't think we've ever settled how we want to deal with parametrized tests. Might be good to sync up with Tast to see if they've decided on the issue too.
,
Jan 10
,
Jan 10
First effort: make paygenTest to kick off skylab suite: https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1404494/
,
Jan 11
Skylab suite is able to kick off paygen_au_* suite after https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1406881 is merged. But it cannot run due to: https://stainless.corp.google.com/browse/chromeos-autotest-results/swarming-4256be391a974411/ Using client trampoline because of: Failed to find any control files with NAME autoupdate_EndToEndTest_paygen_au_beta_full_11316.66.0 ... 01/11 09:34:00.941 ERROR| traceback:0013| File "/usr/local/autotest/results/lxc_job_folder/control.srv", line 11, in _client_trampoline 01/11 09:34:00.942 ERROR| traceback:0013| path = job.stage_control_file(trampoline_testname) 01/11 09:34:00.942 ERROR| traceback:0013| AttributeError: 'server_job' object has no attribute 'stage_control_file' ...
,
Jan 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/78ef6f913119134c2794c0342c1a0c02f7c9c656 commit 78ef6f913119134c2794c0342c1a0c02f7c9c656 Author: Xixuan Wu <xixuan@google.com> Date: Fri Jan 11 21:32:10 2019 cbuildbot: Refactor logics of detecting whether hwtest is run in skylab. BUG=chromium:920393 TEST=Tryjob. Change-Id: I0f4e53378a70fde9a2d656d0a99d27ba8ab32d45 Reviewed-on: https://chromium-review.googlesource.com/1403884 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/78ef6f913119134c2794c0342c1a0c02f7c9c656/lib/constants.py [modify] https://crrev.com/78ef6f913119134c2794c0342c1a0c02f7c9c656/lib/config_lib.py [modify] https://crrev.com/78ef6f913119134c2794c0342c1a0c02f7c9c656/cbuildbot/stages/test_stages.py
,
Jan 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/d3797f1399361a1b170cecdfb0b78933a7a9e049 commit d3797f1399361a1b170cecdfb0b78933a7a9e049 Author: Xixuan Wu <xixuan@google.com> Date: Fri Jan 11 21:32:11 2019 SkylabHWTest: Add job_keyvals to skylab. BUG=chromium:920393 TEST=Tryjob. Change-Id: I14862079285853f633a1723c0832c514cb569606 Reviewed-on: https://chromium-review.googlesource.com/1403885 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/d3797f1399361a1b170cecdfb0b78933a7a9e049/cbuildbot/commands.py [modify] https://crrev.com/d3797f1399361a1b170cecdfb0b78933a7a9e049/cbuildbot/stages/test_stages.py
,
Jan 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/6ef1585f56c3ccc92ba51005beafbb192e6130fe commit 6ef1585f56c3ccc92ba51005beafbb192e6130fe Author: Xixuan Wu <xixuan@google.com> Date: Fri Jan 11 21:32:11 2019 PaygenStage: Enable skylab in paygen test stage. BUG=chromium:920393 TEST=Tryjob. Change-Id: I3175f29ba62b384bf1f123154351489eae8e6091 Reviewed-on: https://chromium-review.googlesource.com/1404494 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Yaakov Shaul <yshaul@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: C Shapiro <shapiroc@chromium.org> [modify] https://crrev.com/6ef1585f56c3ccc92ba51005beafbb192e6130fe/cbuildbot/stages/release_stages_unittest.py [modify] https://crrev.com/6ef1585f56c3ccc92ba51005beafbb192e6130fe/lib/paygen/paygen_build_lib.py [modify] https://crrev.com/6ef1585f56c3ccc92ba51005beafbb192e6130fe/cbuildbot/stages/release_stages.py
,
Jan 12
,
Jan 12
,
Jan 12
,
Jan 12
Plan is to - Modify paygen stage to directly create skylab tasks (no suite involved), with test arguments for the 4 differnt tasks created. - paygen stage can create the tests with the names as required by downstream test consumers. - Modify skylab_swarming_worker, lucifer to passthrough test arguments to autoupdate_EndToEndTest
,
Jan 12
we will need to work with downstream results consumers because the test name will be the same across the four tasks generated by the builder. We could include the test arguments as some TKO entry so that it is available for disambiguation.
,
Jan 12
As a result, PaygenBuildStage no need to generate arbitrary control files I think. @dgarrett, does that make sense?
,
Jan 12
Re: comment #20, this is a pretty big deal for us because it's not clear how we will correlate the different results (and they are different).
,
Jan 12
Re #21, I mean PaygenBuildStage has no need to generate arbitrary control files in skylab. In autotest it will keep the same. Talked with @don offline, he thinks it makes more sense but @ahassani is the current owner of paygen.
,
Jan 12
I set up a meeting to discuss this on Monday (even though I am at an offsite) because this will break GE paygen results reporting in a significant way.
,
Jan 15
,
Jan 15
Note that this is blocking reef for being included in the M71 Chrome OS Stable refresh.
,
Jan 15
Blocking stable releases qualifies as an emergency P0.
,
Jan 15
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/9179c3dcf797b7cc2a458d15656a2b3d7764c3bb commit 9179c3dcf797b7cc2a458d15656a2b3d7764c3bb Author: Allen Li <ayatane@chromium.org> Date: Tue Jan 15 20:55:55 2019 skylab_swarming_worker: Add -test-args flag R=pprabhu@chromium.org Bug: 920393 Change-Id: I9c342e800c49682497939659930a90063fba0695 Reviewed-on: https://chromium-review.googlesource.com/c/1409369 Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Commit-Queue: Allen Li <ayatane@chromium.org> Cr-Commit-Position: refs/heads/master@{#19994} [modify] https://crrev.com/9179c3dcf797b7cc2a458d15656a2b3d7764c3bb/go/src/infra/cmd/skylab_swarming_worker/internal/lucifer/lucifer.go [modify] https://crrev.com/9179c3dcf797b7cc2a458d15656a2b3d7764c3bb/go/src/infra/cmd/skylab_swarming_worker/main.go
,
Jan 15
OK, let's first get this to *not* be a P0. Xixuan: Can you move 3 reef DUTs back to autotest, and work with leecy@ / dhaddock@ to make sure we get the testing needed for reef M71 stable release to work with those reef DUTs. Making long-term design decisions for Skylab with a fire under our butt will not get us a happy path in the future. Let's release the pressure some.
,
Jan 15
In early talk with @leecy I notice that nyan_blaze is not experiencing this problem because TPM manually kick off these paygen suites to pool:suites. So I will migrate some DUTs back for reef to pool:suites.
,
Jan 15
[hi on] xixuan@xixuan0:~/chromiumos/infra_internal/skylab_inventory$ dut-status -p suites -b reef hostname S last checked URL chromeos6-row4-rack10-host13 OK 2019-01-15 14:50:28 https://stainless.corp.google.com/browse/chromeos-autotest-results/hosts/chromeos6-row4-rack10-host13/1998957-repair/ chromeos6-row3-rack10-host1 -- --- --- chromeos6-row4-rack10-host12 OK 2019-01-15 14:50:57 https://stainless.corp.google.com/browse/chromeos-autotest-results/hosts/chromeos6-row4-rack10-host12/1998960-repair/ 2 reef suites DUTs are working. File b/122912651 to fix chromeos6-row3-rack10-host1. So...who should I contact to manually trigger paygen suites?
,
Jan 15
Thanks xixuan@ I just kicked off a run on reef http://cautotest/afe/#tab_id=view_job&object_id=277289489 Duts are repairing stage now
,
Jan 16
,
Jan 16
hmm, sth keeps changing one of these reef DUTs to other pools. Just add chromeos6-row4-rack10-host13 back to pool:suites.
,
Jan 16
(6 days ago)
removing rbs ; thanks
,
Jan 16
(6 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/e9e737b0828e72542ff8757581c09f0f39b942f5 commit e9e737b0828e72542ff8757581c09f0f39b942f5 Author: Xixuan Wu <xixuan@google.com> Date: Wed Jan 16 20:42:37 2019 autotest: Force paygen test to run in autotest. BUG=chromium:920393 TEST=None Change-Id: I8b6f16d0f0398c5c2688ac33d65562c281a43636 Reviewed-on: https://chromium-review.googlesource.com/c/1415912 Commit-Queue: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Trybot-Ready: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/e9e737b0828e72542ff8757581c09f0f39b942f5/cbuildbot/stages/release_stages.py
,
Jan 16
(6 days ago)
Migrate hosts back to autotest in pool:bvt: atest host migrate chromeos6-row6-rack3-host20 chromeos4-row7-rack10-host1 chromeos6-row4-rack9-host19 --rollback --env prod
,
Jan 16
(6 days ago)
Re #39, 2 of them are broken, b/122968846 to fix them.
,
Jan 16
(6 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/7f4fca60eabb6aba7d67ebdb81df63190e36831d commit 7f4fca60eabb6aba7d67ebdb81df63190e36831d Author: Xixuan Wu <xixuan@chromium.org> Date: Wed Jan 16 21:57:47 2019 Revert "autotest: Force paygen test to run in autotest." This reverts commit e9e737b0828e72542ff8757581c09f0f39b942f5. Reason for revert: <INSERT REASONING HERE> Original change's description: > autotest: Force paygen test to run in autotest. > > BUG=chromium:920393 > TEST=None > > Change-Id: I8b6f16d0f0398c5c2688ac33d65562c281a43636 > Reviewed-on: https://chromium-review.googlesource.com/c/1415912 > Commit-Queue: Xixuan Wu <xixuan@chromium.org> > Tested-by: Xixuan Wu <xixuan@chromium.org> > Trybot-Ready: Xixuan Wu <xixuan@chromium.org> > Reviewed-by: Xixuan Wu <xixuan@chromium.org> Bug: chromium:920393 Change-Id: I301b16cda8e2bac59a753eb1c3f02063729b564d Reviewed-on: https://chromium-review.googlesource.com/c/1416430 Reviewed-by: Xixuan Wu <xixuan@chromium.org> Commit-Queue: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Trybot-Ready: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/7f4fca60eabb6aba7d67ebdb81df63190e36831d/cbuildbot/stages/release_stages.py
,
Jan 17
(5 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/d8a4f93b2aea63ff47d3be36b73617ea705d97db commit d8a4f93b2aea63ff47d3be36b73617ea705d97db Author: Xixuan Wu <xixuan@google.com> Date: Thu Jan 17 19:11:55 2019 PaygenTest: Force paygen test to run in autotest. This CL will be reverted after client-side (skylab) is able to run paygen tests. BUG=chromium:920393 TEST=None Change-Id: I17f32aedcc883313b01d0735141cda9c95dbf5ee Reviewed-on: https://chromium-review.googlesource.com/c/1416412 Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> Commit-Queue: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> [modify] https://crrev.com/d8a4f93b2aea63ff47d3be36b73617ea705d97db/cbuildbot/stages/release_stages_unittest.py [modify] https://crrev.com/d8a4f93b2aea63ff47d3be36b73617ea705d97db/cbuildbot/stages/release_stages.py
,
Jan 18
(4 days ago)
Are the jobs now running in autotest? Once they are runninable in autotest (i.e. we have DUTs in autotest) and are being run there by the builders, lower the priority of this bug to P1. I also request to remove the BlockingAuSignoff labelat that point. If there are failures in those tests that need to be chased, that are BlockingAuSignoff, use separate specific bugs for those (this bug is only about running the tests, not about whether cros passes those tests). We will use this bug (as P1) to implement paygen testing in Skylab in the following 2-3 weeks.
,
Jan 18
(4 days ago)
,
Jan 18
(4 days ago)
Yes, it looks like these are running in autotest (today's canary): https://screenshot.googleplex.com/2otVP1wL9Xr
,
Jan 18
(4 days ago)
OK cool. Looks like we are unblocked for now
,
Today
(4 hours ago)
A passed autoupdateEnd2End test with test args: https://chromium-swarm-dev.appspot.com/task?id=4290f317cf039110&refresh=10&request_detail=true Need to figure out why some of the params are None: 01/22 16:36:27.823 DEBUG|autoupdate_EndToEn:0366| The test configuration supplied: {'source_payload_uri': 'gs://chromeos-releases/beta-channel/nyan-blaze/11316.66.0/payloads/chromeos_11316.66.0_nyan-blaze_beta-channel_full_test.bin-625e4d287611f1ecce85af1ef4d30a75', 'name': None, 'target_archive_uri': None, 'target_release': '11316.66.0', 'source_release': '11316.66.0', 'update_type': None, 'source_archive_uri': 'gs://chromeos-releases/beta-channel/nyan-blaze/11316.66.0', 'target_payload_uri': 'gs://chromeos-releases/beta-channel/nyan-blaze/11316.66.0/payloads/chromeos_11316.66.0_nyan-blaze_beta-channel_full_test.bin-625e4d287611f1ecce85af1ef4d30a75'}
,
Today
(3 hours ago)
I kick off another one successfully: https://chromium-swarm-dev.appspot.com/task?id=42913c495ec04e10&refresh=10&request_detail=true Looks like this time update_type is correctly recognized, but name is not. The first arg in test-args are eaten by some reasons. |
|||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by xixuan@chromium.org
, Jan 9