New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 920393 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on: View detail
issue 921262
issue 921263
issue 919258
issue 923151



Sign in to add a comment

[reef] paygen_au suites not running on reef

Project Member Reported by leecy@google.com, Jan 9

Issue description

During investigation into b/122599583, it looks like reef is not running any paygen_au suites (possibly since M70).  Not sure if this is related to the unique lab deployment of reef devices or Skylab or some other reason.  

Here's a look at results in Skylab: https://stainless.corp.google.com/search?view=matrix&row=build&col=test&first_date=2018-12-30&last_date=2019-01-13&owner=chromeos-test&suite=%5CQpaygen_au_canary%5CE&model=electro&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=false&exclude_acts=true&exclude_retried=false&exclude_non_production=false

These show the tests are aborted.



 
Status: Started (was: Untriaged)
The problem is caused by we only enable skylab in HWTest stage. But paygen_au* suites are kicked off by PaygenTest**** stage.

I will fix this ASAP, after that I need to check whether there're other stages to schedule suite instead of HWTest Stage.
[hi on] xixuan@xixuan0:~/chromiumos/chromite$ git grep '\.RunHWTestSuite(' | grep -v unittest
cbuildbot/stages/test_stages.py:    cmd_result = commands.RunHWTestSuite(
lib/paygen/paygen_build_lib.py:  cmd_result = commands.RunHWTestSuite(

Seems paygen test is the only hole here.
this is worse than it sounds. iirc, PaygentTest*** dynamically generates control files and injects them into the database somehow. The test then executes these control file blobs from the DB.

In skylab, there is no way to generate arbitrary control files to execute. Users must request a test to be run by the name, and skylab searches for the control file before execution.
Cc: ayatane@chromium.org
Re #3, I don't see it's too different from common suite. From my understanding, PaygenBuildStage, e.g. PaygenBuildCanary is generating the arbitrary control files and upload them to google storage.

https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8940429518364006000/+/steps/PaygenBuildCanary/0/stdout

PaygenTestStage, e.g. PaygenTestCanary, kicks off a suite, which use devserver to stage control files, parse and schedule child tests...

https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8940429518364006000/+/steps/PaygenTestCanary/0/stdout

Finally each child test, also calls devserver to stage control files, parse and execute them. Maybe skylab need to add some special cases to handle the arbitrary test names. + @ayatane to estimate whether skylab needs extra work.
We may as well enable it and see if it works, since it's not even running now.

I still need to wrap my head around what paygen is doing, but Skylab can run a test given only the name, if the test is actually packaged with the build.
After taking a closer look, the only difference in the "generated" control file from the standard autoupdate_EndToEndTest test is that some stuff get appended to the test name, e.g. autoupdate_EndToEndTest_paygen_au_canary_full_10896.0.0

(And the DOC string is different for some reason, probably because I'm looking at a different branch.)

We can just go ahead with autoupdate_EndToEndTest as the test name and everything should work.  I think the infra side has no problem with it, unsure if downstream depends on it.

Even if downstream depends on it, I think the onus is on downstream to fix whatever weird thing they are doing, with help from us as needed of course.
Downstream does depend on different names, because the test is actually different depending upon the name, these generate payload specific tests.  It's not OK to consider the results of one test as pass/fail aggregated.
Cc: dgarr...@chromium.org
There are some args that get injected into the control file, which is troublesome.  I think we can derive all/most of them from the cros-version though.

name = 'paygen_au_dev'
update_type = 'full'
source_release = '10896.0.0'
target_release = '10896.0.0'
target_payload_uri = 'gs://chromeos-releases/dev-channel/link/10896.0.0/payloads/chromeos_10896.0.0_link_dev-channel_full_test.bin-fe24000b79a54945fe8fe4b70a1f9ba6'
SUITE = 'paygen_au_dev'
source_payload_uri = 'gs://chromeos-releases/dev-channel/link/10896.0.0/payloads/chromeos_10896.0.0_link_dev-channel_full_test.bin-fe24000b79a54945fe8fe4b70a1f9ba6'
source_archive_uri = 'gs://chromeos-releases/dev-channel/link/10896.0.0'

I don't think we've ever settled how we want to deal with parametrized tests.  Might be good to sync up with Tast to see if they've decided on the issue too.
Cc: ahass...@chromium.org yshaul@chromium.org
First effort:

make paygenTest to kick off skylab suite: https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1404494/

Skylab suite is able to kick off paygen_au_* suite after https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1406881 is merged.

But it cannot run due to: https://stainless.corp.google.com/browse/chromeos-autotest-results/swarming-4256be391a974411/

Using client trampoline because of: Failed to find any control files with NAME autoupdate_EndToEndTest_paygen_au_beta_full_11316.66.0
...
01/11 09:34:00.941 ERROR|         traceback:0013|   File "/usr/local/autotest/results/lxc_job_folder/control.srv", line 11, in _client_trampoline
01/11 09:34:00.942 ERROR|         traceback:0013|     path = job.stage_control_file(trampoline_testname)
01/11 09:34:00.942 ERROR|         traceback:0013| AttributeError: 'server_job' object has no attribute 'stage_control_file'
...
Project Member

Comment 13 by bugdroid1@chromium.org, Jan 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/78ef6f913119134c2794c0342c1a0c02f7c9c656

commit 78ef6f913119134c2794c0342c1a0c02f7c9c656
Author: Xixuan Wu <xixuan@google.com>
Date: Fri Jan 11 21:32:10 2019

cbuildbot: Refactor logics of detecting whether hwtest is run in skylab.

BUG=chromium:920393
TEST=Tryjob.

Change-Id: I0f4e53378a70fde9a2d656d0a99d27ba8ab32d45
Reviewed-on: https://chromium-review.googlesource.com/1403884
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/78ef6f913119134c2794c0342c1a0c02f7c9c656/lib/constants.py
[modify] https://crrev.com/78ef6f913119134c2794c0342c1a0c02f7c9c656/lib/config_lib.py
[modify] https://crrev.com/78ef6f913119134c2794c0342c1a0c02f7c9c656/cbuildbot/stages/test_stages.py

Project Member

Comment 14 by bugdroid1@chromium.org, Jan 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/d3797f1399361a1b170cecdfb0b78933a7a9e049

commit d3797f1399361a1b170cecdfb0b78933a7a9e049
Author: Xixuan Wu <xixuan@google.com>
Date: Fri Jan 11 21:32:11 2019

SkylabHWTest: Add job_keyvals to skylab.

BUG=chromium:920393
TEST=Tryjob.

Change-Id: I14862079285853f633a1723c0832c514cb569606
Reviewed-on: https://chromium-review.googlesource.com/1403885
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>

[modify] https://crrev.com/d3797f1399361a1b170cecdfb0b78933a7a9e049/cbuildbot/commands.py
[modify] https://crrev.com/d3797f1399361a1b170cecdfb0b78933a7a9e049/cbuildbot/stages/test_stages.py

Project Member

Comment 15 by bugdroid1@chromium.org, Jan 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/6ef1585f56c3ccc92ba51005beafbb192e6130fe

commit 6ef1585f56c3ccc92ba51005beafbb192e6130fe
Author: Xixuan Wu <xixuan@google.com>
Date: Fri Jan 11 21:32:11 2019

PaygenStage: Enable skylab in paygen test stage.

BUG=chromium:920393
TEST=Tryjob.

Change-Id: I3175f29ba62b384bf1f123154351489eae8e6091
Reviewed-on: https://chromium-review.googlesource.com/1404494
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Yaakov Shaul <yshaul@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: C Shapiro <shapiroc@chromium.org>

[modify] https://crrev.com/6ef1585f56c3ccc92ba51005beafbb192e6130fe/cbuildbot/stages/release_stages_unittest.py
[modify] https://crrev.com/6ef1585f56c3ccc92ba51005beafbb192e6130fe/lib/paygen/paygen_build_lib.py
[modify] https://crrev.com/6ef1585f56c3ccc92ba51005beafbb192e6130fe/cbuildbot/stages/release_stages.py

Blockedon: 919258
Blockedon: 921262
Blockedon: 921263
Plan is to

- Modify paygen stage to directly create skylab tasks (no suite involved), with test arguments for the 4 differnt tasks created.
  - paygen stage can create the tests with the names as required by downstream test consumers.
- Modify skylab_swarming_worker, lucifer to passthrough test arguments to autoupdate_EndToEndTest
we will need to work with downstream results consumers because the test name will be the same across the four tasks generated by the builder.
We could include the test arguments as some TKO entry so that it is available for disambiguation.
As a result, PaygenBuildStage no need to generate arbitrary control files I think. @dgarrett, does that make sense?
Re: comment #20, this is a pretty big deal for us because it's not clear how we will correlate the different results (and they are different). 
Re #21, I mean PaygenBuildStage has no need to generate arbitrary control files in skylab. In autotest it will keep the same.

Talked with @don offline, he thinks it makes more sense but @ahassani is the current owner of paygen. 
I set up a meeting to discuss this on Monday (even though I am at an offsite) because this will break GE paygen results reporting in a significant way.
Labels: BlockingAuSignoff
Labels: ReleaseBlock-Stable M-71
Summary: [reef] paygen_au suites not running on reef (was: paygen_au suites not running on reef)
Note that this is blocking reef for being included in the M71 Chrome OS Stable refresh.  
Labels: -Pri-1 Pri-0
Blocking stable releases qualifies as an emergency P0.
Project Member

Comment 29 by bugdroid1@chromium.org, Jan 15

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/9179c3dcf797b7cc2a458d15656a2b3d7764c3bb

commit 9179c3dcf797b7cc2a458d15656a2b3d7764c3bb
Author: Allen Li <ayatane@chromium.org>
Date: Tue Jan 15 20:55:55 2019

skylab_swarming_worker: Add -test-args flag

R=pprabhu@chromium.org

Bug: 920393
Change-Id: I9c342e800c49682497939659930a90063fba0695
Reviewed-on: https://chromium-review.googlesource.com/c/1409369
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Commit-Queue: Allen Li <ayatane@chromium.org>
Cr-Commit-Position: refs/heads/master@{#19994}
[modify] https://crrev.com/9179c3dcf797b7cc2a458d15656a2b3d7764c3bb/go/src/infra/cmd/skylab_swarming_worker/internal/lucifer/lucifer.go
[modify] https://crrev.com/9179c3dcf797b7cc2a458d15656a2b3d7764c3bb/go/src/infra/cmd/skylab_swarming_worker/main.go

OK, let's first get this to *not* be a P0.

Xixuan: Can you move 3 reef DUTs back to autotest, and work with leecy@ / dhaddock@ to make sure we get the testing needed for reef M71 stable release to work with those reef DUTs.

Making long-term design decisions for Skylab with a fire under our butt will not get us a happy path in the future. Let's release the pressure some.
In early talk with @leecy I notice that nyan_blaze is not experiencing this problem because TPM manually kick off these paygen suites to pool:suites. So I will migrate some DUTs back for reef to pool:suites.
[hi on] xixuan@xixuan0:~/chromiumos/infra_internal/skylab_inventory$ dut-status -p suites -b reef
hostname                       S   last checked         URL
chromeos6-row4-rack10-host13   OK  2019-01-15 14:50:28  https://stainless.corp.google.com/browse/chromeos-autotest-results/hosts/chromeos6-row4-rack10-host13/1998957-repair/
chromeos6-row3-rack10-host1    --  ---                  ---
chromeos6-row4-rack10-host12   OK  2019-01-15 14:50:57  https://stainless.corp.google.com/browse/chromeos-autotest-results/hosts/chromeos6-row4-rack10-host12/1998960-repair/

2 reef suites DUTs are working. File b/122912651 to fix chromeos6-row3-rack10-host1.

So...who should I contact to manually trigger paygen suites?

Comment 33 Deleted

Thanks xixuan@

I just kicked off a run on reef

http://cautotest/afe/#tab_id=view_job&object_id=277289489

Duts are repairing stage now 

hmm, sth keeps changing one of these reef DUTs to other pools. Just add chromeos6-row4-rack10-host13 back to pool:suites.

Comment 37 by kbleicher@google.com, Jan 16 (6 days ago)

Labels: -ReleaseBlock-Stable
removing rbs ; thanks
Project Member

Comment 38 by bugdroid1@chromium.org, Jan 16 (6 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/e9e737b0828e72542ff8757581c09f0f39b942f5

commit e9e737b0828e72542ff8757581c09f0f39b942f5
Author: Xixuan Wu <xixuan@google.com>
Date: Wed Jan 16 20:42:37 2019

autotest: Force paygen test to run in autotest.

BUG=chromium:920393
TEST=None

Change-Id: I8b6f16d0f0398c5c2688ac33d65562c281a43636
Reviewed-on: https://chromium-review.googlesource.com/c/1415912
Commit-Queue: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Trybot-Ready: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/e9e737b0828e72542ff8757581c09f0f39b942f5/cbuildbot/stages/release_stages.py

Comment 39 by xixuan@chromium.org, Jan 16 (6 days ago)

Migrate hosts back to autotest in pool:bvt:

atest host migrate chromeos6-row6-rack3-host20 chromeos4-row7-rack10-host1 chromeos6-row4-rack9-host19 --rollback --env prod

Comment 40 by xixuan@chromium.org, Jan 16 (6 days ago)

Re #39, 2 of them are broken, b/122968846 to fix them.
Project Member

Comment 41 by bugdroid1@chromium.org, Jan 16 (6 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/7f4fca60eabb6aba7d67ebdb81df63190e36831d

commit 7f4fca60eabb6aba7d67ebdb81df63190e36831d
Author: Xixuan Wu <xixuan@chromium.org>
Date: Wed Jan 16 21:57:47 2019

Revert "autotest: Force paygen test to run in autotest."

This reverts commit e9e737b0828e72542ff8757581c09f0f39b942f5.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> autotest: Force paygen test to run in autotest.
> 
> BUG=chromium:920393
> TEST=None
> 
> Change-Id: I8b6f16d0f0398c5c2688ac33d65562c281a43636
> Reviewed-on: https://chromium-review.googlesource.com/c/1415912
> Commit-Queue: Xixuan Wu <xixuan@chromium.org>
> Tested-by: Xixuan Wu <xixuan@chromium.org>
> Trybot-Ready: Xixuan Wu <xixuan@chromium.org>
> Reviewed-by: Xixuan Wu <xixuan@chromium.org>

Bug: chromium:920393
Change-Id: I301b16cda8e2bac59a753eb1c3f02063729b564d
Reviewed-on: https://chromium-review.googlesource.com/c/1416430
Reviewed-by: Xixuan Wu <xixuan@chromium.org>
Commit-Queue: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Trybot-Ready: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/7f4fca60eabb6aba7d67ebdb81df63190e36831d/cbuildbot/stages/release_stages.py

Project Member

Comment 42 by bugdroid1@chromium.org, Jan 17 (5 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/d8a4f93b2aea63ff47d3be36b73617ea705d97db

commit d8a4f93b2aea63ff47d3be36b73617ea705d97db
Author: Xixuan Wu <xixuan@google.com>
Date: Thu Jan 17 19:11:55 2019

PaygenTest: Force paygen test to run in autotest.

This CL will be reverted after client-side (skylab) is able to run
paygen tests.

BUG=chromium:920393
TEST=None

Change-Id: I17f32aedcc883313b01d0735141cda9c95dbf5ee
Reviewed-on: https://chromium-review.googlesource.com/c/1416412
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Commit-Queue: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>

[modify] https://crrev.com/d8a4f93b2aea63ff47d3be36b73617ea705d97db/cbuildbot/stages/release_stages_unittest.py
[modify] https://crrev.com/d8a4f93b2aea63ff47d3be36b73617ea705d97db/cbuildbot/stages/release_stages.py

Comment 43 by pprabhu@chromium.org, Jan 18 (4 days ago)

Are the jobs now running in autotest?
Once they are runninable in autotest (i.e. we have DUTs in autotest) and are being run there by the builders, lower the priority of this bug to P1.
I also request to remove the BlockingAuSignoff labelat that point.

If there are failures in those tests that need to be chased, that are BlockingAuSignoff, use separate specific bugs for those (this bug is only about running the tests, not about whether cros passes those tests).

We will use this bug (as P1) to implement paygen testing in Skylab in the following 2-3 weeks.

Comment 44 by pprabhu@chromium.org, Jan 18 (4 days ago)

Blockedon: 923151

Comment 45 by leecy@google.com, Jan 18 (4 days ago)

Yes, it looks like these are running in autotest (today's canary): https://screenshot.googleplex.com/2otVP1wL9Xr

Comment 46 by dhadd...@chromium.org, Jan 18 (4 days ago)

Labels: -Pri-0 -BlockingAuSignoff Pri-1
OK cool. Looks like we are unblocked for now 

Comment 47 by xixuan@chromium.org, Today (4 hours ago)

A passed autoupdateEnd2End test with test args:

https://chromium-swarm-dev.appspot.com/task?id=4290f317cf039110&refresh=10&request_detail=true

Need to figure out why some of the params are None:

01/22 16:36:27.823 DEBUG|autoupdate_EndToEn:0366| The test configuration supplied: {'source_payload_uri': 'gs://chromeos-releases/beta-channel/nyan-blaze/11316.66.0/payloads/chromeos_11316.66.0_nyan-blaze_beta-channel_full_test.bin-625e4d287611f1ecce85af1ef4d30a75', 'name': None, 'target_archive_uri': None, 'target_release': '11316.66.0', 'source_release': '11316.66.0', 'update_type': None, 'source_archive_uri': 'gs://chromeos-releases/beta-channel/nyan-blaze/11316.66.0', 'target_payload_uri': 'gs://chromeos-releases/beta-channel/nyan-blaze/11316.66.0/payloads/chromeos_11316.66.0_nyan-blaze_beta-channel_full_test.bin-625e4d287611f1ecce85af1ef4d30a75'}

Comment 48 by xixuan@chromium.org, Today (3 hours ago)

I kick off another one successfully:

https://chromium-swarm-dev.appspot.com/task?id=42913c495ec04e10&refresh=10&request_detail=true

Looks like this time update_type is correctly recognized, but name is not. The first arg in test-args are eaten by some reasons.

Sign in to add a comment