All boards in auron,kunimitsu,strago families have RED build status on ToT(61). OOB suites do not run on them. |
||||||||||
Issue descriptionGE builds view - https://screenshot.googleplex.com/MtSNtzknJJA Happening since 2017-06-23 9679.0.0 / 61.0.3138.0 Red status stages: - PaygenTestDev - all boards - HW tests - lots of the boards Other families has more or less of the boards in the same state. OOB tests suites do not run on the boards with red build state - this is all time issue for me to track test results based on which test team prioritizes daily work. Can the dependency for the OOB suites on PayGen tests be removed?
,
Jun 26 2017
,
Jun 26 2017
I went and looked at the history of one Paygen test failure:
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=125175082
That test ran on chromeos4-row8-rack4-host18. Looking at the DUT's history,
you see this:
chromeos4-row8-rack4-host18
2017-06-25 23:59:35 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack4-host18/959995-repair/
2017-06-25 23:57:10 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack4-host18/959991-reset/
2017-06-25 23:17:13 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/125175082-chromeos-test/
2017-06-25 23:16:49 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack4-host18/959965-reset/
That log shows that after running the test, the DUT failed reset
testing, and required repair. The repair logs are here:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row8-rack4-host18/959995-repair/20172506235933/
The status.log from the repair task shows that the DUT was offline at
the start of repair, and required reinstallation from USB before it would
work.
This sort of symptom is caused by product, not infra bugs: Passing to
a sheriff for more evaluation.
Escalating, because this doesn't look like something we can afford to
ignore.
,
Jun 26 2017
,
Jun 26 2017
,
Jun 26 2017
,
Jun 26 2017
09:10:05: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' stat -- gs://chromeos-releases/canary-channel/auron-paine/9687.0.0/payloads/signing/28791-140248560363328/1.payload.hash.update_signer.signed.bin 09:10:05: WARNING: GS_ERROR: No URLs matched: gs://chromeos-releases/canary-channel/auron-paine/9687.0.0/payloads/signing/28791-140248560363328/1.payload.hash.update_signer.signed.bin
,
Jun 26 2017
It seems to generate and sign the payloads OK. Then PaygetnTestCanary says this: 09:13:16: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpnOa2xh/tmpTA3zxf/temp_summary.json --raw-cmd --task-name auron_paine-release/R61-9687.0.0-paygen_au_canary --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:paygen_au_canary' '--tags=build:auron_paine-release/R61-9687.0.0' '--tags=task_name:auron_paine-release/R61-9687.0.0-paygen_au_canary' '--tags=board:auron_paine' -- /usr/local/autotest/site_utils/run_suite.py --build auron_paine-release/R61-9687.0.0 --board auron_paine --suite_name paygen_au_canary --pool bvt --file_bugs True --priority Build --timeout_mins 180 --retry True --suite_min_duts 2 -m 125235548 @@@STEP_FAILURE@@@ 10:02:36: ERROR: Timeout occurred- waited 27622 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time. https://uberchromegw.corp.google.com/i/chromeos/builders/auron_paine-release/builds/1249/steps/PaygenTestCanary/logs/stdio
,
Jun 26 2017
> 10:02:36: ERROR: Timeout occurred- waited 27622 seconds, failing. > Timeout reason: This build has reached the timeout deadline set > by the master. Either this stage or a previous one took too long > (see stage timing historical summary in ReportStage) or the build > failed to start on time. My assumption is that this is a downstream impact of the real problem. The logs from the test suite show failures, DUTs being forced into repair, and aborts. I expect that this is the causal chain: * DUT goes offline, as described in bug 736807 . * The offline DUT causes a test failure. * The offline DUT (and the failure) force repair. * The time required to complete repair means that the test suite times out, and some tests abort. * The timeout on the Autotest side shows up as the builder message above.
,
Jun 26 2017
https://bugs.chromium.org/p/chromium/issues/detail?id=722603#c29 Looks like sentry-release is also affected by this
,
Jun 26 2017
,
Jun 27 2017
I believe the PaygenTestDev failure is fixed by the toolchain revert. sentry-release just had a green run. https://uberchromegw.corp.google.com/i/chromeos/builders/sentry-release/builds/1252 I'm closing this since I believe the root cause is fixed.
,
Jun 27 2017
Thanks, Yes, most of these failures are gone. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by jrbarnette@chromium.org
, Jun 26 2017