Issue metadata
Sign in to add a comment
|
quawks-release seldom passes |
||||||||||||||||||||||||
Issue descriptionThis shows a very low success rate, perhaps 5-10%. We should fix it, or if this builder is not needed, remove it. https://luci-milo.appspot.com/buildbot/chromeos/quawks-release/?limit=200 quawks-release:1543 failed Builders failed on: - quawks-release: https://luci-milo.appspot.com/buildbot/chromeos/quawks-release/1543
,
Jun 30 2017
Looking at: https://uberchromegw.corp.google.com/i/chromeos/builders/quawks-release/builds/1548/steps/PaygenTestCanary/logs/stdio autoupdate_EndToEndTest.paygen_au_canary_full [ FAILED ] autoupdate_EndToEndTest.paygen_au_canary_full ABORT: Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test., Host did not return from reboot autoupdate_EndToEndTest.paygen_au_canary_full retry_count: 1 autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/125993103-chromeos-test/ autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/126007864-chromeos-test/ 06-30-2017 [07:30:29] Output below this line is for buildbot consumption: @@@STEP_LINK@[Auto-Bug]: autoupdate_EndToEndTest.paygen_au_canary_full: retry_count: 1, ABORT: Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test., Host did not return from reboot, 168 reports@https://code.google.com/p/chromium/issues/detail?id=688719@@@ @@@STEP_LINK@[Test-Logs]: autoupdate_EndToEndTest.paygen_au_canary_full: retry_count: 1, ABORT: Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test., Host did not return from reboot@http://cautotest/tko/retrieve_logs.cgi?job=/results/126007864-chromeos-test/@@@ @@@STEP_LINK@[Flake-Dashboard]: autoupdate_EndToEndTest.paygen_au_canary_full@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=autoupdate_EndToEndTest.paygen_au_canary_full@@@ Will return from run_suite with status: ERROR The same information appears to be repeated twice in the log. Filed crbug.com/738545 for that.
,
Jun 30 2017
First paygen_au_canary_full looks OK. Not sure why it retried. Second paygen_au_canary_full link has this: AutoservRebootError: Host did not return from reboot ABORT ---- ---- timestamp=1498832725 localtime=Jun 30 07:25:25 Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test. END ABORT ---- ---- timestamp=1498832725 localtime=Jun 30 07:25:25 ABORT ---- ---- timestamp=1498832735 localtime=Jun 30 07:25:35 Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test. END ABORT ---- ---- timestamp=1498832735 localtime=Jun 30 07:25:35 FAIL autoupdate_EndToEndTest.paygen_au_canary_full autoupdate_EndToEndTest.paygen_au_canary_full timestamp=1498832743 localtime=Jun 30 07:25:43 Unhandled AutoservRebootError: Host did not return from reboot Not sure what to do with this one.
,
Jun 30 2017
For reasons I can't explain, the quawks shard is running slow. That causes the suite to abort before it completes. I'm having trouble extracting host history to see what's happening. More to come.
,
Jun 30 2017
There's something wrong with the quawks shard; possibly, the
database isn't in sync with the master.
A recent job on the master:
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=125956880
The same job as known to the shard:
http://chromeos-server48.hot.corp.google.com/afe/#tab_id=view_job&object_id=125956880
On the shard, the job shows no TKO results, yet the master knows
about the results.
,
Jun 30 2017
Also curious is the host history page:
https://ubercautotest.corp.google.com/afe/#tab_id=view_host&object_id=2037
Here's what the database says about that host's history:
mysql> select job_id, started_on, finished_on from afe_host_queue_entries where host_id=2037 and started_on >= '2017-06-30 10:00' order by started_on;
+-----------+---------------------+---------------------+
| job_id | started_on | finished_on |
+-----------+---------------------+---------------------+
| 125957179 | 2017-06-30 10:03:03 | 2017-06-30 10:06:35 |
| 125957208 | 2017-06-30 10:07:18 | 2017-06-30 10:09:25 |
| 125957220 | 2017-06-30 10:10:08 | 2017-06-30 10:13:20 |
| 125957223 | 2017-06-30 10:14:08 | 2017-06-30 10:16:59 |
| 125957238 | 2017-06-30 10:17:22 | 2017-06-30 10:19:21 |
| 125957250 | 2017-06-30 10:19:58 | 2017-06-30 10:24:38 |
| 125957257 | 2017-06-30 10:24:56 | 2017-06-30 10:26:59 |
| 125957263 | 2017-06-30 10:27:35 | 2017-06-30 10:31:19 |
| 125957270 | 2017-06-30 10:31:51 | 2017-06-30 10:33:48 |
| 126027349 | 2017-06-30 10:34:43 | 2017-06-30 10:36:55 |
| 125956935 | 2017-06-30 11:16:42 | 2017-06-30 11:21:15 |
| 125957046 | 2017-06-30 11:21:48 | 2017-06-30 11:23:42 |
| 125956880 | 2017-06-30 11:24:13 | 2017-06-30 12:04:53 |
| 125956961 | 2017-06-30 12:05:28 | 2017-06-30 12:08:16 |
| 125956965 | 2017-06-30 12:09:04 | 2017-06-30 12:12:21 |
| 125956990 | 2017-06-30 12:13:08 | 2017-06-30 12:16:40 |
| 125956998 | 2017-06-30 12:17:23 | 2017-06-30 12:20:41 |
| 125957038 | 2017-06-30 12:21:18 | 2017-06-30 12:25:16 |
| 125957086 | 2017-06-30 12:25:56 | 2017-06-30 12:29:54 |
| 126044361 | 2017-06-30 12:46:18 | 2017-06-30 12:58:04 |
| 126044366 | 2017-06-30 12:58:21 | 2017-06-30 13:00:35 |
| 126044368 | 2017-06-30 13:00:54 | 2017-06-30 13:02:34 |
| 126044370 | 2017-06-30 13:03:06 | 2017-06-30 13:05:17 |
| 126044372 | 2017-06-30 13:05:54 | 2017-06-30 13:07:39 |
+-----------+---------------------+---------------------+
There's a steady stream of jobs from 10:03 onward. But the
AFE host page for both the master and the shard doesn't shows
no jobs after 6:58, until the job at 10:34, then skips jobs
until the job at 12:46. Basically, history is incomplete.
,
Jun 30 2017
Also, the dut-status command shows the same anomaly:
chromeos4-row10-rack10-host3
2017-06-30 13:14:34 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570398-reset/
2017-06-30 13:12:18 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570391-reset/
2017-06-30 13:10:57 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044391-chromeos-test/
2017-06-30 13:10:07 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570386-reset/
2017-06-30 13:08:23 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044381-chromeos-test/
2017-06-30 13:07:21 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570382-reset/
2017-06-30 13:05:54 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044372-chromeos-test/
2017-06-30 13:04:58 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570377-reset/
2017-06-30 13:03:06 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044370-chromeos-test/
2017-06-30 13:02:15 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570375-reset/
2017-06-30 13:00:54 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044368-chromeos-test/
2017-06-30 13:00:03 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570374-reset/
2017-06-30 12:58:21 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044366-chromeos-test/
2017-06-30 12:57:30 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570372-reset/
2017-06-30 12:46:18 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044361-chromeos-test/
2017-06-30 12:29:45 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570343-provision/
2017-06-30 12:24:58 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570333-reset/
2017-06-30 12:20:22 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570326-reset/
2017-06-30 12:16:21 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570319-reset/
2017-06-30 12:12:05 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570310-reset/
2017-06-30 12:07:57 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570306-reset/
2017-06-30 12:04:37 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570305-reset/
2017-06-30 11:23:22 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570265-reset/
2017-06-30 11:20:40 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570258-reset/
2017-06-30 11:00:24 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570242-provision/
2017-06-30 10:40:54 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570218-repair/
2017-06-30 10:36:36 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570211-provision/
2017-06-30 10:34:43 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126027349-chromeos-test/
2017-06-30 10:33:31 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570195-reset/
,
Jun 30 2017
phobbs@ - could this be related to the recent changes to the database schema? I'm guessing not; I don't think any other shard is doing this, but I haven't dived in.
,
Jun 30 2017
Will look into this this afternoon
,
Jul 11 2017
,
Jul 17 2017
Will continue looking into it this week (was OOO last week)
,
Jul 17 2017
,
Jul 24 2017
There's two components here: quawks rarely passes (possibly addressed by crbug.com/746997 ) and the db inconsistencies (tracked by crbug.com/748209)
,
Jul 31 2017
,
Sep 7 2017
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by sjg@chromium.org
, Jun 30 2017Status: Started (was: Available)