New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 738036 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 746997
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression

Blocked on:
issue 746997



Sign in to add a comment

quawks-release seldom passes

Project Member Reported by sjg@google.com, Jun 29 2017

Issue description

This shows a very low success rate, perhaps 5-10%.

We should fix it, or if this builder is not needed, remove it.


https://luci-milo.appspot.com/buildbot/chromeos/quawks-release/?limit=200


quawks-release:1543 failed

Builders failed on: 
- quawks-release: 
  https://luci-milo.appspot.com/buildbot/chromeos/quawks-release/1543



 

Comment 1 by sjg@chromium.org, Jun 30 2017

Owner: sjg@chromium.org
Status: Started (was: Available)
Will dig into this

Comment 2 by sjg@chromium.org, Jun 30 2017

Looking at:

https://uberchromegw.corp.google.com/i/chromeos/builders/quawks-release/builds/1548/steps/PaygenTestCanary/logs/stdio

  autoupdate_EndToEndTest.paygen_au_canary_full    [ FAILED ]
  autoupdate_EndToEndTest.paygen_au_canary_full      ABORT: Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test., Host did not return from reboot
  autoupdate_EndToEndTest.paygen_au_canary_full      retry_count: 1


  autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/125993103-chromeos-test/
  autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/126007864-chromeos-test/
  

   06-30-2017 [07:30:29] Output below this line is for buildbot consumption:
  @@@STEP_LINK@[Auto-Bug]: autoupdate_EndToEndTest.paygen_au_canary_full: retry_count: 1, ABORT: Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test., Host did not return from reboot, 168 reports@https://code.google.com/p/chromium/issues/detail?id=688719@@@
  @@@STEP_LINK@[Test-Logs]: autoupdate_EndToEndTest.paygen_au_canary_full: retry_count: 1, ABORT: Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test., Host did not return from reboot@http://cautotest/tko/retrieve_logs.cgi?job=/results/126007864-chromeos-test/@@@
  @@@STEP_LINK@[Flake-Dashboard]: autoupdate_EndToEndTest.paygen_au_canary_full@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=autoupdate_EndToEndTest.paygen_au_canary_full@@@
  Will return from run_suite with status: ERROR



The same information appears to be repeated twice in the log. Filed  crbug.com/738545  for that.


Comment 3 by sjg@chromium.org, Jun 30 2017

Owner: jrbarnette@chromium.org
Status: Available (was: Started)
First paygen_au_canary_full looks OK. Not sure why it retried.

Second paygen_au_canary_full link has this:

  AutoservRebootError: Host did not return from reboot
	ABORT	----	----	timestamp=1498832725	localtime=Jun 30 07:25:25	Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test.
END ABORT	----	----	timestamp=1498832725	localtime=Jun 30 07:25:25	
	ABORT	----	----	timestamp=1498832735	localtime=Jun 30 07:25:35	Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test.
END ABORT	----	----	timestamp=1498832735	localtime=Jun 30 07:25:35	
	FAIL	autoupdate_EndToEndTest.paygen_au_canary_full	autoupdate_EndToEndTest.paygen_au_canary_full	timestamp=1498832743	localtime=Jun 30 07:25:43	Unhandled AutoservRebootError: Host did not return from reboot


Not sure what to do with this one.
For reasons I can't explain, the quawks shard is running
slow.  That causes the suite to abort before it completes.

I'm having trouble extracting host history to see what's
happening.  More to come.

There's something wrong with the quawks shard; possibly, the
database isn't in sync with the master.

A recent job on the master:
    https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=125956880
The same job as known to the shard:
    http://chromeos-server48.hot.corp.google.com/afe/#tab_id=view_job&object_id=125956880

On the shard, the job shows no TKO results, yet the master knows
about the results.

Also curious is the host history page:
    https://ubercautotest.corp.google.com/afe/#tab_id=view_host&object_id=2037

Here's what the database says about that host's history:

mysql> select job_id, started_on, finished_on from afe_host_queue_entries where host_id=2037 and started_on >= '2017-06-30 10:00' order by started_on;
+-----------+---------------------+---------------------+
| job_id    | started_on          | finished_on         |
+-----------+---------------------+---------------------+
| 125957179 | 2017-06-30 10:03:03 | 2017-06-30 10:06:35 |
| 125957208 | 2017-06-30 10:07:18 | 2017-06-30 10:09:25 |
| 125957220 | 2017-06-30 10:10:08 | 2017-06-30 10:13:20 |
| 125957223 | 2017-06-30 10:14:08 | 2017-06-30 10:16:59 |
| 125957238 | 2017-06-30 10:17:22 | 2017-06-30 10:19:21 |
| 125957250 | 2017-06-30 10:19:58 | 2017-06-30 10:24:38 |
| 125957257 | 2017-06-30 10:24:56 | 2017-06-30 10:26:59 |
| 125957263 | 2017-06-30 10:27:35 | 2017-06-30 10:31:19 |
| 125957270 | 2017-06-30 10:31:51 | 2017-06-30 10:33:48 |
| 126027349 | 2017-06-30 10:34:43 | 2017-06-30 10:36:55 |
| 125956935 | 2017-06-30 11:16:42 | 2017-06-30 11:21:15 |
| 125957046 | 2017-06-30 11:21:48 | 2017-06-30 11:23:42 |
| 125956880 | 2017-06-30 11:24:13 | 2017-06-30 12:04:53 |
| 125956961 | 2017-06-30 12:05:28 | 2017-06-30 12:08:16 |
| 125956965 | 2017-06-30 12:09:04 | 2017-06-30 12:12:21 |
| 125956990 | 2017-06-30 12:13:08 | 2017-06-30 12:16:40 |
| 125956998 | 2017-06-30 12:17:23 | 2017-06-30 12:20:41 |
| 125957038 | 2017-06-30 12:21:18 | 2017-06-30 12:25:16 |
| 125957086 | 2017-06-30 12:25:56 | 2017-06-30 12:29:54 |
| 126044361 | 2017-06-30 12:46:18 | 2017-06-30 12:58:04 |
| 126044366 | 2017-06-30 12:58:21 | 2017-06-30 13:00:35 |
| 126044368 | 2017-06-30 13:00:54 | 2017-06-30 13:02:34 |
| 126044370 | 2017-06-30 13:03:06 | 2017-06-30 13:05:17 |
| 126044372 | 2017-06-30 13:05:54 | 2017-06-30 13:07:39 |
+-----------+---------------------+---------------------+

There's a steady stream of jobs from 10:03 onward.  But the
AFE host page for both the master and the shard doesn't shows
no jobs after 6:58, until the job at 10:34, then skips jobs
until the job at 12:46.  Basically, history is incomplete.

Also, the dut-status command shows the same anomaly:

chromeos4-row10-rack10-host3
    2017-06-30 13:14:34  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570398-reset/
    2017-06-30 13:12:18  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570391-reset/
    2017-06-30 13:10:57  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044391-chromeos-test/
    2017-06-30 13:10:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570386-reset/
    2017-06-30 13:08:23  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044381-chromeos-test/
    2017-06-30 13:07:21  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570382-reset/
    2017-06-30 13:05:54  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044372-chromeos-test/
    2017-06-30 13:04:58  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570377-reset/
    2017-06-30 13:03:06  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044370-chromeos-test/
    2017-06-30 13:02:15  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570375-reset/
    2017-06-30 13:00:54  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044368-chromeos-test/
    2017-06-30 13:00:03  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570374-reset/
    2017-06-30 12:58:21  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044366-chromeos-test/
    2017-06-30 12:57:30  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570372-reset/
    2017-06-30 12:46:18  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126044361-chromeos-test/
    2017-06-30 12:29:45  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570343-provision/
    2017-06-30 12:24:58  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570333-reset/
    2017-06-30 12:20:22  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570326-reset/
    2017-06-30 12:16:21  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570319-reset/
    2017-06-30 12:12:05  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570310-reset/
    2017-06-30 12:07:57  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570306-reset/
    2017-06-30 12:04:37  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570305-reset/
    2017-06-30 11:23:22  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570265-reset/
    2017-06-30 11:20:40  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570258-reset/
    2017-06-30 11:00:24  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570242-provision/
    2017-06-30 10:40:54  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570218-repair/
    2017-06-30 10:36:36  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570211-provision/
    2017-06-30 10:34:43  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/126027349-chromeos-test/
    2017-06-30 10:33:31  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack10-host3/570195-reset/



Owner: pho...@chromium.org
Status: Assigned (was: Available)
phobbs@ - could this be related to the recent changes to the
database schema?

I'm guessing not; I don't think any other shard is doing this,
but I haven't dived in.

Comment 9 by pho...@chromium.org, Jun 30 2017

Will look into this this afternoon
Project Member

Comment 10 by sheriffbot@chromium.org, Jul 11 2017

Labels: Hotlist-Google
Will continue looking into it this week (was OOO last week)
Cc: -cernekee@chromium.org
Blockedon: 746997 748209
There's two components here: quawks rarely passes (possibly addressed by  crbug.com/746997 ) and the db inconsistencies (tracked by crbug.com/748209)
Mergedinto: 746997
Status: Duplicate (was: Assigned)
Blockedon: -748209

Sign in to add a comment