New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 715368 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Weekly runs of control.power_daily contain lots of "MISSING_TEST"

Project Member Reported by dbasehore@chromium.org, Apr 26 2017

Issue description

It seems that all weekly runs on release branches of control.power_daily are aborted with no DUT provisioned for the test suite. Can someone from Infra help diagnose what the actual problem is?
 

Comment 1 by tbroch@chromium.org, Apr 26 2017

Labels: Restrict-View-Google
Suspect this might have been related to simiilar aborts for crbug.com/715367

Here's the first aborted one I believe (4/22)
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=113798749

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/113798749-chromeos-test/hostless/

status.log,

INFO	----	----	Job aborted by autotest_system on 2017-04-24 12:03:17

And from autoserv.INFO it does look like log output just stops ...

04/24 11:45:08.099 INFO |          autoserv:0687| Results placed in /usr/local/autotest/results/113798749-chromeos-test/hostless
04/24 11:45:08.099 INFO |           pidfile:0016| Logged pid 23422 to /usr/local/autotest/results/113798749-chromeos-test/hostless/.autoserv_execute
04/24 11:45:08.225 INFO |    connectionpool:0188| Starting new HTTP connection (1): metadata.google.internal
04/24 11:45:08.447 NOTIC|      cros_logging:0037| ts_mon was set up.
04/24 11:45:08.539 INFO |        server_job:0719| I am PID 23422
04/24 11:45:08.544 WARNI|        subcommand:0081| parallel_simple was called with an empty arglist, did you forget to pass in a list of machines?
04/24 11:45:08.544 WARNI|        server_job:0776| Not checking if job_repo_url contains autotest packages on []
04/24 11:45:08.545 INFO |        server_job:0799| Processing control file
04/24 11:45:08.588 WARNI|             suite:0927| /usr/local/autotest/server/cros/dynamic_suite/suite.py:927: UserWarning: Calling this method from Suite is deprecated
  warnings.warn('Calling this method from Suite is deprecated')

04/24 11:45:08.590 INFO |        dev_server:1094| Staging artifacts on devserver http://100.115.219.129:8082: build=enguarde-release/R60-9483.0.0, artifacts=['control_files', 'test_suites'], files=, archive_url=gs://chromeos-image-archive/enguarde-release/R60-9483.0.0
04/24 11:45:09.845 WARNI|             retry:0238| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.129 'curl "http://100.115.219.129:8082/stage?artifacts=control_files,test_suites&files=&async=True&archive_url=gs://chromeos-image-archive/enguarde-release/R60-9483.0.0"'> failed, rc=255, Command returned non-zero exit status
* Command: 
    ssh 100.115.219.129 'curl "http://100.115.219.129:8082/stage?artifacts=co
    ntrol_files,test_suites&files=&async=True&archive_url=gs://chromeos-image-
    archive/enguarde-release/R60-9483.0.0"'
Exit status: 255
Duration: 1.18088102341

stderr:
ssh_exchange_identification: read: Connection reset by peer)
04/24 11:45:09.847 WARNI|             retry:0193| Retrying in 3.465777 seconds...
04/24 11:45:13.527 WARNI|             retry:0238| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.129 'curl "http://100.115.219.129:8082/stage?artifacts=control_files,test_suites&files=&async=True&archive_url=gs://chromeos-image-archive/enguarde-release/R60-9483.0.0"'> failed, rc=255, Command returned non-zero exit status
* Command: 
    ssh 100.115.219.129 'curl "http://100.115.219.129:8082/stage?artifacts=co
    ntrol_files,test_suites&files=&async=True&archive_url=gs://chromeos-image-
    archive/enguarde-release/R60-9483.0.0"'
Exit status: 255
Duration: 0.157376050949

stderr:
ssh_exchange_identification: Connection closed by remote host)
04/24 11:45:13.528 WARNI|             retry:0193| Retrying in 3.067414 seconds...
04/24 11:45:25.118 WARNI|        dev_server:1004| CmdError happens in is_stage: CmdError('ssh 100.115.219.129 \'curl "http://100.115.219.129:8082/is_staged?artifacts=control_files,test_suites&files=&archive_url=gs://chromeos-image-archive/enguarde-release/R60-9483.0.0"\'', * Command: 
    ssh 100.115.219.129 'curl "http://100.115.219.129:8082/is_staged?artifact
    s=control_files,test_suites&files=&archive_url=gs://chromeos-image-archive
    /enguarde-release/R60-9483.0.0"'
Exit status: 255
Duration: 0.167067050934

stderr:
ssh_exchange_identification: Connection closed by remote host, 'Command returned non-zero exit status'), will retry
04/24 11:45:50.060 INFO |        dev_server:1112| Finished staging artifacts: build=enguarde-release/R60-9483.0.0, artifacts=['control_files', 'test_suites'], files=, archive_url=gs://chromeos-image-archive/enguarde-release/R60-9483.0.0
04/24 11:45:50.319 WARNI|             retry:0238| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.129 'curl "http://100.115.219.129:8082/list_suite_controls?suite_name=power_daily&build=enguarde-release/R60-9483.0.0"'> failed, rc=255, Command returned non-zero exit status
* Command: 
    ssh 100.115.219.129 'curl "http://100.115.219.129:8082/list_suite_control
    s?suite_name=power_daily&build=enguarde-release/R60-9483.0.0"'
Exit status: 255
Duration: 0.160670042038

stderr:
ssh_exchange_identification: Connection closed by remote host)
04/24 11:45:50.320 WARNI|             retry:0193| Retrying in 2.791393 seconds...

Tried looking at a few other aborted jobs but all there logs are missing.

Comment 2 by aut...@google.com, Jun 2 2017

Sorry for the delay, is this still an issue?
Looks like things run from time to time but there's still alot of 'missing tests' identified on wmatrix for M60, M59,

https://wmatrix.googleplex.com/power_daily?releases=60&days_back=30
https://wmatrix.googleplex.com/power_daily?releases=59&days_back=30
Screen Shot 2017-06-02 at 6.05.57 PM.png
494 KB View Download
Screen Shot 2017-06-02 at 6.06.35 PM.png
567 KB View Download

Comment 4 by nxia@chromium.org, Jun 7 2017

Cc: nxia@chromium.org dgarr...@chromium.org
+ current deputies.

Do we have more examples of the failures? Were they trying to connect 100.115.219.129 when the failures happened?
Cc: pprabhu@chromium.org
Owner: tbroch@chromium.org
+this week's deputies.

-> tbroch still an issue?
Summary: Weekly runs of control.power_daily contain lots of "MISSING_TEST" (was: Weekly runs of control.power_daily on release branches are all aborted)
I have not spent much time looking at wmatrix generally, so tell me if I'm just reading it wrong.

- Not all builds have results. This is OK. The bug claims that these tests are weekly (I'm assuming suite_scheduler runs them)
- The day on which the results do exist drifts a bit. This may not be ideal, but it's OK as far as this bug is concerned and as long as we get some run roughly weekly.

- The runs that do exist have tests with the result MISSING_TEST. This is _not OK_.

Updated the bug to refer to just this last problem. Will take a look at just that. (Otherwise the bug is too generic).
Also, tbroch@: What's the reason for Restrict-View-Google here?
Please take a second look and remove the label if nothing on the bug is private.
Focussing on one instance: https://wmatrix.googleplex.com/failures/power_daily?platforms=cyan&tests=power_Consumption&days_back=30&releases=60

This has both MISSING_TEST and ERROR states, so we know that the MISSING_TEST is an anomaly, not just that the board doesn't support that test.
Owner: pprabhu@chromium.org
Status: Started (was: Untriaged)
Another useful view: https://wmatrix.googleplex.com/power_daily?days_back=30&suites=power_daily&tests=power_Consumption

This test hardly ever passes. But let's make the MISSING_TEST reason disappear as the goal of this bug.
So the problem is at the suite level. Whenever a test is labelled MISSING_TEST, the whole suite is missing: https://wmatrix.googleplex.com/failures/power_daily?platforms=cyan&days_back=30&releases=60&suites=power_daily


Also, I couldn't even find the job corresponding to one of the suites that is missing: "yan-release/R60-9529.0.0/power_daily/Power daily tests"

So, the suite wasn't aborted, it was never created.
Labels: -Restrict-View-Google
Removed RVG.  Thanks for having a look.

Status: Archived (was: Started)

Sign in to add a comment