New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 766342 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 766196



Sign in to add a comment

autotest failed to stage artifact to DUT | HTTP OK not accompanied by 'Success'

Project Member Reported by akes...@chromium.org, Sep 18 2017

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/edgar-paladin/builds/814

My guess is this is another symptom of devserver network load problem.
 
Blockedon: 766196
Cc: davidri...@chromium.org jrbarnette@chromium.org
Could this be from jrbarnette's load limiting changes?

When I looked at the logs, the devserver had already staged the image long before the failure was being reported.  I don't see the failure in the devserver logs.

09/18 14:07:36.148 DEBUG|             utils:0212| Running 'ssh 100.115.219.132 'curl "http://100.115.219.132:8082/stage?artifacts=autotest_packages&files=&async=True&archive_url=gs://chromeos-image-archive/edgar-paladin/R63-9954.0.0-rc1"''
09/18 14:07:36.821 DEBUG|        dev_server:1067| response for RPC: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n<hr>\n<address>Apache/2.4.7 (Ubuntu) Server at 100.115.219.132 Port 8082</address>\n</body></html>\n'

Images previously staged:
[18/Sep/2017:13:52:13] AUTOTEST_TARBALL ArtifactStaged() -> yes, autotest_packages.tar->/home/chromeos-test/images/edgar-paladin/R63-9954.0.0-rc1 is staged.
[18/Sep/2017:13:52:13] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/edgar-paladin/R63-9954.0.0-rc1 is already staged.
> Could this be from jrbarnette's load limiting changes?

No.  The workqueue throttling doesn't impact staging.  More importantly,
the feature is currently off; see  bug 760652 .

Cc: puneetster@chromium.org
Owner: xixuan@chromium.org
xixuan@ had some comments about this on crosoncall, maybe you can distill them to this bug?

Comment 5 by xixuan@chromium.org, Sep 19 2017

Status: Started (was: Untriaged)
The error happens at:
09/18 14:07:36.148 DEBUG|             utils:0212| Running 'ssh 100.115.219.132 'curl "http://100.115.219.132:8082/stage?artifacts=autotest_packages&files=&async=True&archive_url=gs://chromeos-image-archive/edgar-paladin/R63-9954.0.0-rc1"''
09/18 14:07:36.821 DEBUG|        dev_server:1067| response for RPC: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n<hr>\n<address>Apache/2.4.7 (Ubuntu) Server at 100.115.219.132 Port 8082</address>\n</body></html>\n'


The culprit devserver is: chromeos4-devserver4, here's its logs:
[18/Sep/2017:14:07:26] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('::', 8080)) shut down
[18/Sep/2017:14:07:26] ENGINE Stopped thread '_TimeoutMonitor'.
[18/Sep/2017:14:07:26] ENGINE Bus STOPPED
[18/Sep/2017:14:07:26] ENGINE Bus EXITING
[18/Sep/2017:14:07:26] ENGINE Bus EXITED
[18/Sep/2017:14:07:26] ENGINE Waiting for child threads to terminate...
[18/Sep/2017:14:07:26] ENGINE Caught signal SIGTERM.
[18/Sep/2017:14:07:26] ENGINE Bus STOPPING
[18/Sep/2017:14:07:26] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('::', 8080)) already shut down
[18/Sep/2017:14:07:26] ENGINE No thread running for None.
[18/Sep/2017:14:07:26] ENGINE Bus STOPPED
[18/Sep/2017:14:07:26] ENGINE Bus EXITING
[18/Sep/2017:14:07:26] ENGINE Bus EXITED
[18/Sep/2017:14:07:35] DEVSERVER Using cache directory /home/chromeos-test/images/cache
[18/Sep/2017:14:07:35] DEVSERVER Serving from /home/chromeos-test/images
[18/Sep/2017:14:07:36] XBUDDY Using shadow config file stored at /home/chromeos-test/chromiumos/src/platform/dev/shadow_xbuddy_config.ini
[18/Sep/2017:14:07:39] ENGINE Listening for SIGHUP.
[18/Sep/2017:14:07:39] ENGINE Listening for SIGTERM.
[18/Sep/2017:14:07:39] ENGINE Listening for SIGUSR1.
[18/Sep/2017:14:07:39] ENGINE Bus STARTING
[18/Sep/2017:14:07:39] ENGINE Started monitor thread '_TimeoutMonitor'.

In the same period, that devserver is restarting, so returns 503 error for any calls, which is not retryable.

For devserver, it's restarting == it's not working, we can't differentiate the 2 cases. A quick fix may be we wait and retry such errors for a very short time (1-2 minutes) to cover this case. I will prepare a CL for this. Any other suggestions? 
Labels: -Pri-2 Chase-Pending Pri-1
xixuan@ has a suggested fix above.
Labels: -Chase-Pending Chase
Project Member

Comment 8 by bugdroid1@chromium.org, Sep 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e3dfe00d35c5d1b4ac48a23b05539fc3347e82b3

commit e3dfe00d35c5d1b4ac48a23b05539fc3347e82b3
Author: Xixuan Wu <xixuan@chromium.org>
Date: Wed Sep 27 03:44:02 2017

autotest: Add retry if devserver call fails due to down service.

It's possible that when devserver service is restarting, it temporarily
doesn't work for a very short time window, but may cause HWTest errors.
This CL adds retry for this case. If after a given time period (60s) the
service is still down, we will declare that this devserver is not serving
and won't retry.

BUG= chromium:766342 
TEST=Ran unittest.

Change-Id: I66ed4f138ee89909797b889458ddb490dd5aa304
Reviewed-on: https://chromium-review.googlesource.com/685260
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>

[modify] https://crrev.com/e3dfe00d35c5d1b4ac48a23b05539fc3347e82b3/client/common_lib/utils.py
[modify] https://crrev.com/e3dfe00d35c5d1b4ac48a23b05539fc3347e82b3/client/common_lib/cros/dev_server.py
[modify] https://crrev.com/e3dfe00d35c5d1b4ac48a23b05539fc3347e82b3/client/common_lib/cros/dev_server_unittest.py

Status: Fixed (was: Started)

Sign in to add a comment