New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 721436 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 708500
issue 719307



Sign in to add a comment

peppy-chrome-pfq failure "FAIL: No answer to ping from chromeos4-row6-rack13-host7, completed successfully"

Project Member Reported by jamescook@chromium.org, May 11 2017

Issue description

Chrome PFQ failed to uprev last night with this failure:

https://uberchromegw.corp.google.com/i/chromeos/builders/peppy-chrome-pfq/builds/3474

Chrome ToT informational also failed:

https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/tricky-tot-chrome-pfq-informational/builds/4380

Autofiled bugs are:
Issue 719307 ⚐	[sanity] provision Failure on wizpig-release/R60-9531.0.0
Issue 708500 ⚐	[bvt-inline] provision Failure on peppy-chrome-pfq/R59-9432.0.0-rc2

I'm not sure from the logs what is going on. I see this in the autoserv logs:

05/11 02:17:46.278 INFO |        dev_server:1771| Received response from devserver for cros_au call: '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html>\n<head>\n    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>\n    <title>500 Internal Server Error</title>\n    <style type="text/css">\n    #powered_by {\n        margin-top: 20px;\n        border-top: 2px solid black;\n        font-style: italic;\n    }\n\n    #traceback {\n        color: red;\n    }\n    </style>\n</head>\n    <body>\n        <h2>500 Internal Server Error</h2>\n        <p>The server encountered an unexpected condition which prevented it from fulfilling the request.</p>\n        <pre id="traceback">Traceback (most recent call last):\n  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond\n    response.body = self.handler()\n  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 188, in __call__\n    self.body = self.oldhandler(*args, **kwargs)\n  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 34, in __call__\n    return self.callable(*self.args, **self.kwargs)\n  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 926, in cros_au\n    archive_url=release_archive_url)\n  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 877, in stage\n    dl.Download(factory, async=async)\n  File "/home/chromeos-test/chromiumos/src/platform/dev/downloader.py", line 196, in Download\n    self._DownloadArtifactsSerially(required_artifacts, no_wait=True)\n  File "/home/chromeos-test/chromiumos/src/platform/dev/downloader.py", line 235, in _DownloadArtifactsSerially\n    artifact.Process(self, no_wait)\n  File "/home/chromeos-test/chromiumos/src/platform/dev/build_artifact.py", line 337, in Process\n    self.name, self.is_regex_name, timeout)\n  File "/home/chromeos-test/chromiumos/src/platform/dev/downloader.py", line 336, in Wait\n    (name, self._archive_url))\nArtifactDownloadError: Could not find stateful.tgz in Google Storage at gs://chromeos-releases/stable-channel/peppy/9540.0.0\n</pre>\n    <div id="powered_by">\n    <span>Powered by <a href="http://www.cherrypy.org">CherryPy 3.2.2</a></span>\n    </div>\n    </body>\n</html>\n'
05/11 02:27:26.042 INFO |      abstract_ssh:0795| Master ssh connection to chromeos4-row6-rack13-host7 is down.
05/11 02:27:26.042 INFO |      abstract_ssh:0809| Starting master ssh connection '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_4lbf2tssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row6-rack13-host7'
05/11 02:27:31.141 INFO |      abstract_ssh:0824| Timed out waiting for master-ssh connection to be established.
05/11 02:28:34.368 ERROR|             utils:0297| [stderr] ssh: connect to host chromeos4-row6-rack13-host7 port 22: Connection timed out

Note the 10-minute gap.

Maybe related to  Issue 696668  ⚐	ssh connection failure after post-stateful-update reboot ?

P0 because this broke Chrome uprev and is happening on informational builders on a different board.

ayatane, can you get this to the right person?

 

Comment 1 by xixuan@chromium.org, May 11 2017

Labels: -Pri-0 Pri-1
Owner: jwer...@chromium.org
Decrease the priority to 1, since 1, the two failures are different, 2, the peppy failure is a flake. 

peppy failure has nothing to do with  Issue 696668 , the DUT comes to offline before provision is conducted. Looks like a flaky DUT problem.

tricky-tot-chrome-pfq failure is also different: after rootfs update, the DUT cannot reboot. Sounds like a build problem, let sheriff to investigate why the DUT cannot reboot.

Comment 2 by xixuan@chromium.org, May 11 2017

Cc: ayatane@chromium.org
xixuan, is there a bug filed for the flaky DUT? I don't want it to block another Chrome PFQ run.

(FYI I started at P0 because according to our gardener documentation, issues that cause the chrome PFQ failure start at P0, https://docs.google.com/document/d/13zse2T7S-rMaFEvd0HhFvF95GWJWbuwY-QfeWUlOwbg/edit#heading=h.kvym7mkd16wt )

Comment 4 by xixuan@chromium.org, May 11 2017

Re #3, wow this DUT is working fine, so no bug filed for it, and no need to lock it.

Here's the "maybe-useful" logs of these two rebooted DUTs for sheriffs to further debug:

peppy DUT: chromeos4-row6-rack13-host7

2017-05-11 02:38:05  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row6-rack13-host7/681501-repair/         (successful repair to bring it back)
2017-05-11 02:08:22  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row6-rack13-host7/681372-provision/      (failed provision due to offline DUT)

tricky DUT: chromeos4-row2-rack3-host19

2017-05-11 07:02:18  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row2-rack3-host19/1214865-repair/        (successful repair to bring it back)
2017-05-11 06:17:37  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row2-rack3-host19/1214608-provision/     (failed provision due to offline DUT)
Labels: akeshet-pending-downgrade
ChromeOS Infra P1 Bugscrub.

P1 Bugs in this component should be important enough to get weekly status updates.

Is this already fixed?  -> Fixed
Is this no longer relevant? -> Archived or WontFix
Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority.
Is this a Feature Request rather than a bug? Type -> Feature
Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign.
Does this bug have the wrong owner? -> reassign.

Bugs that remain in this state next week will be downgraded to P2.
Labels: -akeshet-pending-downgrade Pri-2
ChromeOS Infra P1 Bugscrub.

Issue untouched in a week after previous message. Downgrading to P2.

Comment 7 by derat@chromium.org, Jan 15 2018

Status: WontFix (was: Assigned)
Closing, as I'm doubtful it's even possible to figure out what happened at this point (the links above are 404s). :-/

Sign in to add a comment