New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 654337 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

timed out crash_collecting ap_controller causing suite to time out

Project Member Reported by kevcheng@chromium.org, Oct 10 2016

Issue description

Build: https://uberchromegw.corp.google.com/i/chromeos/builders/whirlwind-paladin/builds/5988

I'm not quite sure what the issue might be, it's either an issue of getting crash collection info for ap_controller or repeated devserver calls to fail (the below snippet repeats itself for 6 hours).

10/09 12:50:55.605 INFO | site_crashcollect:0094| Generating stack trace using devserver for /usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_controller.20160928.110232.5363.dmp
10/09 12:51:55.607 ERROR|        dev_server:0305| Devserver call failed: "http://172.17.40.24:8082/check_health?", timeout: 60 seconds, Error: Call is timed out.
10/09 12:52:55.608 ERROR|        dev_server:0305| Devserver call failed: "http://172.17.40.28:8082/check_health?", timeout: 60 seconds, Error: Call is timed out.
10/09 12:53:55.609 ERROR|        dev_server:0305| Devserver call failed: "http://172.17.40.17:8082/check_health?", timeout: 60 seconds, Error: Call is timed out.
10/09 12:53:57.220 DEBUG|             retry:0155| Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0)
10/09 12:53:57.221 INFO |    connectionpool:0188| Starting new HTTP connection (1): 100.115.219.130
10/09 12:53:57.427 DEBUG|    connectionpool:0362| "POST /symbolicate_dump?archive_url=gs://chromeos-image-archive/whirlwind-paladin/R56-8882.0.0-rc1 HTTP/1.1" 500 1626
10/09 12:53:57.458 INFO | site_crashcollect:0107| Failed to generate stack trace on devserver for dump /usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_controller.20160928.110232.5363.dmp:
DevServerException('\n\n\n    \n    500 Internal Server Error\n    \n    #powered_by {\n        margin-top: 20px;\n        border-top: 2px solid black;\n        font-style: italic;\n    }\n\n    #traceback {\n        color: red;\n    }\n    \n\n    \n        500 Internal Server Error\n        The server encountered an unexpected condition which prevented it from fulfilling the request.\n        Traceback (most recent call last):\n  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond\n    response.body = self.handler()\n  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 188, in __call__\n    self.body = self.oldhandler(*args, **kwargs)\n  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 34, in __call__\n    return self.callable(*self.args, **self.kwargs)\n  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 1105, in symbolicate_dump\n    stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__\n    errread, errwrite)\n  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child\n    raise child_exception\nOSError: [Errno 2] No such file or directory\n\n    \n    Powered by CherryPy 3.2.2\n    \n    \n\n',)
10/09 12:53:57.458 WARNI| site_crashcollect:0111| Failed to generate stack trace for /usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_controller.20160928.110232.5363.dmp (see info logs)
10/09 12:53:57.458 INFO | site_crashcollect:0083| Trying to generate stack trace locally for /usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_group_manager.20160928.110253.5496.dmp
10/09 12:53:57.458 INFO | site_crashcollect:0036| symbol_dir: /usr/local/autotest/server/../../../lib/debug
10/09 12:53:57.458 DEBUG|        base_utils:0185| Running 'minidump_stackwalk "/usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_group_manager.20160928.110253.5496.dmp" "/usr/local/autotest/server/../../../lib/debug" > "/usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_group_manager.20160928.110253.5496.dmp.txt"'
10/09 12:53:57.499 INFO | site_crashcollect:0090| Failed to generate stack trace locally for dump /usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_group_manager.20160928.110253.5496.dmp (rc=127):
CmdError('minidump_stackwalk "/usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_group_manager.20160928.110253.5496.dmp" "/usr/local/autotest/server/../../../lib/debug" > "/usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap_group_manager.20160928.110253.5496.dmp.txt"', * Command: 
    minidump_stackwalk "/usr/local/autotest/results/80070198-chromeos-
    test/chromeos4-row10-jetstream-host7/jetstream_ApiServerAttestation/sysinf
    o/var/spool/crash/ap_group_manager.20160928.110253.5496.dmp"
    "/usr/local/autotest/server/../../../lib/debug" >
    "/usr/local/autotest/results/80070198-chromeos-test/chromeos4-row10
    -jetstream-host7/jetstream_ApiServerAttestation/sysinfo/var/spool/crash/ap
    _group_manager.20160928.110253.5496.dmp.txt"
Exit status: 127
Duration: 0.00235891342163

stderr:
/bin/bash: minidump_stackwalk: command not found, 'Command returned non-zero exit status')
 

Comment 1 by dshi@chromium.org, Oct 10 2016

Cc: akes...@chromium.org
Labels: -Pri-3 Pri-1
There are couple issues:

1. These 3 devservers might no longer exist in the lab, check with lab wrangler and maybe remove them from shadow config.
http://172.17.40.24:8082, http://172.17.40.28:8082, http://172.17.40.17:8082

2. The minidump_stackwalk issue should be fixed in this cl:
https://chrome-internal-review.googlesource.com/#/c/294816
But it's failing in chromeos4-crash1 because
 when /chromeos-crash.*/;            'devserver'
in puppet/modules/facter/server_type.rb
We might as well remove that crash server, tell syslab that they can recycle that server to be a devserver.

3. many (if not all crash servers are still running old code), I requested a push to crash server last week, seems didn't happen. Please do a push to crash servers.

Comment 2 by dshi@chromium.org, Oct 10 2016

Cc: -kevcheng@chromium.org
Owner: kevcheng@chromium.org
Cc: haoweiw@chromium.org
Status: Assigned (was: Untriaged)
Hey Haowei,

Can you comment on the devservers in #1?


I'll start a push to crash servers.
Yes, those devservers are moved or migrate into new subnets. 
push to crash servers done, will remove http://172.17.40.24:8082, http://172.17.40.28:8082, http://172.17.40.17:8082 from crash_server list.
and also chromeos4-crash1
Project Member

Comment 7 by bugdroid1@chromium.org, Oct 11 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/edd70ee4a38d4ffe5495826ff83cdc9fc5eec04b

commit edd70ee4a38d4ffe5495826ff83cdc9fc5eec04b
Author: Kevin Cheng <kevcheng@chromium.org>
Date: Tue Oct 11 21:06:00 2016

Labels: akeshet-pending-downgrade
ChromeOS Infra P1 Bugscrub.

P1 Bugs in this component should be important enough to get weekly status updates.

Is this already fixed?  -> Fixed
Is this no longer relevant? -> Archived or WontFix
Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority.
Is this a Feature Request rather than a bug? Type -> Feature
Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign.
Does this bug have the wrong owner? -> reassign.

Bugs that remain in this state next week will be downgraded to P2.

Comment 9 by sosa@chromium.org, Jul 18 2017

Status: Archived (was: Assigned)

Sign in to add a comment