PFQ time out due to "Could not resolve host: storage.googleapis.com" |
|||||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of yuhsuan@google.com kevin-arcnext-paladin:1260, kevin-paladin:4400 timed out Builders failed on: - kevin-arcnext-paladin: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936996884616956256 19:48:26: INFO: Re-run swarming_cmd to avoid buildbot salency check. 19:48:26: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpAScgIR/tmp1VEna9/temp_summary.json --print-status-updates --timeout 9000 --raw-cmd --task-name kevin-arcnext-paladin/R70-11010.0.0-rc2-bvt-arc --dimension os Ubuntu-14.04 --dimension pool default --io-timeout 9000 --hard-timeout 9000 --expiration 1200 '--tags=priority:CQ' '--tags=suite:bvt-arc' '--tags=build:kevin-arcnext-paladin/R70-11010.0.0-rc2' '--tags=task_name:kevin-arcnext-paladin/R70-11010.0.0-rc2-bvt-arc' '--tags=board:kevin-arcnext' -- /usr/local/autotest/site_utils/run_suite.py --build kevin-arcnext-paladin/R70-11010.0.0-rc2 --board kevin --suite_name bvt-arc --pool cq --file_bugs False --priority CQ --timeout_mins 90 --retry True --max_retries 5 --minimum_duts 4 --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 90436842L, 'cidb_build_id': 2884616, 'datastore_parent_key': ('Build', 2884616, 'BuildStage', 90436842L)}" --test_args "{'fast': 'True'}" -m 231392899 20:32:39: INFO: Refreshing due to a 401 (attempt 1/2) 20:32:39: INFO: Refreshing access_token 20:38:29: INFO: Refreshing due to a 401 (attempt 1/2) 20:38:29: INFO: Refreshing access_token 20:42:15: INFO: Refreshing due to a 401 (attempt 1/2) 20:42:15: INFO: Refreshing access_token 20:52:10: ERROR: Timeout occurred- waited 15236 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time.
,
Aug 28
Looks like a dup of issue 874308
,
Aug 28
,
Aug 28
"Refreshing due to a 401 " is a red herring. There is actually a failure. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936978623559055952 ssp_logs shows: 08/27 20:31:12.002 DEBUG| container:0344| Command <sudo lxc-attach -P /usr/local/autotest/containers -n test_231392919_1535426243_163245 -- bash -c "curl --head https://storage.googleapis.com/abci-ssp/autotest-containers/base_09.tar.xz"> failed, rc=6, Command returned non-zero exit status * Command: sudo lxc-attach -P /usr/local/autotest/containers -n test_231392919_1535426243_163245 -- bash -c "curl --head https://storage.googleapis.com/abci-ssp/autotest- containers/base_09.tar.xz" Exit status: 6 Duration: 19.8597819805 stderr: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 curl: (6) Could not resolve host: storage.googleapis.com jkop@, could you check it out and help to triage this? Thanks.
,
Aug 28
,
Aug 30
Unmerge from 878403 This now also happens for caroline-arcnext-chrome-pfq e.g https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936803581058848320
,
Aug 30
There are multiple different failures here:
* The failure in the description looks like bug 874308.
* Concurrent with the symptom of bug 874308, there are also multiple
test failures in the underlying suite.
* The failure in #c4 is a new failure and TTBOMK not reported elsewhere.
* The caroline failure is also bug 874308.
ATM, I'm inclined to say that this bug is the failure in #c4, but I'd like
to know whether that problem is still seen. My expectation is that that
sort of failure is transient, and won't be recurring.
,
Aug 30
Passing to the Chrome gardener to answer whether the problem in #c4 has been seen recently on any builder.
,
Aug 30
The caroline failure has the same symptom as #c4. The error is hidden in the ssp_logs of the test failures. If you click into "GE Suite Details", then click on the failed tests, say, "cheets_CTS_N.7.1", you get to here: https://stainless.corp.google.com/browse/chromeos-autotest-results/232140354-chromeos-test/ Then if you look at autoserv.DEBUG here: ├── ssp_logs │ └── debug │ ├── autoserv.DEBUG [58.4 kB] You will see the failure is actually caused by the name resolving failure. The problem is not that transient. When it happens, it will happen for several builds. E.g. carolin has failed for the last 4 builds. And I saw the same error for all builds listed here.
,
Aug 30
> The caroline failure has the same symptom as #c4. [ ... ]
The only caroline build cited is in #c7:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936803581058848320
That build ran this test suite:
http://cautotest-prod/afe/#tab_id=view_job&object_id=232212677
There are no test failures in that suite. The only failure in the
associated build is bug 874308.
> The error is hidden in the ssp_logs of the test failures. If you click
> into "GE Suite Details", then click on the failed tests, say,
> "cheets_CTS_N.7.1", you get to here:
> https://stainless.corp.google.com/browse/chromeos-autotest-results/232140354-chromeos-test/
That test result is from testing caroline-release/R70-11017.0.0, not
the caroline-arcnext-chrome-pfq at all. That's this build:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936804363040971296
The fact that this is happening on cros-full-0005.mtv makes me
suspect some side-effect of bug 874308, but I don't yet have the
evidence to show it.
,
Aug 30
> The fact that this is happening on cros-full-0005.mtv makes me > suspect some side-effect of bug 874308, but I don't yet have the > evidence to show it. <sigh> the proper bug to suspect is bug 878403. The general sickness on the affected shard may be having unexpected side-effects, such as networking glitches. Whatever is causing it, the problem is intermittent. I just tried the failing 'curl' and it passed. chromeos-test@cros-full-0005:/tmp$ curl --head https://storage.googleapis.com/abci-ssp/autotest-containers/base_09.tar.xz >/dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 432M 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
,
Aug 30
Thanks for the investigation. So merge back to issue 878403 ?
,
Aug 30
I've confirmed that three of the past four caroline-arcnext-chrome-pfq runs failed with the symptom at issue here. The fourth (the one with a different symptom) is the build cited in #c7.
,
Aug 30
> Thanks for the investigation. So merge back to issue 878403 ? I'm inclined to hold this open as a separate bug until cros-full-0005.mtv is more healthy. If the problem doesn't persist past that point, we can close this. If the problem _does_ persist, it'll be easier not to have the extra churn.
,
Aug 30
,
Sep 1
,
Sep 4
The affected builder has been green over the past couple days. Let's blame it on the shard slowness. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by yuhsuan@chromium.org
, Aug 28