New issue
Advanced search Search tips

Issue 602573 link

Starred by 4 users

Issue metadata

Status: Archived
Owner: ----
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

run_isolated upload failure didn't mark the task as internal failure

Project Member Reported by kbr@chromium.org, Apr 12 2016

Issue description

The linux_chromium_asan_rel_ng trybot is failing many tryjobs while running net_unittests due to the following failure after shards complete successfully:

16792 2016-04-12 02:17:09.658 E: Unable to open given url, https://isolateserver.appspot.com/_ah/api/isolateservice/v1/preupload, after 30 attempts.
HTTPSConnectionPool(host='isolateserver.appspot.com', port=443): Max retries exceeded with url: /_ah/api/isolateservice/v1/preupload (Caused by NewConnectionError('<third_party.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x36669d0>: Failed to establish a new connection: [Errno 101] Network is unreachable',))
16792 2016-04-12 02:17:09.665 E: Leaking out_dir /tmp/outdSTiKa: Failed to execute preupload query
Traceback (most recent call last):
  File "/b/swarm_slave/swarming_bot.1.zip/client/run_isolated.py", line 402, in map_and_run
    storage, out_dir, leak_temp_dir)
  File "/b/swarm_slave/swarming_bot.1.zip/client/run_isolated.py", line 248, in delete_and_upload
    storage, [out_dir], None)
  File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 2050, in archive_files_to_storage
    uploaded = storage.upload_items(items_to_upload)
  File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 449, in upload_items
    for missing_item, push_state in self.get_missing_items(items):
  File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 628, in get_missing_items
    for missing_item, push_state in channel.pull().iteritems():
  File "/b/swarm_slave/swarming_bot.1.zip/utils/threading_utils.py", line 377, in _task_executer
    result = func(*args, **kwargs)
  File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 618, in contains
    return self._storage_api.contains(batch)
  File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 1112, in contains
    'Failed to execute preupload query')
MappingError: Failed to execute preupload query


Example failures of this nature:

https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144311
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144306
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144305
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144303
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144297
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144295
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144292
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/144290

https://chromium-swarm.appspot.com/user/task/2e21d81d25e9be10
https://chromium-swarm.appspot.com/user/task/2e21d81f6de79310
https://chromium-swarm.appspot.com/user/task/2e21d822bd1b8e10
https://chromium-swarm.appspot.com/user/task/2e21d823ffb67810

Marking P0. Requires immediate attention as it's blocking CQ jobs.

 
Labels: -Pri-0 Pri-1
I believe this was due to GCE outage: https://status.cloud.google.com/incident/compute/16007
the latest build I can find with that error is before midnight PST, that is >3 hours ago. So, no longer Pri0.

maruel@ Is this exit code 0 for the tests, of for the swarming process itself? if latter, isn't that a bug?

8856 2016-04-12 02:16:20.204 E: Unable to open given url, https://isolateserver.appspot.com/_ah/api/isolateservice/v1/preupload, after 30 attempts.
HTTPSConnectionPool(host='isolateserver.appspot.com', port=443): Max retries exceeded with url: /_ah/api/isolateservice/v1/preupload (Caused by NewConnectionError('<third_party.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x4923790>: Failed to establish a new connection: [Errno 101] Network is unreachable',))
8856 2016-04-12 02:16:20.222 E: Leaking out_dir /tmp/outx1ZU84: Failed to execute preupload query
Traceback (most recent call last):
  ----< cut, seee above >----
MappingError: Failed to execute preupload query
+----------------------------------------------------------------------------+
| End of shard 3  Pending: N/A  Duration: 1557.0s  Bot: swarm241-c4  Exit: 0 |
+----------------------------------------------------------------------------+

Comment 2 by mar...@chromium.org, Apr 12 2016

Labels: -Infra-Troopers
Status: Available (was: Untriaged)
Summary: run_isolated upload failure didn't mark the task as internal failure (was: linux_chromium_asan_rel_ng failing net_unittests due to Swarming internal failure)
The fact that upload failure wasn't marked as internal failure is a bug.

-troopers since it was due to GCE incident.

Comment 3 by aga...@chromium.org, Apr 26 2016

Components: Infra>Platform>Swarming
Labels: -Infra-Swarming
Project Member

Comment 4 by sheriffbot@chromium.org, Aug 9 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Archived (was: Untriaged)
I'm pretty sure I fixed this specific problem a bit later.

Sign in to add a comment