Swarming tasks unable to reach isolateserver.appspot.com after all tests pass (net_unittests) |
||
Issue descriptionThere are two bots that have problems with their swarming execution: https://build.chromium.org/p/chromium.memory/builders/Linux%20ASan%20LSan%20Tests%20(1) https://build.chromium.org/p/chromium.linux/builders/Linux%20Tests%20(dbg)(1)(32) It's only affecting net_unittests which makes me believe something in the test is causing a crash somewhere which propagates into hard-to-read "some shards did not complete" errors. By comparing the blamelists on the two bots I end up with: https://chromium.googlesource.com/chromium/src/+log/cb7d55e9e2424f2676c5a4656548cdda69443fb2%5E..0646cce6b4397895614b152f1a1547e2075e40ea?pretty=fuller as a common blamelist, but nothing stands out here (since I'm unable to find where the test actually fails/crashes). If I take a closer look at one failure: https://build.chromium.org/p/chromium.memory/builders/Linux%20ASan%20LSan%20Tests%20%281%29/builds/25123 has 4 missing shards. But when I look at those shards all tests pass in each one of them. At the end of the run, there's an error like this: SUCCESS: all tests passed. Tests took 1065 seconds. Additional test environment: ASAN_OPTIONS=symbolize=1 external_symbolizer_path=/tmp/runNZyNPJ/third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer detect_leaks=1 CHROME_DEVEL_SANDBOX=/opt/chromium/chrome_sandbox G_SLICE=always-malloc LANG=en_US.UTF-8 LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/debug: LSAN_OPTIONS= NSS_DISABLE_ARENA_FREE_LIST=1 NSS_DISABLE_UNLOAD=1 Command: ../out/Release/net_unittests --brave-new-test-launcher --test-launcher-bot-mode --test-launcher-print-test-stdio=always --test-launcher-batch-limit=1 --test-launcher-summary-output=/tmp/outhmFPuu/output.json --no-sandbox 24110 2016-04-12 02:19:53.051 E: Unable to open given url, https://isolateserver.appspot.com/_ah/api/isolateservice/v1/preupload, after 30 attempts. HTTPSConnectionPool(host='isolateserver.appspot.com', port=443): Max retries exceeded with url: /_ah/api/isolateservice/v1/preupload (Caused by NewConnectionError('<third_party.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x3872750>: Failed to establish a new connection: [Errno 101] Network is unreachable',)) 24110 2016-04-12 02:19:53.064 E: Leaking out_dir /tmp/outhmFPuu: Failed to execute preupload query Traceback (most recent call last): File "/b/swarm_slave/swarming_bot.1.zip/client/run_isolated.py", line 402, in map_and_run storage, out_dir, leak_temp_dir) File "/b/swarm_slave/swarming_bot.1.zip/client/run_isolated.py", line 248, in delete_and_upload storage, [out_dir], None) File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 2050, in archive_files_to_storage uploaded = storage.upload_items(items_to_upload) File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 449, in upload_items for missing_item, push_state in self.get_missing_items(items): File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 628, in get_missing_items for missing_item, push_state in channel.pull().iteritems(): File "/b/swarm_slave/swarming_bot.1.zip/utils/threading_utils.py", line 377, in _task_executer result = func(*args, **kwargs) File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 618, in contains return self._storage_api.contains(batch) File "/b/swarm_slave/swarming_bot.1.zip/client/isolateserver.py", line 1112, in contains 'Failed to execute preupload query') MappingError: Failed to execute preupload query
,
Apr 12 2016
https://status.cloud.google.com/incident/compute/16007 is the cause. Then issue 602573 caused cascading failure.
,
Apr 26 2016
|
||
►
Sign in to add a comment |
||
Comment 1 by kjellander@chromium.org
, Apr 12 2016