Issue metadata
Sign in to add a comment
|
Swarming job on Android completed successfully but collection of results timed out |
||||||||||||||||||||||||
Issue descriptionIn this tryjob: https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1243 which ran this Swarming job on the Nexus 5X pool: https://chromium-swarm.appspot.com/task?id=32e963e03b9b7510&refresh=10&show_raw=1 the job completed successfully, but per the logs from the recipe, the collection step timed out. There are a couple of instances of this on https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel?numbuilds=200 -- and, actually, these are essentially the last source of flakiness seen today per Issue 596622 . Are there enough logs for someone to see what happened to the collect step? ---------- python -u /b/c/b/android/src/tools/swarming_client/swarming.py collect --swarming https://chromium-swarm.appspot.com --decorate --print-status-updates --json /tmp/tmp5HILAV.json --task-output-dir /tmp/tmpGdGNrW in dir /b/c/b/android: @@@STEP_LINK@stdout-->stdio@https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.android%2Fandroid_optional_gpu_tests_rel%2F1243%2F%2B%2Frecipes%2Fsteps%2Fwebgl_conformance_tests__with_patch__on_Android%2F0%2Fstdout@@@ allow_subannotations: False base_name: webgl_conformance_tests (with patch) on Android cmd: ['python', '-u', '/b/c/b/android/src/tools/swarming_client/swarming.py', 'collect', '--swarming', 'https://chromium-swarm.appspot.com', '--decorate', '--print-status-updates', '--json', '/tmp/tmp5HILAV.json', '--task-output-dir', '/tmp/tmpGdGNrW'] cwd: /b/c/b/android env: {'GOMA_SERVICE_ACCOUNT_JSON_FILE': '/creds/service_accounts/service-account-goma-client.json', 'PATH': '/b/c/b/android/src/third_party/android_tools/sdk/platform-tools:/b/c/b/android/src/build/android:%(PATH)s'} infra_step: False name: webgl_conformance_tests (with patch) on Android nest_level: 0 ok_ret: frozenset([0]) step_test_data: <lambda>(...) trigger_specs: [] full environment: AWS_CREDENTIAL_FILE: /b/build/site_config/.boto BOTO_CONFIG: /b/build/site_config/.boto BUILDBOT_BLAMELIST: [u'geofflang@chromium.org'] BUILDBOT_BRANCH: BUILDBOT_BUILDBOTURL: https://build.chromium.org/p/tryserver.chromium.android/ BUILDBOT_BUILDERNAME: android_optional_gpu_tests_rel BUILDBOT_BUILDNUMBER: 1243 BUILDBOT_CLOBBER: BUILDBOT_GOT_REVISION: None BUILDBOT_MASTERNAME: tryserver.chromium.android BUILDBOT_REVISION: BUILDBOT_SCHEDULER: None BUILDBOT_SLAVENAME: slave1000-c4 CHROME_HEADLESS: 1 DISPLAY: :0.0 GIT_USER_AGENT: linux2 git/2.11.0 slave1000-c4.c.chromecompute.google.com.internal GOMA_SERVICE_ACCOUNT_JSON_FILE: /creds/service_accounts/service-account-goma-client.json HOME: /home/chrome-bot LANG: en_US.UTF-8 LOGDOG_STREAM_PREFIX: bb/tryserver.chromium.android/android_optional_gpu_tests_rel/1243 LOGDOG_STREAM_PROJECT: chromium LOGDOG_STREAM_SERVER_PATH: unix:/b/build/rr/tmpurkoVW/butler.sock PAGER: cat PATH: /b/c/b/android/src/third_party/android_tools/sdk/platform-tools:/b/c/b/android/src/build/android:/home/chrome-bot/slavebin:/b/depot_tools:/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin PWD: /b/build/slave/android/build PYTHONPATH: /b/rr/tmpeA5m8p/rw/checkout/scripts:/b/rr/tmpeA5m8p/rw/checkout/site_config:/b/rr/tmpeA5m8p/rw/checkout/third_party:/b/rr/tmpeA5m8p/rw/checkout/third_party/buildbot_8_4p1:/b/rr/tmpeA5m8p/rw/checkout/third_party/buildbot_slave_8_4:/b/rr/tmpeA5m8p/rw/checkout/third_party/coverage-3.7.1:/b/rr/tmpeA5m8p/rw/checkout/third_party/decorator_3_3_1:/b/rr/tmpeA5m8p/rw/checkout/third_party/google_api_python_client:/b/rr/tmpeA5m8p/rw/checkout/third_party/httplib2/python2:/b/rr/tmpeA5m8p/rw/checkout/third_party/infra_libs:/b/rr/tmpeA5m8p/rw/checkout/third_party/jinja2:/b/rr/tmpeA5m8p/rw/checkout/third_party/markupsafe:/b/rr/tmpeA5m8p/rw/checkout/third_party/mock-1.0.1:/b/rr/tmpeA5m8p/rw/checkout/third_party/oauth2client:/b/rr/tmpeA5m8p/rw/checkout/third_party/pyasn1:/b/rr/tmpeA5m8p/rw/checkout/third_party/pyasn1-modules:/b/rr/tmpeA5m8p/rw/checkout/third_party/python-rsa:/b/rr/tmpeA5m8p/rw/checkout/third_party/requests_2_10_0:/b/rr/tmpeA5m8p/rw/checkout/third_party/setuptools-0.6c11:/b/rr/tmpeA5m8p/rw/checkout/third_party/sqlalchemy_0_7_1:/b/rr/tmpeA5m8p/rw/checkout/third_party/sqlalchemy_migrate_0_7_1:/b/rr/tmpeA5m8p/rw/checkout/third_party/tempita_0_5:/b/rr/tmpeA5m8p/rw/checkout/third_party/twisted_10_2:/b/rr/tmpeA5m8p/rw/checkout/third_party/uritemplate:/b/rr/tmpeA5m8p/rw/checkout/third_party/site-packages:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/recipe_modules/test_results/resources:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/third_party:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/third_party/requests:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/third_party/six:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/third_party/client-py:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/third_party/mock-1.0.1:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/third_party/astunparse:/b/rr/tmpeA5m8p/rw/checkout/scripts/slave/.recipe_deps/recipe_engine:/b/build/site_config:/b/build/scripts:/b/build/scripts/release:/b/build/third_party:/b/build/third_party/requests_2_10_0:/b/build_internal/site_config:/b/build_internal/symsrc:/b/build/slave:/b/build/third_party/buildbot_slave_8_4:/b/build/third_party/twisted_10_2:/b/build/slave/android/build:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload PYTHONUNBUFFERED: 1 TESTING_SLAVENAME: slave1000-c4 USER: chrome-bot USERNAME: chrome-bot Waiting for results from the following shards: 0 command timed out: 6900 seconds elapsed, attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=6900.007838
,
Dec 5 2016
M-A indicated on Issue 670866 that there might be a race condition in the Swarming server. Could this be another symptom of the same issue?
,
Dec 5 2016
No, compile alone took 1h17m. It waited for 26m but the task took 34m. So the global timeout is the problem here. Swarming correctly behaved.
,
Dec 5 2016
Ohhhhhh. Thanks, I see. I'll increase that timeout.
,
Dec 5 2016
I'm taking care of that in issue 665492
,
Dec 5 2016
Can I take that? I'm preparing a CL now.
,
Dec 5 2016
That bot is in the same pool as linux_android_rel_ng and android_n5x_swarming_rel (and only those bots). Is there a reason for that? If so, we should make sure we have capacity for longer-running builds from this bot; if not, we should move it into ccompute_optional_bots.
,
Dec 5 2016
We mimicked the configuration of android_n5x_swarming_rel when setting up this tryserver. It runs very few jobs so I don't think it will be hogging capacity. Let me know if it seems to be a problem. Going to duplicate this into Yuly's Issue 665492 .
,
Dec 5 2016
Ah. If that's the case, it's probably not a problem in practice, but it should still probably be using ccompute_optional_bots rather than the CQ pool.
,
Dec 5 2016
I'd like ccompute_optional_bots to be expanded out if we're going to put android_optional_gpu_tests_rel on it. It is used by the ANGLE team and enough Chromium developers that we don't want jobs waiting for a builder to pick them up (and right now there are only 4 machines in the ccompute_optional_bots pool).
,
Dec 5 2016
That seems reasonable enough. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by jbudorick@chromium.org
, Dec 5 2016