chromium-swarm.appspot.com returning 503 -- BadValueError: state must be COMPLETED on deduped task |
||||
Issue description"[trigger] (with patch)" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 32 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyIQsSBUZsYWtlIhZbdHJpZ2dlcl0gKHdpdGggcGF0Y2gpDA. This flaky test/step was previously tracked in issue 604954 .
,
Aug 30 2016
,
Aug 30 2016
Found a row of HTTP 500s: https://screenshot.googleplex.com/3y1VvNQ5Kr6.png (https://pantheon.corp.google.com/logs/viewer?project=chromium-swarm&minLogLevel=0&expandAll=false&resource=appengine.googleapis.com&logName=%2Fprojects%2Fchromium-swarm%2Flogs%2Fappengine.googleapis.com%252Frequest_log&filters=status:500&filters=text:%2F_ah%2Fspi%2FSwarmingTasksService.new&advancedFilter=) Request: <NewTaskRequest (/base/data/home/apps/s~chromium-swarm/2300-1e18b15.395200628328903120/handlers_endpoints.py:296) expiration_secs: 3600 name: u'events_unittests (with patch)/Ubuntu-12.04/d0f6e5bbba/linux_chromium_rel_ng/289029' parent_task_id: u'' priority: 30 properties: <TaskProperties command: [] dimensions: [<StringPair key: u'cpu' value: u'x86-64'>, <StringPair key: u'gpu' value: u'none'>, <StringPair key: u'os' value: u'Ubuntu-12.04'>, <StringPair key: u'pool' value: u'Chrome'>] env: [] execution_timeout_secs: 3600 extra_args: [u'--test-launcher-summary-output=${ISOLATED_OUTDIR}/output.json'] grace_period_secs: 30 idempotent: True inputs_ref: <FilesRef isolated: u'd0f6e5bbbac03262d2c142de3e3683b0cc3a0050' isolatedserver: u'https://isolateserver.appspot.com' namespace: u'default-gzip'> io_timeout_secs: 1200> tags: [u'buildername:linux_chromium_rel_ng', u'buildnumber:289029', u'data:d0f6e5bbbac03262d2c142de3e3683b0cc3a0050', u'master:tryserver.chromium.linux', u'name:events_unittests', u'os:Ubuntu-12.04', u'patch_project:chromium', u'project:chromium', u'purpose:ManualTS', u'purpose:pre-commit', u'rietveld:https://codereview.chromium.org/2260303002/#ps220001', u'slavename:slave643-c4', u'stepname:events_unittests (with patch) on Ubuntu-12.04'] user: u'shivanisha@chromium.org'> ---- Stack trace: Traceback (most recent call last): File "third_party/protorpc-1.0/protorpc/wsgi/service.py", line 181, in protorpc_service_app response = method(instance, request) File "gae_ts_mon/config.py", line 274, in decorated ret = fn(service, *args, **kwargs) File "components/auth/endpoints_support.py", line 202, in wrapper return method(service, *args, **kwargs) File "third_party/endpoints-1.0/endpoints/api_config.py", line 1331, in invoke_remote return remote_method(service_instance, request) File "third_party/protorpc-1.0/protorpc/remote.py", line 414, in invoke_remote_method response = method(service_instance, request) File "components/auth/endpoints_support.py", line 161, in wrapper return func(service, *args, **kwargs) File "components/auth/api.py", line 952, in wrapper return func(*args, **kwargs) File "handlers_endpoints.py", line 306, in new result_summary = task_scheduler.schedule_request(request) File "server/task_scheduler.py", line 451, in schedule_request request, get_new_keys, extra=[task, result_summary]) File "components/utils.py", line 633, in sync return async_fn(*args, **kwargs).get_result() File "appengine/ext/ndb/tasklets.py", line 383, in get_result self.check_success() File "appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb) File "components/datastore_utils/monotonic.py", line 116, in insert_async if (yield txn.transaction_async(run, retries=0)): File "appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb) File "components/datastore_utils/txn.py", line 46, in transaction_async result = yield ndb.transaction_async(callback, **ctx_options) File "appengine/ext/ndb/tasklets.py", line 430, in _help_tasklet_along value = gen.send(val) File "appengine/ext/ndb/context.py", line 1016, in transaction result = yield result File "appengine/ext/ndb/tasklets.py", line 430, in _help_tasklet_along value = gen.send(val) File "components/datastore_utils/monotonic.py", line 102, in run yield ndb.put_multi_async(entities) File "appengine/ext/ndb/model.py", line 3937, in put_multi_async return [entity.put_async(**ctx_options) for entity in entities] File "appengine/ext/ndb/model.py", line 3466, in _put_async self._pre_put_hook() File "server/task_result.py", line 599, in _pre_put_hook 'state must be COMPLETED on deduped task') BadValueError: state must be COMPLETED on deduped task
,
Aug 30 2016
Based on server logs, events_unittests with isolate d0f6e5bbbac03262d2c142de3e3683b0cc3a0050 is the only such "broken" task and it stopped being used after ~9:00 AM PST today (probably the binary has changed, and now Swarming dedups it from some other task, that's in some valid state). So the immediate danger is gone (Pri-1 -> Pri-2). But there's a bug somewhere that allowed the broken state to happen in the first place. Tentatively assigning to M-A to investigate.
,
Aug 31 2016
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/62dac3d6729d3389437fb2710e3191eb8834f3db commit 62dac3d6729d3389437fb2710e3191eb8834f3db Author: maruel <maruel@chromium.org> Date: Wed Aug 31 21:57:21 2016 Add more safety checks. A rootless index has no consistency guarantee. So ensure that the returned value is good. There has been at least one case on production where the index returned an object matching properties_hash != None but had failed. R=vadimsh@chromium.org BUG= chromium:642511 Review-Url: https://codereview.chromium.org/2298053002 [modify] https://crrev.com/62dac3d6729d3389437fb2710e3191eb8834f3db/appengine/swarming/server/task_result.py [modify] https://crrev.com/62dac3d6729d3389437fb2710e3191eb8834f3db/appengine/swarming/server/task_scheduler.py
,
Aug 31 2016
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/62dac3d6729d3389437fb2710e3191eb8834f3db commit 62dac3d6729d3389437fb2710e3191eb8834f3db Author: maruel <maruel@chromium.org> Date: Wed Aug 31 21:57:21 2016 Add more safety checks. A rootless index has no consistency guarantee. So ensure that the returned value is good. There has been at least one case on production where the index returned an object matching properties_hash != None but had failed. R=vadimsh@chromium.org BUG= chromium:642511 Review-Url: https://codereview.chromium.org/2298053002 [modify] https://crrev.com/62dac3d6729d3389437fb2710e3191eb8834f3db/appengine/swarming/server/task_result.py [modify] https://crrev.com/62dac3d6729d3389437fb2710e3191eb8834f3db/appengine/swarming/server/task_scheduler.py
,
Aug 31 2016
|
||||
►
Sign in to add a comment |
||||
Comment 1 by dgro...@chromium.org
, Aug 30 2016Summary: chromium-swarm.appspot.com returning 503 -- "[trigger] events_unittests (with patch)" is flaky (was: "[trigger] (with patch)" is flaky)