Dashboard upload failures on large tests split across shards |
||||
Issue descriptionDashboard uploads are failing with a 500 error when we upload results that were run on two different shards. Currently we just concatenate results from different shards together and upload them. Maybe the size is too big when we just concat? OBBS Mac 10.12 Perf: system_health.memory_desktop failing Link to most recent failing job: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/OBBS%20Mac%2010.12%20Perf/builds/7 Here are the perf results we are trying to upload: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf.fyi%2FOBBS_Mac_10.12_Perf%2F7%2F%2B%2Fsystem_health.memory_desktop If you look at the perf result sizes produced: https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=45b3d19f337884580a4e451f68e7f5ab412a1476 and https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=80cff9b30b2d311f9b567ea0464b7e5ecbf7d658 4664203 + 42101124 = 46765327 They are roughly on par with the perf results that are generated when you run all the stories on one shard: https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=8af4acb16aeaaaf09c62e75dea6c132b8a8be143 46643885 About a difference of 343306 bytes. I am not sure if adding diagnostics to both sets of data is tipping the scales? android-pixel2-perf: rendering.mobile and system_health.memory_mobile failing link to most recently failing build: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/android-pixel2-perf/builds/812 Also with 500 errors.
,
Jun 5 2018
Looking at it.
,
Jun 5 2018
As far as I can tell, this is actually failing a validation check on the dashboard. It's complaining that the diagnostics (specifically tagmap) aren't the same across all the histograms. If you could supply the pre-merged histogram data, I could debug this further.
,
Jun 5 2018
so the second system_health.memory_desktop is too big for the swarming isolated outputs, is rendering.mobile failig in the same way? Here are the links to the two sets of perf_results output by telemetry: first shard: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=e5a510b048d44d0182e614553680b8852f3f4d6b&as=perf_results.json second shard: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=9e34bd1b0ca86eb4251c78d9106d8383d146a645&as=perf_results.json
,
Jun 5 2018
media.mobile, rendering.mobile, loading.mobile, etc. kinda seems like every upload from OBBS is failing with invalid data. Thanks for the links, I'll download those and get back to you.
,
Jun 5 2018
ok the links I sent you are from the "android-pixel2-perf" configuration. system_health.memory_desktop was from "OBBS Mac 10.12 Perf:". I am not seeing the failures from media.mobile and loading.mobile that you are, but hopefully they are all the same problem. Thanks for investigating!
,
Jun 5 2018
+benjhayden From poking through output, it looks like the tagmap generated for each shard is different. The expectation on the dashboard is that there's only 1 and it's global across everything. We could force telemetry to generate the entire tagmap, but I kinda think telemetry is doing the right thing in that when you run with a subset, you get a subset of the tagmap. Could just have a list of diagnostics that add_reserved_diagnostics is expected to merge together. Any other ideas?
,
Jun 5 2018
+1 to merging tagmaps in add_reserved_diagnostics.
,
Jun 5 2018
SG will have a CL up shortly.
,
Jun 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/6160b40d1ce394e3b19c160775bd9afd81281de2 commit 6160b40d1ce394e3b19c160775bd9afd81281de2 Author: Simon <simonhatch@chromium.org> Date: Thu Jun 07 16:15:46 2018 HistogramSet - Merge TagMap diagnostics in add_reserved_diagnostics We require there be only 1, but coming out of OBBS we may have several (one from each shard). Merge them all back together. Bug: chromium:849752 Change-Id: Id3f9f8003066e29385b64725cfbbcb5dfb62fad6 Reviewed-on: https://chromium-review.googlesource.com/1087584 Reviewed-by: Ben Hayden <benjhayden@chromium.org> Commit-Queue: Simon Hatch <simonhatch@chromium.org> [modify] https://crrev.com/6160b40d1ce394e3b19c160775bd9afd81281de2/tracing/tracing/value/diagnostics/add_reserved_diagnostics_unittest.py [modify] https://crrev.com/6160b40d1ce394e3b19c160775bd9afd81281de2/tracing/tracing/value/diagnostics/add_reserved_diagnostics.py
,
Jun 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fa70f90f5307a90eee037b058283a924751008db commit fa70f90f5307a90eee037b058283a924751008db Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Thu Jun 07 19:45:18 2018 Roll src/third_party/catapult 1331d4b..72a8685 (4 commits) https://chromium.googlesource.com/catapult.git/+log/1331d4b..72a8685 git log 1331d4b..72a8685 --date=short --no-merges --format='%ad %ae %s' 2018-06-07 simonhatch@chromium.org Dashboard - Remove some unused indexes from TestMetadata. 2018-06-07 simonhatch@chromium.org Dashboard - Skip empty last_ran_timestamps 2018-06-07 simonhatch@chromium.org HistogramSet - Merge TagMap diagnostics in add_reserved_diagnostics 2018-06-07 simonhatch@chromium.org Dashboard - Removed unused global from pinpoint_request.py Created with: gclient setdep -r src/third_party/catapult@72a8685 The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG= chromium:849752 TBR=sullivan@chromium.org Change-Id: I34997ad164552bf636bd13c5dce8f68fd2f47c16 Reviewed-on: https://chromium-review.googlesource.com/1091173 Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#565382} [modify] https://crrev.com/fa70f90f5307a90eee037b058283a924751008db/DEPS
,
Jun 8 2018
https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/OBBS%20Mac%2010.12%20Perf/builds/19 Looks green, marking fixed. |
||||
►
Sign in to add a comment |
||||
Comment 1 by eyaich@chromium.org
, Jun 5 2018