New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 849752 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android , Mac
Pri: 1
Type: Bug



Sign in to add a comment

Dashboard upload failures on large tests split across shards

Project Member Reported by eyaich@chromium.org, Jun 5 2018

Issue description

Dashboard uploads are failing with a 500 error when we upload results that were run on two different shards. Currently we just concatenate results from different shards together and upload them.  Maybe the size is too big when we just concat?

OBBS Mac 10.12 Perf: system_health.memory_desktop failing
Link to most recent failing job: https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/OBBS%20Mac%2010.12%20Perf/builds/7

Here are the perf results we are trying to upload: 
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf.fyi%2FOBBS_Mac_10.12_Perf%2F7%2F%2B%2Fsystem_health.memory_desktop

If you look at the perf result sizes produced: 
https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=45b3d19f337884580a4e451f68e7f5ab412a1476

and

https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=80cff9b30b2d311f9b567ea0464b7e5ecbf7d658

4664203 + 42101124 = 46765327

They are roughly on par with the perf results that are generated when you run all the stories on one shard: 
https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=8af4acb16aeaaaf09c62e75dea6c132b8a8be143

46643885

About a difference of 343306 bytes.  I am not sure if adding diagnostics to both sets of data is tipping the scales?


android-pixel2-perf: rendering.mobile and system_health.memory_mobile failing

link to most recently failing build: 
https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/android-pixel2-perf/builds/812

Also with 500 errors.  



 
Cc: nednguyen@chromium.org
Owner: simonhatch@chromium.org
Looking at it.
As far as I can tell, this is actually failing a validation check on the dashboard. It's complaining that the diagnostics (specifically tagmap) aren't the same across all the histograms.

If you could supply the pre-merged histogram data, I could debug this further.
so the second system_health.memory_desktop is too big for the swarming isolated outputs, is rendering.mobile failig in the same way?  Here are the links to the two sets of perf_results output by telemetry: 

first shard: 
https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=e5a510b048d44d0182e614553680b8852f3f4d6b&as=perf_results.json

second shard:
https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=9e34bd1b0ca86eb4251c78d9106d8383d146a645&as=perf_results.json
media.mobile, rendering.mobile, loading.mobile, etc. kinda seems like every upload from OBBS is failing with invalid data.

Thanks for the links, I'll download those and get back to you.
ok the links I sent you are from the "android-pixel2-perf" configuration.  

system_health.memory_desktop was from "OBBS Mac 10.12 Perf:".  I am not seeing the failures from media.mobile and loading.mobile that you are, but hopefully they are all the same problem. 

Thanks for investigating!

Cc: benjhayden@chromium.org
+benjhayden

From poking through output, it looks like the tagmap generated for each shard is different. The expectation on the dashboard is that there's only 1 and it's global across everything.

We could force telemetry to generate the entire tagmap, but I kinda think telemetry is doing the right thing in that when you run with a subset, you get a subset of the tagmap.

Could just have a list of diagnostics that add_reserved_diagnostics is expected to merge together.

Any other ideas?
+1 to merging tagmaps in add_reserved_diagnostics.
SG will have a CL up shortly.
Project Member

Comment 10 by bugdroid1@chromium.org, Jun 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/6160b40d1ce394e3b19c160775bd9afd81281de2

commit 6160b40d1ce394e3b19c160775bd9afd81281de2
Author: Simon <simonhatch@chromium.org>
Date: Thu Jun 07 16:15:46 2018

HistogramSet - Merge TagMap diagnostics in add_reserved_diagnostics

We require there be only 1, but coming out of OBBS we may have several
(one from each shard). Merge them all back together.

Bug:  chromium:849752 
Change-Id: Id3f9f8003066e29385b64725cfbbcb5dfb62fad6
Reviewed-on: https://chromium-review.googlesource.com/1087584
Reviewed-by: Ben Hayden <benjhayden@chromium.org>
Commit-Queue: Simon Hatch <simonhatch@chromium.org>

[modify] https://crrev.com/6160b40d1ce394e3b19c160775bd9afd81281de2/tracing/tracing/value/diagnostics/add_reserved_diagnostics_unittest.py
[modify] https://crrev.com/6160b40d1ce394e3b19c160775bd9afd81281de2/tracing/tracing/value/diagnostics/add_reserved_diagnostics.py

Project Member

Comment 11 by bugdroid1@chromium.org, Jun 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fa70f90f5307a90eee037b058283a924751008db

commit fa70f90f5307a90eee037b058283a924751008db
Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Thu Jun 07 19:45:18 2018

Roll src/third_party/catapult 1331d4b..72a8685 (4 commits)

https://chromium.googlesource.com/catapult.git/+log/1331d4b..72a8685


git log 1331d4b..72a8685 --date=short --no-merges --format='%ad %ae %s'
2018-06-07 simonhatch@chromium.org Dashboard - Remove some unused indexes from TestMetadata.
2018-06-07 simonhatch@chromium.org Dashboard - Skip empty last_ran_timestamps
2018-06-07 simonhatch@chromium.org HistogramSet - Merge TagMap diagnostics in add_reserved_diagnostics
2018-06-07 simonhatch@chromium.org Dashboard - Removed unused global from pinpoint_request.py


Created with:
  gclient setdep -r src/third_party/catapult@72a8685

The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:849752 
TBR=sullivan@chromium.org

Change-Id: I34997ad164552bf636bd13c5dce8f68fd2f47c16
Reviewed-on: https://chromium-review.googlesource.com/1091173
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#565382}
[modify] https://crrev.com/fa70f90f5307a90eee037b058283a924751008db/DEPS

Sign in to add a comment