New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 874856 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 867379



Sign in to add a comment

Dashboard - 500's on rendering.mobile uploads

Project Member Reported by simonhatch@chromium.org, Aug 16

Issue description

Looks like these got really big recently. Some checking in the logs reveals nearly 50% of the 60s deadline spent in deserialization (json.loads + importdicts) alone.

Only part of the histogramset is getting processed by the time the 60s deadline cuts in, so we'll see some data but I'm unsure what % of the total histogramset is making it into /add_histogram_queue


 
Blocking: 867379
Nexus5x webview rendering.mobile (55s):
json.loads:wall=3.451520
hs.ImportDicts:wall=13.377610

Nexu5x rendering.mobile (> 60s):
json.loads:wall=8.491890
hs.ImportDicts:wall=23.471280


So pulling some numbers from the logs, the rendering.mobile upload is enormous. Don't have logs on the size of the histogramset but if json.loads scaling is representative, this would take upwards of 2 mins with the current implementation to process fully given that the webview version takes nearly 1 min.

We can do some profiling here, but shaving off 50% of the processing time is a large undertaking. 

Another possibility would be to immediately write the data to cloud storage ourselves, and queue this on the backend to process. The downside to that is we can't report any invalid data issues at that point, unless we take a hybrid approach and do some quickie validation first.
Silly question, is there anyway to extend the 60s deadline to 180s for bandaid fix? 
Cc: sadrul@chromium.org
No unfortunately not, afaik it's a hard limit from appengine on FE requests.
Then write the data cloud storage then queue it on the backend to process SGTM. If that's not easy, feel free to suggest other easier approach from Telemetry client & I would be happy to implement them.
Project Member

Comment 7 by bugdroid1@chromium.org, Aug 20

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c

commit ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c
Author: Simon <simonhatch@chromium.org>
Date: Mon Aug 20 16:25:09 2018

Dashboard - Push HistogramSet processing into the backend

Currently, this is done in the initial call by the client to
/add_histograms. AppEngine imposes a hard 60s limit to respond to FE
requests, and the uploaded datasets are only getting bigger, not
smaller.

Latest rendering.mobile uploads look like they'd roughly take
about 2 minutes of processing time now (guesstimate based on timings of
what's able to actually run).

This CL writes the dataset out to cloud storage and then queues a task
to do the actual validation and processing of the data. This gives us
more breathing room, since we get 10 mins to run on the backend.

The downside currently is that any validation and errors surfaced can
be communicated back to the client, but now they'll only appear in
logs. We could think about doing something like passing back an ID or
something from the initial /add_histograms call so that the client
could potentially look up the status of an upload later.

Bug:  chromium:874856 
Change-Id: I356b62a062b815bde98c4afbe9b2636cba62de56
Reviewed-on: https://chromium-review.googlesource.com/1179991
Commit-Queue: Simon Hatch <simonhatch@chromium.org>
Reviewed-by: Ethan Kuefner <eakuefner@chromium.org>

[modify] https://crrev.com/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c/dashboard/dashboard/add_histograms_test.py
[modify] https://crrev.com/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c/dashboard/dashboard/dispatcher.py
[modify] https://crrev.com/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c/dashboard/dashboard/add_histograms.py

Project Member

Comment 8 by bugdroid1@chromium.org, Aug 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/128cd6c67006fdd2799963f1dd3e8188438711bc

commit 128cd6c67006fdd2799963f1dd3e8188438711bc
Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Mon Aug 20 18:39:58 2018

Roll src/third_party/catapult e8964a2cd375..ba76717a8d18 (3 commits)

https://chromium.googlesource.com/catapult.git/+log/e8964a2cd375..ba76717a8d18


git log e8964a2cd375..ba76717a8d18 --date=short --no-merges --format='%ad %ae %s'
2018-08-20 simonhatch@chromium.org Dashboard - Push HistogramSet processing into the backend
2018-08-20 wangge@google.com Add Retry if We Cannot Get Battery Info
2018-08-20 vollick@chromium.org Update branch parsing


Created with:
  gclient setdep -r src/third_party/catapult@ba76717a8d18

The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:874856 ,chromium:871748
TBR=sullivan@chromium.org

Change-Id: I3b691c48ea47b72c0b284a1bbd855817c07f2e50
Reviewed-on: https://chromium-review.googlesource.com/1181541
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#584510}
[modify] https://crrev.com/128cd6c67006fdd2799963f1dd3e8188438711bc/DEPS

Project Member

Comment 9 by bugdroid1@chromium.org, Aug 20

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/f86bf0b3b84cf7dc70f8caf0ce0bc065cad2bbb9

commit f86bf0b3b84cf7dc70f8caf0ce0bc065cad2bbb9
Author: Simon <simonhatch@chromium.org>
Date: Mon Aug 20 19:18:28 2018

Dashboard - Fix gcs read.

Forgot to upload this local change before cq'ing.

TBR=eakuefner@chromium.org

Bug:  chromium:874856 
Change-Id: I66abd1eb7c3548f7b7b8fdcf1c6916cac8a9878a
Reviewed-on: https://chromium-review.googlesource.com/1181472
Reviewed-by: Simon Hatch <simonhatch@chromium.org>
Commit-Queue: Simon Hatch <simonhatch@chromium.org>

[modify] https://crrev.com/f86bf0b3b84cf7dc70f8caf0ce0bc065cad2bbb9/dashboard/dashboard/add_histograms.py

Project Member

Comment 10 by bugdroid1@chromium.org, Aug 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bcc387a51abaa21aa580d1703ff431c1356c07b4

commit bcc387a51abaa21aa580d1703ff431c1356c07b4
Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Wed Aug 22 17:46:38 2018

Roll src/third_party/catapult ba76717a8d18..bbb04a38bbdb (23 commits)

https://chromium.googlesource.com/catapult.git/+log/ba76717a8d18..bbb04a38bbdb


git log ba76717a8d18..bbb04a38bbdb --date=short --no-merges --format='%ad %ae %s'
2018-08-22 wangge@google.com Fix Bug When No APK is present and Add Relevant Test Cases.
2018-08-22 anthonyalridge@google.com Initial application of mann whitney testing.
2018-08-22 benjhayden@chromium.org [chromeperf v2] Simple redux helpers.
2018-08-22 benjhayden@chromium.org Add icons for V2SPA.
2018-08-21 mseaborn@google.com [dashboard] Update docs to mention old issues filed in the Github tracker
2018-08-21 benjhayden@chromium.org Fix minify script for v2spa.
2018-08-21 benjhayden@chromium.org Add some utility functions to V2SPA.
2018-08-21 eakuefner@chromium.org [Tracing] Fix Pylint errors
2018-08-21 amyqiu@google.com Fix search bug in metrics visualization tool
2018-08-21 benjhayden@chromium.org Add ElementBase for V2SPA.
2018-08-21 benjhayden@chromium.org Plumb test case tag maps via test suite descriptors.
2018-08-21 vovoy@chromium.org Add story property: wpr_mode
2018-08-21 wangge@google.com Restructure Long Term Health Tool Output File Structure.
2018-08-20 benjhayden@chromium.org Add Material textarea for V2SPA.
2018-08-20 benjhayden@chromium.org Add checkbox to V2SPA.
2018-08-20 benjhayden@chromium.org Add cp-loading for V2SPA.
2018-08-20 benjhayden@chromium.org Add raised-button to V2SPA.
2018-08-20 chiniforooshan@chromium.org Telemetry: pixel metrics in TBMv2
2018-08-20 simonhatch@chromium.org Dashboard - Add a path for inserting out-of-order diagnostics.
2018-08-20 simonhatch@chromium.org Dashboard - Fix gcs read.
2018-08-20 chiniforooshan@chromium.org Telemetry: rename metrics as per  crbug.com/627461 
2018-08-20 simonhatch@chromium.org Dashboard - Cleanup unused masters and bots
2018-08-20 chiniforooshan@chromium.org Telemetry: break rendering_metric.html


Created with:
  gclient setdep -r src/third_party/catapult@bbb04a38bbdb

The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:863390 ,chromium:866423,chromium:862077, chromium:863390 ,chromium:760553, chromium:874856 , chromium:627461 ,chromium:760553
TBR=sullivan@chromium.org

Change-Id: I6f9a52e2e301d0e04b4800a9c57a33000ec51f26
Reviewed-on: https://chromium-review.googlesource.com/1185141
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#585147}
[modify] https://crrev.com/bcc387a51abaa21aa580d1703ff431c1356c07b4/DEPS

Status: Started (was: Untriaged)
My spotcheck shows that we no longer have dashboard upload error on perf waterfall. 

Simon: can you check the error log to confirm that this is fixed?
Status: Fixed (was: Started)
Processing rendering.mobile is getting a lot further, don't see any 500's. The data seems to be malformed though and is getting rejected. I'll file a bug.

Sign in to add a comment