Dashboard - 500's on rendering.mobile uploads |
||||
Issue descriptionLooks like these got really big recently. Some checking in the logs reveals nearly 50% of the 60s deadline spent in deserialization (json.loads + importdicts) alone. Only part of the histogramset is getting processed by the time the 60s deadline cuts in, so we'll see some data but I'm unsure what % of the total histogramset is making it into /add_histogram_queue
,
Aug 16
Nexus5x webview rendering.mobile (55s): json.loads:wall=3.451520 hs.ImportDicts:wall=13.377610 Nexu5x rendering.mobile (> 60s): json.loads:wall=8.491890 hs.ImportDicts:wall=23.471280 So pulling some numbers from the logs, the rendering.mobile upload is enormous. Don't have logs on the size of the histogramset but if json.loads scaling is representative, this would take upwards of 2 mins with the current implementation to process fully given that the webview version takes nearly 1 min. We can do some profiling here, but shaving off 50% of the processing time is a large undertaking. Another possibility would be to immediately write the data to cloud storage ourselves, and queue this on the backend to process. The downside to that is we can't report any invalid data issues at that point, unless we take a hybrid approach and do some quickie validation first.
,
Aug 16
Silly question, is there anyway to extend the 60s deadline to 180s for bandaid fix?
,
Aug 16
,
Aug 16
No unfortunately not, afaik it's a hard limit from appengine on FE requests.
,
Aug 16
Then write the data cloud storage then queue it on the backend to process SGTM. If that's not easy, feel free to suggest other easier approach from Telemetry client & I would be happy to implement them.
,
Aug 20
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c commit ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c Author: Simon <simonhatch@chromium.org> Date: Mon Aug 20 16:25:09 2018 Dashboard - Push HistogramSet processing into the backend Currently, this is done in the initial call by the client to /add_histograms. AppEngine imposes a hard 60s limit to respond to FE requests, and the uploaded datasets are only getting bigger, not smaller. Latest rendering.mobile uploads look like they'd roughly take about 2 minutes of processing time now (guesstimate based on timings of what's able to actually run). This CL writes the dataset out to cloud storage and then queues a task to do the actual validation and processing of the data. This gives us more breathing room, since we get 10 mins to run on the backend. The downside currently is that any validation and errors surfaced can be communicated back to the client, but now they'll only appear in logs. We could think about doing something like passing back an ID or something from the initial /add_histograms call so that the client could potentially look up the status of an upload later. Bug: chromium:874856 Change-Id: I356b62a062b815bde98c4afbe9b2636cba62de56 Reviewed-on: https://chromium-review.googlesource.com/1179991 Commit-Queue: Simon Hatch <simonhatch@chromium.org> Reviewed-by: Ethan Kuefner <eakuefner@chromium.org> [modify] https://crrev.com/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c/dashboard/dashboard/add_histograms_test.py [modify] https://crrev.com/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c/dashboard/dashboard/dispatcher.py [modify] https://crrev.com/ba76717a8d18da91240f0c0a2dcb4a3aa2794f7c/dashboard/dashboard/add_histograms.py
,
Aug 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/128cd6c67006fdd2799963f1dd3e8188438711bc commit 128cd6c67006fdd2799963f1dd3e8188438711bc Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Mon Aug 20 18:39:58 2018 Roll src/third_party/catapult e8964a2cd375..ba76717a8d18 (3 commits) https://chromium.googlesource.com/catapult.git/+log/e8964a2cd375..ba76717a8d18 git log e8964a2cd375..ba76717a8d18 --date=short --no-merges --format='%ad %ae %s' 2018-08-20 simonhatch@chromium.org Dashboard - Push HistogramSet processing into the backend 2018-08-20 wangge@google.com Add Retry if We Cannot Get Battery Info 2018-08-20 vollick@chromium.org Update branch parsing Created with: gclient setdep -r src/third_party/catapult@ba76717a8d18 The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG= chromium:874856 ,chromium:871748 TBR=sullivan@chromium.org Change-Id: I3b691c48ea47b72c0b284a1bbd855817c07f2e50 Reviewed-on: https://chromium-review.googlesource.com/1181541 Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#584510} [modify] https://crrev.com/128cd6c67006fdd2799963f1dd3e8188438711bc/DEPS
,
Aug 20
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/f86bf0b3b84cf7dc70f8caf0ce0bc065cad2bbb9 commit f86bf0b3b84cf7dc70f8caf0ce0bc065cad2bbb9 Author: Simon <simonhatch@chromium.org> Date: Mon Aug 20 19:18:28 2018 Dashboard - Fix gcs read. Forgot to upload this local change before cq'ing. TBR=eakuefner@chromium.org Bug: chromium:874856 Change-Id: I66abd1eb7c3548f7b7b8fdcf1c6916cac8a9878a Reviewed-on: https://chromium-review.googlesource.com/1181472 Reviewed-by: Simon Hatch <simonhatch@chromium.org> Commit-Queue: Simon Hatch <simonhatch@chromium.org> [modify] https://crrev.com/f86bf0b3b84cf7dc70f8caf0ce0bc065cad2bbb9/dashboard/dashboard/add_histograms.py
,
Aug 22
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bcc387a51abaa21aa580d1703ff431c1356c07b4 commit bcc387a51abaa21aa580d1703ff431c1356c07b4 Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Wed Aug 22 17:46:38 2018 Roll src/third_party/catapult ba76717a8d18..bbb04a38bbdb (23 commits) https://chromium.googlesource.com/catapult.git/+log/ba76717a8d18..bbb04a38bbdb git log ba76717a8d18..bbb04a38bbdb --date=short --no-merges --format='%ad %ae %s' 2018-08-22 wangge@google.com Fix Bug When No APK is present and Add Relevant Test Cases. 2018-08-22 anthonyalridge@google.com Initial application of mann whitney testing. 2018-08-22 benjhayden@chromium.org [chromeperf v2] Simple redux helpers. 2018-08-22 benjhayden@chromium.org Add icons for V2SPA. 2018-08-21 mseaborn@google.com [dashboard] Update docs to mention old issues filed in the Github tracker 2018-08-21 benjhayden@chromium.org Fix minify script for v2spa. 2018-08-21 benjhayden@chromium.org Add some utility functions to V2SPA. 2018-08-21 eakuefner@chromium.org [Tracing] Fix Pylint errors 2018-08-21 amyqiu@google.com Fix search bug in metrics visualization tool 2018-08-21 benjhayden@chromium.org Add ElementBase for V2SPA. 2018-08-21 benjhayden@chromium.org Plumb test case tag maps via test suite descriptors. 2018-08-21 vovoy@chromium.org Add story property: wpr_mode 2018-08-21 wangge@google.com Restructure Long Term Health Tool Output File Structure. 2018-08-20 benjhayden@chromium.org Add Material textarea for V2SPA. 2018-08-20 benjhayden@chromium.org Add checkbox to V2SPA. 2018-08-20 benjhayden@chromium.org Add cp-loading for V2SPA. 2018-08-20 benjhayden@chromium.org Add raised-button to V2SPA. 2018-08-20 chiniforooshan@chromium.org Telemetry: pixel metrics in TBMv2 2018-08-20 simonhatch@chromium.org Dashboard - Add a path for inserting out-of-order diagnostics. 2018-08-20 simonhatch@chromium.org Dashboard - Fix gcs read. 2018-08-20 chiniforooshan@chromium.org Telemetry: rename metrics as per crbug.com/627461 2018-08-20 simonhatch@chromium.org Dashboard - Cleanup unused masters and bots 2018-08-20 chiniforooshan@chromium.org Telemetry: break rendering_metric.html Created with: gclient setdep -r src/third_party/catapult@bbb04a38bbdb The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG= chromium:863390 ,chromium:866423,chromium:862077, chromium:863390 ,chromium:760553, chromium:874856 , chromium:627461 ,chromium:760553 TBR=sullivan@chromium.org Change-Id: I6f9a52e2e301d0e04b4800a9c57a33000ec51f26 Reviewed-on: https://chromium-review.googlesource.com/1185141 Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#585147} [modify] https://crrev.com/bcc387a51abaa21aa580d1703ff431c1356c07b4/DEPS
,
Aug 24
My spotcheck shows that we no longer have dashboard upload error on perf waterfall. Simon: can you check the error log to confirm that this is fixed?
,
Aug 24
Processing rendering.mobile is getting a lot further, don't see any 500's. The data seems to be malformed though and is getting rejected. I'll file a bug. |
||||
►
Sign in to add a comment |
||||
Comment 1 by simonhatch@chromium.org
, Aug 16