Issue metadata
Sign in to add a comment
|
Some Telemetry tests don't exit after running all stories and reporting failures |
||||||||||||||||||||
Issue descriptionRevision range first seen: ???? (but a long time) Example: https://luci-milo.appspot.com/buildbot/chromium.perf/Mac%20Air%2010.11%20Perf/1160, see battor.trivial_pages and battor.trivial_pages.reference Both of these benchmarks failed very quickly because the attached BattOrs need to be reset. Digging into battor.trivial_pages (http://bit.ly/2vDn1Qc), the first log line looks like: ----------------------------------------------------------------- (WARNING) 2017-08-15 20:00:20,200 desktop_browser_finder.FindAllAvailableBrowsers:171 Chrome build location for mac_x86_64 not found. Browser will be run without Flash. ----------------------------------------------------------------- The end of the log looks like: ----------------------------------------------------------------- (INFO) 2017-08-15 20:01:47,038 cloud_storage.Insert:377 Uploading /b/s/w/itT1LWMF/tmp7uruIS.png to gs://chrome-telemetry-output/profiler-file-id_0-2017-08-15_20-01-4792731.png View generated profiler files online at https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/profiler-file-id_0-2017-08-15_20-01-4792731.png for page TrivialScrollingPageSharedPageState [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] TrivialScrollingPageSharedPageState 1 FAILED TEST View result at file:///b/s/w/itT1LWMF/tmph6VKfctelemetry/results-chart.json View result at file:///b/s/w/itT1LWMF/tmph6VKfctelemetry/test-results.json ----------------------------------------------------------------- Based on the logging, you can reasonably infer that the test took about 1m47s to fail. However, swarming doesn't corroborate this: it says: Started: 8/15/2017, 11:00:06 PM (EDT) Completed: 8/16/2017, 12:01:49 AM (EDT) In otherwise, swarming says that the task takes a full hour to fail (!), and is eventually killed a by a swarming shard timeout. Looking at other benchmark runs on the same bot (build125-b1) seems to corroborate swarming's story: the next benchmark to run on the bot is blink_perf.layout, which has a "pending" time of 2h4m. This is suspiciously close to the 2h we would expect to see if both battor.trivial_pages and battor.trivial_pages.reference each took an hour to time out. This doesn't appear to be a Telemetry-wide problem: on the same bot, smoothness.top_25 is failing in 8m14s. Ned suggested that we add a timestamp before the the "View result at file..." log lines at the end of the trace in order to determine how long each of those uploads are taking. My suspicion, though, is that they aren't the problem, and that instead there's some problem with some atexit handler that the BattOr code registers.
,
Aug 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/91c6b1bce0497e1d95aaf950e82e901ddbeb8219 commit 91c6b1bce0497e1d95aaf950e82e901ddbeb8219 Author: Charlie Andrews <charliea@chromium.org> Date: Wed Aug 16 16:55:39 2017 Decrease the swarming I/O timeout for perf tests from 1h to 10m We are having a problem where the atexit handler on some BattOr-related code is hanging indefinitely, which in our case, means for an hour. In general, tests should probably never hang for an hour without I/O. If they do, we should probably special-case that test's I/O timeout rather than having a default I/O timeout of 1 hour. Bug: 755981 Change-Id: I29142379bc80009a7012397390e9dc2571f8db5f Reviewed-on: https://chromium-review.googlesource.com/617102 Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Charlie Andrews <charliea@chromium.org> Commit-Queue: Charlie Andrews <charliea@chromium.org> Cr-Commit-Position: refs/heads/master@{#494828} [modify] https://crrev.com/91c6b1bce0497e1d95aaf950e82e901ddbeb8219/testing/buildbot/chromium.perf.fyi.json [modify] https://crrev.com/91c6b1bce0497e1d95aaf950e82e901ddbeb8219/testing/buildbot/chromium.perf.json [modify] https://crrev.com/91c6b1bce0497e1d95aaf950e82e901ddbeb8219/tools/perf/core/perf_data_generator.py [modify] https://crrev.com/91c6b1bce0497e1d95aaf950e82e901ddbeb8219/tools/perf/core/perf_data_generator_unittest.py
,
Aug 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5f27e44dcf49e79d287d0068bab57e3824cdb9fe commit 5f27e44dcf49e79d287d0068bab57e3824cdb9fe Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org> Date: Wed Aug 16 19:51:00 2017 Roll src/third_party/catapult/ 818332ed8..18998c1fd (2 commits) https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/818332ed8043..18998c1fd0cc $ git log 818332ed8..18998c1fd --date=short --no-merges --format='%ad %ae %s' 2017-08-16 simonhatch Dashboard - Bump alert limits for group_report. 2017-08-16 charliea Move atexit_with_log into py_utils and make BattOrWrapper use it Created with: roll-dep src/third_party/catapult BUG= 755661 ,755981 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=sullivan@chromium.org Change-Id: If9400d60d42f1e6dc8c413231386c43170a7f4c4 Reviewed-on: https://chromium-review.googlesource.com/617269 Reviewed-by: <catapult-deps-roller@chromium.org> Commit-Queue: <catapult-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#494906} [modify] https://crrev.com/5f27e44dcf49e79d287d0068bab57e3824cdb9fe/DEPS
,
Aug 17 2017
The NextAction date has arrived: 2017-08-17
,
Aug 21 2017
Try job in progress here: https://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_11_perf_bisect/builds/1715
,
Dec 19 2017
Based on what's happening in https://bugs.chromium.org/p/chromium/issues/detail?id=795060#c12, I'm going to say that this can happen with other Telemetry tests besides just the power ones.
,
Aug 6
Marking this as "available" given my recent pivot away from core Telemetry work
,
Jan 16
,
Jan 16
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by charliea@chromium.org
, Aug 16 2017