Issue metadata
Sign in to add a comment
|
"run_benchmark try" hangs on "Performance Test" with no output |
||||||||||||||||||||||
Issue description
I've started a couple of try jobs with the command:
tools/perf/run_benchmark try android-nexus5 system_health.memory_mobile
And both failed timing out after one hour on the "Performance Test (With Patch) 1 of 1" without producing any output.
The two try jobs:
- https://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus5_perf_bisect/builds/4475
- https://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus5_perf_bisect/builds/4476
An the CL (which doesn't do much really, I was in fact writing the docs on how to run this sort of try jobs):
https://codereview.chromium.org/2595613002
,
Dec 21 2016
I thought that was always implicit, but will retry with it now.
,
Dec 21 2016
,
Dec 21 2016
Build failed again with no output. Same thing happened here from another CL: https://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus5_perf_bisect/builds/4477
,
Dec 21 2016
Command you appear to be running is: src/tools/perf/run_benchmark --browser=android-chromium system_health.memory_mobile --pageset-repeat 5 --verbose Pulling the timing info from a normal run of system_health.memory_mobile from the nexus 5 perf waterfall, it takes ~7000s to run one iteration (I think the default is 3 repeats?), or close to 2 hours, so with --pageset-repeat=5 this would take over 3. Anybody know if there's a 1 hour limit on a single step?
,
Dec 22 2016
I think there is a timeout if the step _does not_ produce any output for a period of time. It's unclear to me, however, why we're not seeing any output at all. I kicked-off another try job with a story filter so, in theory, the job should be able to complete in less than 1 hour. Running here: https://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus5_perf_bisect/builds/4481
,
Dec 22 2016
Interesting, the try job in #6 did work since each run of "Performance Test" took only a few minutes. I guess the problem is how this step works. By "capturing" all output and only producing it all in one go when it's done, the lack of output makes buildbot think that the step is stuck and kills it. This is bad for system health, since running the entire plan to test the effects of memory for a CL on a variety of scenarios is one of the main reasons devs would want to run try jobs. Additionally: - The step failed later at "Post bisect results" with the dashboard responding "Error response: 400". That should probably be a separate bug? - I was expecting to see a link to a results2.html file somewhere. Wasn't it supposed to be there?
,
Dec 22 2016
+mikecase Adding mikecase@ since he's done some work in this area while investigating the mysterious timeouts. I vaguely recall the old perf/test_runner.py used to have a heartbeat logger that prevented timeouts for long running tests. Blocking on: crbug.com/664765 for the failed post to dashboard crbug.com/670316 for the missing results file
,
Dec 22 2016
Not sure heartbeat will work since we intercept all stdout from the test run. Even the heartbeat would be intercepted unless we found some clever way to do it. Howabout a change like this (pretty rough CL): https://chromium-review.googlesource.com/c/423387/1/scripts/slave/recipe_modules/bisect_tester/perf_test.py Basically, instead of intercepting stdout, just do... test_run_cmd.py | tee captured_stdout.txt Then stdout should get streamed to buildbot to avoid timeouts. And we still get stdout in recipes to parse and stuff.
,
Dec 22 2016
link to cleaned up CL: https://chromium-review.googlesource.com/c/423387/
,
Jan 24 2017
This is awesome! Thanks mikecase for the fix! 1. Some random test CL: https://codereview.chromium.org/2651713004 2. Try job build: https://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus5_perf_bisect/builds/4566 3. With easy link to Results HTML (thanks simonhatch for fixing bug 670316 !) 4. Results FULL system health: https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-01-23_22-44-51 This is going to make the lives of developers chasing regressions so much easier!
,
Feb 3 2017
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by sullivan@chromium.org
, Dec 21 2016Components: Tests>AutoBisect