I didn't any obvious reason about the tryjob configs or test command.
From the logs it looks like the job timed-out after without any output for 60 minutes.
Also I noticed that in the config the max_time_minutes is set to 120.
Do we expect this test to run more than 60 minutes?
Update:
I tried couple of perf try jobs for the startup.cold.blank_page on the Winx64 Nvidia perf bisect bot. But I got the same failure
Then I remote logged into bot and ran the same benchmark. Still no luck, benchmark seems to do nothing, no browser is loaded.
Deleted:
Screenshot from 2016-03-03 11:04:52.png
52.2 KB
Debugged this locally with Prasadv, the root cause here is bisect recipe never try to clear the out/Release_x64/ directory which contains the browser.
For startup.cold benchmark, telemetry first try to flush the cache of browser directory. Because we never clear out/Release_x64/, the build artifacts & browser run's data keep accumulating in this directory (50Gb currently), making the flush command take forever to run.
@prasadv say we do clear out the build directory on perf bot recipe & bisect. So to me, this is the symptom of we do zero code sharing of recipes that run telemetry benchmark.
Unclear if we can share this particular behavior. The perf and bisect bots use packaged builds, so clobbering the browser directory is safe. The CQ builds and tests on the same machine. Clobbering the Release directory will disable incremental builds, increasing the build latency.
But definitely, we have code sharing problems overall. There may be some things we can do to unify the behavior here, like using packaged builds on the CQ bots as well.
First we need to understand that the features of perf try jobs and bisect are different. Even though we share lot of codes between these two recipes the workflow of these two are separate. And most of the bisect recipes complies with recipes that runs telemetry benchmarks.
In this case of Perf try job we compile binaries and run the benchmarks, in contrast with Perf tests (chromium.perf) and Bisect job where builds are downloaded after removing build directory.
In case of Perf Try jobs we need to build and run tests with patch and without patch. If we delete the build directory per compile, one effect would be the build without patch might take longer time.
One possible solution is to delete the build directory before starting any perf try job.
Also I think the benchmarks should clear the data once the test execution is done, are we not doing this with benchmarks
Labels: triaged Summary: Perf tryjob test run times out on Winx64 Nvidia perf bisect bot (browser cache directory not cleared). (was: Perf tryjob test run times out on Winx64 Nvidia perf bisect bot.)
I think the job referred in #13 ran dummy benchmarks.
Changes are already landed, I triggered perf try job with the failing benchmark,
https://codereview.chromium.org/1776923004.
I would wait for them to complete to know the actual status.
Comment 1 by pras...@chromium.org
, Mar 2 2016