platform_BootPerfServer has long run time in test lab |
|||||
Issue descriptionThe best practice for measuring boot time is to boot the device 50+ times http://www.chromium.org/chromium-os/how-tos-and-troubleshooting/measuring-boot-time-performance. Doing so seems to up the run time of the platform_BootPerfServer to ~2hrs in some instances which is unacceptably high given our current resourcing for performance devices in the lab. We need to look into the variance in this test and determine why the runs are so long + see if we can minimize the run time. Otherwise, we will need to add more devices to the lab, which is costly.
,
May 25 2016
It's possible that the variability is mostly due to delays on the Autotest server side, rather than on the DUT. Also, I think the ssh timeouts are set awkwardly; those could be causing longer waits before "DUT is up" is detected. A quick hack for the ssh problem might be after the reboot command to sleep for 30 seconds or so before starting to poll for "DUT is up". If the sleep is long enough, the DUT should answer to ssh right away.
,
May 26 2016
For my two runs on gandof the primary difference in the 'fast' vs 'slow' run was between END of reboot and START of sysinfo.iteration.after step.
In fast case only ~10secs and in slow case ~50secs ... for example (slow)
END GOOD ---- reboot kernel=3.14.0 localtime=May 24 23:33:18 timestamp=1464150798
START ---- ---- timestamp=1464150854 localtime=May 24 23:34:14
GOOD ---- sysinfo.iteration.after timestamp=1464150855 localtime=May 24 23:34:15
... while fast is ...
END GOOD ---- reboot kernel=3.14.0 localtime=May 24 21:34:25 timestamp=1464143665
START ---- ---- timestamp=1464143673 localtime=May 24 21:34:33
GOOD ---- sysinfo.iteration.after timestamp=1464143673 localtime=May 24 21:34:33
Will focus further investigation there
,
Jun 4 2016
Uploaded CL: https://chromium-review.googlesource.com/#/c/349713 to fix and results so far look promising. Results below show ~50min each loop of 50 iterations. Not sure why the 'finished' timestamp ( test_runner_utils:0707 ) is +2hrs in all cases. Will let the loop continue for 24hrs or so. for d in `ls -trd test_that* | tail -7`; do head -2 $d/debug/test_that.DEBUG ; tail -1 $d/debug/test_that.DEBUG ; done ... 06/03 18:57:02.442 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_xNd9AI 06/03 18:57:02.442 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50'] 06/03 21:45:18.916 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_xNd9AI or /tmp/test_that_latest 06/03 19:45:51.016 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_p0AHbw 06/03 19:45:51.017 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50'] 06/03 22:31:50.175 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_p0AHbw or /tmp/test_that_latest | 06/03 20:32:22.279 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_vdINLD 06/03 20:32:22.280 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50'] 06/03 23:17:32.407 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_vdINLD or /tmp/test_that_latest 06/03 21:18:04.519 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_LaSoju 06/03 21:18:04.519 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50'] 06/04 00:07:27.734 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_LaSoju or /tmp/test_that_latest 06/03 21:18:04.519 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_LaSoju 06/03 21:18:04.519 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50'] 06/04 00:07:27.734 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_LaSoju or /tmp/test_that_latest
,
Jun 5 2016
Completed a bunch (20) more runs with patch in #c4. Runtime between 50-75min for most. The longer (90min) runs appear to be related to delay of rsync being run on dut to sync data back to autotest server (my localhost in test_that case).
,
Jun 6 2016
@haddowk - Would you be able to provide a max runtime that would be necessary for Crosbolt if we were to use the new test w/o adding further resources?
,
Sep 9 2016
Not going to have time to revisit this beyond the initial analysis. Making it available for others to pickup.
,
Sep 13 2016
,
Sep 14 2017
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by mshe...@chromium.org
, May 25 2016