New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 614821 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

platform_BootPerfServer has long run time in test lab

Project Member Reported by mshe...@chromium.org, May 25 2016

Issue description

The best practice for measuring boot time is to boot the device 50+ times http://www.chromium.org/chromium-os/how-tos-and-troubleshooting/measuring-boot-time-performance.

Doing so seems to up the run time of the platform_BootPerfServer to ~2hrs in some instances which is unacceptably high given our current resourcing for performance devices in the lab.

We need to look into the variance in this test and determine why the runs are so long + see if we can minimize the run time. Otherwise, we will need to add more devices to the lab, which is costly.
 
Further context from haddowk@:

"Full logs are here:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/64552022-haddowk/chromeos4-row11-rack3-host4/

and you can see that each cycle take about 2 mins

START	----	reboot	timestamp=1464191800	localtime=May 25 08:56:40	
		GOOD	----	reboot.start	timestamp=1464191800	localtime=May 25 08:56:40	
		GOOD	----	reboot.verify	timestamp=1464191820	localtime=May 25 08:57:00	
	END GOOD	----	reboot	kernel=3.10.18	localtime=May 25 08:57:02	timestamp=1464191822	
	START	----	----	timestamp=1464191917	localtime=May 25 08:58:37	
		GOOD	----	sysinfo.iteration.after	timestamp=1464191917	localtime=May 25 08:58:37	
	END GOOD	----	----	timestamp=1464191917	localtime=May 25 08:58:37	


I have seen the test run in closer to 1 hour at some times, it is not clear why its runtime is so variable."
Components: Infra>Client>ChromeOS
It's possible that the variability is mostly due to delays
on the Autotest server side, rather than on the DUT.
Also, I think the ssh timeouts are set awkwardly; those
could be causing longer waits before "DUT is up" is detected.

A quick hack for the ssh problem might be after the reboot command
to sleep for 30 seconds or so before starting to poll for "DUT is up".
If the sleep is long enough, the DUT should answer to ssh right away.

Comment 3 by tbroch@chromium.org, May 26 2016

For my two runs on gandof the primary difference in the 'fast' vs 'slow' run was between END of reboot and START of sysinfo.iteration.after step.

In fast case only ~10secs and in slow case ~50secs ... for example (slow)

        END GOOD        ----    reboot  kernel=3.14.0   localtime=May 24 23:33:18       timestamp=1464150798
        START   ----    ----    timestamp=1464150854    localtime=May 24 23:34:14
                GOOD    ----    sysinfo.iteration.after timestamp=1464150855    localtime=May 24 23:34:15

... while fast is ...

        END GOOD        ----    reboot  kernel=3.14.0   localtime=May 24 21:34:25       timestamp=1464143665
        START   ----    ----    timestamp=1464143673    localtime=May 24 21:34:33
                GOOD    ----    sysinfo.iteration.after timestamp=1464143673    localtime=May 24 21:34:33

Will focus further investigation there 
Uploaded CL: https://chromium-review.googlesource.com/#/c/349713 to fix and results so far look promising.  

Results below show ~50min each loop of 50 iterations.  Not sure why the 'finished' timestamp ( test_runner_utils:0707 ) is +2hrs in all cases.  Will let the loop continue for 24hrs or so.


for d in `ls -trd test_that* | tail -7`; do head -2 $d/debug/test_that.DEBUG ; tail -1 $d/debug/test_that.DEBUG ; done

...
06/03 18:57:02.442 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_xNd9AI
06/03 18:57:02.442 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c
ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50']
06/03 21:45:18.916 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_xNd9AI or
 /tmp/test_that_latest
06/03 19:45:51.016 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_p0AHbw
06/03 19:45:51.017 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c
ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50']
06/03 22:31:50.175 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_p0AHbw or
 /tmp/test_that_latest                                                                                                            |
06/03 20:32:22.279 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_vdINLD
06/03 20:32:22.280 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c
ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50']
06/03 23:17:32.407 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_vdINLD or
 /tmp/test_that_latest
06/03 21:18:04.519 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_LaSoju
06/03 21:18:04.519 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c
ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50']
06/04 00:07:27.734 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_LaSoju or
 /tmp/test_that_latest
06/03 21:18:04.519 INFO | test_runner_utils:0660| Began logging to /tmp/test_that_results_LaSoju
06/03 21:18:04.519 DEBUG| test_runner_utils:0662| test_that command line was: ['--board', 'gandof', 'chromeos4-row5-rack5-host17.c
ros', '--autotest_dir=../third_party/autotest/files', 'platform_BootPerfServer.bootperf', '--args', '50']
06/04 00:07:27.734 INFO | test_runner_utils:0707| Finished running tests. Results can be found in /tmp/test_that_results_LaSoju or
 /tmp/test_that_latest
Completed a bunch (20) more runs with patch in #c4.  Runtime between 50-75min for most.  The longer (90min) runs appear to be related to delay of rsync being run on dut to sync data back to autotest server (my localhost in test_that case).
@haddowk - Would you be able to provide a max runtime that would be necessary for Crosbolt if we were to use the new test w/o adding further resources?
Cc: tbroch@chromium.org
Labels: -Pri-1 Pri-3
Owner: ----
Status: Available (was: Assigned)
Not going to have time to revisit this beyond the initial analysis.  Making it available for others to pickup.

Comment 8 by autumn@chromium.org, Sep 13 2016

Components: -Infra>Client>ChromeOS
Project Member

Comment 9 by sheriffbot@chromium.org, Sep 14 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment