arc.Boot timing out sporadically on edgar-paladin |
||||||||
Issue descriptionThere have been a few recent arc.Boot failures on edgar-paladin: --- R70-10983.0.0-rc1 http://stainless/browse/chromeos-autotest-results/229257815-chromeos-test/ 2018/08/20 02:13:01 Started test arc.Boot 2018/08/20 02:13:01 [02:13:01.607] Restarting ui job 2018/08/20 02:13:04 [02:13:03.964] Waiting for org.chromium.SessionManager D-Bus service 2018/08/20 02:13:04 [02:13:03.987] Asking session_manager to enable Chrome testing 2018/08/20 02:13:04 [02:13:03.989] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/08/20 02:13:05 [02:13:05.573] Removing cryptohome for testuser@gmail.com 2018/08/20 02:13:05 [02:13:05.634] Finding OOBE DevTools target 2018/08/20 02:13:05 [02:13:05.890] Connecting to Chrome at ws://127.0.0.1:36842/devtools/page/3FE9262D47055348F6797F752FD90909 2018/08/20 02:13:06 [02:13:06.140] Waiting for OOBE 2018/08/20 02:13:10 [02:13:10.828] Logging in as user "testuser@gmail.com" 2018/08/20 02:13:11 [02:13:11.095] Waiting for cryptohome /home/user/17d0c63761779443dba903f3ddc583101c2635a1 2018/08/20 02:13:14 [02:13:14.589] Waiting for OOBE to be dismissed 2018/08/20 02:13:17 [02:13:17.272] Waiting for Android boot 2018/08/20 02:14:47 [02:14:47.260] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded --- R70-10984.0.0-rc2 http://stainless/browse/chromeos-autotest-results/229416389-chromeos-test/ 2018/08/20 11:01:23 Started test arc.Boot 2018/08/20 11:01:23 [11:01:22.546] Restarting ui job 2018/08/20 11:01:24 [11:01:24.206] Waiting for org.chromium.SessionManager D-Bus service 2018/08/20 11:01:24 [11:01:24.235] Asking session_manager to enable Chrome testing 2018/08/20 11:01:24 [11:01:24.240] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/08/20 11:01:26 [11:01:25.431] Removing cryptohome for testuser@gmail.com 2018/08/20 11:01:26 [11:01:25.491] Finding OOBE DevTools target 2018/08/20 11:01:26 [11:01:25.705] Connecting to Chrome at ws://127.0.0.1:41969/devtools/page/42EA0BE5210D9D81B333B30D90F48DDD 2018/08/20 11:01:26 [11:01:25.845] Waiting for OOBE 2018/08/20 11:01:31 [11:01:30.825] Logging in as user "testuser@gmail.com" 2018/08/20 11:01:31 [11:01:31.065] Waiting for cryptohome /home/user/4cdbfcc11f0db7f2af88406419be4aef5753628e 2018/08/20 11:01:34 [11:01:33.714] Waiting for OOBE to be dismissed 2018/08/20 11:01:37 [11:01:36.947] Waiting for Android boot 2018/08/20 11:03:07 [11:03:06.929] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded --- I don't know enough about what arc.log is supposed to look like to know what went wrong here. There's a successful run at http://stainless/browse/chromeos-autotest-results/229342986-chromeos-test/ . It's suspicious that both failures were on edgar. If this system is expected to be slower than others, we may need to increase the timeout for ARC startup.
,
Aug 20
http://stainless/search?view=matrix&row=builder_name&col=build&first_date=2018-08-14&last_date=2018-08-20&builder_name=%5Eedgar-&test=%5Etast%5C.arc%5C.Boot%24&exclude_cts=false&exclude_not_run=true&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true shows that there are occasional failures on both edgar-release and edgar-paladin. (Ignore the earlier run of failures caused by issue 875608 , the property rename.)
,
Aug 20
In the passing edgar run at http://stainless/browse/chromeos-autotest-results/229342986-chromeos-test/, the test waited 51 seconds for boot: 2018/08/20 06:28:50 [06:28:48.956] Waiting for Android boot 2018/08/20 06:29:41 Completed test arc.Boot in 1m6.836s with 0 error(s) In the new edgar failure at http://stainless/browse/chromeos-autotest-results/229477610-chromeos-test/, the test waited 90 seconds before timing out: 2018/08/20 15:39:11 [15:39:10.758] Waiting for Android boot 2018/08/20 15:40:41 [15:40:40.751] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded In the successful sentry run at http://stainless/browse/chromeos-autotest-results/229342198-chromeos-test/, the test only had to wait for 28 seconds here: 2018/08/20 06:29:05 [06:29:04.614] Waiting for Android boot 2018/08/20 06:29:33 Completed test arc.Boot in 39.697s with 0 error(s) It seems to me like ARC is either inconsistently slow on edgar or inconsistently failing to boot entirely. Is this a known issue? I could possibly try moving the Tast BVT suites to a different strago board, e.g. cyan (which bvt-arc currently uses).
,
Aug 21
Something that looks very weird is that the failing runs don't have logcat.txt contents at all. The file is there, but is empty. I'm not yet familiar enough with Tast to know why this is happening, but assigning to the current PST constable (sorry, Lev!) and cc-ing yusukes@ to see if he has any ideas.
,
Aug 21
Thanks for spotting the issue. I'll try to investigate this today, but I might be delayed since I'm attending ARC summit. If we can't figure out fix soon, we may want to mark arc.Boot as informational to unblock CQ. That said, it is fishy that all three failed runs on edgar left 0-byte logcat.txt. OTOH, ps.txt shows good Android process trees. My guess is that logcat and getprop are not working well (android-sh?). I'll investigate...
,
Aug 21
,
Aug 21
(Just as an aside: marking the test as informational might cause problems; IIRC "tast run" will exit with an error if the supplied test expressions don't match any tests since that would usually indicate an incorrect expression or missing test bundles. If we want to disable the test, we may need to comment out the suite attributes in server/site_tests/tast/control.mustpass-android instead.)
,
Aug 21
Hmm. It is possible DUT is just slow. For example, chromeos2-row5-rack2-host5 is a DUT where edgar-paladin/3721 failed: https://stainless.corp.google.com/browse/chromeos-autotest-results/229536081-chromeos-test/ I tried running arc.Boot on the DUT (after making sure CQ is not running in parallel). $ tast -verbose run -build=false chromeos2-row5-rack2-host5.cros arc.Boot ... 2018/08/21 14:01:40 Started test arc.Boot ... 2018/08/21 14:03:24 [22:03:23.993] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded ... Even after the failure Android continued to boot on DUT. And actually boot event were emitted (slightly) later: # android-sh -c 'logcat -b events -d' | grep arc_system_event 08-20 22:03:24.160 976 976 I arc_system_event: ArcAppLauncher:started 08-20 22:03:24.455 990 2028 I arc_system_event: ArcIntentHelperService:ready Let's increase the timeout for now and see how it goes.
,
Aug 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/aba1cd0b57a2cf2f6bf9c77be4223c0f866effbb commit aba1cd0b57a2cf2f6bf9c77be4223c0f866effbb Author: Shuhei Takahashi <nya@chromium.org> Date: Tue Aug 21 05:28:56 2018 arc: Increase ARC test timeout. We observed ARC boot took ~91s on edgar-paladin. BUG= chromium:875952 TEST=fast_build.sh -T TEST=tast -verbose run chromeos2-row5-rack2-host5.cros arc.Boot Change-Id: Ifb014ced5cc77ad4a0aa6a53eca1a6de5b9ae079 Reviewed-on: https://chromium-review.googlesource.com/1182914 Tested-by: Shuhei Takahashi <nya@chromium.org> Trybot-Ready: Shuhei Takahashi <nya@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> Reviewed-by: Tomasz Figa <tfiga@chromium.org> [modify] https://crrev.com/aba1cd0b57a2cf2f6bf9c77be4223c0f866effbb/src/chromiumos/tast/local/bundles/cros/arc/boot.go [modify] https://crrev.com/aba1cd0b57a2cf2f6bf9c77be4223c0f866effbb/src/chromiumos/tast/local/bundles/cros/arc/downloads.go [modify] https://crrev.com/aba1cd0b57a2cf2f6bf9c77be4223c0f866effbb/src/chromiumos/tast/local/arc/arc.go [modify] https://crrev.com/aba1cd0b57a2cf2f6bf9c77be4223c0f866effbb/src/chromiumos/tast/local/bundles/cros/arcapp/sample.go [modify] https://crrev.com/aba1cd0b57a2cf2f6bf9c77be4223c0f866effbb/src/chromiumos/tast/local/bundles/cros/arc/intent_forward.go
,
Aug 21
CL to increase timeout was chumped with sheriff's approval: https://chromium-review.googlesource.com/c/chromiumos/platform/tast-tests/+/1182914 I'll keep eyes on edgar-paladin runs. It is still not known why logcat.txt was zero byte. I've filed a bug to track this: issue 876191
,
Aug 21
edgar-paladin #3723 passed... while it took as long as 1m45s to start Android :( 2018/08/21 04:56:42 Started test arc.Boot 2018/08/21 04:56:42 [04:56:42.066] Restarting ui job 2018/08/21 04:56:44 [04:56:44.457] Waiting for org.chromium.SessionManager D-Bus service 2018/08/21 04:56:44 [04:56:44.482] Asking session_manager to enable Chrome testing 2018/08/21 04:56:44 [04:56:44.490] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/08/21 04:56:47 [04:56:46.603] Removing cryptohome for testuser@gmail.com 2018/08/21 04:56:47 [04:56:46.663] Finding OOBE DevTools target 2018/08/21 04:56:47 [04:56:46.863] Connecting to Chrome at ws://127.0.0.1:44524/devtools/page/B7E87AB6B4BFC4A40714D5A98C053340 2018/08/21 04:56:47 [04:56:47.131] Waiting for OOBE 2018/08/21 04:56:52 [04:56:51.733] Logging in as user "testuser@gmail.com" 2018/08/21 04:56:52 [04:56:51.932] Waiting for cryptohome /home/user/541e36fe6e72df65cd9067b669d00c7ed9f190fb 2018/08/21 04:56:55 [04:56:55.319] Waiting for OOBE to be dismissed 2018/08/21 04:56:59 [04:56:59.011] Waiting for Android boot 2018/08/21 04:58:44 Completed test arc.Boot in 2m1.791s with 0 error(s) https://storage.cloud.google.com/chromeos-autotest-results/229698028-chromeos-test/chromeos2-row5-rack7-host15/tast/results/tests/arc.Boot/log.txt?_ga=2.231169050.-409884876.1534829544 I declare this issue as fixed, but I feel we need to watch duration of arc.Boot across models/boards. Some DUTs of edgar seem slower than other SKUs...
,
Aug 21
Do we track the time taken to reach significant milestones while Android is booting, similar to Chrome OS's bootstat timestamps? It seems important to know which part(s) of boot are being so variable here, as this may affect end users as well. I can file a separate bug to track investigating this if that seems better.
,
Aug 21
We do. Please add khmel@ and yusukes@ to the bug.
,
Aug 21
Thanks, I've done some rudimentary logs analysis and filed issue 876356 to track the root cause. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by derat@chromium.org
, Aug 20