New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 875952 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 21
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

arc.Boot timing out sporadically on edgar-paladin

Project Member Reported by derat@chromium.org, Aug 20

Issue description

There have been a few recent arc.Boot failures on edgar-paladin:

---

R70-10983.0.0-rc1
http://stainless/browse/chromeos-autotest-results/229257815-chromeos-test/

2018/08/20 02:13:01 Started test arc.Boot
2018/08/20 02:13:01 [02:13:01.607] Restarting ui job
2018/08/20 02:13:04 [02:13:03.964] Waiting for org.chromium.SessionManager D-Bus service
2018/08/20 02:13:04 [02:13:03.987] Asking session_manager to enable Chrome testing
2018/08/20 02:13:04 [02:13:03.989] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/08/20 02:13:05 [02:13:05.573] Removing cryptohome for testuser@gmail.com
2018/08/20 02:13:05 [02:13:05.634] Finding OOBE DevTools target
2018/08/20 02:13:05 [02:13:05.890] Connecting to Chrome at ws://127.0.0.1:36842/devtools/page/3FE9262D47055348F6797F752FD90909
2018/08/20 02:13:06 [02:13:06.140] Waiting for OOBE
2018/08/20 02:13:10 [02:13:10.828] Logging in as user "testuser@gmail.com"
2018/08/20 02:13:11 [02:13:11.095] Waiting for cryptohome /home/user/17d0c63761779443dba903f3ddc583101c2635a1
2018/08/20 02:13:14 [02:13:14.589] Waiting for OOBE to be dismissed
2018/08/20 02:13:17 [02:13:17.272] Waiting for Android boot
2018/08/20 02:14:47 [02:14:47.260] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded

---

R70-10984.0.0-rc2
http://stainless/browse/chromeos-autotest-results/229416389-chromeos-test/

2018/08/20 11:01:23 Started test arc.Boot
2018/08/20 11:01:23 [11:01:22.546] Restarting ui job
2018/08/20 11:01:24 [11:01:24.206] Waiting for org.chromium.SessionManager D-Bus service
2018/08/20 11:01:24 [11:01:24.235] Asking session_manager to enable Chrome testing
2018/08/20 11:01:24 [11:01:24.240] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/08/20 11:01:26 [11:01:25.431] Removing cryptohome for testuser@gmail.com
2018/08/20 11:01:26 [11:01:25.491] Finding OOBE DevTools target
2018/08/20 11:01:26 [11:01:25.705] Connecting to Chrome at ws://127.0.0.1:41969/devtools/page/42EA0BE5210D9D81B333B30D90F48DDD
2018/08/20 11:01:26 [11:01:25.845] Waiting for OOBE
2018/08/20 11:01:31 [11:01:30.825] Logging in as user "testuser@gmail.com"
2018/08/20 11:01:31 [11:01:31.065] Waiting for cryptohome /home/user/4cdbfcc11f0db7f2af88406419be4aef5753628e
2018/08/20 11:01:34 [11:01:33.714] Waiting for OOBE to be dismissed
2018/08/20 11:01:37 [11:01:36.947] Waiting for Android boot
2018/08/20 11:03:07 [11:03:06.929] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded

---

I don't know enough about what arc.log is supposed to look like to know what went wrong here. There's a successful run at http://stainless/browse/chromeos-autotest-results/229342986-chromeos-test/ .

It's suspicious that both failures were on edgar. If this system is expected to be slower than others, we may need to increase the timeout for ARC startup.
 
I also recently committed https://crrev.com/c/1180650 to make this test watch for ro.arc.boot_completed rather than org.chromium.arc.boot after the property was renamed on the ARC side. I wonder if there's any chance that edgar is still using an older build of ARC that sets org.chromium.arc.boot. I think that this is unlikely since there's also been a successful edgar run, but I figured I should mention it.
Cc: jrbarnette@chromium.org dgarr...@chromium.org ihf@chromium.org
In the passing edgar run at http://stainless/browse/chromeos-autotest-results/229342986-chromeos-test/, the test waited 51 seconds for boot:

2018/08/20 06:28:50 [06:28:48.956] Waiting for Android boot
2018/08/20 06:29:41 Completed test arc.Boot in 1m6.836s with 0 error(s)

In the new edgar failure at http://stainless/browse/chromeos-autotest-results/229477610-chromeos-test/, the test waited 90 seconds before timing out:

2018/08/20 15:39:11 [15:39:10.758] Waiting for Android boot
2018/08/20 15:40:41 [15:40:40.751] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded

In the successful sentry run at http://stainless/browse/chromeos-autotest-results/229342198-chromeos-test/, the test only had to wait for 28 seconds here:

2018/08/20 06:29:05 [06:29:04.614] Waiting for Android boot
2018/08/20 06:29:33 Completed test arc.Boot in 39.697s with 0 error(s)

It seems to me like ARC is either inconsistently slow on edgar or inconsistently failing to boot entirely.

Is this a known issue? I could possibly try moving the Tast BVT suites to a different strago board, e.g. cyan (which bvt-arc currently uses).
Cc: lhchavez@chromium.org yusukes@chromium.org
Owner: levarum@chromium.org
Something that looks very weird is that the failing runs don't have logcat.txt contents at all. The file is there, but is empty.

I'm not yet familiar enough with Tast to know why this is happening, but assigning to the current PST constable (sorry, Lev!) and cc-ing yusukes@ to see if he has any ideas.
Owner: nya@chromium.org
Thanks for spotting the issue. I'll try to investigate this today, but I might be delayed since I'm attending ARC summit. If we can't figure out fix soon, we may want to mark arc.Boot as informational to unblock CQ.

That said, it is fishy that all three failed runs on edgar left 0-byte logcat.txt. OTOH, ps.txt shows good Android process trees. My guess is that logcat and getprop are not working well (android-sh?). I'll investigate...

Cc: derat@chromium.org tfiga@chromium.org
 Issue 876188  has been merged into this issue.
(Just as an aside: marking the test as informational might cause problems; IIRC "tast run" will exit with an error if the supplied test expressions don't match any tests since that would usually indicate an incorrect expression or missing test bundles. If we want to disable the test, we may need to comment out the suite attributes in server/site_tests/tast/control.mustpass-android instead.)
Status: Started (was: Assigned)
Hmm. It is possible DUT is just slow.

For example, chromeos2-row5-rack2-host5 is a DUT where edgar-paladin/3721 failed:
https://stainless.corp.google.com/browse/chromeos-autotest-results/229536081-chromeos-test/

I tried running arc.Boot on the DUT (after making sure CQ is not running in parallel).

$ tast -verbose run -build=false chromeos2-row5-rack2-host5.cros arc.Boot
...
2018/08/21 14:01:40 Started test arc.Boot
...
2018/08/21 14:03:24 [22:03:23.993] Error at boot.go:39: Failed to start ARC: failed waiting for ro.arc.boot_completed=1 (system_server crashed before BOOT_COMPLETED?): context deadline exceeded
...

Even after the failure Android continued to boot on DUT. And actually boot event were emitted (slightly) later:

# android-sh -c 'logcat -b events -d' | grep arc_system_event
08-20 22:03:24.160   976   976 I arc_system_event: ArcAppLauncher:started
08-20 22:03:24.455   990  2028 I arc_system_event: ArcIntentHelperService:ready

Let's increase the timeout for now and see how it goes.

CL to increase timeout was chumped with sheriff's approval: https://chromium-review.googlesource.com/c/chromiumos/platform/tast-tests/+/1182914
I'll keep eyes on edgar-paladin runs.

It is still not known why logcat.txt was zero byte. I've filed a bug to track this:  issue 876191 

Status: Fixed (was: Started)
edgar-paladin #3723 passed... while it took as long as 1m45s to start Android :(

2018/08/21 04:56:42 Started test arc.Boot
2018/08/21 04:56:42 [04:56:42.066] Restarting ui job
2018/08/21 04:56:44 [04:56:44.457] Waiting for org.chromium.SessionManager D-Bus service
2018/08/21 04:56:44 [04:56:44.482] Asking session_manager to enable Chrome testing
2018/08/21 04:56:44 [04:56:44.490] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/08/21 04:56:47 [04:56:46.603] Removing cryptohome for testuser@gmail.com
2018/08/21 04:56:47 [04:56:46.663] Finding OOBE DevTools target
2018/08/21 04:56:47 [04:56:46.863] Connecting to Chrome at ws://127.0.0.1:44524/devtools/page/B7E87AB6B4BFC4A40714D5A98C053340
2018/08/21 04:56:47 [04:56:47.131] Waiting for OOBE
2018/08/21 04:56:52 [04:56:51.733] Logging in as user "testuser@gmail.com"
2018/08/21 04:56:52 [04:56:51.932] Waiting for cryptohome /home/user/541e36fe6e72df65cd9067b669d00c7ed9f190fb
2018/08/21 04:56:55 [04:56:55.319] Waiting for OOBE to be dismissed
2018/08/21 04:56:59 [04:56:59.011] Waiting for Android boot
2018/08/21 04:58:44 Completed test arc.Boot in 2m1.791s with 0 error(s)

https://storage.cloud.google.com/chromeos-autotest-results/229698028-chromeos-test/chromeos2-row5-rack7-host15/tast/results/tests/arc.Boot/log.txt?_ga=2.231169050.-409884876.1534829544

I declare this issue as fixed, but I feel we need to watch duration of arc.Boot across models/boards. Some DUTs of edgar seem slower than other SKUs...

Cc: elijahtaylor@chromium.org
Do we track the time taken to reach significant milestones while Android is booting, similar to Chrome OS's bootstat timestamps? It seems important to know which part(s) of boot are being so variable here, as this may affect end users as well. I can file a separate bug to track investigating this if that seems better.
Cc: khmel@chromium.org
We do. Please add khmel@ and yusukes@ to the bug.
Thanks, I've done some rudimentary logs analysis and filed issue 876356 to track the root cause.

Sign in to add a comment