New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 816785



Sign in to add a comment

nautilus-release fails on cheets test ( adb is not ready in 60 seconds.)

Project Member Reported by norvez@chromium.org, Feb 21 Back to list

Issue description

It's happened in 3 out of that last 4 builds: https://luci-milo.appspot.com/buildbot/chromeos/nautilus-release/393 (the other build failed with an infra issue)


"

[Test-Logs]: cheets_CTS_N.CtsAccelerationTestCases: ERROR: Command </tmp/autotest-tradefed-install_ulPPTA/fad41a1ed49f9b27075a6dea983c3c70/android-cts-7.1_r14-linux_x86-arm/android-cts/tools/cts-tradefed run commandAndExit cts --module CtsAccelerationTestCases --precondition-arg skip-media-download --disable-reboot --logcat-on-failure> failed, rc=-15, Command(s) did not complete within 360 seconds
[Test-Logs]: cheets_CTS_N.CtsDramTestCases: ERROR: Command </tmp/autotest-tradefed-install_czw4sQ/fad41a1ed49f9b27075a6dea983c3c70/android-cts-7.1_r14-linux_x86-arm/android-cts/tools/cts-tradefed run commandAndExit cts --module CtsDramTestCases --precondition-arg skip-media-download --disable-reboot --logcat-on-failure> failed, rc=-15, Command(s) did not complete within 360 seconds
[Test-Logs]: cheets_CameraOrientation: retry_count: 2, FAIL: adb is not ready in 60 seconds.
[Test-Logs]: cheets_ContainerSmokeTest: FAIL: adb is not ready in 60 seconds.
[Test-Logs]: cheets_NotificationTest: FAIL: adb is not ready in 60 seconds.
"

Assigning to the constable for triage
 
Cc: yawano@chromium.org kinaba@chromium.org
This may have actually started happening a while ago, around build nautilus-release build 312 (Chrome OS 10339.0.0).

I've also been asked to look at an issue which started around the same time where CtsMonkeyTestCases has become much flakier than normal across all boards, which seems like it is going to be reproduce (still passes most of the time). But that doesn't mean that bug is the same as this one.

I don't have a nautilus at my desk, but I'll look into borrowing one, since this bug seems to be happening every time.
Status: Started
nautilus borrowed. Putting it in dev mode and doing a local build now.
I synced to the manifest for R66-10425.0.0 as used by nautilus-release build 394, and built (./build_packages --withdev) and deployed the image.

   test_that ... cheets_CTS_N.7.1_r14.x86.CtsAccelerationTestCases

ran successfully twice.

I'll try running it a few more times, but that will be tomorrow.




Oops. I'll try the arm version. I made a bad assumption that since it was an x86 device I needed to run the "x86" test.
Yeah, as you might have guessed, "arm" is the version that goes through the binary translation that also matters.

Another thing I found is that before 10339 (#312), cheets_* tests look just skipped. So the problem might have been from the beginning.
I kicked off a repeat run yesterday with "test_that --board=nautilus $DUT_IP cheets_CTS_N.7.1_r14.arm.CtsAccelerationTestCases --iterations 100", and looked at the results today.

I found I had a weird cluster of failures. Iterations 1-30 all passed. 31 failed. 32 passed. 33 failed, 34 passed, 35 failed, and then 36-100 passed.

I was then able to look at the files in my chroot /tmp directory, which included ~200 "/tmp/tradefed_global_log_NNNNNNNNN" files with logcat output. Sorting the files in /tmp by modified time to locate the ones I was interested in.

One of the things that immediately jumped out after sorting was that the file /tmp/oauth2_client-tokencache.xxxxx  happened to be modified right before the failures happened. This seems like a pretty big coincidence, and might be what triggered it.

The actual failures are attached, along with a sample "good" log from an earlier run.

The three failures are all similar, and all three indicate an error with the ADB connection to the device:

02-22 21:56:05 D/CommandScheduler: Checking invocations...
02-22 21:56:08 D/AndroidNativeDevice: Device 100.127.25.254:22 state is now NOT_AVAILABLE
02-22 21:56:18 E/DeviceMonitor: Adb connection Error:EOF

which lead to a repeated string of errors trying to reconnect to the device before the run is aborted.


good.tradefed_global_log_5124437662720619620.txt
3.6 KB View Download
first.bad.tradefed_global_log_4232993286939924998.txt
30.9 KB View Download
second.bad.tradefed_global_log_4736322367399684692.txt
30.8 KB View Download
third.bad.tradefed_global_log_7232672786979923296.txt
30.4 KB View Download
Cc: jcliang@chromium.org wuchengli@chromium.org
Components: Platform>ARC
I'm wondering if this is caused by unavailable camera on Nautilus (b/72616688?)




For the "adb is not ready in 60 seconds" errors in non-CTS cheets_ tests,
the test is waiting for ArcAppLauncher to startup for 60 seconds, which is a response to the BOOT_COMPLETED broadcast.

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/179317888-chromeos-test/chromeos2-row8-rack9-host2/cheets_AndroidToChromeIntents/

// test DEBUG log
02/25 14:03:58.438 DEBUG|             utils:0214| Running 'android-sh -c 'logcat -d -b events *:S arc_system_event''
02/25 14:03:58.998 DEBUG|             utils:0214| Running 'android-sh -c 'logcat -d -b events *:S arc_system_event''
...
02/25 14:04:58.407 DEBUG|             utils:0214| Running 'android-sh -c 'logcat -d -b events *:S arc_system_event''
02/25 14:04:58.458 ERROR|             utils:2753| Will raise error TestFail('adb is not ready in 60 seconds.',) due to unexpected return: False

https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/common_lib/cros/arc.py?sq&q=arc_system_event+package:%5Echromeos_public$&l=271

    log = _android_shell(
        'logcat -d -b events *:S arc_system_event', ignore_status=True)
    return 'ArcAppLauncher:started' in log

But, the BOOT_COMPLETE event is stuck in com.android.camera2 because the camera service is not available...

// logcat in android-dumpstate.log
02-25 14:04:05.159    69   421 D ActivityManager: Sending BOOT_COMPLETE user #0
...
02-25 14:04:05.175    69   332 I ActivityManager: Start proc 1065:com.android.camera2/u0a43 for broadcast com.android.camera2/com.android.camera.DisableCameraReceiver
02-25 14:04:05.262  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:04:05.365  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:04:05.467  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:04:05.599  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:04:05.711  1065  1065 I ServiceManager: Waiting for service media.camera...
...
02-25 14:05:01.620  1065  1065 W ServiceManager: Service media.camera didn't start. Returning NULL
02-25 14:05:01.620  1065  1065 W CameraBase: CameraService not published, waiting...
02-25 14:05:02.121  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:05:02.221  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:05:02.322  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:05:02.422  1065  1065 I ServiceManager: Waiting for service media.camera...
02-25 14:05:02.523  1065  1065 I ServiceManager: Waiting for service media.camera...
...

and after, coincidentally, the 60 seconds of timeout limit, the ActivityManager gives up and sends the boradcast to other services including ArcAppLauncher,
but apparently it was too late.

...
02-25 14:05:05.170    69   104 W BroadcastQueue: Timeout of broadcast BroadcastRecord{feb38d9 u0 android.intent.action.BOOT_COMPLETED} - receiver=android.os.BinderProxy@34f829e, started 60006ms ago
02-25 14:05:05.170    69   104 W BroadcastQueue: Receiver during timeout: ResolveInfo{6daf17f com.android.camera2/com.android.camera.DisableCameraReceiver m=0x108000}
...
02-25 14:05:05.260   436   436 I ArcAppLauncherStarter: Event received: Intent { act=android.intent.action.BOOT_COMPLETED flg=0x9000010 cmp=org.chromium.arc.applauncher/.AppLauncherStarter (has extras) }.
02-25 14:05:05.263   354   354 I ArcBackupSettingsService: Starting service...
02-25 14:05:05.266   372   372 I ArcCastReceiverService: Starting service...
02-25 14:05:05.269   379   379 I ArcFileSystemService: Starting service...
02-25 14:05:05.274   449   449 I ArcIntentHelperBootCompletedReceiver: Starting the service...
02-25 14:05:05.276   449   449 I ArcIntentHelperService: Starting service...
02-25 14:05:05.279   429   429 I ArcTts  : Starting service...
02-25 14:05:05.305   449   449 I ArcIntentHelperService: Handling android.intent.action.BOOT_COMPLETED...
For a historical reason, CTS tests are waiting for a different event to detect ARC bootup,
https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/server/cros/tradefed_test.py#229
which is not affected. (In the log, BOOT_COMPLETED broadcast timeout is still observed.)


Even in the passing x86 CTS runs, the log records that the "testCollectDeviceInfo" step had crashed:

https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-02-01&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=&master_builder_name_number=&owner=&retry=&exclude_cts=false&exclude_non_production=true&hostname=&board=nautilus&test=cheets_CTS_N.7&exclude_not_run=false&build=%5ER66%5C-10438%5C.0%5C.0%24&status=GOOD&reason=&waterfall=&suite=&last_date=2018-02-28&exclude_non_release=true&exclude_au=true&model=%5Enautilus%24&view=list

02/25 13:46:45.125 DEBUG|             utils:0282| [stdout] 02-25 13:46:45 I/DeviceInfoCollector: Instrumenting package com.android.compatibility.common.deviceinfo:
02/25 13:53:30.086 DEBUG|             utils:0282| [stdout] 02-25 13:53:30 W/DeviceInfoCollector: Target preparation step testCollectDeviceInfo failed.
02/25 13:53:30.086 DEBUG|             utils:0282| [stdout] Test failed to run to completion. Reason: 'Instrumentation run failed due to 'Process crashed.''. Check device logcat for details

and the "crash" is a forced timeout of a test process waiting for the camera service again:

02-25 21:46:51.339  1557  1689 I TestRunner: started: testCollectDeviceInfo(com.android.cts.deviceinfo.CameraDeviceInfo)
02-25 21:46:51.345  1557  1557 I MonitoringInstr: Activities that are still in CREATED to STOPPED: 0
02-25 21:46:51.348  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 21:46:56.365  1557  1689 W ServiceManager: Service media.camera didn't start. Returning NULL
02-25 21:46:56.365  1557  1689 W CameraBase: CameraService not published, waiting...
02-25 21:46:56.865  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 21:46:56.965  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 21:46:57.066  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 21:46:57.166  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 21:46:57.267  1557  1689 I ServiceManager: Waiting for service media.camera...
...
02-25 13:52:44.413  1557  1689 W ServiceManager: Service media.camera didn't start. Returning NULL
02-25 13:52:44.413  1557  1689 W CameraBase: CameraService not published, waiting...
02-25 13:52:44.914  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:45.014  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:45.115  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:45.215  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:45.316  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:49.938  1557  1689 W ServiceManager: Service media.camera didn't start. Returning NULL
02-25 13:52:49.938  1557  1689 W CameraBase: CameraService not published, waiting...
02-25 13:52:50.439  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:50.539  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:50.639  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:50.740  1557  1689 I ServiceManager: Waiting for service media.camera...
02-25 13:52:51.339  1557  1571 E AndroidTestSuite: Failed to to stop test execution thread, the correctness of the test runner is at risk. Abort all execution!
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: Exception from the execution thread
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: java.util.concurrent.TimeoutException
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at java.util.concurrent.FutureTask.get(FutureTask.java:177)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at android.support.test.internal.runner.junit3.AndroidTestSuite.runTestWithTimeout(AndroidTestSuite.java:121)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at android.support.test.internal.runner.junit3.AndroidTestSuite.runTestsWithTimeout(AndroidTestSuite.java:80)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at android.support.test.internal.runner.junit3.AndroidTestSuite.run(AndroidTestSuite.java:66)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at android.support.test.internal.runner.junit3.JUnit38ClassRunner.run(JUnit38ClassRunner.java:103)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.Suite.runChild(Suite.java:128)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.Suite.runChild(Suite.java:27)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at android.support.test.internal.runner.TestExecutor.execute(TestExecutor.java:59)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at android.support.test.runner.AndroidJUnitRunner.onStart(AndroidJUnitRunner.java:272)
02-25 13:52:51.340  1557  1571 E AndroidTestSuite: 	at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1925)
--------- beginning of crash
02-25 13:52:51.341  1557  2252 E AndroidRuntime: FATAL EXCEPTION: Terminator

I'm not sure what's the difference between ARM and x86 CTSs, though.
Cc: edcourtney@chromium.org djacobo@chromium.org
+constables this week


The camera server process (in Android) is waiting forever at:

"cameraserver" sysTid=59
  #00 pc 00000a69  [vdso:f3607000] (__kernel_vsyscall+9)
  #01 pc 0002efa8  /system/lib/libc.so (syscall+40)
  #02 pc 00090da1  /system/lib/libc.so (pthread_cond_wait+97)
  #03 pc 000f0e37  /system/lib/libchrome.so (_ZN4base17ConditionVariable4WaitEv+39)
  #04 pc 000f1855  /system/lib/libchrome.so (_ZN4base13WaitableEvent14TimedWaitUntilERKNS_9TimeTicksE+341)
  #05 pc 000f16e3  /system/lib/libchrome.so (_ZN4base13WaitableEvent4WaitEv+67)
  #06 pc 0002ba3b  /system/lib/hw/camera.cheets_v3.so (_ZN7android9CameraHal20startCameraModuleIpcEv+1979)
  #07 pc 0002afa8  /system/lib/hw/camera.cheets_v3.so (_ZN7android9CameraHal11initMojoIpcEv+136)
  #08 pc 0002aefb  /system/lib/hw/camera.cheets_v3.so (_ZN7android9CameraHalC1Ev+427)
  #09 pc 0002d707  /system/lib/hw/camera.cheets_v3.so
  #10 pc 000751a9  /system/lib/libcameraservice.so (_ZN7android12CameraModule4initEv+121)
  #11 pc 00056104  /system/lib/libcameraservice.so (_ZN7android13CameraService10onFirstRefEv+260)
  #12 pc 00056d25  /system/lib/libcameraservice.so (_ZTv0_n16_N7android13CameraService10onFirstRefEv+37)
  #13 pc 0000d712  /system/lib/libutils.so (_ZNK7android7RefBase9incStrongEPKv+66)
  #14 pc 00000af6  /system/bin/cameraserver
  #15 pc 000009d5  /system/bin/cameraserver
  #16 pc 00016dba  /system/lib/libc.so (__libc_init+90)
  #17 pc 00000862  /system/bin/cameraserver


So, according to my guess #8, if we slightly increase the timeout of this line
https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/common_lib/cros/arc.py#30
to 80sec etc, non-camera tests will probably pass. But since camera related tests (including CTS) are left affected, it won't unblock the release builder anyway. Real fix will be needed here.
The camera HAL could not find any cameras on the DUT. I checked a nautilus DUT in the lab that failed the CTS test and it has firmware 10300.0.0, which is probably too old and did not have the camera module enabled.

The firmware updater in latest canary image has 10406.0.0 firmware. I've verified on my Nautilus that with 10300.0.0 the camera is not probed, and with 10406.0.0 the camera works. We should update the nautilus DUTs in the lab to firmware 10406.0.0 as well.
Cc: furquan@chromium.org
+furquan@ (as the firmware expert, who helped us Soraka lab firmware uprev (bug 809156)),
from the above reasons I'm thinking about requesting Nautilus firmware uprev in the lab (to 10406.0.0).
Let me know beforehand, if there are any known issues around the recent-ish firmware versions.


Re #11:
OK, thanks. I'll then try to file a firmware uprev ticket (which may be a bit tricky because the usual step
https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/firmware-updates
requires us to specify a green BVT image with a necessary firmware built-in... but the chicken&egg here is that exactly because of this bug we have had never passed BVT recently :p. We'll be able to argue though.)
Owner: kinaba@chromium.org
Re#12: It should be okay to uprev the firmware to 10406.0.0. kinaba@ can you please raise a request for the same.
Cc: ejcaruso@chromium.org matthewmwang@chromium.org
Blockedon: 816785
Filed bug 816785 for the firmware uprev request.
Status: Fixed
The firmware roll had happened, and at least it passed the build once:
https://uberchromegw.corp.google.com/i/chromeos/builders/nautilus-release


Still seeing red builds often, but they are marked as
** HWTest did not complete due to infrastructure issues (code 3) **
not the adb timeout issues anymore. Closing the current bug. If the next issue persists, let's file a new bug and start a new investigation.

Sign in to add a comment