nautilus-release fails on cheets test ( adb is not ready in 60 seconds.) |
|||||||||
Issue descriptionIt's happened in 3 out of that last 4 builds: https://luci-milo.appspot.com/buildbot/chromeos/nautilus-release/393 (the other build failed with an infra issue) " [Test-Logs]: cheets_CTS_N.CtsAccelerationTestCases: ERROR: Command </tmp/autotest-tradefed-install_ulPPTA/fad41a1ed49f9b27075a6dea983c3c70/android-cts-7.1_r14-linux_x86-arm/android-cts/tools/cts-tradefed run commandAndExit cts --module CtsAccelerationTestCases --precondition-arg skip-media-download --disable-reboot --logcat-on-failure> failed, rc=-15, Command(s) did not complete within 360 seconds [Test-Logs]: cheets_CTS_N.CtsDramTestCases: ERROR: Command </tmp/autotest-tradefed-install_czw4sQ/fad41a1ed49f9b27075a6dea983c3c70/android-cts-7.1_r14-linux_x86-arm/android-cts/tools/cts-tradefed run commandAndExit cts --module CtsDramTestCases --precondition-arg skip-media-download --disable-reboot --logcat-on-failure> failed, rc=-15, Command(s) did not complete within 360 seconds [Test-Logs]: cheets_CameraOrientation: retry_count: 2, FAIL: adb is not ready in 60 seconds. [Test-Logs]: cheets_ContainerSmokeTest: FAIL: adb is not ready in 60 seconds. [Test-Logs]: cheets_NotificationTest: FAIL: adb is not ready in 60 seconds. " Assigning to the constable for triage
,
Feb 21 2018
nautilus borrowed. Putting it in dev mode and doing a local build now.
,
Feb 22 2018
I synced to the manifest for R66-10425.0.0 as used by nautilus-release build 394, and built (./build_packages --withdev) and deployed the image. test_that ... cheets_CTS_N.7.1_r14.x86.CtsAccelerationTestCases ran successfully twice. I'll try running it a few more times, but that will be tomorrow.
,
Feb 22 2018
Looking at https://stainless.corp.google.com/search?view=matrix&row=build&col=model&first_date=2018-02-01&last_date=2018-02-28&test=cheets_CTS_N.7&board=nautilus&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true cheets_CTS_N.7.1_r14.x86.* looks stably passing while cheets_CTS_N.7.1_r14.arm.* is more affected
,
Feb 22 2018
Oops. I'll try the arm version. I made a bad assumption that since it was an x86 device I needed to run the "x86" test.
,
Feb 22 2018
Yeah, as you might have guessed, "arm" is the version that goes through the binary translation that also matters. Another thing I found is that before 10339 (#312), cheets_* tests look just skipped. So the problem might have been from the beginning.
,
Feb 23 2018
I kicked off a repeat run yesterday with "test_that --board=nautilus $DUT_IP cheets_CTS_N.7.1_r14.arm.CtsAccelerationTestCases --iterations 100", and looked at the results today. I found I had a weird cluster of failures. Iterations 1-30 all passed. 31 failed. 32 passed. 33 failed, 34 passed, 35 failed, and then 36-100 passed. I was then able to look at the files in my chroot /tmp directory, which included ~200 "/tmp/tradefed_global_log_NNNNNNNNN" files with logcat output. Sorting the files in /tmp by modified time to locate the ones I was interested in. One of the things that immediately jumped out after sorting was that the file /tmp/oauth2_client-tokencache.xxxxx happened to be modified right before the failures happened. This seems like a pretty big coincidence, and might be what triggered it. The actual failures are attached, along with a sample "good" log from an earlier run. The three failures are all similar, and all three indicate an error with the ADB connection to the device: 02-22 21:56:05 D/CommandScheduler: Checking invocations... 02-22 21:56:08 D/AndroidNativeDevice: Device 100.127.25.254:22 state is now NOT_AVAILABLE 02-22 21:56:18 E/DeviceMonitor: Adb connection Error:EOF which lead to a repeated string of errors trying to reconnect to the device before the run is aborted.
,
Feb 26 2018
I'm wondering if this is caused by unavailable camera on Nautilus (b/72616688?) For the "adb is not ready in 60 seconds" errors in non-CTS cheets_ tests, the test is waiting for ArcAppLauncher to startup for 60 seconds, which is a response to the BOOT_COMPLETED broadcast. https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/179317888-chromeos-test/chromeos2-row8-rack9-host2/cheets_AndroidToChromeIntents/ // test DEBUG log 02/25 14:03:58.438 DEBUG| utils:0214| Running 'android-sh -c 'logcat -d -b events *:S arc_system_event'' 02/25 14:03:58.998 DEBUG| utils:0214| Running 'android-sh -c 'logcat -d -b events *:S arc_system_event'' ... 02/25 14:04:58.407 DEBUG| utils:0214| Running 'android-sh -c 'logcat -d -b events *:S arc_system_event'' 02/25 14:04:58.458 ERROR| utils:2753| Will raise error TestFail('adb is not ready in 60 seconds.',) due to unexpected return: False https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/common_lib/cros/arc.py?sq&q=arc_system_event+package:%5Echromeos_public$&l=271 log = _android_shell( 'logcat -d -b events *:S arc_system_event', ignore_status=True) return 'ArcAppLauncher:started' in log But, the BOOT_COMPLETE event is stuck in com.android.camera2 because the camera service is not available... // logcat in android-dumpstate.log 02-25 14:04:05.159 69 421 D ActivityManager: Sending BOOT_COMPLETE user #0 ... 02-25 14:04:05.175 69 332 I ActivityManager: Start proc 1065:com.android.camera2/u0a43 for broadcast com.android.camera2/com.android.camera.DisableCameraReceiver 02-25 14:04:05.262 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:04:05.365 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:04:05.467 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:04:05.599 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:04:05.711 1065 1065 I ServiceManager: Waiting for service media.camera... ... 02-25 14:05:01.620 1065 1065 W ServiceManager: Service media.camera didn't start. Returning NULL 02-25 14:05:01.620 1065 1065 W CameraBase: CameraService not published, waiting... 02-25 14:05:02.121 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:05:02.221 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:05:02.322 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:05:02.422 1065 1065 I ServiceManager: Waiting for service media.camera... 02-25 14:05:02.523 1065 1065 I ServiceManager: Waiting for service media.camera... ... and after, coincidentally, the 60 seconds of timeout limit, the ActivityManager gives up and sends the boradcast to other services including ArcAppLauncher, but apparently it was too late. ... 02-25 14:05:05.170 69 104 W BroadcastQueue: Timeout of broadcast BroadcastRecord{feb38d9 u0 android.intent.action.BOOT_COMPLETED} - receiver=android.os.BinderProxy@34f829e, started 60006ms ago 02-25 14:05:05.170 69 104 W BroadcastQueue: Receiver during timeout: ResolveInfo{6daf17f com.android.camera2/com.android.camera.DisableCameraReceiver m=0x108000} ... 02-25 14:05:05.260 436 436 I ArcAppLauncherStarter: Event received: Intent { act=android.intent.action.BOOT_COMPLETED flg=0x9000010 cmp=org.chromium.arc.applauncher/.AppLauncherStarter (has extras) }. 02-25 14:05:05.263 354 354 I ArcBackupSettingsService: Starting service... 02-25 14:05:05.266 372 372 I ArcCastReceiverService: Starting service... 02-25 14:05:05.269 379 379 I ArcFileSystemService: Starting service... 02-25 14:05:05.274 449 449 I ArcIntentHelperBootCompletedReceiver: Starting the service... 02-25 14:05:05.276 449 449 I ArcIntentHelperService: Starting service... 02-25 14:05:05.279 429 429 I ArcTts : Starting service... 02-25 14:05:05.305 449 449 I ArcIntentHelperService: Handling android.intent.action.BOOT_COMPLETED...
,
Feb 26 2018
For a historical reason, CTS tests are waiting for a different event to detect ARC bootup, https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/server/cros/tradefed_test.py#229 which is not affected. (In the log, BOOT_COMPLETED broadcast timeout is still observed.) Even in the passing x86 CTS runs, the log records that the "testCollectDeviceInfo" step had crashed: https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-02-01&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=&master_builder_name_number=&owner=&retry=&exclude_cts=false&exclude_non_production=true&hostname=&board=nautilus&test=cheets_CTS_N.7&exclude_not_run=false&build=%5ER66%5C-10438%5C.0%5C.0%24&status=GOOD&reason=&waterfall=&suite=&last_date=2018-02-28&exclude_non_release=true&exclude_au=true&model=%5Enautilus%24&view=list 02/25 13:46:45.125 DEBUG| utils:0282| [stdout] 02-25 13:46:45 I/DeviceInfoCollector: Instrumenting package com.android.compatibility.common.deviceinfo: 02/25 13:53:30.086 DEBUG| utils:0282| [stdout] 02-25 13:53:30 W/DeviceInfoCollector: Target preparation step testCollectDeviceInfo failed. 02/25 13:53:30.086 DEBUG| utils:0282| [stdout] Test failed to run to completion. Reason: 'Instrumentation run failed due to 'Process crashed.''. Check device logcat for details and the "crash" is a forced timeout of a test process waiting for the camera service again: 02-25 21:46:51.339 1557 1689 I TestRunner: started: testCollectDeviceInfo(com.android.cts.deviceinfo.CameraDeviceInfo) 02-25 21:46:51.345 1557 1557 I MonitoringInstr: Activities that are still in CREATED to STOPPED: 0 02-25 21:46:51.348 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 21:46:56.365 1557 1689 W ServiceManager: Service media.camera didn't start. Returning NULL 02-25 21:46:56.365 1557 1689 W CameraBase: CameraService not published, waiting... 02-25 21:46:56.865 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 21:46:56.965 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 21:46:57.066 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 21:46:57.166 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 21:46:57.267 1557 1689 I ServiceManager: Waiting for service media.camera... ... 02-25 13:52:44.413 1557 1689 W ServiceManager: Service media.camera didn't start. Returning NULL 02-25 13:52:44.413 1557 1689 W CameraBase: CameraService not published, waiting... 02-25 13:52:44.914 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:45.014 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:45.115 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:45.215 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:45.316 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:49.938 1557 1689 W ServiceManager: Service media.camera didn't start. Returning NULL 02-25 13:52:49.938 1557 1689 W CameraBase: CameraService not published, waiting... 02-25 13:52:50.439 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:50.539 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:50.639 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:50.740 1557 1689 I ServiceManager: Waiting for service media.camera... 02-25 13:52:51.339 1557 1571 E AndroidTestSuite: Failed to to stop test execution thread, the correctness of the test runner is at risk. Abort all execution! 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: Exception from the execution thread 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: java.util.concurrent.TimeoutException 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at java.util.concurrent.FutureTask.get(FutureTask.java:177) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at android.support.test.internal.runner.junit3.AndroidTestSuite.runTestWithTimeout(AndroidTestSuite.java:121) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at android.support.test.internal.runner.junit3.AndroidTestSuite.runTestsWithTimeout(AndroidTestSuite.java:80) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at android.support.test.internal.runner.junit3.AndroidTestSuite.run(AndroidTestSuite.java:66) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at android.support.test.internal.runner.junit3.JUnit38ClassRunner.run(JUnit38ClassRunner.java:103) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.Suite.runChild(Suite.java:128) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.Suite.runChild(Suite.java:27) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runner.JUnitCore.run(JUnitCore.java:137) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at org.junit.runner.JUnitCore.run(JUnitCore.java:115) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at android.support.test.internal.runner.TestExecutor.execute(TestExecutor.java:59) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at android.support.test.runner.AndroidJUnitRunner.onStart(AndroidJUnitRunner.java:272) 02-25 13:52:51.340 1557 1571 E AndroidTestSuite: at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1925) --------- beginning of crash 02-25 13:52:51.341 1557 2252 E AndroidRuntime: FATAL EXCEPTION: Terminator I'm not sure what's the difference between ARM and x86 CTSs, though.
,
Feb 26 2018
+constables this week The camera server process (in Android) is waiting forever at: "cameraserver" sysTid=59 #00 pc 00000a69 [vdso:f3607000] (__kernel_vsyscall+9) #01 pc 0002efa8 /system/lib/libc.so (syscall+40) #02 pc 00090da1 /system/lib/libc.so (pthread_cond_wait+97) #03 pc 000f0e37 /system/lib/libchrome.so (_ZN4base17ConditionVariable4WaitEv+39) #04 pc 000f1855 /system/lib/libchrome.so (_ZN4base13WaitableEvent14TimedWaitUntilERKNS_9TimeTicksE+341) #05 pc 000f16e3 /system/lib/libchrome.so (_ZN4base13WaitableEvent4WaitEv+67) #06 pc 0002ba3b /system/lib/hw/camera.cheets_v3.so (_ZN7android9CameraHal20startCameraModuleIpcEv+1979) #07 pc 0002afa8 /system/lib/hw/camera.cheets_v3.so (_ZN7android9CameraHal11initMojoIpcEv+136) #08 pc 0002aefb /system/lib/hw/camera.cheets_v3.so (_ZN7android9CameraHalC1Ev+427) #09 pc 0002d707 /system/lib/hw/camera.cheets_v3.so #10 pc 000751a9 /system/lib/libcameraservice.so (_ZN7android12CameraModule4initEv+121) #11 pc 00056104 /system/lib/libcameraservice.so (_ZN7android13CameraService10onFirstRefEv+260) #12 pc 00056d25 /system/lib/libcameraservice.so (_ZTv0_n16_N7android13CameraService10onFirstRefEv+37) #13 pc 0000d712 /system/lib/libutils.so (_ZNK7android7RefBase9incStrongEPKv+66) #14 pc 00000af6 /system/bin/cameraserver #15 pc 000009d5 /system/bin/cameraserver #16 pc 00016dba /system/lib/libc.so (__libc_init+90) #17 pc 00000862 /system/bin/cameraserver So, according to my guess #8, if we slightly increase the timeout of this line https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/common_lib/cros/arc.py#30 to 80sec etc, non-camera tests will probably pass. But since camera related tests (including CTS) are left affected, it won't unblock the release builder anyway. Real fix will be needed here.
,
Feb 26 2018
The camera HAL could not find any cameras on the DUT. I checked a nautilus DUT in the lab that failed the CTS test and it has firmware 10300.0.0, which is probably too old and did not have the camera module enabled. The firmware updater in latest canary image has 10406.0.0 firmware. I've verified on my Nautilus that with 10300.0.0 the camera is not probed, and with 10406.0.0 the camera works. We should update the nautilus DUTs in the lab to firmware 10406.0.0 as well.
,
Feb 26 2018
+furquan@ (as the firmware expert, who helped us Soraka lab firmware uprev (bug 809156)), from the above reasons I'm thinking about requesting Nautilus firmware uprev in the lab (to 10406.0.0). Let me know beforehand, if there are any known issues around the recent-ish firmware versions. Re #11: OK, thanks. I'll then try to file a firmware uprev ticket (which may be a bit tricky because the usual step https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/firmware-updates requires us to specify a green BVT image with a necessary firmware built-in... but the chicken&egg here is that exactly because of this bug we have had never passed BVT recently :p. We'll be able to argue though.)
,
Feb 26 2018
,
Feb 26 2018
Re#12: It should be okay to uprev the firmware to 10406.0.0. kinaba@ can you please raise a request for the same.
,
Feb 26 2018
,
Feb 27 2018
,
Feb 27 2018
Filed bug 816785 for the firmware uprev request.
,
Mar 5 2018
The firmware roll had happened, and at least it passed the build once: https://uberchromegw.corp.google.com/i/chromeos/builders/nautilus-release Still seeing red builds often, but they are marked as ** HWTest did not complete due to infrastructure issues (code 3) ** not the adb timeout issues anymore. Closing the current bug. If the next issue persists, let's file a new bug and start a new investigation. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by lpique@chromium.org
, Feb 21 2018