Nexus 5X bot failing to capture screenshot after switch to android-chromium and Release mode |
|||||
Issue descriptionThis swarmed Nexus 5X bot: https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20(Nexus%205X) is failing most tests after a couple of configuration changes: - Running android-chromium instead of android-content-shell - Running a Release build instead of Debug build I'm not sure which one, or if both, broke things. Basically all of the tests that capture screenshots are broken. The call from Telemetry to DevTools to capture the screenshot is returning None. The same tests are passing on other bots -- for example, the Nexus 5: https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20(Nexus%205) . Therefore screenshots can't be completely broken with ChromePublic.apk. This is the only Android GPU bot running these tests via Swarming, so it's possible that the deps or data_deps for e.g. src/tools/perf/chrome_telemetry_build/BUILD.gn are wrong. In particular, it's surprising to me they don't depend on bitmaptools. I would expect better logging if a key component like that were missing and that was why the screenshot was failing.
,
Jun 9 2016
Ken: I think that's because each of these hosts machine are hook up with 7 devices.
,
Jun 9 2016
I built the telemetry_gpu_test_run target with the gn args: dcheck_always_on = true ffmpeg_branding = "Chrome" goma_dir = "/b/build/slave/cache/cipd/goma" is_component_build = false is_debug = false proprietary_codecs = true symbol_level = 1 target_cpu = "arm64" target_os = "android" use_goma = true from: https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5/steps/generate_build_files/logs/stdio and ran one of the affected tests: ./content/test/gpu/run_gpu_test.py gpu_rasterization --browser=android-chromium Visibly, the test failed to navigate to the target tab; it was stuck on about:blank forever. I didn't wait 5 minutes for it to time out. Something seems broken with Telemetry's page navigation on this device, though this doesn't look exactly like the failure mode on the bots.
,
Jun 9 2016
Building content_shell_apk with the same GN args and running: ./content/test/gpu/run_gpu_test.py gpu_rasterization --browser=android-content-shell fails too on my device, with Android reporting a couple of times that ContentShell has crashed. Maybe the failure was caused by the switch from Debug to Release+Asserts.
,
Jun 9 2016
On my Nexus 5X "./content/test/gpu/run_gpu_test.py gpu_rasterization --browser=android-chrome" built with the args in #3 passes.
,
Jun 9 2016
Looks like I didn't configure the proper environment for run_gpu_test to find my compiled browser, so it ran with a stock one and didn't give me "--browser=android-chromium" option. Now, I configured it correctly, and the test still passes with --browser=android-chromium
,
Jun 9 2016
FWIW, "./content/test/gpu/run_gpu_test.py gpu_rasterization --browser=android-content-shell" also passed for me. I think the problem may be because of old versions of Chrome on swarmed devices not cleaned.
,
Jun 9 2016
./content/test/gpu/run_gpu_test.py maps --browser=android-chromium also passed on my device. Seems like there is something different in swarmed devices configuration, which causes capturing a screenshot to fail.
,
Jun 9 2016
All right! I got "Failure: Could not capture screenshot" on my device as well! It happens when the screen is off. So, the solution should be to keep swarmed devices awake.
,
Jun 9 2016
In that case, this is likely an issue with how swarming sets up its devices.
,
Jun 9 2016
I'll configure swarming to turn device screens on right before a task. I'll also look into killing any chrome-related processes before a task.
,
Jun 9 2016
Thanks Ben.
,
Jun 9 2016
https://chromium.googlesource.com/chromium/src/+/master/docs/android_test_instructions.md suggest that: You MUST ensure that the screen stays on while testing: adb shell svc power stayon usb Or do this manually on the device: Settings -> Developer options -> Stay Awake. The same doc also suggests for instumentation tests to: In order to run instrumentation tests, you must leave your device screen ON and UNLOCKED. Otherwise, the test will timeout trying to launch an intent. Optionally you can disable screen lock under Settings -> Security -> Screen Lock -> None. Makes sense to check this as well, while at it.
,
Jun 9 2016
Something else I've found in that document is adb shell setprop debug.assert 1 Makes sense to do this as well, since we rely now on asserts in Release builds.
,
Jun 9 2016
IIRC that theoretically enables java asserts but doesn't actually do anything on ART.
,
Jun 9 2016
Aha! With content-shell screenshot capture succeeds even when screen is off. That explains why switching from content-shell to chromium has triggered the failures. Perhaps it would be worthwhile to be able to do screenshot capture when screen is off in chromium as well. kbr@, I trust you can find an owner for that?
,
Jun 10 2016
Re #15, then I guess it's best not to enable debug.assert. It reported some weird errors for me when I tried it locally.
,
Jun 10 2016
FYI: the reason the Telemetry tests are hanging on my device with --browser=android-chromium seems to be that my device isn't rooted. They're working fine on a rooted Nexus 5X. It would be really helpful if we could force the screens on these devices to be kept awake per #13. This would get our tests green again. Is there a possibility of this being done in the short term? Thanks.
,
Jun 10 2016
John: I remember there maybe some devil API to check if the device's screen is off & enable it on? If so, we can add s.t to telemetry to make sure that device's screen is always on during the test time.
,
Jun 10 2016
Non-swarming bots currently handle all of this down in provision_devices.py (or stuff it calls) on the chromium side. The solution here is to port logic from there over to swarming (again). The relevant part in this case is the DETERMINISTIC_DEVICE_SETTINGS logic here: https://chromium.googlesource.com/chromium/src/+/master/build/android/provision_devices.py#271
,
Jun 10 2016
specifically, these are probably the settings we want for this specifically: stay_on_while_plugged_in: https://chromium.googlesource.com/chromium/src/+/master/build/android/pylib/device_settings.py#159 lockscreen_disabled: https://chromium.googlesource.com/chromium/src/+/master/build/android/pylib/device_settings.py#173, https://chromium.googlesource.com/chromium/src/+/master/build/android/pylib/device_settings.py#184 (note that those are in different tables) screensaver_enabled: https://chromium.googlesource.com/chromium/src/+/master/build/android/pylib/device_settings.py#175 ... though I imagine some of the others may be useful as well.
,
Jun 10 2016
I'll start porting some of those over to swarming's before task hook. I'll need to test around locally to see if these settings can take effect without a device reboot. If not, we may need to just permanently turn on/unlock screens since we can't afford to reboot devices before every task.
,
Jun 10 2016
Yeah, not sure about that. I know some of the things we set need a reboot to take effect, but not all. If they do need a reboot, perhaps we could add the settings wherever we do the periodic device reboots?
,
Jun 10 2016
Looks like we can disable the phone from automatically locking the screen after a period of idleness with lockscreen.disabled, but it still boots into the lockscreen. We can get out of it with 'input keyevent 82' but I'd rather disable it all together. Time to start diving into these sqlite tables on the phone...
,
Jun 10 2016
Random thought: I wonder if we're only seeing this issue on N5Xs because they've never run provision_devices before, whose effects seem to persist through reboots. Our N5s, on the other hand, were buildbot devices in another life, and so have all had provision run on them many times. Hence why their screens are always on/unlocked. If we wanted an super immediate fix, we could isolate provision_devices.py and run it on once on all bots with device_type == bullhead. But still, I'd rather add the necessary logic to swarming's setup code.
,
Jun 10 2016
Hm, that could be, and if that's the case a run of provision_devices.py would solve the issue in the short term. Given that we want to occasionally factory-reset or flash devices in the future, we should definitely ensure that swarming handles this, though.
,
Jun 10 2016
Looks like the lockscreen.disabled row in locksettings.db did the trick. (Also nuking that entire database also seems to work. I'm looking for a reason not to just 'rm /data/system/locksettings.db' and am having a hard time finding one.) Additionally, these settings need a reboot to take effect, so this'll need to be done at bot_startup which later reboots all devices.
,
Jun 13 2016
How long will it take to deploy this update to the swarming pool? If it will take more than a couple of days then perhaps we should switch these tests back to using content_shell to get them green again.
,
Jun 13 2016
https://chromereviews.googleplex.com/448847013/ No reason not to get that committed today. I'll do some pinging.
,
Jun 14 2016
Could I please ask for a status update on this? Can these configuration changes land today? If not I want to switch the bots back to content_shell to get them green again.
,
Jun 14 2016
Sorry for the delay. This is fixed with the CL I already mentioned. I'll try to get it landed today in between troopering once I get an owner's lgtm, but if you absolutely can't wait then go ahead and make your test changes.
,
Jun 14 2016
I'd strongly prefer to push forward with your CL https://chromereviews.googleplex.com/448847013/ . Please tell me if you need help getting reviews. Thanks.
,
Jun 15 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal.git/+/6cd163d625fe21578e8b1c96bbf870530510880f commit 6cd163d625fe21578e8b1c96bbf870530510880f Author: bpastene <bpastene@google.com> Date: Wed Jun 15 00:56:25 2016
,
Jun 15 2016
The bot is still red. Is there something else needed (bot restart?) for the change in #33 to be applied?
,
Jun 15 2016
It needs to be deployed to prod. I'll do that now.
,
Jun 15 2016
Screens should be on for good now. Picking a random bot: ~/adb -s 00b9d4ce76671554 shell dumpsys input_method | grep mInteractive mSystemReady=true mInteractive=true A consequence of this is that the devices now run warmer and are getting quarantined for over heating: http://shortn/_8V6AD5e18g I may need to increase the maximum allowed temperature.
,
Jun 15 2016
Thanks a lot! I see that build 172 doesn't have screenshot capturing problem. Tests still fail, though. When I ran it locally, I got a similar problem when screen was rotated. Could you please configure the devices to stay in portrait view when the device is rotated? And for temperature problem - maybe setting brightness to minimum will help?
,
Jun 15 2016
Yeah, it's already at the dimmest I could bring it. I think we just have to bump up the threshold a bit. The battery temperatures are all unaffected, and that's where we really care about temps, so bumping only for non-battery sensors seems fine to me. As for the screen orientation, I'll add that to the setup as well after the temperature issues has been sorted out. Probably just have to play with the accelerometer: https://codesearch.chromium.org/chromium/src/build/android/pylib/device_settings.py?rcl=0&l=180 Note that all the phones in our labs are laying on their side horizontally, so the N5Xs are all probably in landscape mode at the moment. I'll get to that.
,
Jun 15 2016
,
Jun 15 2016
Thanks Ben for solving the primary problem with the screens being disabled. The landscape mode issue is still a significant problem. Two of our tests are failing because of it. Could this please be prioritized?
,
Jun 16 2016
Now that I'm no longer troopering, I can spend more time on this.
,
Jun 17 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal.git/+/bbc7b17675046c02c9bb1d39ce4a2a8dd5119edc commit bbc7b17675046c02c9bb1d39ce4a2a8dd5119edc Author: bpastene <bpastene@google.com> Date: Fri Jun 17 19:47:31 2016
,
Jun 17 2016
Screen orientation has been pushed out, and thermal threshold raised. Let me know what the next challenge is :)
,
Jun 17 2016
Thanks, https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/214 is mostly green! Only WebglConformance.conformance_textures_misc_tex_image_and_uniform_binding_bugs failed, but I think it's a test problem (also failed on Nexus 6). I see that some bots are still quarantined (27), but I guess it's not possible to do something about them, as battery temperature is too high? I think this bug is fixed, and we should open new bugs if more issues pop up. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by kbr@chromium.org
, Jun 9 2016