desktopui_MashLogin / ui.MashLogin failures on nyan boards |
|||||
Issue descriptionThe test fails about 50% of the time on several nyan variant boards: https://wmatrix.googleplex.com/platform/unfiltered?hide_missing=True&tests=desktopui_MashLogin&days_back=14&releases=tot&platforms=nyan_kitty Failure reason is: "Autotest client terminated unexpectedly: DUT rebooted during the test run." Generally the first part of the test (launch chrome and wait for OOBE) succeeds. During the second part (re-launch chrome and log in) the device reboots. Kernel logs are truncated with zeroes (e.g. /var/log/messages). There are no available chrome core or dmp files. These machines don't reboot when they run other tests. We're doing something different in ours. CC-ing some people in case they have ideas about how our code might be different (in graphics init) that could cause a device to reboot.
,
May 2 2017
If Chrome is looping, /var/log/messages should contain kernel messages about how it's exiting and session_manager messages about restarting Chrome. /var/log/ui contains output generated by Chrome before it initializes a logfile in /var/log/chrome.
,
May 3 2017
I have a nyan_big and have reproduced it locally as well. The bug reproduces without running any auto tests. 1. Boot nyan_big in --mash mode. 2. Log in with a user. 3. Shelf and Chrome window opens for the logged in user. 4. After a few seconds Chrome crashes without any stack traces and the screen goes black. 5. Chrome restarts and we're back to logged in user. 6. Chrome crashes again after a few seconds 7. Repeat until session manager reboots the device. The /var/log/chrome/ and /var/log/ui/ logs are size 0 when this happens. There is a bit of information in /var/log/system about the crash 2017-05-03T11:17:55.819292-04:00 WARNING crash_reporter[4106]: [user] Received crash notification for chrome[1029] sig 11, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-05-03T11:17:55.875915-04:00 WARNING kernel: [ 265.880567] gk20a 57000000.gk20a: state restore latency exceeded, new value 31727417 ns 2017-05-03T11:17:56.665918-04:00 WARNING kernel: [ 266.661829] gk20a: Power-off latency exceeded, new value 90917 ns 2017-05-03T11:17:56.951270-04:00 INFO session_manager[906]: [INFO:child_exit_handler.cc(77)] Handling 969 exit. 2017-05-03T11:17:56.951754-04:00 ERR session_manager[906]: [ERROR:child_exit_handler.cc(79)] Exited with exit code 1 2017-05-03T11:17:56.952048-04:00 INFO session_manager[906]: [INFO:session_manager_service.cc(274)] Exiting process is chrome. 2017-05-03T11:17:56.952478-04:00 INFO session_manager[906]: [INFO:browser_job.cc(149)] Terminating process group: Ensuring browser processes are gone. 2017-05-03T11:17:56.952747-04:00 INFO session_manager[906]: [INFO:system_utils_impl.cc(110)] Sending 9 to -969 as 1000 2017-05-03T11:17:56.954519-04:00 INFO session_manager[906]: [INFO:browser_job.cc(140)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=26.0.0.89 --ppapi-flash-args=enable_hw_video_decode=0,enable_hw_video_decode_ave=0 --ui-prioritize-in-gpu-process --use-gl=egl --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/oem_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/oem_small.jpg --default-wallpaper-is-oem --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-prefixed-encrypted-media --enable-consumer-kiosk --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --mash --login-user=kylechartest@gmail.com --login-profile=d8655581912dc5b6d0e5ba09cd811bd8172f396b --vmodule=*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 Total wild speculation on my part is that maybe we are corrupting something in the kernel?
,
May 3 2017
That seems unlikely to me. Can you get a symbolized Chrome onto the device and attach gdb to the browser process to get a stack trace from the segfault?
,
May 3 2017
Is it possible to check what configs are different in nyan_* from other boards that may affect ozone/mus
,
May 3 2017
(I believe nyan was one of the last boards to switch to ozone?)
,
May 3 2017
Also, is this an official build (like a downloaded test image) or a build you built yourself with simplechrome? sig 11 is segv, so that sounds like a routine chrome crash. Deploying a symbolized build should get you a backtrace to /var/log/ui. You might want to check the pids / command line to make sure it's the browser process that is crashing. Also, I think rockot@ added something recently where command lines show the service name, which might be helpful. Aside: I usually deploy with some symbols like this: deploy_chrome --build-dir=out_$SDK_BOARD/Release --to=172.18.37.39 --target-dir=/usr/local/chrome --mount-dir=/opt/google/chrome --strip-flags '-w -K "!*WebCore*"' Be aware that if you use a non-official build then the in-process stack dumping means chrome will exit(1) after dumping the stack and crash_reporter won't run. Thanks for digging into this!
,
May 3 2017
I'm getting some weird graphics glitches on nyan_big running classic ash with a test image. This is both with a test image and with a binary I deployed (although what is glitched tends to vary). I'm trying to figure out what is going on and if this could be related to the crashing autotest. I did have a symbolized binary on it and I still didn't get any backtraces in /var/log/ui or /var/log/chrome when it crashes. I'll try attaching gdb to see if that gets anything.
,
May 4 2017
If you don't make significant progress in a day or two of debugging I would put this issue on hold and move on. You can blacklist nyan devices in the autotest python file like we do for alex/zgb/etc. I think spending time on getting us into bvt-cq on peach_pit and adding a test for chrome --mus are more important.
,
May 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/03b065c2de97714e696e40ef6ceb40f5c0554a45 commit 03b065c2de97714e696e40ef6ceb40f5c0554a45 Author: kylechar <kylechar@chromium.org> Date: Fri May 12 05:27:47 2017 autotest: Skip nyan_* boards for mus/mash autotests. The desktopui_MusLogin and desktopui_MashLogin tests are failing consistently on nyan_* boards. Disable the tests on these boards temporarily to avoid spamming our bug alias. BUG=chromium:717275 TEST=Checked autotest is skipped on nyan_big. Change-Id: Ib42c02aea2c3e5dc2598f5b05db4554b44366038 Reviewed-on: https://chromium-review.googlesource.com/501470 Commit-Ready: Kyle Charbonneau <kylechar@chromium.org> Tested-by: Kyle Charbonneau <kylechar@chromium.org> Reviewed-by: James Cook <jamescook@chromium.org> [modify] https://crrev.com/03b065c2de97714e696e40ef6ceb40f5c0554a45/client/site_tests/desktopui_MashLogin/desktopui_MashLogin.py [modify] https://crrev.com/03b065c2de97714e696e40ef6ceb40f5c0554a45/client/site_tests/desktopui_MusLogin/desktopui_MusLogin.py
,
Jun 13 2017
I'm not actively working on this.
,
Feb 26 2018
,
Apr 19 2018
,
Jan 10
This is now happening on ui.MashLogin (the tast test) on nyan_big. There are chrome dmp files with crashes in the nvidia driver, although the PIDs don't match up with the reported error: "Chrome login failed: OOBE not dismissed: browser process 5786 exited" https://stainless.corp.google.com/browse/chromeos-autotest-results/275346224-chromeos-test/ Crash reason: SIGSEGV Crash address: 0x0 Process uptime: not available Thread 0 (crashed) 0 libnvidia-eglcore.so + 0x99b4f6 r0 = 0xade42204 r1 = 0xb9490b40 r2 = 0x00000000 r3 = 0x00000000 r4 = 0xb9490b40 r5 = 0xb9992000 r6 = 0xb9490b40 r7 = 0x00000000 r8 = 0xade42204 r9 = 0xffffffff r10 = 0xb9abe400 r12 = 0xadcf0f40 fp = 0xb9abe400 sp = 0xbef180f0 lr = 0xad7744e5 pc = 0xad7744f6 Found by: given as instruction pointer in context 1 libnvidia-eglcore.so + 0xa2cd11 sp = 0xbef18108 pc = 0xad805d13 Found by: stack scanning Unless someone on the CC list knows what this is, I'm going to skip nyan boards in the tast test too. (We're currently focused on SingleProcessMash, and this is a multi-process test.) |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jamescook@chromium.org
, May 2 2017Owner: kylec...@chromium.org
Status: Assigned (was: Untriaged)