Crostini crash on multi-monitor when waking from sleep |
||||||||||||||||||||
Issue descriptionChrome version: 68.0.3422.0 (Official Build) unknown (64-bit) OS: Chrome OS Repro steps: 1. Install Crostini. 2. Install and open GUI apps (e.g. gnome-terminal). 3. Leave Chrome OS until it sleeps or close the lid. 4. Log back in. Expected: Crostini programs windows are open, with their previous contents. (Via a VM hibernation). Actual: Crostini programs windows are closed, state is lost, unsaved documents etc. are lost.
,
May 23 2018
Does this always happen or is it only sometimes?
,
May 24 2018
I can't consistently reproduce the sleep issue but I have been able to reproduce a related issue that may be the root cause of this problem. Repro steps: 1. Install Crostini. 2. Install and open GUI apps (e.g. gnome-terminal). 3. Open a window and move it to a second monitor. 4. Close and open the lid of the Chromebook. This seems to reliably close the running Crostini apps. I haven't been able to repro with just closing the lid or sleeping the device. This probably means that the bug only occurs when there is a multi-monitor setup.
,
May 29 2018
,
Jun 1 2018
,
Jun 4 2018
If this is tied to multi-monitors, adding that to the description
,
Jun 21 2018
Hi David, can you take a look at this stack trace? This is how the Crostini apps are closed. Please note that the exact lines of the files might be a little off.
This crash feels related to multiple display. It seems some wayland connection state is not updated with the lid being closed or reopened.
#4 0x00005aca6da279bf in exo::ShellSurfaceBase::~ShellSurfaceBase() () at ../../components/exo/shell_surface_base.cc:413
#5 0x00005aca6da25ffe in exo::XdgShellSurface::~XdgShellSurface() () at ../../components/exo/shell_surface.cc:83
#6 0x00005aca70f19527 in destroy_resource () at ../../third_party/wayland/src/src/wayland-server.c:677
#7 0x00005aca70f180ac in wl_map_for_each () at ../../third_party/wayland/src/src/wayland-util.c:374
#8 0x00005aca70f1959d in wl_client_destroy () at ../../third_party/wayland/src/src/wayland-server.c:834
#9 0x00005aca70f19050 in wl_client_connection_data () at ../../third_party/wayland/src/src/wayland-server.c:319
#10 0x00005aca70f186a2 in wl_event_loop_dispatch () at ../../third_party/wayland/src/src/event-loop.c:423
#11 0x00005aca6da37099 in exo::wayland::Server::Dispatch(base::TimeDelta) () at ../../components/exo/wayland/server.cc:5317
#12 0x00005aca6da141b4 in ash::WaylandServerController::WaylandWatcher::OnFileCanReadWithoutBlocking(int) ()
at ../../ash/wayland/wayland_server_controller.cc:35
#13 0x00005aca6ec50705 in base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) ()
at ../../base/message_loop/message_pump_libevent.cc:90
,
Jun 21 2018
Did you find a reliable way to repro this? It's hard to see how we can end up with that stack trace.
,
Jun 21 2018
Yes, I have a reliable way to reproduce it. 1) connect a second monitor. 2) open a Crostini app on either display. 3) Close lid. (All windows including the Crostini app window moves to the secondary display if they are not already on that). 4) Open lid. Then the Crostini apps crash. Browser and ARC++ apps don't.
,
Jun 21 2018
Thanks! I'll figure out what is happening when back in NYC next week and it's easy to reproduce multi-monitor issues.
,
Jun 28 2018
I can't reproduce this in latest dev-channel release. There are some known scaling issue when closing the lid and just having an external display but that's different. I'm not seeing any crostini apps crash.
,
Jun 29 2018
#8 is a stack trace from chrome. is chrome crashing or the crostini apps? Previous comments makes it sounds chrome is not crashing but linux apps are so I'm not sure why you added the stack trace from chrome.
,
Jun 29 2018
Chrome did not crash. The linux apps crashed.
,
Jun 29 2018
Ok, any idea what process crashes on the container side? Sommelier? or the app itself?
,
Jun 29 2018
oh, I'm sorry that I didn't look into that.
,
Jul 7
I can still reproduce the problem. In side the container, X sommelier and Xwayland crashed.
,
Jul 9
Did chrome also restart? Sommelier and Xwayland are supposed to terminate if chrome terminates. The container will attempt to restart these instances but it's a known issue that it fails today because crosvm doesn't handle chrome restarts correctly. If chrome didn't restart then it sounds like a sommelier issue and a stack trace would be useful.
,
Jul 10
Chrome didn't restart. What I've observed is that in the container the sommelier X and Xwayland processes restarted with new PIDs when I reproduced the problem. I didn't find core dumps for them. I tried to debug with gdb, when I attached to the sommelier X process and try to "continue", the process disappeared and restarted. I'm not sure what condition that the gdb attchment trigger the process to quit.
,
Jul 11
Client exiting or connection to host being broken will cause sommelier to exit.
,
Jul 19
I'm reassigning this. It looks like a Sommelier bug.
,
Jul 24
Issue 864263 has been merged into this issue.
,
Jul 27
@reveman are you able to take a look at this?
,
Jul 27
No, I don't have a setup where I can reproduce this.
,
Jul 31
This does not appear to be an M69 regression and not being introduced. Fix needs to be prioritized, but should not block beta. Changing to RBS.
,
Aug 1
Release blockers should be P1 or P0
,
Aug 1
Relevant journalctl outputs. Aug 01 18:50:58 penguin sommelier[1095]: (EE) Aug 01 18:50:58 penguin sommelier[1095]: Fatal server error: Aug 01 18:50:58 penguin sommelier[1095]: (EE) failed to read Wayland events: Broken pipe Aug 01 18:50:58 penguin sommelier[1095]: (EE) Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Main process exited, code=killed, status=8/FPE Aug 01 18:50:58 penguin org.a11y.atspi.Registry[508]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Aug 01 18:50:58 penguin org.a11y.atspi.Registry[508]: after 21 requests (19 known processed) with 0 events remaining. Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Unit entered failed state. Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Failed with result 'signal'. Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Service hold-off time over, scheduling restart. Aug 01 18:50:58 penguin systemd[96]: Stopped X11 sommelier at display 0. Aug 01 18:50:58 penguin systemd[96]: Starting X11 sommelier at display 0... Aug 01 18:50:59 penguin systemd[96]: Started X11 sommelier at display 0.
,
Aug 1
Is it only x11 sommelier that restarts? native wayland clients and their sommelier instances survives?
,
Aug 1
Yes, only x11 sommelier restarts. Wayland native sommelier survives. In terms of apps it's a little murky. gimp crashes when the lid is closed. gedit survies a cycle of lid close and reopen. But it also crashes when the lid is closed again. I'm not sure if it's because it's not a pure native app.
,
Aug 1
ok, looks like the x11 sommelier instance is crashing from a floating point error. can you attach gdb to the x11 sommelier instance to see where?
,
Aug 1
I tried to gdb attach but whenever I do "continue" after attach it exits.
,
Aug 1
You can run a separate instance of sommelier with gdb like this: $ gdb -ex run --args /opt/google/cros-containers/lib/ld-linux-x86-64.so.2 --library-path /opt/google/cros-containers/lib --inhibit-rpath '' /opt/google/cros-containers/bin/sommelier.elf -X xterm not sure if some of that preload magic will prevent gdb from working as expected. if it does, then building sommelier from source inside the container might be needed.
,
Aug 2
,
Aug 6
https://cs.corp.google.com/chromeos_public/src/platform2/vm_tools/sommelier/sommelier.c?rcl=6d0ebf96f5e1ffdd19761e734e5fb5262deb2482&l=1246 I am able to trace the crash to the above source line. I build sommelier in my chroot on workstation with symbols. I inserted printf statement right before and after the above line. I copied the sommelier into the container and I run it with this command: gdb -ex run --args /opt/google/cros-containers/lib/ld-linux-x86-64.so.2 --library-path /opt/google/cros-containers/lib --inhibit-rpath '' ./sommelier -X gimp My chromebook was connected to an external display. When I closed the lid the gimp window and the terminal window were moved to the external display and sommelier crashed with signal SIGFPE. The fprintf right before the linked statement was showing but not the one right after. I tried bt. It only print outs the part of the callstack inside the shared libraries. It says "Backtrace stopped: previous frame inner to this frame (corrput stack?). I verified with "info proc map" that all the stack trace were from the shared library addresses.
,
Aug 8
hm, are you sure the crash is actually inside wl_display_dispatch? The function will dispatch calls to all of sommelier. Most code in sommelier will have this function as part of it's stack. Try building sommelier inside the container instead so you can avoid all that ld magic. I've never had issues getting stack traces when doing that. Here are some instructions for building sommelier inside the container: for virtiowl.h kernel header: $ git clone --depth 1 https://chromium.googlesource.com/chromiumos/third_party/kernel -b chromeos-4.14 $ sudo cp kernel/include/uapi/linux/virtwl.h /usr/include/linux/ follow instructions here to install depot_tools (needed for ninja): https://chromium.googlesource.com/chromium/src/+/master/docs/linux_build_instructions.md#install $ sudo apt-get install gyp $ git clone https://chromium.googlesource.com/chromiumos/platform2 $ cd platform2/vm_tools/sommelier $ gyp -Dsysroot="" -Dplatform2_root=../.. -Dpkg-config=pkg-config -Dpeer_cmd_prefix=0 -Dxwayland_path=/usr/bin/Xwayland -I../../common-mk/common.gypi --depth=. $ ninja -C out/Default sommelier and to run: $ gdb --args ./out/Default/sommelier weston-terminal
,
Aug 8
correction, gyp command should be: gyp -Dsysroot="" -Dplatform2_root=../.. -Dpkg-config=pkg-config -Dpeer_cmd_prefix=0 -Dxwayland_path=\"/usr/bin/Xwayland\" -I../../common-mk/common.gypi --depth=.
,
Aug 8
This should fix the crash in sommelier: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1167543 However, there are other issues with removing the internal display. Apps seem to be confused and shut down when the output that they are assigned to is removed. We need to investigate this more. 1. Chrome needs to implement proper output tracking for surfaces (ie. send wayland events to indicate what output a surface is appearing on and assign the surface to a new output before removing it). It's possible that some applications are still confused when this output goes away and we need to always keep the it around. Hard to say until we've implemented 1)
,
Aug 8
I think the best short term solution is for chrome/exo to always expose the internal display output even if it's not currently used. That would solve these issues and also remove the current DPI issues we have when turning off the internal display.
,
Aug 8
oshima@, how hard would it be to have wayland exo code behave as if the internal panel is always connected? We currently use display::Screen::GetAllDisplays but if could use a list that always included the internal panel then these issues would go away.
,
Aug 9
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/d95814a86b21bc78043a40f08ff4f34571a10364 commit d95814a86b21bc78043a40f08ff4f34571a10364 Author: Maksim Ivanov <emaxx@chromium.org> Date: Thu Aug 09 03:46:06 2018 vm_tools: sommelier: fix crash when output is removed Outputs can be removed while we still have events to be dispatched. This can result in the sl_output instance not being valid. Avoid relying on sl_output instance in host implementation. BUG= chromium:845776 TEST='sommelier weston-terminal' and remove internal output Change-Id: Ie68940d93f10200716a5d0a31b94433c1c4ec1cc Reviewed-on: https://chromium-review.googlesource.com/1167543 Commit-Ready: David Reveman <reveman@chromium.org> Tested-by: David Reveman <reveman@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> [modify] https://crrev.com/d95814a86b21bc78043a40f08ff4f34571a10364/vm_tools/sommelier/sommelier-output.c [modify] https://crrev.com/d95814a86b21bc78043a40f08ff4f34571a10364/vm_tools/sommelier/sommelier.h
,
Aug 9
Verified on termina component 10953.0.0. Requesting merge to M69.
,
Aug 9
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 9
Merge approved, M69.
,
Aug 9
jopra@ can you elaborte a bit? Note that internal panel may be turned off in docked mode even it's chromebook.
,
Aug 13
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 13
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/2faedad2fd2bcb3c3543a0840e2841cfd32c85f9 commit 2faedad2fd2bcb3c3543a0840e2841cfd32c85f9 Author: Maksim Ivanov <emaxx@chromium.org> Date: Mon Aug 13 17:15:45 2018 vm_tools: sommelier: fix crash when output is removed Outputs can be removed while we still have events to be dispatched. This can result in the sl_output instance not being valid. Avoid relying on sl_output instance in host implementation. BUG= chromium:845776 TEST='sommelier weston-terminal' and remove internal output Change-Id: Ie68940d93f10200716a5d0a31b94433c1c4ec1cc Reviewed-on: https://chromium-review.googlesource.com/1167543 Commit-Ready: David Reveman <reveman@chromium.org> Tested-by: David Reveman <reveman@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> (cherry picked from commit d95814a86b21bc78043a40f08ff4f34571a10364) Reviewed-on: https://chromium-review.googlesource.com/1172704 Commit-Queue: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> [modify] https://crrev.com/2faedad2fd2bcb3c3543a0840e2841cfd32c85f9/vm_tools/sommelier/sommelier-output.c [modify] https://crrev.com/2faedad2fd2bcb3c3543a0840e2841cfd32c85f9/vm_tools/sommelier/sommelier.h
,
Aug 13
,
Aug 31
Adding merge approval label for accurate queries.
,
Sep 4
<triage>Can this issue be closed?</triage>
,
Sep 4
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Sep 4
|
||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||
Comment 1 by jopra@chromium.org
, May 23 2018