New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 845776 link

Starred by 9 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Sep 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Crostini crash on multi-monitor when waking from sleep

Project Member Reported by jopra@chromium.org, May 23 2018

Issue description

Chrome version: 68.0.3422.0 (Official Build) unknown (64-bit)
OS: Chrome OS

Repro steps:
1. Install Crostini.
2. Install and open GUI apps (e.g. gnome-terminal).
3. Leave Chrome OS until it sleeps or close the lid.
4. Log back in.

Expected: Crostini programs windows are open, with their previous contents. (Via a VM hibernation).

Actual: Crostini programs windows are closed, state is lost, unsaved documents etc. are lost.
 

Comment 1 by jopra@chromium.org, May 23 2018

Description: Show this description
Does this always happen or is it only sometimes?

Comment 3 by jopra@chromium.org, May 24 2018

I can't consistently reproduce the sleep issue but I have been able to reproduce a related issue that may be the root cause of this problem.

Repro steps:
1. Install Crostini.
2. Install and open GUI apps (e.g. gnome-terminal).
3. Open a window and move it to a second monitor.
4. Close and open the lid of the Chromebook.

This seems to reliably close the running Crostini apps.

I haven't been able to repro with just closing the lid or sleeping the device.
This probably means that the bug only occurs when there is a multi-monitor setup.

Comment 4 by vapier@chromium.org, May 29 2018

Components: OS>Systems>Containers
Cc: timzheng@chromium.org
Cc: reve...@chromium.org
Labels: Hotlist-Crostini-UI
Owner: timzheng@chromium.org
Status: Assigned (was: Untriaged)
Summary: Crostini crash on multi-monitor when waking from sleep (was: Crostini App state is lost after sleep)
If this is tied to multi-monitors, adding that to the description

Comment 7 Deleted

Hi David, can you take a look at this stack trace? This is how the Crostini apps are closed. Please note that the exact lines of the files might be a little off.

This crash feels related to multiple display. It seems some wayland connection state is not updated with the lid being closed or reopened.

#4  0x00005aca6da279bf in exo::ShellSurfaceBase::~ShellSurfaceBase() () at ../../components/exo/shell_surface_base.cc:413
#5  0x00005aca6da25ffe in exo::XdgShellSurface::~XdgShellSurface() () at ../../components/exo/shell_surface.cc:83
#6  0x00005aca70f19527 in destroy_resource () at ../../third_party/wayland/src/src/wayland-server.c:677
#7  0x00005aca70f180ac in wl_map_for_each () at ../../third_party/wayland/src/src/wayland-util.c:374
#8  0x00005aca70f1959d in wl_client_destroy () at ../../third_party/wayland/src/src/wayland-server.c:834
#9  0x00005aca70f19050 in wl_client_connection_data () at ../../third_party/wayland/src/src/wayland-server.c:319
#10 0x00005aca70f186a2 in wl_event_loop_dispatch () at ../../third_party/wayland/src/src/event-loop.c:423
#11 0x00005aca6da37099 in exo::wayland::Server::Dispatch(base::TimeDelta) () at ../../components/exo/wayland/server.cc:5317
#12 0x00005aca6da141b4 in ash::WaylandServerController::WaylandWatcher::OnFileCanReadWithoutBlocking(int) ()
    at ../../ash/wayland/wayland_server_controller.cc:35
#13 0x00005aca6ec50705 in base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) ()
    at ../../base/message_loop/message_pump_libevent.cc:90

Did you find a reliable way to repro this? It's hard to see how we can end up with that stack trace.
Yes, I have a reliable way to reproduce it.
1) connect a second monitor.
2) open a Crostini app on either display.
3) Close lid. (All windows including the Crostini app window moves to the secondary display if they are not already on that).
4) Open lid.

Then the Crostini apps crash. Browser and ARC++ apps don't.


Thanks! I'll figure out what is happening when back in NYC next week and it's easy to reproduce multi-monitor issues.
I can't reproduce this in latest dev-channel release. There are some known scaling issue when closing the lid and just having an external display but that's different. I'm not seeing any crostini apps crash.
#8 is a stack trace from chrome. is chrome crashing or the crostini apps? 

Previous comments makes it sounds chrome is not crashing but linux apps are so I'm not sure why you added the stack trace from chrome.
Chrome did not crash. The linux apps crashed.
Ok, any idea what process crashes on the container side? Sommelier? or the app itself?
oh, I'm sorry that I didn't look into that.
I can still reproduce the problem. In side the container, X sommelier and Xwayland crashed.
Did chrome also restart? Sommelier and Xwayland are supposed to terminate if chrome terminates. The container will attempt to restart these instances but it's a known issue that it fails today because crosvm doesn't handle chrome restarts correctly.

If chrome didn't restart then it sounds like a sommelier issue and a stack trace would be useful. 
Chrome didn't restart. What I've observed is that in the container the sommelier X and Xwayland processes restarted with new PIDs when I reproduced the problem.

I didn't find core dumps for them. I tried to debug with gdb, when I attached to the sommelier X process and try to "continue", the process disappeared and restarted. I'm not sure what condition that the gdb attchment trigger the process to quit.
Client exiting or connection to host being broken will cause sommelier to exit.
Owner: reve...@chromium.org
I'm reassigning this. It looks like a Sommelier bug.
Cc: rohi...@chromium.org dgreid@chromium.org tbuck...@chromium.org
 Issue 864263  has been merged into this issue.
Labels: ReleaseBlock-Beta M-69
@reveman are you able to take a look at this?
Status: Available (was: Started)
No, I don't have a setup where I can reproduce this.
Labels: -ReleaseBlock-Beta ReleaseBlock-Stable
This does not appear to be an M69 regression and not being introduced. Fix needs to be prioritized, but should not block beta. Changing to RBS.
Labels: Pri-1
Release blockers should be P1 or P0
Relevant journalctl outputs.

Aug 01 18:50:58 penguin sommelier[1095]: (EE)
Aug 01 18:50:58 penguin sommelier[1095]: Fatal server error:
Aug 01 18:50:58 penguin sommelier[1095]: (EE) failed to read Wayland events: Broken pipe
Aug 01 18:50:58 penguin sommelier[1095]: (EE)
Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Main process exited, code=killed, status=8/FPE
Aug 01 18:50:58 penguin org.a11y.atspi.Registry[508]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
Aug 01 18:50:58 penguin org.a11y.atspi.Registry[508]:       after 21 requests (19 known processed) with 0 events remaining.
Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Unit entered failed state.
Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Failed with result 'signal'.
Aug 01 18:50:58 penguin systemd[96]: sommelier-x@0.service: Service hold-off time over, scheduling restart.
Aug 01 18:50:58 penguin systemd[96]: Stopped X11 sommelier at display 0.
Aug 01 18:50:58 penguin systemd[96]: Starting X11 sommelier at display 0...
Aug 01 18:50:59 penguin systemd[96]: Started X11 sommelier at display 0.

Is it only x11 sommelier that restarts? native wayland clients and their sommelier instances survives?
Yes, only x11 sommelier restarts. Wayland native sommelier survives.
In terms of apps it's a little murky. gimp crashes when the lid is closed. gedit survies a cycle of lid close and reopen. But it also crashes when the lid is closed again. I'm not sure if it's because it's not a pure native app.
ok, looks like the x11 sommelier instance is crashing from a floating point error. can you attach gdb to the x11 sommelier instance to see where?
I tried to gdb attach but whenever I do "continue" after attach it exits.
You can run a separate instance of sommelier with gdb like this:

$ gdb -ex run --args /opt/google/cros-containers/lib/ld-linux-x86-64.so.2 --library-path /opt/google/cros-containers/lib --inhibit-rpath '' /opt/google/cros-containers/bin/sommelier.elf -X xterm

not sure if some of that preload magic will prevent gdb from working as expected. if it does, then building sommelier from source inside the container might be needed.
Status: Assigned (was: Available)
https://cs.corp.google.com/chromeos_public/src/platform2/vm_tools/sommelier/sommelier.c?rcl=6d0ebf96f5e1ffdd19761e734e5fb5262deb2482&l=1246

I am able to trace the crash to the above source line.

I build sommelier in my chroot on workstation with symbols. I inserted printf statement right before and after the above line. I copied the sommelier into the container and I run it with this command:
gdb -ex run --args /opt/google/cros-containers/lib/ld-linux-x86-64.so.2 --library-path /opt/google/cros-containers/lib --inhibit-rpath '' ./sommelier -X gimp
My chromebook was connected to an external display. When I closed the lid the gimp window and the terminal window were moved to the external display and sommelier crashed with signal SIGFPE.
The fprintf right before the linked statement was showing but not the one right after.
I tried bt. It only print outs the part of the callstack inside the shared libraries. It says "Backtrace stopped: previous frame inner to this frame (corrput stack?).
I verified with "info proc map" that all the stack trace were from the shared library addresses.
hm, are you sure the crash is actually inside wl_display_dispatch? The function will dispatch calls to all of sommelier. Most code in sommelier will have this function as part of it's stack. Try building sommelier inside the container instead so you can avoid all that ld magic. I've never had issues getting stack traces when doing that.

Here are some instructions for building sommelier inside the container:

for virtiowl.h kernel header:
$ git clone --depth 1 https://chromium.googlesource.com/chromiumos/third_party/kernel -b chromeos-4.14
$ sudo cp kernel/include/uapi/linux/virtwl.h /usr/include/linux/

follow instructions here to install depot_tools (needed for ninja): https://chromium.googlesource.com/chromium/src/+/master/docs/linux_build_instructions.md#install

$ sudo apt-get install gyp
$ git clone https://chromium.googlesource.com/chromiumos/platform2
$ cd platform2/vm_tools/sommelier
$ gyp -Dsysroot="" -Dplatform2_root=../.. -Dpkg-config=pkg-config -Dpeer_cmd_prefix=0 -Dxwayland_path=/usr/bin/Xwayland -I../../common-mk/common.gypi --depth=.
$ ninja -C out/Default sommelier

and to run:

$ gdb --args ./out/Default/sommelier weston-terminal
correction, gyp command should be:

gyp -Dsysroot="" -Dplatform2_root=../.. -Dpkg-config=pkg-config -Dpeer_cmd_prefix=0 -Dxwayland_path=\"/usr/bin/Xwayland\" -I../../common-mk/common.gypi --depth=.
This should fix the crash in sommelier: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1167543

However, there are other issues with removing the internal display. Apps seem to be confused and shut down when the output that they are assigned to is removed. We need to investigate this more.

1. Chrome needs to implement proper output tracking for surfaces (ie. send wayland events to indicate what output a surface is appearing on and assign the surface to a new output before removing it).

It's possible that some applications are still confused when this output goes away and we need to always keep the it around.  Hard to say until we've implemented 1)
I think the best short term solution is for chrome/exo to always expose the internal display output even if it's not currently used. That would solve these issues and also remove the current DPI issues we have when turning off the internal display.
Cc: osh...@chromium.org
Status: Available (was: Assigned)
oshima@, how hard would it be to have wayland exo code behave as if the internal panel is always connected? We currently use display::Screen::GetAllDisplays but if could use a list that always included the internal panel then these issues would go away.
Project Member

Comment 40 by bugdroid1@chromium.org, Aug 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/d95814a86b21bc78043a40f08ff4f34571a10364

commit d95814a86b21bc78043a40f08ff4f34571a10364
Author: Maksim Ivanov <emaxx@chromium.org>
Date: Thu Aug 09 03:46:06 2018

vm_tools: sommelier: fix crash when output is removed

Outputs can be removed while we still have events to be
dispatched. This can result in the sl_output instance not
being valid. Avoid relying on sl_output instance in
host implementation.

BUG= chromium:845776 
TEST='sommelier weston-terminal' and remove internal output

Change-Id: Ie68940d93f10200716a5d0a31b94433c1c4ec1cc
Reviewed-on: https://chromium-review.googlesource.com/1167543
Commit-Ready: David Reveman <reveman@chromium.org>
Tested-by: David Reveman <reveman@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/d95814a86b21bc78043a40f08ff4f34571a10364/vm_tools/sommelier/sommelier-output.c
[modify] https://crrev.com/d95814a86b21bc78043a40f08ff4f34571a10364/vm_tools/sommelier/sommelier.h

Labels: Merge-Request-69
Verified on termina component 10953.0.0.

Requesting merge to M69.
Project Member

Comment 42 by sheriffbot@chromium.org, Aug 9

Labels: -Merge-Request-69 Merge-Review-69 Hotlist-Merge-Review
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-69 Merge-Approved-69
Merge approved, M69.
jopra@ can you elaborte a bit? Note that internal panel may be turned off in docked mode even it's chromebook.
Project Member

Comment 45 by sheriffbot@chromium.org, Aug 13

Cc: cindyb@chromium.org smbar...@chromium.org
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 46 by bugdroid1@chromium.org, Aug 13

Labels: merge-merged-release-R69-10895.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/2faedad2fd2bcb3c3543a0840e2841cfd32c85f9

commit 2faedad2fd2bcb3c3543a0840e2841cfd32c85f9
Author: Maksim Ivanov <emaxx@chromium.org>
Date: Mon Aug 13 17:15:45 2018

vm_tools: sommelier: fix crash when output is removed

Outputs can be removed while we still have events to be
dispatched. This can result in the sl_output instance not
being valid. Avoid relying on sl_output instance in
host implementation.

BUG= chromium:845776 
TEST='sommelier weston-terminal' and remove internal output

Change-Id: Ie68940d93f10200716a5d0a31b94433c1c4ec1cc
Reviewed-on: https://chromium-review.googlesource.com/1167543
Commit-Ready: David Reveman <reveman@chromium.org>
Tested-by: David Reveman <reveman@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
(cherry picked from commit d95814a86b21bc78043a40f08ff4f34571a10364)
Reviewed-on: https://chromium-review.googlesource.com/1172704
Commit-Queue: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/2faedad2fd2bcb3c3543a0840e2841cfd32c85f9/vm_tools/sommelier/sommelier-output.c
[modify] https://crrev.com/2faedad2fd2bcb3c3543a0840e2841cfd32c85f9/vm_tools/sommelier/sommelier.h

Labels: -Merge-Approved-69
Labels: Merge-Approved-69
Adding merge approval label for accurate queries.
<triage>Can this issue be closed?</triage>
Project Member

Comment 50 by sheriffbot@chromium.org, Sep 4

This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-69
Status: Fixed (was: Available)

Sign in to add a comment