New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 789872 link

Starred by 10 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 750554
issue 775789
issue 776072
issue 779190



Sign in to add a comment

gpu process hang and crash on guado during video conference (stuck on bsd ring)

Project Member Reported by kcwu@chromium.org, Nov 30 2017

Issue description

We received many bug reports about gpu process hang and crash on guado.
There are common that there is "stuck on bsd ring" error message.

Although the customers have this issue regularly (maybe 2 or 3 times per day), but unfortunately, there is no concrete/quick steps to reproduce yet. So far, observations are:
(I compiled following information from bug reports. I haven't reproduced yet)

Device: all reports are on guado

Version: earliest report are R59. Also reported on 60, 62, 63, and is still coming.
Bug reports are coming recently since July. So this may be a regression, but not sure.

Possible workaround:
one report says disabling hw acceleration can work around this issue (ref:  crbug.com/776072 )

Symptom:
during multi-user video conference (some use hangout, some use faceme.com or other private apps)
1. the video works well at the beginning several minutes.
2. before crash, video quality drop/green lines/horizontal thick lines and then hang. audio still continue
3. crash completely (black screen, no UI)

/var/log/messages shows
2017-11-02T12:47:25.536559+13:00 INFO kernel: [  646.041518] [drm] stuck on bsd ring
2017-11-02T12:47:25.536577+13:00 INFO kernel: [  646.041528] [drm] GPU crash dump saved to /sys/class/drm/card0/error
2017-11-02T12:47:25.536579+13:00 INFO kernel: [  646.041536] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
2017-11-02T12:47:25.536582+13:00 INFO kernel: [  646.041546] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
2017-11-02T12:47:25.536584+13:00 INFO kernel: [  646.041558] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
2017-11-02T12:47:25.536586+13:00 INFO kernel: [  646.041570] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
2017-11-02T12:47:25.553929+13:00 INFO crash_reporter[3572]: Consent given - collect udev crash info.

GPU Crash dump is attached (there are more from crbug.com/779190)

Plan:
Current plan is to disable hardware acceleration on BDW first.
https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/799531
BDW does 1080p VP8 decode at ~23% CPU with 0 dropped frames, so it's not that impactful

Additional info:
guado is using old kernel (3.14) and we know there are some drm/i915 fixes in 4.x kernel. 
https://bugs.freedesktop.org/show_bug.cgi?id=97396
Not sure this is relevant or not.

One customer say "all of these video meetings included a participant from iPad Pro" (crbug.com/779190)
Not sure this is relevant or not.

 
gpu_crashes-2017-11-20T09-41-17.tar.xz
165 KB Download

Comment 1 by kcwu@chromium.org, Nov 30 2017

Blocking: 779190

Comment 2 by kcwu@chromium.org, Nov 30 2017

Blocking: 775789

Comment 3 by kcwu@chromium.org, Nov 30 2017

Blocking: 750554

Comment 4 by kcwu@chromium.org, Nov 30 2017

Blocking: 776072
Cc: marc...@chromium.org
> Current plan is to disable hardware acceleration on BDW first.

AFAICT SW fallback for encoder and decoder kicks in and video recovers, see https://bugs.chromium.org/p/chromium/issues/detail?id=779190#c29 for instance. SW fallback has the same affect as disabling HW.

Comment 7 by egemih@chromium.org, Nov 30 2017

Cc: egemih@chromium.org
crbug.com/779190 was updated with customer feedback on  crbug.com/77190#c32 , explaining that the video doesn't recover and chrome eventually crashes. FYI.
@6: I don't think there is a sw fallback path. Instead the theory is that things get "fixed" after the kernels resets the GPU... until you hit the problem again of course.

Comment 9 by kotah@chromium.org, Nov 30 2017

Cc: kotah@chromium.org
Labels: Hotlist-Enterprise Proj-Hotrod
Ping - any updates on this?
Cc: harpreet@chromium.org vsu...@chromium.org avkodipelli@chromium.org
Cc: katierh@chromium.org mzhuo@chromium.org satoshi....@gmail.com
Cc: -satoshi....@gmail.com tovep@chromium.org choonc@google.com frankhu@chromium.org mnilsson@chromium.org
Labels: M-64
Are we targeting M-64 with a fix for this problem?
Labels: -M-64
HW decode acceleration is now disabled on BDW devices starting from M64, which should prevent the hang/crash issues.
Thanks, that's great news. I will ask affected customers if they can see the issue resolved in Beta channel once Beta reaches M64.

Stable M64 is still ~2 months away. Is it possible to disable it in M63 too? We have multiple customers affected by this issue, both Cfm and other video conference solutions.
Status: Assigned (was: Untriaged)
Labels: Merge-Request-63
Yes, the CL to disable should be safe to merge.
Project Member

Comment 20 by sheriffbot@chromium.org, Dec 7 2017

Labels: -Merge-Request-63 Merge-Review-63 Hotlist-Merge-Review
This bug requires manual review: Request affecting a post-stable build
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 21 by kcwu@chromium.org, Dec 7 2017

Cc: daniel.c...@intel.com
Daniel, could you take a look?

Comment 22 by tovep@chromium.org, Dec 11 2017

Labels: M-63
Owner: posciak@chromium.org
Cc: mpricone@chromium.org cvintila@chromium.org marcore@chromium.org
 Issue 775789  has been merged into this issue.
Owner: gkihumba@chromium.org
gkihumba@, what are your thoughts in mergin this to M63. AFAIK guado is one of the few devices that still use kernel version 3.14. So the general impact should be low. 

CL's that would need approval are:
1. https://chromium-review.googlesource.com/799531
2. https://chromium-review.googlesource.com/799515
3. https://chromium-review.googlesource.com/804922
Labels: -Merge-Review-63 Merge-Rejected-63
Owner: egemih@chromium.org
We currently have ~18 devices on 3.14 kernel and M63 stable roll out starts today. So rejecting merge as this is a firmware change that's quite late and affects multiple devices. 
Status: Fixed (was: Assigned)
The fix already merged into M64. So it seems like there is nothing else to do on this issue. I'm closing this bug as fixed. If we run into this crash again, feel free to reopen.  

Comment 28 by tovep@chromium.org, Dec 15 2017

Labels: -M-63 M-64

Comment 29 by kotah@chromium.org, Jan 27 2018

Cc: jorgelo@chromium.org vapier@chromium.org posciak@chromium.org ejcaruso@chromium.org
 Issue 794667  has been merged into this issue.

Sign in to add a comment