Issue metadata
Sign in to add a comment
|
Application freezing, gpu process crash on Veyron
Reported by
josh@arreya.com,
Jul 18
|
||||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; CrOS x86_64 10863.0.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3480.0 Safari/537.36 Platform: 10575.58.0 (Official Build) stable-channel veyron_fievel Example URL: Steps to reproduce the problem: 1. 2. 3. What is the expected behavior? What went wrong? After 20-30 minutes of running a kiosk application, the application will freeze and no interaction is possible. Multiple times in the GPU log you will see: : The GPU process hung. Terminating after 10000 ms. GpuProcessHostUIShim: The GPU process crashed! Attached device log captures the crash and 'GPU soft-reset' just prior to the crash report. Did this work before? Yes Is it a problem with Flash or HTML5? N/A Does this work in other browsers? N/A Chrome version: 67.0.3396.99 Channel: stable OS Version: 10575.58.0 Flash Version: Contents of chrome://gpu: Attached as veyron-gpu.txt
,
Jul 19
The kiosk application is playing back a video and a slideshow of still images. The application is basically a webview displaying the URL below, so I would guess that the issue is reproducible in desktop mode as well. We're working on a minimum repro. I can see from remote devtools that our 1000ms interval in the background javascript continues to fire, but the video is frozen and interaction isn't possible. https://gcmhfoundation.arreya.com/
,
Jul 19
Does chrome://crashes on this device have any ids?
,
Jul 19
We have been able to reproduce this on devices with the Rockchip CPU - Chromebit, Chromebox Mini and Chromebase Mini. The issue affects multiple clients, sites, and content layouts. Video playback seems to be part of the trigger for the issue - when we remove video elements from content we have not been able to replicate the issue. We have one report that the issue was first seen a few weeks ago, and several reports were filed this week. What went wrong? -After about 30 minutes devices will visually freeze but continue to run in the background. -Visually the content is frozen. The clock is stuck. Slideshows are stuck. Videos are stuck. Touch appears unresponsive due to the visual layout being stuck, but we are able to tell that navigation is still working via devtools inspection/dom elements and the cursor state on screen. -Inspecting via remote devtools shows that the javascript continues to run and update the proper dom elements with no visual change on the display. The clock appears stuck even though the dom element is updated in the Elements tab of devtools. Slideshows continue to run and update dom elements, but nothing updates visually. -Remote devtools is unable to display the page preview image during this 'frozen' state. -Using the Admin Console we are no longer able to issue screenshot commands when the display is frozen. -Admin console will report an error for the reboot command but eventually reboot. When running normally all Admin Console commands are successful. -Occasionally interacting with the content during this state the entire screen will go black, and may or may not display a mouse cursor, and after this the device may fully crash and reboot. We have several devices set up for testing here including debug mode and one customer device that reliably reproduces the issue. Please let me know if you have any steps you would like us to follow. Here are recent crash IDs from a Chromebox Mini (logs attached) - b8624f34c2f9b3af ec3497d89a61a6d1 2c162418af292b11 1c69d12dcd92db2d 6d86301103ae3c2a 726b7aa1015fd4ac b4dc0832442df92a 608411f4954ddb6e 6310d8f33992ecad 79c881444db2b25a b11d7daa29e5050c 4c2138d66d0d88bb 209ba4cb1d4e2221
,
Jul 19
Looks like the V4L2 driver is hanging during Destroy: https://crash.corp.google.com/browse?stbtiq=ec3497d89a61a6d1 +some other CrOS owners.
,
Jul 20
This appears to be a hang while waiting for GPU driver waiting for EGL sync to complete and probably issue 845645.
,
Jul 24
Verified the issue still occurs on beta and dev channel. Google Chrome Version 68.0.3440.70 Platform Version 10718.58.0 (Official Build) beta-channel veyron_fievel Firmware Version Google_Veyron_Fievel.6588.237.0 Google Chrome Version 69.0.3494.0 Platform Version 10888.0.0 (Official Build) dev-channel veyron_fievel Firmware Version Google_Veyron_Fievel.6588.237.0
,
Jul 25
dstaessens@ Is this case related to crbug.com/862409 as well? Some of crash reports are pointing crbug.com/862409.
,
Jul 26
crbug.com/862409 doesn't look like the issue I'm currently looking at, but I'm not very familiar with this code yet... Does any video playback trigger the issue, or do I have to use a specific codec/video/...? I'm currently trying to reproduce on a RK3399 using crosvideo.appspot.com, but have yet to see the issue. Thanks!
,
Jul 26
The example content is all using h264 MP4 video H264 cached via fetch, stored & served from IndexedDB as blob. I do believe video plays a role as I have not seen any crashes on client devices since removing the videos. Scenarios involve digital signage with simple looping videos, as well as interactive content where videos hide/show/destroy as content is changed. For example, the video involved in the initial report/example link is a background video, looping, with several elements on top. We discovered that you can trigger a full crash/reboot on the Chrome device once it is frozen by turning the display off and then back on again. After power cycling the display and waiting a few minutes the device reboots and returns to the content functioning. Without the display power cycle, the device will continue to show the frozen content. Attached is a log from a device that performed this type of crash twice this morning. One was from freezing yesterday afternoon and the monitor being power cycled at night/in the morning. The other was around 10:30AM CST, where we confirmed that the power cycle would cause it to reboot when frozen.
,
Jul 30
Test link moved to https://rkissuetest.arreya.com Running this content on a Rockchip based device will eventually cause the GPU to hang / visually freeze the display. Confirmed on 3 different models using the Rockchip CPU. To confirm, we ran the same test on an Intel device with no issues.
,
Aug 7
Any news on this issue? We are experiencing the same issues from os64 on up to os69.
,
Aug 9
Sorry for the delay. Had a possible fix ready but needed some more work. Should be fine now once it gets through review. http://crrev.com/c/1133614
,
Aug 15
Any updates on the possible fix for this issue?
,
Aug 16
Sorry for the delay, review has been going back and forth between different approaches. Hope it gets sorted soon. Did a few attempts to reproduce the issue but haven't seen any freezes yet so far.
,
Aug 23
Possibly related to issue #873750 Same device family, similar description. Screen visually hangs, cursor state updates (hover), but DOM/rendering does not respond, black screen after HDMI unplug/plug in.
,
Aug 24
We still experience this issue running signage in both Chrome Sign Builder & Signagelive. It doesn't seem to be specific to just Google software.
,
Aug 27
Submitted http://crrev.com/c/1133614, let me know if this fixes the issue!
,
Sep 5
We're still able to reproduce this issue on 70.0.3538.0 on Veyron. If I'm not mistaken, the above fix should be in that version since it landed in 70.0.3535.0. The issue is still easy to reproduce for us in kiosk mode with our kiosk application, we can provide more detailed instructions to reproduce the issue if needed.
,
Sep 5
I also noticed crrev.com/c/1195225 with similar changes that hasn't been merged, could this affect my testing on 70.0.3538.0 since it's not included?
,
Sep 6
Additional log and some crash IDs. f409e6df76674655 7e8729d9218a67bd 33a0fda9306a181c f1f44b5462e8b7f6 d8ed5b36b3fb6d20
,
Sep 6
+conradlo for additional visibility.
,
Sep 6
The issue is still repeatable with 'Hardware-accelerated video decode' disabled. The log attached to comment #19 is with it enabled and the log in comment #21 is with the flag disabled.
,
Sep 6
Those crashes are all issue 738907 +zmo who closed that one as WontFix due to GPU driver instability -- which hopefully wouldn't be a problem on CrOS.
,
Sep 7
crrev.com/c/1195225 isn't relevant for Veyron. The crash ids listed above seem to be related to a different issue than the one I fixed. The issue I tackled should fix ec3497d89a61a6d1 and 726b7aa1015fd4ac.
,
Sep 7
Hmm, these crash signatures seem suspiciously like the other end of issue 845645.
,
Sep 10
These last crash ID's seem to occur in gpu::gles2::GLES2DecoderImpl, why do you think these are linked to issue 845645? Do you think this is a separate issue from the V4L2SliceVideoDecodeAccelerator::Destroy hanging, and if so who would be the owner if this code?
,
Sep 10
These crashes seem to be hanging inside an EGL sync and on the other issue you comment "This seems to be caused by the decoder thread waiting for an EGL sync that will never come, when queuing an output buffer." So naively they seem like they could be related. I'm just the peanut gallery who noticed some similar terms while triaging though, so I defer to your expert judgement.
,
Sep 19
Can we get an update on this issue? What are the next steps? Any testing we can do or potential workarounds?
,
Sep 20
Fixing issue 845645 seems to have just moved the problem. The real problem might be related to issue 705957. Some calls to "glDeleteFramebuffersEXT" block until the watchdog kicks in.
,
Sep 21
Added marcheu@ and hoegsberg@ from the Chrome OS Graphics team.
,
Sep 21
,
Sep 25
,
Sep 28
,
Oct 4
@arreya, can you try testing the new canary build? There were some changes made, and we were unable to reproduce the issue. Chrome version 71 (11124.0.0)
,
Oct 4
Thanks for the update. Testing it now and will report back.
,
Oct 4
,
Oct 4
,
Oct 4
Chrome version 71 (11124.0.0), kiosk mode (managed) https://rkissuetest.arreya.com crashed in about an hour, contains video https://rkissuetest2.arreya.com did not experience crash so far, similar to testing on Canary we performed on Oct 2. This test does not contain video, just one image and an embedded Google Slides presentation. It crashes on stable in a couple minutes. I'm not sure if these comments from the email chain were relayed or included in the private issue. "After further testing on the new repro (https://rkissuetest2.arreya.com/) I think it may be a different path to the same result (crashed renderer/blank screen). It looks like this one is failing 100% on stable, but does not repro on Canary. The older repro (https://rkissuetest.arreya.com/) continues to produce the issue on Canary." Some more info here (desktop restarts renderer and iframe goes black on rkissuetest2, kiosk crashes and stays black) - https://bugs.chromium.org/p/chromium/issues/detail?id=879081#c7 Chrome Sign Builder policies attached
,
Oct 4
Here is the log from the above crash on latest canary (11124.0.0)
,
Oct 5
Are there any additional steps we can perform to report more relevant information from the crashes? Happy to run commands and report back the results. If it helps we can arrange a remote ssh session before/after the crash.
,
Oct 8
@bigo, this issue was discovered in M69. Can you add rationale to M70 RBS label? Can we take this fix in M71 instead, M70 is nearing Stable checkpoints? Thanks.
,
Oct 10
Removing RBS label. Please feel free to add back with justification if you feel different.
,
Oct 15
Apologies for the slow progress, this issue has been very hard to reproduce and track down. So far we've been able to create a somewhat reliable repro. It seems that the GPU is not getting enough power, increasing the voltage slightly seems to fix the issue. We're currently working on determining the exact values required, but hopefully we should be able to provide a fix soon.
,
Nov 14
Fix is in the latest build of M72 and M71, will be merged to M70 soon: https://crrev.com/c/1334450
,
Nov 19
@dstaessens Thank you all for the hard work. We have been unable to reproduce the issue on 72 since the fix was merged. Do you know the version number we should look for in stable that has the fix?
,
Nov 20
First version is R71-11151.31.0: https://crosland.corp.google.com/log/11151.30.0..11151.31.0 |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by dalecur...@chromium.org
, Jul 18