Intermittent sad-chrome crashes in Kiosk App
Reported by
ch...@radiusnetworks.com,
Nov 2 2017
|
||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 Platform: 9592.96.0 (Official Build) stable-channel veyron_mickey Steps to reproduce the problem: 1. Run Kiosk App 2. Load PWA in a webview that will play webm video on loop 3. Wait for the crash to occur What is the expected behavior? Continue to play video What went wrong? The webview appears to have crashed and will show the "sad-chrome" screen. Attached 3 log files that all list craches over the same period that we saw this issue. Included a picture taken of the "sad-chrome" screen as well. Did this work before? N/A Does this work in other browsers? N/A Chrome version: 60.0.3112.114 Channel: stable OS Version: OS X 60.0.3112.114 Flash Version: Google_Veyron_Mickey.6588.264.0
,
Nov 2 2017
+isandrk fyi, in case you want to peek at the logs
,
Nov 2 2017
Renderer crashed in vp9 decoder. fgalligan@, could you help to triage? http://go/crash/4ab7802aae049d77 http://go/crash/0ada6dbedfc2b87f http://go/crash/2202fcb3a62c26b9 http://go/crash/2885063e4a61c584 ... Stack Quality96%Show frame trust levels 0x37321816 (chrome -vp9_detokenize.c:39 ) decode_coefs 0x37321127 (chrome -vp9_detokenize.c:280 ) vp9_decode_block_tokens 0x3731cf95 (chrome -vp9_decodeframe.c:360 ) decode_block 0x3731c2c5 (chrome -vp9_decodeframe.c ) decode_partition 0x3731c33d (chrome -vp9_decodeframe.c:954 ) decode_partition 0x3731be4f (chrome -vp9_decodeframe.c:1539 ) tile_worker_hook 0x37303de1 (chrome + 0x02863de1 ) Execute 0x3731b329 (chrome -vp9_decodeframe.c:1672 ) vp9_decode_frame 0x37320b43 (chrome -vp9_decoder.c:371 ) vp9_receive_compressed_data 0x37302e3f (chrome -vp9_dx_iface.c:278 ) frame_worker_hook 0x37303de1 (chrome + 0x02863de1 ) Execute 0x37302d43 (chrome -vp9_dx_iface.c:443 ) decode_one 0x37301cdf (chrome -vp9_dx_iface.c:622 ) decoder_decode 0x3730310b (chrome -vpx_decoder.c:116 ) vpx_codec_decode 0x37105d6d (chrome -vpx_video_decoder.cc:545 ) media::VpxVideoDecoder::VpxDecode(scoped_refptr<media::DecoderBuffer> const&, scoped_refptr<media::VideoFrame>*) 0x37105c01 (chrome -vpx_video_decoder.cc:403 ) media::VpxVideoDecoder::DecodeBuffer(scoped_refptr<media::DecoderBuffer> const&, base::Callback<void (media::DecodeStatus), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) 0x350fb7b9 (chrome -callback.h:91 ) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) 0x350e9193 (chrome -message_loop.cc:409 ) base::MessageLoop::RunTask(base::PendingTask*) 0x350e98c9 (chrome -message_loop.cc:420 ) base::MessageLoop::DoWork() 0x350e9c71 (chrome -message_pump_default.cc:33 ) base::MessagePumpDefault::Run(base::MessagePump::Delegate*) 0x36840f77 (chrome -run_loop.cc:111 ) base::RunLoop::Run() 0x36857b05 (chrome -thread.cc:338 ) base::Thread::ThreadMain() 0x36854df3 (chrome -platform_thread_posix.cc:71 ) base::(anonymous namespace)::ThreadFunc(void*)
,
Nov 2 2017
Hi Chris, can you please upload your app with sample videos?
,
Nov 3 2017
Attached a sample app to reproduce this issue. I opened another issue (#781325) about the video playback, but since this is crashing in the vp9 decoder I suspect it may be the root cause.
,
Nov 3 2017
,
Nov 3 2017
,
Nov 3 2017
How do you Run Kiosk App? Also the readme says it takes 8-12 hours to occur?
,
Nov 3 2017
Think it could happen when running inside a regular user session too. You can load the app in chrome://extensions and run it. Not sure whether it is hardware related though. The original crash happened on veyron_mickey - a chromebit device.
,
Nov 3 2017
OK I'm up and running on my machine. We will see. What happened with the memory while the app was running? Also whoever can reproduce, can you try testing other scenarios to cut down on variables? 1. Test one movie in a loop that has video only (either clouds or turbines) 2. Test small in a loop (video and audio) 3. Test one movie in a loop, that has video only, using the loop attribute.
,
Nov 4 2017
Just if it crashes the same way? Does it keep looping? Just trying to narrow the variables to try and make this easier to debug. Also would be a good idea to see if memory available is decreasing, as my best guess at this point is we are running out of memory. It has been running for over 6 hours at this point.
,
Nov 6 2017
Observed another crash on an Asus Chromebit here. Attached the log. Crash happened in the same away, it was running for a few hours and once it crashed it stopped looping through the videos. I didn't get immediate access to see what the memory usage was, but being a Kiosk app it is hard to get any historical data on that (or perhaps there is a CDM trick that I am missing). CDM currently reports memory as 822MB Available
,
Nov 6 2017
I have been running on the Mac since Friday afternoon (~66 hours) without any issues. Memory looks OK. Hovering about 200MB right now.
,
Nov 6 2017
Krishna, any update from running the same test in Chromebit?
,
Nov 13 2017
Encountered this again this morning:
There was a crash reported by this device. This may not have been fatal, but it could help support understand the reason why it failed.
2017-11-13T01:24:48.960480-05:00 NOTICE crash_sender[11965]: Crash report receipt ID [Reason: ALERT!!:Crash reporting detected] 511c2fd8bf6be0d0 [Reason: info:Chrome Crash IDs]
System details:
Google Chrome Version: 60.0.3112.114
Platform Version: 9592.96.0 (Official Build) stable-channel veyron_mickey
Firmware Version: Google_Veyron_Mickey.6588.264.0
,
Nov 14 2017
Reproduced again in our QA environment here. Attached the log.
System details:
Google Chrome Version: 61.0.3163.123
Platform Version: 9765.85.0 (Official Build) stable-channel veyron_mickey
Firmware Version: Google_Veyron_Mickey.6588.264.0
Please let me know if these log files are helpful or if I should be gathering different details.
,
Nov 14 2017
Another day another crash, this time on the latest stable version of Chrome. This time with the "Sad Document" screen.
System Details:
Google Chrome Version: 62.0.3202.82
Platform Version: 9901.66.0 (Official Build) stable-channel veyron_mickey
Firmware Version: Google_Veyron_Mickey.6588.264.0
,
Nov 14 2017
I don't think there is any useful information in the logs (from the video decoder perspective). Has anyone tried any of the suggestions in #10?
,
Nov 14 2017
Can I get a device to test with?
,
Nov 16 2017
Working with Google to get the device for you to test with.
In the mean time we have noticed a different error when reaching the "Sad Document" screen I reported in #18. Now the logs have the following error:
[1043:1043:1115/131433.561178:ERROR:network_metrics_provider.cc(391)] NOTREACHED() hit. [Reason: info:NOTREACHED()]
Since the log analyzer says "This always constitutes a coding error." I wanted to make sure to include it here. I attached the full log.
,
Nov 17 2017
We have successfully reproduced this again, and have setup and started running the three test cases suggested in #10. We will let you know the outcome of those test cases.
Google Chrome Version
62.0.3202.97
Platform Version
9901.77.0 (Official Build) stable-channel veyron_mickey
Firmware Version
Google_Veyron_Mickey.6588.264.0
The reference app for this can be found on github here. We will add the variants in #10 as branches.
https://github.com/RadiusNetworks/chrome-780837
,
Nov 18 2017
I have been the running the demo (unmodified) from #5 on the device. It has been about 5 hours. The demo is still running, but one thing I have noticed is that the small clip with audio is not playing back audio anymore, nor any part of the video (without audio playback this makes sense). Just the first frame. The audio stopped playing back about 4 hours in.
,
Nov 18 2017
I'm 15 hours in. No crash yet. The small clip with the audio is still not rendering audio. Did anyone else reproduce the no audio after a certain time? That should be filed as another issue.
,
Nov 19 2017
OK, I'm over 24 hours. Just want to check on the repo steps. 1. I signed in to the device with a new gmail account. 2. Downloaded the zip file in #5 3. Went to chrome://extensions and loaded the unpacked zip file. 4. Then ran the extension.
,
Nov 19 2017
About 39 hours in, and still going the same as it was four hours in. The version is exactly the same as what is listed in #22 Google Chrome Version 62.0.3202.97 (Official Build) (32-bit) Platform Version 9901.77.0 (Official Build) stable-channel veyron_mickey Firmware Version Google_Veyron_Mickey.6588.264.0
,
Nov 20 2017
Yikes! 40 hours is much longer than it would normally take us to reproduce. We very much appreciate the effort! We have an updated version of the test app that reproduced the crash more consistently. The primary change was adding an `iframe` nested under the `webview`, which is closer to how our production app works. Initially we omitted this to try and keep the app as simple as possible, but after seeing the problems you were having in reproducing it we added it back in. We have tested with it and can reproduce the issue. Test Scenarios -------------- With this app we've set up all the test cases you asked for in #10. We ran all three of these over the weekend, but were only able to reproduce the crash with the original scenario. We will continue to test. Published GitHub releases for all scenarios: https://github.com/RadiusNetworks/chrome-780837/releases However, we suggest you test with the master branch (also attached here as a zip, in case that is easier). https://github.com/RadiusNetworks/chrome-780837/ Kiosk App --------- We do almost all our testing as a Kiosk app, so not sure that makes a difference. I do know there are material differences when running as Kiosk (you are granted a number of additional permissions for example). Since Kiosk apps require the Chrome Web Store we have uploaded each version. If you would like to use them - master https://chrome.google.com/webstore/detail/chrome-pwa-video-test/ecfefhpbicedcnpcbmcfoojbmagaahbp - clouds https://chrome.google.com/webstore/detail/sad-face-clouds/mombobgmhblmmmkdjiidcommkjcmakjh - small https://chrome.google.com/webstore/detail/sad-face-small/blnikklhfpljkbbjfakapbinopdhekni - turbines-loop https://chrome.google.com/webstore/detail/sad-face-turbines-loop/fodmfghjcejhihdgkpdnhokkaclngdjk Questions --------- Couple of follow up questions: Raj, do you think running in Kiosk mode this would make a difference? Is the audio problem related to the "video freeze" issue (781325)? Any thoughts on the `NOTREACHED() hit` crash? I am not sure how to read the crash to narrow it down to the offending line.
,
Nov 20 2017
I restarted the the app at about 50 hours. then it ran last night for 12 hours, without a crash. I did notice the memory does grow. At 40 hours, the browser went from about 110K to about 475K and the freeze app went from about 10K to 110K. The webview stayed about the same bouncing from 30-50K. I just installed master from the Chrome play store: I.e. https://chrome.google.com/webstore/detail/chrome-pwa-video-test/ecfefhpbicedcnpcbmcfoojbmagaahbp Are the videos on master in a random order?
,
Nov 20 2017
Yes, the videos are randomly rotated. The logic can be seen here: https://github.com/RadiusNetworks/chrome-780837/blob/master/pwa_webapp_code/index.html#L51
,
Nov 21 2017
So we are 23 hours in and the master from the Chrome play store is still going. Some observations: 1. Memory growth (Time, Browser, Chrome PWA Test, Webview) 0 hours, 140K, 8K, 25-45K 2.5 hours, 186K, 15K, 40-60K 5 hours, 231K, 16K, 50-70K 23 hours, 443K, 19K, 65-85K 2. File with audio is still not playing audio or video. 3. The files with video only are dropping a lot more frames.
,
Nov 21 2017
Chris how are the 3 other test cases doing? How reproducible is the crash within 8-12 hours running master on your end? Are there any other extensions (or anything else) running on your devices? I'm just trying to see if there are any other differences in our setups.
,
Nov 26 2017
Sorry for the delay, I was hoping to reproduce this with the other test scenarios. Unfortunately, we have not seen a reproduction with any of the alternative scenarios, only the original test app. Although we've only been running one machine for each case, so the test pool is very small. We also see a similar "sad face" in our production app. We originally wanted to provide the simple test case that could be used to isolate this crash, but it seems like that is taking a long time to reproduce since we can only run these tests on a few units. We recently shipped a unit to Shubha R. in Mountain View that was exhibiting this issue consistently. If this test unit would be helpful for you on this issue I am happy to provide more details on that but would prefer to do that in email. Just let me know.
,
Nov 28 2017
Hi Chris, Np. Was a busy week. So with your test app on the Chromebit device I have, I was able to run 33+ hours, and 50+ hours. Both times Chrome did not crash with a sad face, but the app went away, and Chrome was brought to the foreground. So a little different behavior, but still undesirable. Some observations: 1. Memory is growing. The browser process starts at about 120K and at 50 hours it was at 765K. The test app started at 11K and at 50 hours was 129K. 2. Over time the 1080p clips playback gets worse. I.e. they start dropping more and more frames. My recommendation is to fix the audio issue first, as I am worried that what is causing the audio to stop playing might also be causing the video decoder to crash. I will ask around to see if I can find out who would be best to look at the audio issue.
,
Nov 28 2017
,
Nov 28 2017
,
Nov 28 2017
,
Nov 28 2017
fgalligan@ Do you have logs when audio stopped working?
,
Nov 28 2017
I see this error in messages.1 in logs_20171030-1408.zip. Jimmy. Is this a real error? 2017-10-30T00:32:24.368034-04:00 ERR cras_server[1115]: fetch err: -32 for 36b0002
,
Nov 28 2017
Seems like there may be an fd leak and the device eventually runs out of descriptors.
,
Nov 29 2017
I'm re-running. I should be able to get you the logs tomorrow 11/28.
,
Nov 29 2017
About the fetch err: CRAS server sends fetch message to client by writing to a socket. That fetch error means client has unexpectedly closed the read end of the pipe so CRAS got an EPIPE error. We should get a veyron_mickey device and try to reproduce the issue, and see when the issue happens why the read end of the socket was closed. Also, the video clips after issue happens should try to connect new streams to CRAS. We need to figure out why they all failed. There was no further error messages in CRAS. Maybe they were routed to fallback device in CRAS (we can check that by CRAS audio thread log), or maybe they were routed to fallback audio path in Chrome. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by sduraisamy@chromium.org
, Nov 2 2017Components: UI>Shell>Kiosk Enterprise
Owner: xiy...@chromium.org
Status: Assigned (was: Unconfirmed)