rk3399, rk3288, audio playback stops after 2~3 hours
Reported by
ch...@radiusnetworks.com,
Nov 3 2017
|
||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 Platform: 9592.96.0 (Official Build) stable-channel ninja Steps to reproduce the problem: 1. Load attached app as Kiosk App on Chrome OS 2. Run several hours What is the expected behavior? All videos should play back smoothly. What went wrong? After several hours (usually 8-12) the video with sound will freeze when it comes on-screen. The other videos will continue to play properly. Did this work before? N/A Does this work in other browsers? N/A Chrome version: 60.0.3112.114 Channel: stable OS Version: 60.0.3112.114 Flash Version: The attached app is also available in a github repo here: https://github.com/RadiusNetworks/chrome-freezing-video-test After being advised to use webm video containers we now transcode all video that needs to be played to that. But running this test we noticed that some videos would freeze but others would not. After testing many different settings (framerate, bitrate, resolution) we've noticed this only happens on videos with sound. Noticed in the logs for these machines that we will get multiple `MEDIA_ERROR_LOG_ENTRY {"error":"video decode error"}` messages. Not sure if that is related, but I attached an example log.
,
Nov 3 2017
posciak@, could you help to triage? Looks like one of the videos (the one with both video and audio) has problem playback. Might be related to issue 780837 where vp9 decoder crashed. I'll cc you there.
,
Nov 3 2017
,
Nov 6 2017
I have attached two logs from Mickey devices, one of which is from a crash that occurred on Sunday, Nov. 5, 2017 (debug-logs_20171106-105945.tgz), and a second which did not report a crash but the video loop had ceased playing at some point over the weekend (debug-logs_20171106-111732.tgz).
,
Nov 7 2017
Hi Pawel, can you please look into it?
,
Nov 20 2017
Chris if this happens again can you post "chrome://media-internals" logs for that player?
,
Nov 26 2017
Unfortunately I have only been able to reproduce in Kiosk mode (most of the testing is done there, since that is how we run in production). I don't think there is a way to get to the media-internals page while in Kiosk mode. Am I missing something?
,
Nov 27 2017
,
Nov 28 2017
You might not be able too.
,
Nov 28 2017
I actually think it is an audio issue that is causing the video to freeze. I noticed that behavior in crbug.com/780837. When the videos freeze do you hear audio?
,
Nov 28 2017
Avinash, can we try to get a repro for this?
,
Nov 28 2017
+owenlin@ and +dalecurtis@, could this be related to issue 761852? This is VP9 and software decoder is being used.
,
Nov 28 2017
,
Nov 28 2017
There are some audio errors in /var/log/messages. In debug-logs_20171106-105945.tgz (#4), 2017-11-05T21:58:12.294973+00:00 ERR cras_server[1554]: failed to shm_open /cras-1554-stream-1d1f0000: Too many open files 2017-11-05T21:58:12.294994+00:00 ERR cras_server[1554]: failed to setup shm -24 2017-11-05T21:58:12.295559+00:00 ERR chrome[1179]: cras_client: Error Setting up stream -12 2017-11-05T21:58:35.055551+00:00 ERR cras_server[1554]: failed to shm_open /cras-1554-stream-1d200000: Too many open files 2017-11-05T21:58:35.055570+00:00 ERR cras_server[1554]: failed to setup shm -24 2017-11-05T21:58:35.055712+00:00 ERR chrome[1179]: cras_client: Error Setting up stream -12 In debug-logs_20171106-111732.tgz (#4) and logs_20171102-0048.zip (#0): 2017-11-06T18:28:20.978216+00:00 INFO kernel: [182430.362844] dw-hdmi-audio dw-hdmi-audio: startup. 2017-11-06T18:28:20.978219+00:00 INFO kernel: [182430.364718] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger enable. 2017-11-06T18:28:35.968176+00:00 INFO kernel: [182445.351688] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger disable. 2017-11-06T18:28:35.968215+00:00 INFO kernel: [182445.352844] dw-hdmi-audio dw-hdmi-audio: shutdown. 2017-11-06T18:28:43.218191+00:00 INFO kernel: [182452.596769] dw-hdmi-audio dw-hdmi-audio: startup. 2017-11-06T18:28:43.218227+00:00 INFO kernel: [182452.597024] dw-hdmi-audio dw-hdmi-audio: shutdown. 2017-11-06T18:28:43.218232+00:00 INFO kernel: [182452.597525] dw-hdmi-audio dw-hdmi-audio: startup. 2017-11-06T18:28:43.218235+00:00 INFO kernel: [182452.600285] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger enable. 2017-11-06T18:28:58.188199+00:00 INFO kernel: [182467.565622] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger disable. 2017-11-06T18:28:58.188241+00:00 INFO kernel: [182467.565897] dw-hdmi-audio dw-hdmi-audio: shutdown.
,
Nov 28 2017
Jimmy. Have you seen errors in #14 before?
,
Dec 7 2017
,
Dec 7 2017
There was a leak FD bug fixed in R60. https://bugs.chromium.org/p/chromium/issues/detail?id=738023 This might be similar, but leak from a different path. I will check how to repro the issue.
,
Dec 7 2017
In the previous issue https://bugs.chromium.org/p/chromium/issues/detail?id=738023#c9 The client removes stream so early even before the stream is connected. That cause the fd leak. But in the test app https://github.com/RadiusNetworks/chrome-freezing-video-test Only one of the video contains audio, and it last for about seconds. So the audio stream can be added and removed follow the normal schedule. I monitor /proc/[cras pid]/fd and can see the fd of stream be opened and closed. The same fd number 25 is reused. localhost fd # ls -al | grep /shm lrwx------. 1 root root 64 Dec 7 21:41 25 -> /dev/shm/cras-1893-stream-001b0000 lrwx------. 1 root root 64 Dec 7 21:41 7 -> /dev/shm/cras-1893 lr-x------. 1 root root 64 Dec 7 21:41 8 -> /dev/shm/cras-1893 I will let it run overnight and see if number of fd is increased.
,
Dec 8 2017
After 12 hours running the number of opened fd indeed increased. I will check how did that happen
,
Dec 8 2017
Found the bug:
When ftruncate fail, we should close fd.
int cras_shm_open_rw (const char *name, size_t size)
{
int fd;
int rc;
fd = shm_open(name, O_CREAT | O_EXCL | O_RDWR, 0600);
if (fd < 0) {
fd = -errno;
syslog(LOG_ERR, "failed to shm_open %s: %s\n",
name, strerror(-fd));
return fd;
}
rc = ftruncate(fd, size);
if (rc) {
rc = -errno;
syslog(LOG_ERR, "failed to set size of shm %s: %s\n",
name, strerror(-rc));
return rc; <------------------------------------should close here then return.
}
return fd;
}
As for why ftruncate fail, I think it is because of memory leak.
Anyway, we need to fix the fd leak first.
,
Dec 8 2017
Is it possible to add a test to catch this?
,
Dec 8 2017
Actually, the bug in #20 is not the cause. I did not see "failed to set size of shm " error message. So, there must be another path where stream is not removed. When the issue happens, cras_test_clinet --dump_s shows number of stream is 499. That means, cras_rstream_destroy is not called. So I need to check why stream_destroy_cb in stream_list is not called.
,
Dec 11 2017
This time the issue happens after two hours. lrwx------. 1 root root 64 Dec 9 03:27 28 -> /dev/shm/cras-18908-stream-03380000 lrwx------. 1 root root 64 Dec 9 02:02 27 -> /dev/snd/pcmC0D0p lrwx------. 1 root root 64 Dec 9 01:43 24 -> 'socket:[2905249]' lrwx------. 1 root root 64 Dec 9 01:43 25 -> /dev/shm/cras-18908-stream-03370000 shm fd of stream 0x03370000 was not closed. And all the streams after it had the same issue. 2017-12-09T03:26:16.815438+08:00 ERR chrome[10106]: @@ remove stream 0x3350000 from CLIENT_REMOVE_STREAM 2017-12-09T03:26:16.815736+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3350000 2017-12-09T03:26:16.816087+08:00 ERR cras_server[18908]: @@ call stream_destroy_cb on stream 0x3350000 2017-12-09T03:26:21.816365+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams 2017-12-09T03:26:38.943455+08:00 ERR chrome[10106]: @@ remove stream 0x3360000 from CLIENT_REMOVE_STREAM 2017-12-09T03:26:38.944405+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3360000 2017-12-09T03:26:38.945485+08:00 ERR cras_server[18908]: @@ call stream_destroy_cb on stream 0x3360000 2017-12-09T03:26:43.948281+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams 2017-12-09T03:27:01.062580+08:00 ERR chrome[10106]: @@ remove stream 0x3370000 from CLIENT_REMOVE_STREAM 2017-12-09T03:27:01.065028+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3370000 -------> stream_destroy_cb was not called ! 2017-12-09T03:27:06.067719+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams 2017-12-09T03:27:23.211433+08:00 ERR chrome[10106]: @@ remove stream 0x3380000 from CLIENT_REMOVE_STREAM 2017-12-09T03:27:23.211573+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3380000 2017-12-09T03:27:28.212495+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams Need to check why stream_destroy_cb was not called on stream 0x3370000. Although stream_list_rm_all_client_streams removed the timer, it calls delete_stream in the end, so that delete_stream should re-check the drain status of the stream. https://cs.corp.google.com/chromeos_public/src/third_party/adhd/cras/src/server/stream_list.c?type=cs&q=delete_streams+package:%5Echromeos_public$&l=131 I will add more logs to check what was going on.
,
Dec 11 2017
The problem might be in driver. localhost fd # cat /proc/asound/card0/pcm0p/sub0/status state: RUNNING owner_pid : 26579 trigger_time: 314179.005119559 tstamp : 0.000000000 delay : 131072 avail : 0 avail_max : 0 ----- hw_ptr : 512 appl_ptr : 131584 The PCM device is in RUNNING state, but hw_ptr does not move at all. It stuck at 512. There is no available space for new samples to feed into device. So, all the streams that are being drained can not move their remaining samples in the shared memory from shared memory into device. So, they are left in streams_to_delete forever. So the fds of shm are not closed. I am testing on a bob device. I need to check why hw_ptr does not move at all.
,
Dec 14 2017
I ran the test on a jerry device, and can repro this issue. When it happens, dma does not work at all. Every 0.1s: cat /proc/asound/card1/pcm0p/sub0/status localhost: Thu Dec 14 20:36:[0/102] state: RUNNING owner_pid : 3420 trigger_time: 13512.475192107 tstamp : 16727.183939474 delay : 131072 avail : 0 avail_max : 0 ----- hw_ptr : 256 appl_ptr : 131328 hw_ptr remains at a fixed position. number of dma interrupts does not increase localhost ~ # cat /proc/interrupts | grep dma 32: 1358830 0 0 0 GIC 32 ffb20000.dma-controller 33: 0 0 0 0 GIC 33 ffb20000.dma-controller 34: 0 0 0 0 GIC 34 ff250000.dma-controller 35: 0 0 0 0 GIC 35 ff250000.dma-controller I would guess this is an issue in driver on rk3399 (bob) and rk3288 (mickey, jerry). These two families both use pl330, we will need to look into it. On another device (ninja), I could not reproduce this issue after running for more than 10 hours.
,
Dec 14 2017
+rockchip Hi Jeffy and Xing, could you please take a look ? The issue can be reproduced by running the app at https://github.com/RadiusNetworks/chrome-freezing-video-test You can download all the files and put them into /home/chronos/user/Downloads/ and chown -R chronos chronos to change the file owner. After that, you can start the app by chrome://extensions and load unpacked apps. The issue happens after 2~3 hours. PCM is in RUNNING state, but hw_ptr does not move at all. This makes CRAS can not drain the streams. So, the streams can not be removed. The shm fd of those streams will remain open until CRAS can not open any new stream.
,
Dec 29 2017
Tested on RK3399 (bob) following Zhengxing's suggestion. When frequency is set to highest at 1608000, issue can NOT be reproduced. localhost ~ # cd /sys/devices/system/cpu/cpu0/cpufreq/ localhost cpufreq # cat scaling_available_governors ondemand userspace interactive performance localhost cpufreq # cat scaling_available_frequencies 126000 216000 408000 600000 696000 816000 1008000 1200000 1416000 1512000 1608000 1704000 1800000 localhost cpufreq # echo userspace > scaling_governor localhost cpufreq # cat scaling_governor userspace localhost cpufreq # echo 1608000 > scaling_setspeed localhost cpufreq # cat scaling_cur_freq 1608000 Doing more test with lower frequency.
,
Jan 2 2018
Using freq 1608000, the issue can be reproduced. PCM state is running but hw_ptr does not move. state: RUNNING owner_pid : 3420 trigger_time: 34285.369625930 tstamp : 0.000000000 delay : 131072 avail : 0 avail_max : 0 ----- hw_ptr : 512 appl_ptr : 131584
,
Jan 10 2018
When the issue is reproduced, from the xrun debug log, it looks like the DMA lost interrupt: 2018-01-08T23:51:04.305924+08:00 ERR kernel: [ 8150.888123] ALSA: PCM: [Q] Lost interrupts?: (stream=0, delta=512, new_hw_ptr=512, old_hw_ptr=0) during the DMA enter autosuspend. So I try to remove all of pm_runtime_put_autosuspend() in the pl330.c, it looks that the DUT(bob) can be passed whole night (about 14h). I will try to dig more...
,
Jan 10 2018
I upload 2 TEST CLs: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/858720 https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/858721 and then we will enable the xrun_debug, and: echo 3 > /proc/asound/card0/pcm0p/xrun_debug to catch the log for pcm0p(speaker). Thanks
,
Jan 11 2018
Unfortunately and sorry, the issue is still reproduced without autosuspend about 20H... It seems that there are bits problems in pl330.c from our kernel team. I add some debug logs in pl330.c and try to trace more details. Thanks.
,
Jan 19 2018
It looks like that the CPU is blocked at some point, and can not handle any interrupt. I also try to open an issue to trace it: https://partnerissuetracker.corp.google.com/issues/72203134 Thanks. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by krishna...@chromium.org
, Nov 3 2017