New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 781325 link

Starred by 6 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

rk3399, rk3288, audio playback stops after 2~3 hours

Reported by ch...@radiusnetworks.com, Nov 3 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Platform: 9592.96.0 (Official Build) stable-channel ninja

Steps to reproduce the problem:
1. Load attached app as Kiosk App on Chrome OS
2. Run several hours

What is the expected behavior?
All videos should play back smoothly.

What went wrong?
After several hours (usually 8-12) the video with sound will freeze when it comes on-screen. The other videos will continue to play properly.

Did this work before? N/A 

Does this work in other browsers? N/A

Chrome version: 60.0.3112.114  Channel: stable
OS Version: 60.0.3112.114
Flash Version: 

The attached app is also available in a github repo here: https://github.com/RadiusNetworks/chrome-freezing-video-test

After being advised to use webm video containers we now transcode all video that needs to be played to that. But running this test we noticed that some videos would freeze but others would not. After testing many different settings (framerate, bitrate, resolution) we've noticed this only happens on videos with sound. 

Noticed in the logs for these machines that we will get multiple `MEDIA_ERROR_LOG_ENTRY {"error":"video decode error"}` messages. Not sure if that is related, but I attached an example log.
 
chrome-freezing-video-test-master.zip
6.7 MB Download
logs_20171102-0048.zip
320 KB Download
Cc: mlight@chromium.org vsu...@chromium.org krishna...@chromium.org
Owner: posciak@chromium.org
posciak@, could you help to triage? Looks like one of the videos (the one with both video and audio) has problem playback.

Might be related to issue 780837 where vp9 decoder crashed. I'll cc you there.
Components: Internals>Media>Video UI>Shell>Kiosk
I have attached two logs from Mickey devices, one of which is from a crash that occurred on Sunday, Nov. 5, 2017 (debug-logs_20171106-105945.tgz), and a second which did not report a crash but the video loop had ceased playing at some point over the weekend (debug-logs_20171106-111732.tgz).


debug-logs_20171106-105945.tgz
15.3 MB Download
debug-logs_20171106-111732.tgz
2.9 MB Download
Cc: xiy...@chromium.org
Labels: -Pri-2 Pri-1
Status: Assigned (was: Unconfirmed)
Hi Pawel, can you please look into it?
Chris if this happens again can you post "chrome://media-internals" logs for that player? 
Unfortunately I have only been able to reproduce in Kiosk mode (most of the testing is done there, since that is how we run in production). I don't think there is a way to get to the media-internals page while in Kiosk mode. Am I missing something?
Cc: shubhar@chromium.org
You might not be able too.

I actually think it is an audio issue that is causing the video to freeze. I noticed that behavior in crbug.com/780837. 

When the videos freeze do you hear audio?
Cc: avkodipelli@chromium.org
Avinash, can we try to get a repro for this?
Cc: dalecur...@chromium.org
Owner: owenlin@chromium.org
+owenlin@ and +dalecurtis@, could this be related to issue 761852?

This is VP9 and software decoder is being used.
Cc: wuchengli@chromium.org
There are some audio errors in /var/log/messages.

In debug-logs_20171106-105945.tgz (#4),
2017-11-05T21:58:12.294973+00:00 ERR cras_server[1554]: failed to shm_open /cras-1554-stream-1d1f0000: Too many open files
2017-11-05T21:58:12.294994+00:00 ERR cras_server[1554]: failed to setup shm -24
2017-11-05T21:58:12.295559+00:00 ERR chrome[1179]: cras_client: Error Setting up stream -12
2017-11-05T21:58:35.055551+00:00 ERR cras_server[1554]: failed to shm_open /cras-1554-stream-1d200000: Too many open files
2017-11-05T21:58:35.055570+00:00 ERR cras_server[1554]: failed to setup shm -24
2017-11-05T21:58:35.055712+00:00 ERR chrome[1179]: cras_client: Error Setting up stream -12

In debug-logs_20171106-111732.tgz (#4) and logs_20171102-0048.zip (#0):
2017-11-06T18:28:20.978216+00:00 INFO kernel: [182430.362844] dw-hdmi-audio dw-hdmi-audio: startup.
2017-11-06T18:28:20.978219+00:00 INFO kernel: [182430.364718] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger enable.
2017-11-06T18:28:35.968176+00:00 INFO kernel: [182445.351688] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger disable.
2017-11-06T18:28:35.968215+00:00 INFO kernel: [182445.352844] dw-hdmi-audio dw-hdmi-audio: shutdown.
2017-11-06T18:28:43.218191+00:00 INFO kernel: [182452.596769] dw-hdmi-audio dw-hdmi-audio: startup.
2017-11-06T18:28:43.218227+00:00 INFO kernel: [182452.597024] dw-hdmi-audio dw-hdmi-audio: shutdown.
2017-11-06T18:28:43.218232+00:00 INFO kernel: [182452.597525] dw-hdmi-audio dw-hdmi-audio: startup.
2017-11-06T18:28:43.218235+00:00 INFO kernel: [182452.600285] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger enable.
2017-11-06T18:28:58.188199+00:00 INFO kernel: [182467.565622] dw-hdmi-audio dw-hdmi-audio: [codec_dai]: trigger disable.
2017-11-06T18:28:58.188241+00:00 INFO kernel: [182467.565897] dw-hdmi-audio dw-hdmi-audio: shutdown.

Cc: dgreid@chromium.org cychiang@chromium.org
Components: OS>Kernel>Audio
Jimmy. Have you seen errors in #14 before?
Owner: cychiang@chromium.org
There was a leak FD bug fixed in R60. https://bugs.chromium.org/p/chromium/issues/detail?id=738023

This might be similar, but leak from a different path.

I will check how to repro the issue.
In the previous issue https://bugs.chromium.org/p/chromium/issues/detail?id=738023#c9
The client removes stream so early even before the stream is connected.
That cause the fd leak.

But in the test app https://github.com/RadiusNetworks/chrome-freezing-video-test
Only one of the video contains audio, and it last for about seconds.
So the audio stream can be added and removed follow the normal schedule.

I monitor 
/proc/[cras pid]/fd
and can see the fd of stream be opened and closed.
The same fd number 25 is reused.

localhost fd # ls -al | grep /shm
lrwx------. 1 root root 64 Dec  7 21:41 25 -> /dev/shm/cras-1893-stream-001b0000
lrwx------. 1 root root 64 Dec  7 21:41 7 -> /dev/shm/cras-1893
lr-x------. 1 root root 64 Dec  7 21:41 8 -> /dev/shm/cras-1893

I will let it run overnight and see if number of fd is increased.
After 12 hours running the number of opened fd indeed increased.
I will check how did that happen


Found the bug:

When ftruncate fail, we should close fd.

int cras_shm_open_rw (const char *name, size_t size)
{
        int fd;
        int rc;

        fd = shm_open(name, O_CREAT | O_EXCL | O_RDWR, 0600);
        if (fd < 0) {
                fd = -errno;
                syslog(LOG_ERR, "failed to shm_open %s: %s\n",
                       name, strerror(-fd));
                return fd;
        }
        rc = ftruncate(fd, size);
        if (rc) {
                rc = -errno;
                syslog(LOG_ERR, "failed to set size of shm %s: %s\n",
                       name, strerror(-rc));
                return rc;  <------------------------------------should close here then return.
        }
        return fd;
}

As for why ftruncate fail, I think it is because of memory leak.
Anyway, we need to fix the fd leak first.
Is it possible to add a test to catch this?
Actually, the bug in #20 is not the cause.
I did not see "failed to set size of shm " error message.

So, there must be another path where stream is not removed.

When the issue happens,
cras_test_clinet --dump_s shows number of stream is 499.

That means, cras_rstream_destroy is not called.
So I need to check why stream_destroy_cb in stream_list is not called.


This time the issue happens after two hours.

lrwx------. 1 root root 64 Dec  9 03:27 28 -> /dev/shm/cras-18908-stream-03380000
lrwx------. 1 root root 64 Dec  9 02:02 27 -> /dev/snd/pcmC0D0p
lrwx------. 1 root root 64 Dec  9 01:43 24 -> 'socket:[2905249]'
lrwx------. 1 root root 64 Dec  9 01:43 25 -> /dev/shm/cras-18908-stream-03370000

shm fd of stream 0x03370000 was not closed.
And all the streams after it had the same issue.


2017-12-09T03:26:16.815438+08:00 ERR chrome[10106]: @@ remove stream 0x3350000 from CLIENT_REMOVE_STREAM
2017-12-09T03:26:16.815736+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3350000
2017-12-09T03:26:16.816087+08:00 ERR cras_server[18908]: @@ call stream_destroy_cb on stream 0x3350000
2017-12-09T03:26:21.816365+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams

2017-12-09T03:26:38.943455+08:00 ERR chrome[10106]: @@ remove stream 0x3360000 from CLIENT_REMOVE_STREAM
2017-12-09T03:26:38.944405+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3360000
2017-12-09T03:26:38.945485+08:00 ERR cras_server[18908]: @@ call stream_destroy_cb on stream 0x3360000
2017-12-09T03:26:43.948281+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams

2017-12-09T03:27:01.062580+08:00 ERR chrome[10106]: @@ remove stream 0x3370000 from CLIENT_REMOVE_STREAM
2017-12-09T03:27:01.065028+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3370000            -------> stream_destroy_cb was not called !
2017-12-09T03:27:06.067719+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams

2017-12-09T03:27:23.211433+08:00 ERR chrome[10106]: @@ remove stream 0x3380000 from CLIENT_REMOVE_STREAM
2017-12-09T03:27:23.211573+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm id: 0x3380000
2017-12-09T03:27:28.212495+08:00 ERR cras_server[18908]: @@ call delete_streams in stream_list_rm_all_client_streams

Need to check why stream_destroy_cb was not called on stream 0x3370000.
Although stream_list_rm_all_client_streams removed the timer, it calls delete_stream in the end, so that delete_stream should re-check the drain status of the stream.

https://cs.corp.google.com/chromeos_public/src/third_party/adhd/cras/src/server/stream_list.c?type=cs&q=delete_streams+package:%5Echromeos_public$&l=131

I will add more logs to check what was going on.
The problem might be in driver.

localhost fd # cat /proc/asound/card0/pcm0p/sub0/status 
state: RUNNING
owner_pid   : 26579
trigger_time: 314179.005119559
tstamp      : 0.000000000
delay       : 131072
avail       : 0
avail_max   : 0
-----
hw_ptr      : 512
appl_ptr    : 131584

The PCM device is in RUNNING state, but hw_ptr does not move at all.
It stuck at 512.
There is no available space for new samples to feed into device.

So, all the streams that are being drained can not move their remaining samples in the shared memory from shared memory into device.
So, they are left in streams_to_delete forever. So the fds of shm are not closed.

I am testing on a bob device.
I need to check why hw_ptr does not move at all.
Summary: rk3399, rk3288, audio playback stops after 2~3 hours (was: webm video freezing during playback)
I ran the test on a jerry device, and can repro this issue.
When it happens, dma does not work at all.

Every 0.1s: cat /proc/asound/card1/pcm0p/sub0/status                                                                                                                           localhost: Thu Dec 14 20:36:[0/102]

state: RUNNING
owner_pid   : 3420
trigger_time: 13512.475192107
tstamp      : 16727.183939474
delay       : 131072
avail       : 0
avail_max   : 0
-----
hw_ptr      : 256
appl_ptr    : 131328

hw_ptr remains at a fixed position.

number of dma interrupts does not increase

localhost ~ # cat /proc/interrupts  | grep dma
 32:    1358830          0          0          0       GIC  32  ffb20000.dma-controller
 33:          0          0          0          0       GIC  33  ffb20000.dma-controller
 34:          0          0          0          0       GIC  34  ff250000.dma-controller
 35:          0          0          0          0       GIC  35  ff250000.dma-controller

I would guess this is an issue in driver on rk3399 (bob) and rk3288 (mickey, jerry).
These two families both use pl330, we will need to look into it.

On another device (ninja), I could not reproduce this issue after running for more than 10 hours.


Cc: zhengx...@rock-chips.com jeffy.c...@rock-chips.com
Components: -UI>Shell>Kiosk -Internals>Media>Video
+rockchip

Hi Jeffy and Xing, could you please take a look ?
The issue can be reproduced by running the app  at 

https://github.com/RadiusNetworks/chrome-freezing-video-test

You can download all the files and put them into 

/home/chronos/user/Downloads/

and chown -R chronos chronos
to change the file owner.

After that, you can start the app by

chrome://extensions

and load unpacked apps.

The issue happens after 2~3 hours.

PCM is in RUNNING state, but hw_ptr does not move at all.

This makes CRAS can not drain the streams.
So, the streams can not be removed.
The shm fd of those streams will remain open until CRAS can not open any new stream.

Tested on RK3399 (bob) following Zhengxing's suggestion.

When frequency is set to highest at 1608000, issue can NOT be reproduced.

localhost ~ # cd /sys/devices/system/cpu/cpu0/cpufreq/
localhost cpufreq # cat scaling_available_governors
ondemand userspace interactive performance 
localhost cpufreq # cat scaling_available_frequencies
126000 216000 408000 600000 696000 816000 1008000 1200000 1416000 1512000 1608000 1704000 1800000 
localhost cpufreq # echo userspace > scaling_governor 
localhost cpufreq # cat scaling_governor
userspace
localhost cpufreq # echo 1608000 > scaling_setspeed
localhost cpufreq # cat scaling_cur_freq 
1608000

Doing more test with lower frequency.
Using freq 1608000, the issue can be reproduced.
PCM state is running but hw_ptr does not move.

state: RUNNING
owner_pid   : 3420
trigger_time: 34285.369625930
tstamp      : 0.000000000
delay       : 131072
avail       : 0
avail_max   : 0
-----
hw_ptr      : 512
appl_ptr    : 131584
When the issue is reproduced, from the xrun debug log, it looks like the DMA lost interrupt:

2018-01-08T23:51:04.305924+08:00 ERR kernel: [ 8150.888123] ALSA: PCM: [Q] Lost interrupts?: (stream=0, delta=512, new_hw_ptr=512, old_hw_ptr=0)

during the DMA enter autosuspend.

So I try to remove all of pm_runtime_put_autosuspend() in the pl330.c, it looks that the DUT(bob) can be passed whole night (about 14h).

I will try to dig more...
I upload 2 TEST CLs:

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/858720
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/858721

and then we will enable the xrun_debug, and:
echo 3 > /proc/asound/card0/pcm0p/xrun_debug

to catch the log for pcm0p(speaker).

Thanks
Unfortunately and sorry, the issue is still reproduced without autosuspend about 20H...

It seems that there are bits problems in pl330.c from our kernel team.
I add some debug logs in pl330.c and try to trace more details.

Thanks.
It looks like that the CPU is blocked at some point, and can not handle any interrupt.

I also try to open an issue to trace it:
https://partnerissuetracker.corp.google.com/issues/72203134

Thanks.

Sign in to add a comment