New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 731808 link

Starred by 7 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Memory leak caused by looping videos, play/pause events

Reported by josh@arreya.com, Jun 9 2017

Issue description

UserAgent: Mozilla/5.0 (X11; CrOS x86_64 9334.72.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.140 Safari/537.36
Platform: 9334.72.0 (Official Build) stable-channel buddy

Example URL:
http://run.plnkr.co/4EQ9TEqEPHpRQjgD/

Steps to reproduce the problem:
1. goto http://run.plnkr.co/4EQ9TEqEPHpRQjgD/
2. watch memory usage in task manager
3. Tab eventually crashes when it runs out of memory.

What is the expected behavior?

What went wrong?
Tested on 58.0.3029.140 and 60.0.3112.20.

Running videos in a loop situation causes memory leak. 

Plunker example: http://run.plnkr.co/4EQ9TEqEPHpRQjgD/
Example has multiple videos in a slideshow format, notably the library used calls play() and pause() on the video DOM element as it loops through the videos.

On both OS versions, setting "Hardware-accelerated video decode" flag to disabled seems to fix memory leak.

In our testing environment tab typically crashes after 30min-1hr.

Bug is affecting enterprise kiosk app clients.

Did this work before? Yes 56

Is it a problem with Flash or HTML5? HTML5

Does this work in other browsers? N/A

Chrome version: 58.0.3029.140  Channel: stable
OS Version: 9334.72.0
Flash Version: Shockwave Flash 25.0 r0

Contents of chrome://gpu: 
Graphics Feature Status
Canvas: Hardware accelerated
CheckerImaging: Disabled
Flash: Hardware accelerated
Flash Stage3D: Hardware accelerated
Flash Stage3D Baseline profile: Hardware accelerated
Compositing: Hardware accelerated
Multiple Raster Threads: Disabled
Native GpuMemoryBuffers: Hardware accelerated
Panel Fitting: Unavailable
Rasterization: Hardware accelerated
Video Decode: Hardware accelerated
Video Encode: Hardware accelerated
WebGL: Hardware accelerated
WebGL2: Hardware accelerated
Driver Bug Workarounds
clear_uniforms_before_first_program_use
count_all_in_varyings_packing
decode_encode_srgb_for_generatemipmap
disable_discard_framebuffer
disable_framebuffer_cmaa
msaa_is_slow
scalarize_vec_and_mat_constructor_args
Problems Detected
Chrome OS panel fitting is only supported for Intel IVB and SNB Graphics Controllers
Disabled Features: panel_fitting
Framebuffer discarding causes jumpy scrolling on Mali drivers: 301988
Applied Workarounds: disable_discard_framebuffer
Clear uniforms before first program use on all platforms: 124764, 349137
Applied Workarounds: clear_uniforms_before_first_program_use
Mesa drivers in ChromeOS handle varyings without static use incorrectly: 333885
Applied Workarounds: count_all_in_varyings_packing
Always rewrite vec/mat constructors to be consistent: 398694
Applied Workarounds: scalarize_vec_and_mat_constructor_args
On Intel GPUs MSAA performance is not acceptable for GPU rasterization: 527565
Applied Workarounds: msaa_is_slow
Limited enabling of Chromium GL_INTEL_framebuffer_CMAA: 535198
Applied Workarounds: disable_framebuffer_cmaa
Disable KHR_blend_equation_advanced until cc shaders are updated: 661715
Decode and Encode before generateMipmap for srgb format textures on Chromeos Intel: 634519
Applied Workarounds: decode_encode_srgb_for_generatemipmap
Raster is using a single thread.
Disabled Features: multiple_raster_threads
Checker-imaging has been disabled via finch trial or the command line.
Disabled Features: checker_imaging
Version Information

[22737:22737:0609/124637.312186:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0
[22737:22737:0609/124637.312347:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:30596:0609/124638.168016:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream
[22737:22737:0609/124638.265136:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:22737:0609/124638.284047:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0
[22737:22737:0609/124638.285249:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:22737:0609/124638.675292:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0
[22737:22737:0609/124638.675562:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:30600:0609/124644.162291:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream
[22737:22737:0609/124644.179348:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:30607:0609/124648.708938:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream
[22737:22737:0609/124648.710630:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:30614:0609/124656.771650:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream
[22737:22737:0609/124656.772817:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:22737:0609/124656.774781:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0
[22737:22737:0609/124656.775027:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
[22737:30618:0609/124706.377955:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream
[22737:22737:0609/124706.378236:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
 

Comment 1 by josh@arreya.com, Jun 9 2017

debug-logs_20170609-130736.tgz
2.7 MB Download

Comment 2 by josh@arreya.com, Jun 9 2017

debug-logs_20170607-083243.tgz
764 KB Download

Comment 3 by josh@arreya.com, Jun 9 2017

Logs added, correct plunker URL here: http://plnkr.co/edit/nXKFQi5aXXbjnLMIfQOY?p=preview
Cc: posciak@chromium.org
Clearing the src= for videos when you're done with them will help. We have seen this issue on other CrOS devices though, so +posciak.

Comment 5 by josh@arreya.com, Jun 9 2017

In the example plunker I linked in comment 3, the src attribute does get set to null before attempting to load another video and the issue still seems apparent.
Cc: -posciak@chromium.org sduraisamy@chromium.org
Labels: M-60
Owner: posciak@chromium.org
Status: Assigned (was: Unconfirmed)
Pawel, can you please own/triage this bug?

A couple of enterprise kiosk customers are facing this issue. Making it M-60 as it is reported as bug-regression.

Comment 7 by roy...@google.com, Jun 9 2017

Labels: Hotlist-Enterprise
Labels: videoshortlist
dalecurtis@: Could this be issue 700776 ?

Comment 9 by josh@arreya.com, Jun 12 2017

Verified on AOpen Mini Chromebox, memory leak seems to amount much quicker on this device.

Version 58.0.3029.140
Platform 9334.72.0 (Official Build) stable-channel veyron_fievel
ARC Version 4015103
Firmware Google_Veyron_Fievel.6588.237.0

lots of following line in gpu log
[1063:30299:0612/141508.303462:ERROR:v4l2_slice_video_decode_accelerator.cc(1444)] : DecodeBufferTask(): Setting error state:4
It may be related, that bug was not clearing the src= though so you could see entries in chrome://media-internals pile up until GC kicked in. Once they started clearing the src= it was fine for them. In this one you can see that WebMediaPlayer is destroyed between each load. So if we're still seeing a leak only on CrOS it seems specific to the VDA per the note that disabling the VDA fixes the issue.

Comment 11 by josh@arreya.com, Jun 12 2017

Also verified on an ASUS Chromebit CS10, info below. Memory leak seemed much slower on this device. chrome://gpu tab output attached 

Version 58.0.3029.140
Platform 9334.72.0 (Official Build) stable-channel veyron_mickey
Firmware Google_Veyron_Mickey.6588.197.0


ASUSchromebit58GPUtabLog.txt
9.4 KB View Download

Comment 12 by josh@arreya.com, Jun 12 2017

Couple of other observations:

Watching task manager, file descriptors for tab in question seem to be incrementing by 1 or more each time a video is loaded/played.

Memory leak seems to also occur on Windows, Chrome browser gpu tab text attached
windows10r58memoryleak.txt
10.4 KB View Download

Comment 13 by josh@arreya.com, Jun 12 2017

Just updated original device from 60.0.3112.20 -> 60.0.3112.26

"Hardware-accelerated video decode" flag no longer seems to fix the leak now. Tab memory steadily climbs before crashing.

Comment 14 by wal...@arreya.com, Jun 12 2017

It appears that devices with GPU decode flag off off only climb in memory usage.
Devices with GPU decode flag turned on (default) climb in memory usage and file descriptors.
Can't reproduce on Windows beta-channel (M60). Memory in GPU consistently hovers around 200-220MB throughout a test of 30 minutes.
Josh, is the URL http://run.plnkr.co/4EQ9TEqEPHpRQjgD/ still valid?

Comment 18 by wal...@arreya.com, Jun 13 2017

Another link, this one advances the slideshow every second and typically reproduces the issue faster than the original link - http://plnkr.co/edit/AbItcfZegvsjgpmDiTJc?p=preview

Comment 19 by wal...@arreya.com, Jun 13 2017

We have been able to reproduce the issue on the following platforms/versions -

-Windows 10, Intel, Chrome 59.0.3071.81
-OSX, Intel, Chrome 58.0.3029.86
Chrome OS (various configurations of 58/59/60/gpu on/gpu off) -
  -AOpen Mini Chromebase (most stable so far, with GPU decode off, dev channel)
  -AOpen Mini Chromebox
  -Asus Chromebox CN60
  -Asus Chromebit
  -Toshiba Chromebook 2
  -Acer Chromebase (newer model)

Comment 20 by josh@arreya.com, Jun 13 2017

Updated steps to reproduce:
1. Open new Chrome tab/window
2. Go to plunker test: http://plnkr.co/edit/AbItcfZegvsjgpmDiTJc?p=preview
3. Run plunker
4. Open Chrome devtools - causes leak to either start or leak quicker 
5. Watch memory usage on tab in task manager

@dalecurtis - Could you retry your test?

Oh that's a different issue if you're saying it leaks w/ dev tools open. Possibly that's accumulating a bunch of dev info. Are you only able to reproduce it with dev tools open?
Also, you're dumping a bunch of info to the console.log() so I wouldn't be surprised to see it accumulate w/ dev tools open.

Comment 23 by josh@arreya.com, Jun 13 2017

It's definitely reproducible without devtools open, devtools appeared to cause it to accumulate quicker which is why I noted it.

Removing log statements doesn't appear to affect it.
Are the machines you're testing on extension free?
Or at least the Windows/Mac ones. I've had no luck reproing there on any version of Chrome.
Labels: -Pri-2 Pri-1
Pawel, can we please prioritize this?
Cc: mlight@chromium.org krishna...@chromium.org
Krishna/Mike, can you run this in M56 and M59 and compare the behavior? Based on the original report, the video loop-back should work without any issues in M56.
I am probably missing some step in the instructions.

I loaded M59 on a sumo, enterprise-enrolled it, signed into to a user session, then:

1. Open new Chrome tab/window
2. Go to plunker test: http://plnkr.co/edit/AbItcfZegvsjgpmDiTJc?p=preview
3. Run plunker

I get a gray page with "[down-arrow] Plunker" in the upper left, but clicking on it doesn't seem to do anything.  Task manager shows the tab at a constant 59,964K memory usage.



In user-session, you should see the video playing on the right-hand side panel.
I am able to repro the issue in user-session very easily. Sometime I get "Aw Snap" and sometime I get a grey screen. I am attaching the logs.


debug-logs_20170614-114322.tgz
954 KB Download
Labels: ReleaseBlock-Stable
Cc: xiy...@chromium.org vidster@chromium.org
Components: UI>Shell>Kiosk
+Xiyuan, fyi
mlight@, can you repro this in M56?
I'm still trying to figure out the chrome "Developer Tools".  I've never used it before, and can't get a match to the screenshot you sent to me.
For some reason the plnkr app would not run on a Sumo with M59.

I'm having better luck with a zako on M56.  I have the looping video now.  Task manager shows the tab gradually growing in size, starting around 150 MB and after five minutes it is up to 187 MB.  

Comment 36 by wal...@arreya.com, Jun 14 2017

Just tested on a Chromebit, out of box without updates, 52.0.2743.116.  Same issue.

Comment 37 by wal...@arreya.com, Jun 14 2017

Update from #36, Chromebit on 52 shows the same memory leak, and fails to play any videos.

[1:1:0614/152915:ERROR:render_media_log.cc(23)] MediaEvent: PIPELINE_ERROR pipeline: decode error
[1:1:0614/152915:WARNING:webmediaplayer_impl.cc(346)] Using MultibufferDataSource [994:8474:0614/132916:ERROR:v4l2_slice_video_decode_accelerator.cc(1349)] Setting error state:4 [1:26:0614/152916:ERROR:ffmpeg_video_decoder.cc(308)] Error decoding video: timestamp: 83333 duration: 41667 size: 12950 side_data_size: 0 is_key_frame: 0 encrypted: 0 discard_padding (ms): (0, 0)

Comment 38 by roy...@google.com, Jun 14 2017

FYI: 
- I'm running the recommended test on a wolf using guest mode since last night (about 22 hours)
- The usage was about 170mb when it started
- It was about 210mb at 6pm pacific
- The process is at 1.07GB currently

I don't yet know if this is a memory leak, but if someone can help generate a memory dump, I can share on the bug.

I tried M-56 Stable on a Zako, and M-59 Beta on a Tricky.  The plunker app behaved nearly identical on both.

The Zako plunker tab started at ~150 mb memory used, and while it bounced around a lot there was a gradual up-trend.  At 37 minutes the video loop stopped, and tab memory was 264 mb.  Clicking on anything in the tab produced the message "Page Unresponsive [sick screen face] You can wait for it to become responsive or kill it."

On the Tricky, the tab started at 161 mb memory used, and grew to 242 mb before the video loop ceased and the page became unresponsive.

Comment 40 by roy...@google.com, Jun 14 2017

Note that in my case, the app is now hung and "Page unresponsive" dialog has come up. At this point, if this were a real kiosk app, it would be pretty dead I think.

@mlight: Can you please help us dissect this ?

Comment 41 by roy...@google.com, Jun 14 2017

Michael is retesting to confirm that disabling hardware decode reduces impact of the issue. His tests so far shows that the issue does happen on 56 as well... so its not a new issue.

Considering that this may break kiosk use cases, I'm moving to P0 to raise visibility until we have clarity on root cause.
The test run in comment 39 had Hardware-assisted Video Decoding Enabled.  I have now disabled it on both the Zako & Tricky and both systems are nearing one hour of video looping.  Memory usage grew slowly for about 20 minutes, but seemed to reach a peak on both platforms:

Zako (M56): 194 mb
Tricky (M59): 201 mb

I'll let both run overnight to see how they do.
Do we perhaps have a repro outside of the above plnkr.co test case (specifically with no errors reported by HW decoder)?

The errors returned by the hardware decoder are due to the stream in the test case being nonconformant with the H.264 specification: the SPS header specifies codec level 3.0, but the picture size is 1920x1088, which is too big for level 3.0. The hardware codec stack on veyron and Intel CrOS devices mentioned above will see this and return an error. It will not decode any frames in this stream, reporting the error from VDA::Decode() all the way up to the renderer (this is indicated above by "MediaEvent: PIPELINE_ERROR pipeline: decode error").


So apart from the possibility of a leak in the HW/GPU decoder stack itself, there is a possibility of a leak in handling decoder errors from VDAs somewhere higher up the stack, which could only manifest on platforms using media::H264Decoder (VaapiVideoDecodeAccelerator on all Intel CrOS devices, as well as V4L2SVDA mentioned above on veyrons and kevin), because other VDAs could relatively easily ignore/correct the invalid value in SPS and keep decoding.


On Windows only D3D11VideoDecodeAccelerator appears to be using the same H264Decoder class, but only on Win8+ and with disabled by default flag kD3D11VideoDecoding (https://cs.chromium.org/chromium/src/media/gpu/gpu_video_decode_accelerator_factory.cc?l=171). I don't know if DXVA would also fail here, so I don't know if the root cause would be the same there.


One way to debug this would be to simply add an explicit stream error failure in the Decode() call of any VDA on any platform, and re-run the test case.

Comment 44 by josh@arreya.com, Jun 15 2017

@posciak, another plunker but mp4 tags are removed, webm only, no decode errors in logs. Still crashes tab about 5-10 minutes on Panther on 59.

http://plnkr.co/edit/VsFSqvif1ExvBlucgchP

Possibly notable, just before crash console has the following line:

Failed to create temp file 11 : An operation that depends on state cached in an interface object was made but the state had changed since it was read from disk.
After 19 hours with Hardware-assisted Video Decoding Disabled, the Plunker app is still alive on Zako (M56), with memory usage peak is at 214MB (up from 194); and on the Tricky (M59) with memory usage peaking at 228MB (up from 200).
At some point during the night (27+ hours) the Tricky on M59 plunker video loop crashed, not only the tab but the entire browser disappeared.

The zako on M56 is still running plunker at 42 hours, but the peak memory usage for the tab is at 252mb and it probably will croak later today.

mlight@ any crash id in chrome://crashes for that?
Alas, I need to use the Tricky for some CfM M60 full-release testing, so I wiped the device.  I'll be sure to look for crash files on the zako when it chokes.  Its plunker tab is now at a 259mb peak, so shouldn't be long...
Cc: vsu...@chromium.org avkodipelli@chromium.org
Issue 735643 has been merged into this issue.
Issue 735643 has a minium repro extracted as a html page from a customer. It eats about 200MB every hour on veyron_fievel.

Not sure if it related, but I saw the following log when the video starts:
[1:11:0621/155111.199116:ERROR:render_media_log.cc(30)] MediaEvent: MEDIA_ERROR_LOG_ENTRY {"error":"FFmpegDemuxer: open context failed"}
[1:1:0621/155111.199913:ERROR:render_media_log.cc(30)] MediaEvent: PIPELINE_ERROR DEMUXER_ERROR_COULD_NOT_OPEN

Hi Pawel, do we have any update on this?
Re 51:
Here's a data point. This is the test result of video_VideoDecodeMemoryUsage in chrome performance dashboard. The test plays a h264 video for 70 times in a loop. The leak was 100KB and didn't change much between m54 to m59 on veyron fievel.
https://chromeperf.appspot.com/report?sid=f681a62eced81ccbd0a8c735307b64dffc6184c0d2bb161fa7d9cc9c5dbf7106
Cc: posciak@chromium.org
Owner: owenlin@chromium.org
Have we completely eliminated the possibility of leaking something on error? It appears that there are still some pipeline errors present from #51?

As I mentioned in #43, one way to speed up/simplify repro could be just to always unconditionally return an error on first decode into the stream...

xiyuan@: do we perhaps have a feedback report/logs from the repro case in #51?
dalecurtis@: would you perhaps have any ideas on how we could further minimize/simplify the repro cases and any suggestions for tracking this down please? Thanks!
Re 53: 100KB is the memory increase after playing the video 70 times. Sometimes the memory is not freed immediately after stop. So it doesn't mean we are leaking 100KB. The more important metrics is the difference of memory increase between different versions.
Re #54: I don't. But I can repro the problem with the mini html extracted from customer's app in issue 735643. The page loops 4 customer mp4 videos and switch every 1 second (to stress test). I load the html in a tab on veyron_fievel and the renderer crashed eventually. During the time, the renderer memory goes up and down between 170MB and 700+MB. Devtools does not show leaks in v8 (delta is minimum between two snapshots taken 8 hours apart).

Comment 58 by wal...@arreya.com, Jun 28 2017

A few notes from our recent tests with Chrome OS devices -

On devices with GPU decode enabled, we see file descriptors climb with each video load, and it does not go back down.  A quick test this morning ran up over 1000 descriptors for a tab running the test below.  Memory appears to climb as well, but not as fast as with GPU decode disabled.

On devices with GPU decode disabled, memory appears to climb faster, but file descriptors do not climb.

Both cases will eventually crash.

Videos do appear to have some effect on the issue.  Included is an MP4 that triggers the issue repeatedly (and quickly) on our test devices.

Drive link (video attachments too large) - Includes minimum repro html, test video, and video showing repro results on Panther 59.0.3071.91.  If you have problems getting the repro to run, try increasing the timeout on the timer.  The call to .load() does not appear to have any effect on the issue.

https://drive.google.com/drive/folders/0BzOyOeeyLV7zYXd3Zmg5VXVrcW8?usp=sharing

In an effort to help rule out network/cache/filesystem issues we have also repro'd using a video fetched/stored in IndexedDB, played using blob url, same result.
videosrctest.html
363 bytes View Download

Comment 59 by wal...@arreya.com, Jun 28 2017

Additional repro case -
Every 500ms it sets the src, begins playback, waits for the play promise, then waits 150ms, clears src, and calls load.  I believe this follows the recommendations for clearing src, calling load, and waiting for the play promise.


videosrctest2.html
609 bytes View Download
Cc: sande...@chromium.org
posciak@ several chromium folk here seem to be able to repro with ChromeOS; is your team not able to reproduce even after following c#57?

walter@ memory climbing higher with software decode isn't unsual so long as it reaches a stable state. I'm surprised to hear it's crashing as I was not able to repro that on desktop devices. Your attached html files don't work since we don't have the 27406.mp4 file. Can you include it?

Using a video from file:// is the best thing to test. blob:// URLs end up creating shared memory which could be confounding this. Though that would also be interesting to report. See  issue 715859 .

File descriptor leak is interesting and indicates perhaps some sort of shared memory or related leak.

Comment 61 by wal...@arreya.com, Jun 28 2017

dale@ I updated the permissions on the drive folder, it should be public now.

I will re-run our tests with GPU decode off to verify crash is still occurring in that scenario.
Cc: chinyue@chromium.org dgreid@chromium.org
I am trying to reproduce the issue on Kevin and Cyan. (will barrow a veyron for testing later.)
No lock to reproduce the issue yet after about 1 hour (following the instruction in c57). 

In which process, do we observed the fd leaking? The tab, gpu, or browser. 

The only number I saw keeping increasing is the fd # in browser process on Cyan. And I found most of them are those deleted buffers of cras. 

lrwx------. 1 chronos chronos 64 Jun 29 09:37 927 -> /dev/shm/cras-1707-stream-000b00de (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:37 928 -> /dev/shm/cras-1707-stream-000b00de (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 932 -> /dev/shm/cras-1707-stream-000b00e1 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 933 -> /dev/shm/cras-1707-stream-000b00e1 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 934 -> /dev/shm/cras-1707-stream-000b00e2 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 935 -> /dev/shm/cras-1707-stream-000b00e2 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 936 -> /dev/shm/cras-1707-stream-000b00e3 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 937 -> /dev/shm/cras-1707-stream-000b00e3 (deleted)


 #localhost fd # ls -l | grep  deleted | grep cras | wc
      590    7080   57820

But the same behavior is not observed on kevin.

BTW, it has been increased to 664 while I am typing.

  #localhost fd # ls -l | grep  deleted | grep cras | wc
      752    7968   65083

cc+: dgreid, chinyue to look at the cras issue.

Will also try to simulate a decode error in VDA to see if it helps.

dalecurtis@: could you provide instructions for the repro on Chrome OS you were referring to in #60 please? Thanks!

xiyuan@: would you be able to after reproducing "Submit feedback" from the Chrome menu in the top right hand side of the screen, and provide feedback id for it? Thank you.


In general, if possible, may I ask all reporters to any time this reproduces whenever possible submit a feedback report and let us know here please? Thank you!
Status: Started (was: Assigned)
The play starts around 06-21 12:06 and crash happens around 06-22 14:22. Unfortunately, the renderer crash is not picked up. 
forked cras FD issue to #738023
@posick: I was pointing at xiyuan@'s reference to the HTML file attached to issue 735643 that contains a simplified html looping four videos.
@xiyuan: Interesting, I didn't know you were letting it run for so long. I'll retest on desktop using your link and walter's.

Comment 70 by josh@arreya.com, Jun 30 2017

Unable to get a crash ID, but feedback report sent after crash. Feedback report comment has '731808' and 'https://bugs.chromium.org/p/chromium/issues/detail?id=731808 in it'

Repro'd using Walter's test above on 59 panther
While I cannot reproduce the leak on Kevin,  I do observed FD leaks in the "browser process" on veyron_minnie. 

After running few hours, there are some fd associated with /dev/shm/.chrome.google.Chrome.XXXXX are not released.

Then I try to simulate the decoder error as Pawel suggest. Then I can observed the FD leak in the tab process. (I believe this is the original issue.)

The leaked FD are still pointed to /dev/shm/.chrome.google.Chrome.XXXXX. And the leaking speed is fast.

Is there any trick I can find out who are allocating those shared memory? 


I reproduced the crash on edgar.

2017-07-05T16:04:53.525868+08:00 ERR chrome[27769]: cras_client: stream_connected calls wake_aud_thread, id 0x4809b7
2017-07-05T16:04:53.526644+08:00 ERR chrome[27769]: cras_client: stream_connected calls close, id 0x4809b7, ret rc0: 0, rc1: 0
2017-07-05T16:04:53.751810+08:00 WARNING crash_reporter[21003]: Received crash notification for chrome[9359] user 1000 (called directly)
2017-07-05T16:04:53.898942+08:00 INFO kernel: [168929.354099] traps: Media[9370] trap invalid opcode ip:61803eeac6f0 sp:71c41fedd350 error:0 in chrome[61803dc00000+69cb000]
2017-07-05T16:04:53.926626+08:00 WARNING crash_reporter[21024]: [user] Received crash notification for chrome[9359] sig 4, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly)
2017-07-05T16:04:53.985261+08:00 WARNING crash_reporter[21026]: Received crash notification for chrome[9359] user 1000 (called directly)
2017-07-05T16:04:54.063192+08:00 ERR chrome[27769]: cras_client: client_thread_rm_stream, id:0x4809b7
2017-07-05T16:04:54.063905+08:00 ERR chrome[27769]: cras_client: stop_aud_thread stream id:0x4809b7, join:1


The line 2017-07-05T16:04:53.898942+08:00 INFO kernel: [168929.354099] traps: Media[9370] trap invalid opcode ip:61803eeac6f0 sp:71c41fedd350 error:0 in chrome[61803dc00000+69cb000] looks very suspicious.


The cras_client message was irrelevant as I was debugging https://bugs.chromium.org/p/chromium/issues/detail?id=738023.
I was using R61-9663.0.0 edgar image with local built CRAS and runs videosrctest2.html (with timeout set to 150 ms instead of 500 ms).

The crash log was uploaded to https://crash.corp.google.com/browse?stbtiq=3dc4fe5738000000#3

messages.crash
5.1 MB Download
videosrctest2.html
610 bytes View Download
Have a fix for the memory leaking in decoded error streams:

https://chromium-review.googlesource.com/c/560810/


Project Member

Comment 74 by bugdroid1@chromium.org, Jul 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2348a1ee4f7ca0dd3e75a229d955b94a14495aab

commit 2348a1ee4f7ca0dd3e75a229d955b94a14495aab
Author: Owen Lin <owenlin@google.com>
Date: Fri Jul 07 12:37:12 2017

gpu_video_decoder: Use unique_ptr to track the ownership of SHMBuffer

The |shm_buffer| is leaked in NotifyError(), where it remove an entry
from |bitstream_buffers_in_decoder_| without freeing the |shm_buffer|.

Remove the usage of native pointer to have better ownership of the
SHMBuffer.

BUG= 731808 
TEST=Play the mem_leak_loop_mp4.html in the issue and make sure
     no FD leaks.

Change-Id: Ic475d1780ddf5ea32be6290e737f626ed8e4cd09
Reviewed-on: https://chromium-review.googlesource.com/560810
Reviewed-by: Pawel Osciak <posciak@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Owen Lin <owenlin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#484896}
[modify] https://crrev.com/2348a1ee4f7ca0dd3e75a229d955b94a14495aab/media/filters/gpu_video_decoder.cc
[modify] https://crrev.com/2348a1ee4f7ca0dd3e75a229d955b94a14495aab/media/filters/gpu_video_decoder.h

Cc: josa...@chromium.org
Should we be merging to M60 given that we are close to stable and this is not a regression?
I recommend merging this fix to M60 though it is not a regression. Atleast two enterprise customers raised this issue.
Important to take into account is that this leak should only happen if the video played back is erroneous and fails to play with an error. This was the case in the report and repro cases provided above. It was also small enough that this had to happen in a tight loop repeatedly attempting to play such videos over a longer time period.
Based on the repro that Xiyuan worked on, continuous video playback was fine (the video was playing without any issues. The app was trying to swap videoa every 5 seconds or so) but the renderer crash eventually happened.

Related bug is here - https://bugs.chromium.org/p/chromium/issues/detail?id=735643

I believe that is addressed by this fix as well?

Comment 79 by wal...@arreya.com, Jul 7 2017

#77: Can you share a way to see information about the erroneous stream?  Running our test files through ffmpeg/ffprobe/VLC I am not seeing any errors or incorrect sizes?  Is there a tool to validate the stream/format?
Labels: Merge-Request-60
Project Member

Comment 81 by sheriffbot@chromium.org, Jul 10 2017

Labels: -Merge-Request-60 Hotlist-Merge-Review Merge-Review-60
This bug requires manual review: M60 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
@79: The H264 reference decoder should show the error in its output, and this can also be confirmed through examining the H264 headers (please see comment #43 for details on the issue).

The H264Decoder class in Chrome will also show the error if debug logs are enabled.
@78: Since I am not able to reproduce the issue (http://crbug.com/735643) on veyron minnie without faking a decoding error in h264 decoder, I am not sure if it is fixed by the CL.


Labels: -Merge-Review-60 Merge-Approved-60
Merge approved for 60.
Project Member

Comment 85 by bugdroid1@chromium.org, Jul 11 2017

Labels: -merge-approved-60 merge-merged-3112
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/351051a805d30014a4bbd18b4c1eca3022e271e6

commit 351051a805d30014a4bbd18b4c1eca3022e271e6
Author: Owen Lin <owenlin@google.com>
Date: Tue Jul 11 05:49:40 2017

gpu_video_decoder: Use unique_ptr to track the ownership of SHMBuffer

The |shm_buffer| is leaked in NotifyError(), where it remove an entry
from |bitstream_buffers_in_decoder_| without freeing the |shm_buffer|.

Remove the usage of native pointer to have better ownership of the
SHMBuffer.

BUG= 731808 
TEST=Play the mem_leak_loop_mp4.html in the issue and make sure
     no FD leaks.

TBR=owenlin@google.com

(cherry picked from commit 2348a1ee4f7ca0dd3e75a229d955b94a14495aab)

Change-Id: Ic475d1780ddf5ea32be6290e737f626ed8e4cd09
Reviewed-on: https://chromium-review.googlesource.com/560810
Reviewed-by: Pawel Osciak <posciak@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Owen Lin <owenlin@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#484896}
Reviewed-on: https://chromium-review.googlesource.com/566148
Reviewed-by: Owen Lin <owenlin@chromium.org>
Cr-Commit-Position: refs/branch-heads/3112@{#575}
Cr-Branched-From: b6460e24cf59f429d69de255538d0fc7a425ccf9-refs/heads/master@{#474897}
[modify] https://crrev.com/351051a805d30014a4bbd18b4c1eca3022e271e6/media/filters/gpu_video_decoder.cc
[modify] https://crrev.com/351051a805d30014a4bbd18b4c1eca3022e271e6/media/filters/gpu_video_decoder.h

Status: Fixed (was: Started)
Status: Verified (was: Fixed)
verified on 9592.71.0, 60.0.3112.80

Comment 88 by josh@arreya.com, Aug 30 2017

I'm still observing this issue on a Mickey on 60.0.3112.112 though it may be a different decoder. Rapid memory leak which eventually results in app/tab crash.

Device log attached, quite a few of the following error:

[2630:4049:0830/101711.770934:ERROR:v4l2_slice_video_decode_accelerator.cc(1450)] DecodeBufferTask(): Setting error state:4

veyron_mickey_R60.3112.112.tgz
151 KB Download

Sign in to add a comment