Issue metadata
Sign in to add a comment
|
Memory leak caused by looping videos, play/pause events
Reported by
josh@arreya.com,
Jun 9 2017
|
||||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; CrOS x86_64 9334.72.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.140 Safari/537.36 Platform: 9334.72.0 (Official Build) stable-channel buddy Example URL: http://run.plnkr.co/4EQ9TEqEPHpRQjgD/ Steps to reproduce the problem: 1. goto http://run.plnkr.co/4EQ9TEqEPHpRQjgD/ 2. watch memory usage in task manager 3. Tab eventually crashes when it runs out of memory. What is the expected behavior? What went wrong? Tested on 58.0.3029.140 and 60.0.3112.20. Running videos in a loop situation causes memory leak. Plunker example: http://run.plnkr.co/4EQ9TEqEPHpRQjgD/ Example has multiple videos in a slideshow format, notably the library used calls play() and pause() on the video DOM element as it loops through the videos. On both OS versions, setting "Hardware-accelerated video decode" flag to disabled seems to fix memory leak. In our testing environment tab typically crashes after 30min-1hr. Bug is affecting enterprise kiosk app clients. Did this work before? Yes 56 Is it a problem with Flash or HTML5? HTML5 Does this work in other browsers? N/A Chrome version: 58.0.3029.140 Channel: stable OS Version: 9334.72.0 Flash Version: Shockwave Flash 25.0 r0 Contents of chrome://gpu: Graphics Feature Status Canvas: Hardware accelerated CheckerImaging: Disabled Flash: Hardware accelerated Flash Stage3D: Hardware accelerated Flash Stage3D Baseline profile: Hardware accelerated Compositing: Hardware accelerated Multiple Raster Threads: Disabled Native GpuMemoryBuffers: Hardware accelerated Panel Fitting: Unavailable Rasterization: Hardware accelerated Video Decode: Hardware accelerated Video Encode: Hardware accelerated WebGL: Hardware accelerated WebGL2: Hardware accelerated Driver Bug Workarounds clear_uniforms_before_first_program_use count_all_in_varyings_packing decode_encode_srgb_for_generatemipmap disable_discard_framebuffer disable_framebuffer_cmaa msaa_is_slow scalarize_vec_and_mat_constructor_args Problems Detected Chrome OS panel fitting is only supported for Intel IVB and SNB Graphics Controllers Disabled Features: panel_fitting Framebuffer discarding causes jumpy scrolling on Mali drivers: 301988 Applied Workarounds: disable_discard_framebuffer Clear uniforms before first program use on all platforms: 124764, 349137 Applied Workarounds: clear_uniforms_before_first_program_use Mesa drivers in ChromeOS handle varyings without static use incorrectly: 333885 Applied Workarounds: count_all_in_varyings_packing Always rewrite vec/mat constructors to be consistent: 398694 Applied Workarounds: scalarize_vec_and_mat_constructor_args On Intel GPUs MSAA performance is not acceptable for GPU rasterization: 527565 Applied Workarounds: msaa_is_slow Limited enabling of Chromium GL_INTEL_framebuffer_CMAA: 535198 Applied Workarounds: disable_framebuffer_cmaa Disable KHR_blend_equation_advanced until cc shaders are updated: 661715 Decode and Encode before generateMipmap for srgb format textures on Chromeos Intel: 634519 Applied Workarounds: decode_encode_srgb_for_generatemipmap Raster is using a single thread. Disabled Features: multiple_raster_threads Checker-imaging has been disabled via finch trial or the command line. Disabled Features: checker_imaging Version Information [22737:22737:0609/124637.312186:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0 [22737:22737:0609/124637.312347:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:30596:0609/124638.168016:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream [22737:22737:0609/124638.265136:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:22737:0609/124638.284047:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0 [22737:22737:0609/124638.285249:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:22737:0609/124638.675292:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0 [22737:22737:0609/124638.675562:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:30600:0609/124644.162291:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream [22737:22737:0609/124644.179348:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:30607:0609/124648.708938:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream [22737:22737:0609/124648.710630:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:30614:0609/124656.771650:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream [22737:22737:0609/124656.772817:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:22737:0609/124656.774781:ERROR:vaapi_video_decode_accelerator.cc(515)] : Decode/Flush request from client in invalid state: 0 [22737:22737:0609/124656.775027:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4 [22737:30618:0609/124706.377955:ERROR:vaapi_video_decode_accelerator.cc(661)] : Error decoding stream [22737:22737:0609/124706.378236:ERROR:vaapi_video_decode_accelerator.cc(287)] : Notifying of error 4
,
Jun 9 2017
,
Jun 9 2017
Logs added, correct plunker URL here: http://plnkr.co/edit/nXKFQi5aXXbjnLMIfQOY?p=preview
,
Jun 9 2017
Clearing the src= for videos when you're done with them will help. We have seen this issue on other CrOS devices though, so +posciak.
,
Jun 9 2017
In the example plunker I linked in comment 3, the src attribute does get set to null before attempting to load another video and the issue still seems apparent.
,
Jun 9 2017
Pawel, can you please own/triage this bug? A couple of enterprise kiosk customers are facing this issue. Making it M-60 as it is reported as bug-regression.
,
Jun 9 2017
,
Jun 12 2017
dalecurtis@: Could this be issue 700776 ?
,
Jun 12 2017
Verified on AOpen Mini Chromebox, memory leak seems to amount much quicker on this device. Version 58.0.3029.140 Platform 9334.72.0 (Official Build) stable-channel veyron_fievel ARC Version 4015103 Firmware Google_Veyron_Fievel.6588.237.0 lots of following line in gpu log [1063:30299:0612/141508.303462:ERROR:v4l2_slice_video_decode_accelerator.cc(1444)] : DecodeBufferTask(): Setting error state:4
,
Jun 12 2017
It may be related, that bug was not clearing the src= though so you could see entries in chrome://media-internals pile up until GC kicked in. Once they started clearing the src= it was fine for them. In this one you can see that WebMediaPlayer is destroyed between each load. So if we're still seeing a leak only on CrOS it seems specific to the VDA per the note that disabling the VDA fixes the issue.
,
Jun 12 2017
Also verified on an ASUS Chromebit CS10, info below. Memory leak seemed much slower on this device. chrome://gpu tab output attached Version 58.0.3029.140 Platform 9334.72.0 (Official Build) stable-channel veyron_mickey Firmware Google_Veyron_Mickey.6588.197.0
,
Jun 12 2017
Couple of other observations: Watching task manager, file descriptors for tab in question seem to be incrementing by 1 or more each time a video is loaded/played. Memory leak seems to also occur on Windows, Chrome browser gpu tab text attached
,
Jun 12 2017
Just updated original device from 60.0.3112.20 -> 60.0.3112.26 "Hardware-accelerated video decode" flag no longer seems to fix the leak now. Tab memory steadily climbs before crashing.
,
Jun 12 2017
It appears that devices with GPU decode flag off off only climb in memory usage. Devices with GPU decode flag turned on (default) climb in memory usage and file descriptors.
,
Jun 12 2017
Can't reproduce on Windows beta-channel (M60). Memory in GPU consistently hovers around 200-220MB throughout a test of 30 minutes.
,
Jun 13 2017
Josh, is the URL http://run.plnkr.co/4EQ9TEqEPHpRQjgD/ still valid?
,
Jun 13 2017
Correct link is in c#3, http://plnkr.co/edit/nXKFQi5aXXbjnLMIfQOY?p=preview
,
Jun 13 2017
Another link, this one advances the slideshow every second and typically reproduces the issue faster than the original link - http://plnkr.co/edit/AbItcfZegvsjgpmDiTJc?p=preview
,
Jun 13 2017
We have been able to reproduce the issue on the following platforms/versions - -Windows 10, Intel, Chrome 59.0.3071.81 -OSX, Intel, Chrome 58.0.3029.86 Chrome OS (various configurations of 58/59/60/gpu on/gpu off) - -AOpen Mini Chromebase (most stable so far, with GPU decode off, dev channel) -AOpen Mini Chromebox -Asus Chromebox CN60 -Asus Chromebit -Toshiba Chromebook 2 -Acer Chromebase (newer model)
,
Jun 13 2017
Updated steps to reproduce: 1. Open new Chrome tab/window 2. Go to plunker test: http://plnkr.co/edit/AbItcfZegvsjgpmDiTJc?p=preview 3. Run plunker 4. Open Chrome devtools - causes leak to either start or leak quicker 5. Watch memory usage on tab in task manager @dalecurtis - Could you retry your test?
,
Jun 13 2017
Oh that's a different issue if you're saying it leaks w/ dev tools open. Possibly that's accumulating a bunch of dev info. Are you only able to reproduce it with dev tools open?
,
Jun 13 2017
Also, you're dumping a bunch of info to the console.log() so I wouldn't be surprised to see it accumulate w/ dev tools open.
,
Jun 13 2017
It's definitely reproducible without devtools open, devtools appeared to cause it to accumulate quicker which is why I noted it. Removing log statements doesn't appear to affect it.
,
Jun 13 2017
Are the machines you're testing on extension free?
,
Jun 13 2017
Or at least the Windows/Mac ones. I've had no luck reproing there on any version of Chrome.
,
Jun 14 2017
Pawel, can we please prioritize this?
,
Jun 14 2017
Krishna/Mike, can you run this in M56 and M59 and compare the behavior? Based on the original report, the video loop-back should work without any issues in M56.
,
Jun 14 2017
I am probably missing some step in the instructions. I loaded M59 on a sumo, enterprise-enrolled it, signed into to a user session, then: 1. Open new Chrome tab/window 2. Go to plunker test: http://plnkr.co/edit/AbItcfZegvsjgpmDiTJc?p=preview 3. Run plunker I get a gray page with "[down-arrow] Plunker" in the upper left, but clicking on it doesn't seem to do anything. Task manager shows the tab at a constant 59,964K memory usage.
,
Jun 14 2017
In user-session, you should see the video playing on the right-hand side panel.
,
Jun 14 2017
I am able to repro the issue in user-session very easily. Sometime I get "Aw Snap" and sometime I get a grey screen. I am attaching the logs.
,
Jun 14 2017
,
Jun 14 2017
+Xiyuan, fyi
,
Jun 14 2017
mlight@, can you repro this in M56?
,
Jun 14 2017
I'm still trying to figure out the chrome "Developer Tools". I've never used it before, and can't get a match to the screenshot you sent to me.
,
Jun 14 2017
For some reason the plnkr app would not run on a Sumo with M59. I'm having better luck with a zako on M56. I have the looping video now. Task manager shows the tab gradually growing in size, starting around 150 MB and after five minutes it is up to 187 MB.
,
Jun 14 2017
Just tested on a Chromebit, out of box without updates, 52.0.2743.116. Same issue.
,
Jun 14 2017
Update from #36, Chromebit on 52 shows the same memory leak, and fails to play any videos. [1:1:0614/152915:ERROR:render_media_log.cc(23)] MediaEvent: PIPELINE_ERROR pipeline: decode error [1:1:0614/152915:WARNING:webmediaplayer_impl.cc(346)] Using MultibufferDataSource [994:8474:0614/132916:ERROR:v4l2_slice_video_decode_accelerator.cc(1349)] Setting error state:4 [1:26:0614/152916:ERROR:ffmpeg_video_decoder.cc(308)] Error decoding video: timestamp: 83333 duration: 41667 size: 12950 side_data_size: 0 is_key_frame: 0 encrypted: 0 discard_padding (ms): (0, 0)
,
Jun 14 2017
FYI: - I'm running the recommended test on a wolf using guest mode since last night (about 22 hours) - The usage was about 170mb when it started - It was about 210mb at 6pm pacific - The process is at 1.07GB currently I don't yet know if this is a memory leak, but if someone can help generate a memory dump, I can share on the bug.
,
Jun 14 2017
I tried M-56 Stable on a Zako, and M-59 Beta on a Tricky. The plunker app behaved nearly identical on both. The Zako plunker tab started at ~150 mb memory used, and while it bounced around a lot there was a gradual up-trend. At 37 minutes the video loop stopped, and tab memory was 264 mb. Clicking on anything in the tab produced the message "Page Unresponsive [sick screen face] You can wait for it to become responsive or kill it." On the Tricky, the tab started at 161 mb memory used, and grew to 242 mb before the video loop ceased and the page became unresponsive.
,
Jun 14 2017
Note that in my case, the app is now hung and "Page unresponsive" dialog has come up. At this point, if this were a real kiosk app, it would be pretty dead I think. @mlight: Can you please help us dissect this ?
,
Jun 14 2017
Michael is retesting to confirm that disabling hardware decode reduces impact of the issue. His tests so far shows that the issue does happen on 56 as well... so its not a new issue. Considering that this may break kiosk use cases, I'm moving to P0 to raise visibility until we have clarity on root cause.
,
Jun 14 2017
The test run in comment 39 had Hardware-assisted Video Decoding Enabled. I have now disabled it on both the Zako & Tricky and both systems are nearing one hour of video looping. Memory usage grew slowly for about 20 minutes, but seemed to reach a peak on both platforms: Zako (M56): 194 mb Tricky (M59): 201 mb I'll let both run overnight to see how they do.
,
Jun 15 2017
Do we perhaps have a repro outside of the above plnkr.co test case (specifically with no errors reported by HW decoder)? The errors returned by the hardware decoder are due to the stream in the test case being nonconformant with the H.264 specification: the SPS header specifies codec level 3.0, but the picture size is 1920x1088, which is too big for level 3.0. The hardware codec stack on veyron and Intel CrOS devices mentioned above will see this and return an error. It will not decode any frames in this stream, reporting the error from VDA::Decode() all the way up to the renderer (this is indicated above by "MediaEvent: PIPELINE_ERROR pipeline: decode error"). So apart from the possibility of a leak in the HW/GPU decoder stack itself, there is a possibility of a leak in handling decoder errors from VDAs somewhere higher up the stack, which could only manifest on platforms using media::H264Decoder (VaapiVideoDecodeAccelerator on all Intel CrOS devices, as well as V4L2SVDA mentioned above on veyrons and kevin), because other VDAs could relatively easily ignore/correct the invalid value in SPS and keep decoding. On Windows only D3D11VideoDecodeAccelerator appears to be using the same H264Decoder class, but only on Win8+ and with disabled by default flag kD3D11VideoDecoding (https://cs.chromium.org/chromium/src/media/gpu/gpu_video_decode_accelerator_factory.cc?l=171). I don't know if DXVA would also fail here, so I don't know if the root cause would be the same there. One way to debug this would be to simply add an explicit stream error failure in the Decode() call of any VDA on any platform, and re-run the test case.
,
Jun 15 2017
@posciak, another plunker but mp4 tags are removed, webm only, no decode errors in logs. Still crashes tab about 5-10 minutes on Panther on 59. http://plnkr.co/edit/VsFSqvif1ExvBlucgchP Possibly notable, just before crash console has the following line: Failed to create temp file 11 : An operation that depends on state cached in an interface object was made but the state had changed since it was read from disk.
,
Jun 15 2017
After 19 hours with Hardware-assisted Video Decoding Disabled, the Plunker app is still alive on Zako (M56), with memory usage peak is at 214MB (up from 194); and on the Tricky (M59) with memory usage peaking at 228MB (up from 200).
,
Jun 16 2017
At some point during the night (27+ hours) the Tricky on M59 plunker video loop crashed, not only the tab but the entire browser disappeared. The zako on M56 is still running plunker at 42 hours, but the peak memory usage for the tab is at 252mb and it probably will croak later today.
,
Jun 16 2017
mlight@ any crash id in chrome://crashes for that?
,
Jun 16 2017
Alas, I need to use the Tricky for some CfM M60 full-release testing, so I wiped the device. I'll be sure to look for crash files on the zako when it chokes. Its plunker tab is now at a 259mb peak, so shouldn't be long...
,
Jun 16 2017
,
Jun 21 2017
Issue 735643 has been merged into this issue.
,
Jun 21 2017
Issue 735643 has a minium repro extracted as a html page from a customer. It eats about 200MB every hour on veyron_fievel.
Not sure if it related, but I saw the following log when the video starts:
[1:11:0621/155111.199116:ERROR:render_media_log.cc(30)] MediaEvent: MEDIA_ERROR_LOG_ENTRY {"error":"FFmpegDemuxer: open context failed"}
[1:1:0621/155111.199913:ERROR:render_media_log.cc(30)] MediaEvent: PIPELINE_ERROR DEMUXER_ERROR_COULD_NOT_OPEN
,
Jun 27 2017
Hi Pawel, do we have any update on this?
,
Jun 28 2017
Re 51: Here's a data point. This is the test result of video_VideoDecodeMemoryUsage in chrome performance dashboard. The test plays a h264 video for 70 times in a loop. The leak was 100KB and didn't change much between m54 to m59 on veyron fievel. https://chromeperf.appspot.com/report?sid=f681a62eced81ccbd0a8c735307b64dffc6184c0d2bb161fa7d9cc9c5dbf7106
,
Jun 28 2017
Have we completely eliminated the possibility of leaking something on error? It appears that there are still some pipeline errors present from #51? As I mentioned in #43, one way to speed up/simplify repro could be just to always unconditionally return an error on first decode into the stream... xiyuan@: do we perhaps have a feedback report/logs from the repro case in #51?
,
Jun 28 2017
dalecurtis@: would you perhaps have any ideas on how we could further minimize/simplify the repro cases and any suggestions for tracking this down please? Thanks!
,
Jun 28 2017
Re 53: 100KB is the memory increase after playing the video 70 times. Sometimes the memory is not freed immediately after stop. So it doesn't mean we are leaking 100KB. The more important metrics is the difference of memory increase between different versions.
,
Jun 28 2017
Re #54: I don't. But I can repro the problem with the mini html extracted from customer's app in issue 735643. The page loops 4 customer mp4 videos and switch every 1 second (to stress test). I load the html in a tab on veyron_fievel and the renderer crashed eventually. During the time, the renderer memory goes up and down between 170MB and 700+MB. Devtools does not show leaks in v8 (delta is minimum between two snapshots taken 8 hours apart).
,
Jun 28 2017
A few notes from our recent tests with Chrome OS devices - On devices with GPU decode enabled, we see file descriptors climb with each video load, and it does not go back down. A quick test this morning ran up over 1000 descriptors for a tab running the test below. Memory appears to climb as well, but not as fast as with GPU decode disabled. On devices with GPU decode disabled, memory appears to climb faster, but file descriptors do not climb. Both cases will eventually crash. Videos do appear to have some effect on the issue. Included is an MP4 that triggers the issue repeatedly (and quickly) on our test devices. Drive link (video attachments too large) - Includes minimum repro html, test video, and video showing repro results on Panther 59.0.3071.91. If you have problems getting the repro to run, try increasing the timeout on the timer. The call to .load() does not appear to have any effect on the issue. https://drive.google.com/drive/folders/0BzOyOeeyLV7zYXd3Zmg5VXVrcW8?usp=sharing In an effort to help rule out network/cache/filesystem issues we have also repro'd using a video fetched/stored in IndexedDB, played using blob url, same result.
,
Jun 28 2017
Additional repro case - Every 500ms it sets the src, begins playback, waits for the play promise, then waits 150ms, clears src, and calls load. I believe this follows the recommendations for clearing src, calling load, and waiting for the play promise.
,
Jun 28 2017
posciak@ several chromium folk here seem to be able to repro with ChromeOS; is your team not able to reproduce even after following c#57? walter@ memory climbing higher with software decode isn't unsual so long as it reaches a stable state. I'm surprised to hear it's crashing as I was not able to repro that on desktop devices. Your attached html files don't work since we don't have the 27406.mp4 file. Can you include it? Using a video from file:// is the best thing to test. blob:// URLs end up creating shared memory which could be confounding this. Though that would also be interesting to report. See issue 715859 . File descriptor leak is interesting and indicates perhaps some sort of shared memory or related leak.
,
Jun 28 2017
dale@ I updated the permissions on the drive folder, it should be public now. I will re-run our tests with GPU decode off to verify crash is still occurring in that scenario.
,
Jun 29 2017
I am trying to reproduce the issue on Kevin and Cyan. (will barrow a veyron for testing later.)
No lock to reproduce the issue yet after about 1 hour (following the instruction in c57).
In which process, do we observed the fd leaking? The tab, gpu, or browser.
The only number I saw keeping increasing is the fd # in browser process on Cyan. And I found most of them are those deleted buffers of cras.
lrwx------. 1 chronos chronos 64 Jun 29 09:37 927 -> /dev/shm/cras-1707-stream-000b00de (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:37 928 -> /dev/shm/cras-1707-stream-000b00de (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 932 -> /dev/shm/cras-1707-stream-000b00e1 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 933 -> /dev/shm/cras-1707-stream-000b00e1 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 934 -> /dev/shm/cras-1707-stream-000b00e2 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 935 -> /dev/shm/cras-1707-stream-000b00e2 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 936 -> /dev/shm/cras-1707-stream-000b00e3 (deleted)
lrwx------. 1 chronos chronos 64 Jun 29 09:38 937 -> /dev/shm/cras-1707-stream-000b00e3 (deleted)
#localhost fd # ls -l | grep deleted | grep cras | wc
590 7080 57820
But the same behavior is not observed on kevin.
BTW, it has been increased to 664 while I am typing.
#localhost fd # ls -l | grep deleted | grep cras | wc
752 7968 65083
cc+: dgreid, chinyue to look at the cras issue.
Will also try to simulate a decode error in VDA to see if it helps.
,
Jun 29 2017
dalecurtis@: could you provide instructions for the repro on Chrome OS you were referring to in #60 please? Thanks! xiyuan@: would you be able to after reproducing "Submit feedback" from the Chrome menu in the top right hand side of the screen, and provide feedback id for it? Thank you. In general, if possible, may I ask all reporters to any time this reproduces whenever possible submit a feedback report and let us know here please? Thank you!
,
Jun 29 2017
,
Jun 29 2017
Re #63: Filed a feedback report. https://feedback.corp.google.com/product/208/neutron?lView=rd&lRSort=1&lROrder=2&lRFilter=1&lReportSearch=731808&lReport=67111245539
,
Jun 29 2017
The play starts around 06-21 12:06 and crash happens around 06-22 14:22. Unfortunately, the renderer crash is not picked up.
,
Jun 29 2017
forked cras FD issue to #738023
,
Jun 29 2017
@posick: I was pointing at xiyuan@'s reference to the HTML file attached to issue 735643 that contains a simplified html looping four videos.
,
Jun 29 2017
@xiyuan: Interesting, I didn't know you were letting it run for so long. I'll retest on desktop using your link and walter's.
,
Jun 30 2017
Unable to get a crash ID, but feedback report sent after crash. Feedback report comment has '731808' and 'https://bugs.chromium.org/p/chromium/issues/detail?id=731808 in it' Repro'd using Walter's test above on 59 panther
,
Jul 3 2017
While I cannot reproduce the leak on Kevin, I do observed FD leaks in the "browser process" on veyron_minnie. After running few hours, there are some fd associated with /dev/shm/.chrome.google.Chrome.XXXXX are not released. Then I try to simulate the decoder error as Pawel suggest. Then I can observed the FD leak in the tab process. (I believe this is the original issue.) The leaked FD are still pointed to /dev/shm/.chrome.google.Chrome.XXXXX. And the leaking speed is fast. Is there any trick I can find out who are allocating those shared memory?
,
Jul 5 2017
I reproduced the crash on edgar. 2017-07-05T16:04:53.525868+08:00 ERR chrome[27769]: cras_client: stream_connected calls wake_aud_thread, id 0x4809b7 2017-07-05T16:04:53.526644+08:00 ERR chrome[27769]: cras_client: stream_connected calls close, id 0x4809b7, ret rc0: 0, rc1: 0 2017-07-05T16:04:53.751810+08:00 WARNING crash_reporter[21003]: Received crash notification for chrome[9359] user 1000 (called directly) 2017-07-05T16:04:53.898942+08:00 INFO kernel: [168929.354099] traps: Media[9370] trap invalid opcode ip:61803eeac6f0 sp:71c41fedd350 error:0 in chrome[61803dc00000+69cb000] 2017-07-05T16:04:53.926626+08:00 WARNING crash_reporter[21024]: [user] Received crash notification for chrome[9359] sig 4, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-07-05T16:04:53.985261+08:00 WARNING crash_reporter[21026]: Received crash notification for chrome[9359] user 1000 (called directly) 2017-07-05T16:04:54.063192+08:00 ERR chrome[27769]: cras_client: client_thread_rm_stream, id:0x4809b7 2017-07-05T16:04:54.063905+08:00 ERR chrome[27769]: cras_client: stop_aud_thread stream id:0x4809b7, join:1 The line 2017-07-05T16:04:53.898942+08:00 INFO kernel: [168929.354099] traps: Media[9370] trap invalid opcode ip:61803eeac6f0 sp:71c41fedd350 error:0 in chrome[61803dc00000+69cb000] looks very suspicious. The cras_client message was irrelevant as I was debugging https://bugs.chromium.org/p/chromium/issues/detail?id=738023. I was using R61-9663.0.0 edgar image with local built CRAS and runs videosrctest2.html (with timeout set to 150 ms instead of 500 ms). The crash log was uploaded to https://crash.corp.google.com/browse?stbtiq=3dc4fe5738000000#3
,
Jul 6 2017
Have a fix for the memory leaking in decoded error streams: https://chromium-review.googlesource.com/c/560810/
,
Jul 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2348a1ee4f7ca0dd3e75a229d955b94a14495aab commit 2348a1ee4f7ca0dd3e75a229d955b94a14495aab Author: Owen Lin <owenlin@google.com> Date: Fri Jul 07 12:37:12 2017 gpu_video_decoder: Use unique_ptr to track the ownership of SHMBuffer The |shm_buffer| is leaked in NotifyError(), where it remove an entry from |bitstream_buffers_in_decoder_| without freeing the |shm_buffer|. Remove the usage of native pointer to have better ownership of the SHMBuffer. BUG= 731808 TEST=Play the mem_leak_loop_mp4.html in the issue and make sure no FD leaks. Change-Id: Ic475d1780ddf5ea32be6290e737f626ed8e4cd09 Reviewed-on: https://chromium-review.googlesource.com/560810 Reviewed-by: Pawel Osciak <posciak@chromium.org> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Commit-Queue: Owen Lin <owenlin@chromium.org> Cr-Commit-Position: refs/heads/master@{#484896} [modify] https://crrev.com/2348a1ee4f7ca0dd3e75a229d955b94a14495aab/media/filters/gpu_video_decoder.cc [modify] https://crrev.com/2348a1ee4f7ca0dd3e75a229d955b94a14495aab/media/filters/gpu_video_decoder.h
,
Jul 7 2017
Should we be merging to M60 given that we are close to stable and this is not a regression?
,
Jul 7 2017
I recommend merging this fix to M60 though it is not a regression. Atleast two enterprise customers raised this issue.
,
Jul 7 2017
Important to take into account is that this leak should only happen if the video played back is erroneous and fails to play with an error. This was the case in the report and repro cases provided above. It was also small enough that this had to happen in a tight loop repeatedly attempting to play such videos over a longer time period.
,
Jul 7 2017
Based on the repro that Xiyuan worked on, continuous video playback was fine (the video was playing without any issues. The app was trying to swap videoa every 5 seconds or so) but the renderer crash eventually happened. Related bug is here - https://bugs.chromium.org/p/chromium/issues/detail?id=735643 I believe that is addressed by this fix as well?
,
Jul 7 2017
#77: Can you share a way to see information about the erroneous stream? Running our test files through ffmpeg/ffprobe/VLC I am not seeing any errors or incorrect sizes? Is there a tool to validate the stream/format?
,
Jul 10 2017
,
Jul 10 2017
This bug requires manual review: M60 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 10 2017
@79: The H264 reference decoder should show the error in its output, and this can also be confirmed through examining the H264 headers (please see comment #43 for details on the issue). The H264Decoder class in Chrome will also show the error if debug logs are enabled.
,
Jul 10 2017
@78: Since I am not able to reproduce the issue (http://crbug.com/735643) on veyron minnie without faking a decoding error in h264 decoder, I am not sure if it is fixed by the CL.
,
Jul 10 2017
Merge approved for 60.
,
Jul 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/351051a805d30014a4bbd18b4c1eca3022e271e6 commit 351051a805d30014a4bbd18b4c1eca3022e271e6 Author: Owen Lin <owenlin@google.com> Date: Tue Jul 11 05:49:40 2017 gpu_video_decoder: Use unique_ptr to track the ownership of SHMBuffer The |shm_buffer| is leaked in NotifyError(), where it remove an entry from |bitstream_buffers_in_decoder_| without freeing the |shm_buffer|. Remove the usage of native pointer to have better ownership of the SHMBuffer. BUG= 731808 TEST=Play the mem_leak_loop_mp4.html in the issue and make sure no FD leaks. TBR=owenlin@google.com (cherry picked from commit 2348a1ee4f7ca0dd3e75a229d955b94a14495aab) Change-Id: Ic475d1780ddf5ea32be6290e737f626ed8e4cd09 Reviewed-on: https://chromium-review.googlesource.com/560810 Reviewed-by: Pawel Osciak <posciak@chromium.org> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Commit-Queue: Owen Lin <owenlin@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#484896} Reviewed-on: https://chromium-review.googlesource.com/566148 Reviewed-by: Owen Lin <owenlin@chromium.org> Cr-Commit-Position: refs/branch-heads/3112@{#575} Cr-Branched-From: b6460e24cf59f429d69de255538d0fc7a425ccf9-refs/heads/master@{#474897} [modify] https://crrev.com/351051a805d30014a4bbd18b4c1eca3022e271e6/media/filters/gpu_video_decoder.cc [modify] https://crrev.com/351051a805d30014a4bbd18b4c1eca3022e271e6/media/filters/gpu_video_decoder.h
,
Jul 11 2017
,
Jul 31 2017
verified on 9592.71.0, 60.0.3112.80
,
Aug 30 2017
I'm still observing this issue on a Mickey on 60.0.3112.112 though it may be a different decoder. Rapid memory leak which eventually results in app/tab crash. Device log attached, quite a few of the following error: [2630:4049:0830/101711.770934:ERROR:v4l2_slice_video_decode_accelerator.cc(1450)] DecodeBufferTask(): Setting error state:4
,
Sep 4 2017
Let us track the issue at https://bugs.chromium.org/p/chromium/issues/detail?id=761852 |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by josh@arreya.com
, Jun 9 20172.7 MB
2.7 MB Download