Issue metadata
Sign in to add a comment
|
Regression in 10071.0.0/64.0.3251.0 video freezes |
|||||||||||||||||||||||||||
Issue descriptionChromeOS version: 10071.0.0/64.0.3251.0 and higher ChromeOS device model: any Case#: 15453351 Description: Video randomly freezes in app (both in kiosk and chrome OS mode). Disabling GPU doesn't help. Video of the issue (RVG): https://drive.google.com/open?id=1JeTtP1dyMIH7InLSoEqZ8jCbqVyIZ6zl Steps to reproduce (adopted to be able to reproduce at Google): 1. Extract toyota-fixed.zip to Downloads on any chrome device with version 10071.0.0/64.0.3251.0 or higher (65,66 also will suit). https://drive.google.com/open?id=1H2hGkVvCXN8JSgXxtpkwN1kegS5lz6jF 2. Navigate to chrome://extensions, enable developer mode, click on "Load unpacked", point to Downloads/toyota-fixed folder. 3a. Click on Setup-> enter anything 3b. Choose eu-en 3c. Choose "TNO test" 3d. Click "Play videos" 4. Wait about 15 minutes Current Behavior / Reproduction: Video freezes, sometimes after 1-2 minutes, sometimes after 10-30 minutes. Expected Behavior: Video not freezes. Drive link to logs: Attached debug-logs_20180425-122428, frozen around 12:21 It never freezes on versions 10070.0.0 /64.0.3249.0 and earlier. List of changes (please help to triage or choose suspect, maybe video-related, maybe V8-related, (as change "Update V8 to version 6.4.143." is present in bissect list) https://chromium.googlesource.com/chromium/src/+log/64.0.3249.0..64.0.3251.0?n=10000
,
Apr 26 2018
,
Apr 26 2018
Please use Blink>JavaScript component for V8-related issues. Infra>Client>V8 is used to track issues with V8 development infrastructure.
,
Apr 26 2018
Seems unlikely Baileys change would cause this. It's mostly a renaming and changing the meaning of the identifier used to call apps/extensions that implement the file system provider API.
,
May 1 2018
It additionally freezes on Linux, Mac, Windows as well, but never freezes on ARM Chrome OS devices. Maybe this will help to find a candidate in https://chromium.googlesource.com/chromium/src/+log/64.0.3249.0..64.0.3251.0?n=10000 ?
,
May 2 2018
+ atwilson as this is breaking Toyota, a major signage customer and it's preventing them from moving past 63. marchuk@ has already narrowed the issue down to a small list of changes, can we get a resource assigned that might be able to identify which change it is and revert?
,
May 3 2018
Please bisect, for repro instructions please have a look at the summary.
,
May 3 2018
This looks like it might be crbug.com/813218 - does this still happen on M65+?
,
May 3 2018
No, not that one, I just checked again on 65+ and 66, e.g. 66.0.3359.139 which already includes all commits form crbug.com/813218 but still always can reproduce it.
,
May 8 2018
marchuk@ Thanks for the issue. Unable to access the shared video and toyota-fixed.zip links provided in the original comment as it seems to be RVG. Hence adding 'TE-NeedsTriageFromHYD' label and requesting Inhouse team to look into the issue and help in further triaging. Thanks..
,
May 8 2018
Ivan, sounds like the next step here is to come up with a repro, and figure out which of the CLs in https://chromium.googlesource.com/chromium/src/+log/64.0.3249.0..64.0.3251.0?n=10000 is causing this. I'd suspect one of the V8 or RTC rolls.
,
May 9 2018
Hi guys - please can I ask for an update in terms of potential timeline to fix? This has now been escalated up to CEO level at Toyota as a critical issue affecting their showrooms across Europe. They are asking for a f2f to address and update. Any info you can share would be greatly appreciated. Many thanks. Matt
,
May 9 2018
,
May 9 2018
No update yet - we aren't even sure yet where the problem lies. Problem is that given that the repro steps require up to 30 minutes *per iteration* it's going to take a significant amount of time to figure out which CL is to blame so we can find the correct owner. xhwang: is there anyone on the video team that can help directly debug this? That may get us a resolution faster than us trying to bisect, then assign to an owner. In the meantime, best if Toyota stays pinned to M63 - are there any urgent issues that would prevent them from remaining pinned to that version?
,
May 9 2018
Thanks for the update. I appreciate the info I have a call this afternoon regarding this and there is a request from Toyota to have a senior level call in coming days. Anything further you can relay in that time is of course appreciated. Will find out the burning need to move beyond 63 in the short term
,
May 9 2018
Ran the bisect on my Linux server and below is the output. + apacible adding you since Toyota seems to have broken on https://chromium.googlesource.com/chromium/src/+/69e12d5a958299ebc8b5eaabae4b32f831f6d93b any ideas of why that commit would cause video playback to freeze for Toyota's kiosk app? $ python bisect_builds.py -o -a linux64 -g 64.0.3249.0 -b 64.0.3251.0 Bisecting on continuous Chrome builds. If you would like to bisect on release builds, try running with -r option instead. Previous -o options is currently changed to -r option as continous official builds were added for bisect Downloading list of known revisions... (use --use-local-cache to cache and re-use the list of revisions) Total of 98 percent of perf builds available in the range Downloading revision 511684... Bisecting range [511318 (good), 512036 (bad)]. Trying revision 511684... Revision 511684 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: g Downloading revision 511860... Bisecting range [511684 (good), 512036 (bad)]. Trying revision 511860... Revision 511860 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b Downloading revision 511772... Bisecting range [511684 (good), 511860 (bad)]. Trying revision 511772... Revision 511772 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b Downloading revision 511728... Bisecting range [511684 (good), 511772 (bad)]. Trying revision 511728... Revision 511728 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b Downloading revision 511706... Bisecting range [511684 (good), 511728 (bad)]. Trying revision 511706... Revision 511706 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b Downloading revision 511695... Bisecting range [511684 (good), 511706 (bad)]. Trying revision 511695... Revision 511695 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: g Downloading revision 511700... Bisecting range [511695 (good), 511706 (bad)]. Trying revision 511700... Revision 511700 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b Downloading revision 511697... Bisecting range [511695 (good), 511700 (bad)]. Trying revision 511697... Revision 511697 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b Downloading revision 511696... Bisecting range [511695 (good), 511697 (bad)]. Trying revision 511696... Revision 511696 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b You are probably looking for a change made after 511695 (known good), but no later than 511696 (first known bad). CHANGELOG URL: The script might not always return single CL as suspect as some perf builds might get missing due to failure. https://chromium.googlesource.com/chromium/src/+log/69fc8466a5a1b6ff4e9503c1bf7105c19e12e1f1..69e12d5a958299ebc8b5eaabae4b32f831f6d93b
,
May 9 2018
Thanks, Jay - reassigning to apacible@ for further investigation.
,
May 9 2018
,
May 9 2018
Workaround may be as simple as Toyota specifying: preload: "auto" in their app source code. @apacible: can you confirm that auto is the old default for preload?
,
May 9 2018
I can no longer reproduce the freeze in 66 if I set preload to auto in Toyota's app. In file build/bundle.js, change around line 5241 from: ref: this.setVideo, src: this.state.uri }); to be: ref: this.setVideo, preload: "auto", src: this.state.uri }); and reload. Once this is set I can no longer reproduce the video freeze. @marchuk: can you try this on your end and if confirmed share this workaround with Toyota so they can move up to latest stable?
,
May 9 2018
Yes, "auto" was the old default for preload. They should set preload=auto for their video.
,
May 9 2018
thanks apacible, any ideas why the video would freeze at random with preload=metadata (new default) ?
,
May 9 2018
+dalecurtis for preload= behavior I'm (really) slow to respond this week so adding dalecurtis for follow up questions. :)
,
May 9 2018
Are they keying anything off video events? preload=metadata vs preload=auto will generate a different pattern of loading events. I don't expect any hangs from this implicitly; in fact I'd expect less since preload=metadata closes connections sooner and doesn't hit the 6 connection cap as readily. If you can grab the chrome://media-internals state when it's hung that would be helpful in understanding what the element thinks it's doing. +hubbe in case there's something interesting at the cache level for this.
,
May 10 2018
Not hangs anymore on 66 with 'preload: "auto",' added. Attached media-internals from the state when it hung (no 'preload: "auto",' added).
,
May 10 2018
Unable to reproduce the issue on Windows 10, mac 10.13.3 & Debian Rodate using chrome reported version-64.0.3251.0 , stable-66.0.3359.139 & latest Canary-68.0.3425.0 as per the steps mentioned in C#0. Able to play video successfully without any freeze.Tested this issue for min 10 to 20 mins & no issue observed while playing the video. Please check the issue on latest chrome versions with clean profile & let us know your observations on the same. Please find the attached screencast for reference. Thanks..!
,
May 10 2018
Last line in media-internals is the pipeline underflowing on audio and stopping playback waiting for more data, which it never gets:
{
"time": 1747.6959999999963,
"key": "audio_buffering_state",
"value": "BUFFERING_HAVE_NOTHING"
},
{
"time": 1747.8310000000056,
"key": "pipeline_buffering_state",
"value": "BUFFERING_HAVE_NOTHING"
}
I notice the URLs in the log are things like filesystem:chrome-extension://hdlcifnijdlfeifjfdcmmgkceekdkdmf/persistent/video-cache/C-HR%252071N%2520promo%2520animasjon%2520to%2520biler_tcm-11-1307544.mp4
Which IIUC our buffering code correctly, we are marking as assuming they're fully buffered. So this seems to confirm the findings in c#1 that we're getting blocked on a filesystem read. Are these actually on disk resources or some network loaded resources? I don't see them in the package, so I'm guessing these are being downloaded.
It's it possible the download and the playback are deadlocking on the filesystem?
,
May 11 2018
c #26 - I was unable to reproduce the issue when connected via remote desktop or any sort of VM setup. I could only get it to occur once I tried on gLinux desktop hooked up to a local monitor. Can you try that?
,
May 14 2018
jayhlee@, Thanks for the reply. Tried this issue as per C#28 on Debian machine using reported version-64.0.3251.0,stable-66.0.3359.170 & Canary-68.0.3430.0.Seems behavior is very inconsistent on these version as per C#0. Some times video played successfully till 20 to 30 mins upon reopen the extension and play video & some times its getting freeze. As we are seeing bisect info on C#16 & due to inconsistent behavior removing 'Needs-bisect' & 'TE-NeedsTriageFromHYD' label.
,
May 14 2018
=>hubbe who has the most experience on matters of loading, per jayhlee@ above it looks like this can be reproduced on linux. AFAICT from the logs above loading is hung from the media pipeline's point of view. There's not enough information to tell if we're hung in the Multibuffer stage or somewhere lower though. hubbe@ can you take a look and rule out any multibuffer hangs here? Can someone from the filesystem team take a closer look to see if/why reading is blocked at that level? +some filesystem owners (kinuko, jsbell, tzik)
,
May 15 2018
,
May 16 2018
I'm not 100% yet, but it looks like the problem might be on a lower level than media buffering code. From what I can tell, our code defers loading early on in the file a very short time later it sets deferred to false, but gets no more calls to didReceiveData, even though it's not at the end of the file yet. Since this is pulled from the filesystem, not http, I suspect a bug somewhere in the code that handles deferring for the filesystem, but I'm not sure where that code is yet.
,
May 16 2018
Not sure why, but it seems that I can't trigger this bug if I specify: --disable-features=LoadingWithMojo Will look more tomorrow.
,
May 16 2018
Hi guys. Thanks for your ongoing efforts on this. Toyota are testing as we speak and I expect feedback in the coming days. Will share as soon as I hear. Regards Matt Stevens matthewstevens@google.com Chrome Sales Specialist +447342053921 Google UK Belgrave House 76 Buckingham Palace Road London SW1W 9TQ Registered in England Number: 3977902 This email may be confidential and privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. The above terms reflect a potential business arrangement, are provided solely as a basis for further discussion, and are not intended to be and do not constitute a legally binding obligation. No legally binding obligations will be created, implied, or inferred until an agreement in final form is executed in writing by all parties involved.
,
May 16 2018
+Blink>Loader per c#33
,
May 16 2018
+cmumford, who did a little work recently with filesystem+extensions+mojo
,
May 16 2018
Assigning to blink loader OWNER. Feel free to assign back to me if I'm wrong about this being a problem at a lower level.
,
May 18 2018
I couldn't reproduce the issue with Chorme 66 on Linux. hubbe@, I don't know anything about video, could you create another reproduction case which is easier to reproduce?
,
May 18 2018
I was able to reproduce it with a debug build on linux, although it did not reproduce every time. Since I have no idea what's actually wrong, I have no idea if I can create a better test case.
,
May 18 2018
The filesystem work that has currently landed is all minor cleanup and refcatoring. The filesystem/mojo work is just about to land (http://crrev.com/c/982752). I have landed some extensions/WebUI work, but I believe that should only be enabled if the network service is explicitly enabled. Happy to help debug though, but it sounds like nobody's got a good repro.
,
May 21 2018
The repro hubbe@ is using seems reliable enough for yhirano@ and cmumford@ to experiment with locally. hubbe@, can you share the details of how you're triggering the reproduction?
,
May 21 2018
I just did a chromium debug build, loaded up the extension as listed in #1. The steps in #1 is missing a step between 2 and 3a though: Open a new tab and go to "apps" I would usually do right-click and select "inspect", and select the network tab as well, as that makes it easier to see when it's hung.
,
May 22 2018
A little further debugging seems to indicate that when we hang, it's because the blink::ResoureLoader::CancelTimerFired() occurs. It doens't seem to be documented, so I have no idea if that is normal or not. Here is a stack trace which kind of shows what's going on: #0 0x7f4523ff5abd base::debug::StackTrace::StackTrace() #1 0x7f4523d1f4bc base::debug::StackTrace::StackTrace() #2 0x7f451ef7917d content::URLResponseBodyConsumer::Cancel() #3 0x7f451ef71eb0 content::URLLoaderClientImpl::~URLLoaderClientImpl() #4 0x7f451ef71f89 content::URLLoaderClientImpl::~URLLoaderClientImpl() #5 0x7f451ef569eb content::ResourceDispatcher::RemovePendingRequest() #6 0x7f451ef56ea5 content::ResourceDispatcher::Cancel() #7 0x7f451ef7b81a content::WebURLLoaderImpl::Context::Cancel() #8 0x7f451ef887b1 content::WebURLLoaderImpl::Cancel() #9 0x7f451ef863b8 content::WebURLLoaderImpl::~WebURLLoaderImpl() #10 0x7f451ef863f9 content::WebURLLoaderImpl::~WebURLLoaderImpl() #11 0x7f450b15cfea blink::ResourceLoader::HandleError() #12 0x7f450b15bafb blink::ResourceLoader::Cancel() #13 0x7f450b15ac56 blink::ResourceLoader::CancelTimerFired() #14 0x7f450b0257e1 blink::TaskRunnerTimer<>::Fired() #15 0x7f450b05d647 blink::TimerBase::RunInternal()
,
May 22 2018
Hmm, it seems the reason the connection is getting cancelled is because the media code is closing the connection, which it probably shouldn't do...
,
May 22 2018
Problem found, fix on the way.
,
May 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/86281d8307cde2fde7e303df1460b221701760c2 commit 86281d8307cde2fde7e303df1460b221701760c2 Author: Fredrik Hubinette <hubbe@google.com> Date: Wed May 23 18:16:28 2018 Don't destroy multibuffer reader if read operation is pending In a few cases, we would call "havenoughdata" even when there is a read operation pending. If we destroy the reader, the read operation never completes and video playback hangs. Bug: 836952 Change-Id: Ic66727aa51ddfeb8edcffc72d44457422087a8af Reviewed-on: https://chromium-review.googlesource.com/1069706 Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Commit-Queue: Fredrik Hubinette <hubbe@chromium.org> Cr-Commit-Position: refs/heads/master@{#561162} [modify] https://crrev.com/86281d8307cde2fde7e303df1460b221701760c2/media/blink/multibuffer_data_source.cc
,
May 23 2018
This is now fixed on ToT Not sure if it makes sense to merge back to earlier versions or not.
,
May 23 2018
Probably too close to branch cut and there's a known workaround.
,
May 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/563e724aa42bd53435c3b8edfe34d9693ee5371c commit 563e724aa42bd53435c3b8edfe34d9693ee5371c Author: Fredrik Hubinette <hubbe@google.com> Date: Wed May 23 20:43:05 2018 test that multibuffer readers don't get destroyed with outstanding reads Test for https://chromium-review.googlesource.com/c/chromium/src/+/1069706 Bug: 836952 Change-Id: Icf28f1e14ea89bf6a0f56c8069f5a21160e356d2 Reviewed-on: https://chromium-review.googlesource.com/1067768 Commit-Queue: Fredrik Hubinette <hubbe@chromium.org> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Cr-Commit-Position: refs/heads/master@{#561242} [modify] https://crrev.com/563e724aa42bd53435c3b8edfe34d9693ee5371c/media/blink/multibuffer_data_source_unittest.cc
,
May 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6dd3f322c16a3d1962d7ddade5558f2cccb5db79 commit 6dd3f322c16a3d1962d7ddade5558f2cccb5db79 Author: Fredrik Hubinette <hubbe@google.com> Date: Thu May 24 01:00:51 2018 media: Avoid closing connection if we're going to play soon Bug: 836952 Change-Id: I381042fd107b954b5d47f1dae7e392e5bd3752dc Reviewed-on: https://chromium-review.googlesource.com/1069804 Commit-Queue: Fredrik Hubinette <hubbe@chromium.org> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Cr-Commit-Position: refs/heads/master@{#561345} [modify] https://crrev.com/6dd3f322c16a3d1962d7ddade5558f2cccb5db79/media/blink/webmediaplayer_impl.cc
,
May 24 2018
Tagging next action to check out UMA metrics on rebuffering next week.
,
May 29 2018
The NextAction date has arrived: 2018-05-29
,
May 29 2018
Eh, no change in rebuffering metrics that I can see, so seems the penalty of rapid open-close-open of the connection didn't harm much at least on canary populations. Probably worth checking again on stable: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=93831e20984a8132c82aff40b651b5a4 |
||||||||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||||||
Comment 1 by marchuk@chromium.org
, Apr 26 2018++trace where it hung after 10seconds and net-internals where we can see it's actively using File System Provider: filesystem:chrome-extension://dalbkmbnmiaeaginmbbhdnlgmplioacg/persistent/video-cache/C-HR%2520driving%2520in%2520rain_tcm-11-1307546.mp4 and hungs at: t=138733 [st=53] DELEGATE_INFO [dt=28] --> delegate_blocked_by = "MojoAsyncResourceHandler" t=138761 [st=81] URL_REQUEST_JOB_FILTERED_BYTES_READ --> byte_count = 32768 t=138761 [st=81] URL_REQUEST_JOB_FILTERED_BYTES_READ --> byte_count = 32768 t=138761 [st=81] +DELEGATE_INFO [dt=2] --> delegate_blocked_by = "MojoAsyncResourceHandler" t=138763 [st=83] CANCELLED t=138763 [st=83] -DELEGATE_INFO t=138764 [st=84] -REQUEST_ALIVE Therefore ++Bailey to check change https://chromium.googlesource.com/chromium/src/+/9bce89359c786654e07c1b5017c8514d1875c206 "Refactor extension_id to provider_id in File System Provider"134 KB
134 KB Download
1.0 MB
1.0 MB View Download