New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 678206 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 2
Type: Bug

Blocking:
issue 672570



Sign in to add a comment

Media loading does not work well when site isolation is enabled

Project Member Reported by zqzh...@chromium.org, Jan 4 2017

Issue description

It's more wider issue with Media loading. The issue was "Cross-origin autoplay does not work well when site isolation is enabled"

Found in: Chrome Linux 57.0.2970.0 64-bit

Steps to reproduce:

1. Open chrome://flags and turn on "Top document isolation"
2. Relaunch Chrome
3. Navigate to http://mounirlamouri.github.io/sandbox/autoplay/test.html and observe.

Expected behavior:
All four videos should start almost simultaneously.

Observed behavior:
The fourth video starts after a long wait.

I suspect this should be an issue with site isolation and autoplay.
 
Labels: Needs-Feedback
Owner: creis@chromium.org
zqzhang@, do you know if the cause if the frame taking long to load? Why do we delay the playback?

Assigning to creis@ for triage assuming it's likely a OOPIF issue.

Comment 2 by creis@chromium.org, Jan 11 2017

Labels: Proj-TopDocumentIsolation-BlockingLaunch OS-Chrome OS-Mac OS-Windows
Owner: zqzh...@chromium.org
Status: Assigned (was: Untriaged)
This is an important issue for using OOPIFs on the web, and we're considering it a blocker for --top-document-isolation.  (It affects --site-per-process as well.)  The frame does finish loading, but the video doesn't start playing.

zqzhang@: Will you be able to take a look at it this quarter?  I see that you have some initial thoughts posted to  issue 672570 .
Investigated a bit more. It's actually a wider issue in media, not only for autoplay.

The reason why the videos are not autoplaying is that some cross-origin iframe videos are not loaded in time. The readyState is HAVE_NOTHING and networkState is NETWORK_LOADING, thus not be able to play or autoplay.

Here are some observations:
1. If there are multiple videos with autoplay attribute on, all other videos will not load until the first video finishes or paused. Pausing the first video will cause some other videos load.
2. If there are multiple videos without autoplay attribute on, the videos will be loaded but in some order but some are very slow.

Is there are any kind of network/event/computation throttling that may cause this kind of issue?

PS: will talk back shortly if we can work on this in Q1
Summary: Media loading does not work well when site isolation is enabled (was: Cross-origin autoplay does not work well when site isolation is enabled)
Description: Show this description
Haven't looked, but don't forget there's a 6 connection limit per origin.
Cc: zqzh...@chromium.org
Owner: creis@chromium.org
creis@, I looked into this and I can't find anything specific to autoplay. It seems that the first network request(s) go trough then it shuts down. I have no idea why but even pressing play on these videos have no effect and will not trigger/finish loading. There are two x-origin autoplay videos on this test page and usually the first one works and the second breaks. I was able to get the second one to work but the first one then broke.

Is there anything related to loading/networking that you think could be the root cause here?
Cc: hubbe@chromium.org
Components: Internals>Network
cc:hubbe, +network folk.

Comment 9 by hubbe@chromium.org, Jan 17 2017

Are some/all of these videos using the same URL?
There is a client side cache that allows multiple <video> tags to share one resource in some circumstances. When site isolation is enabled, I suspect that this cache is effectively disabled, leading to more simultaneous http requests.

Comment 10 by nasko@chromium.org, Jan 17 2017

I'd also suggest auditing the code for usage of RenderView/RenderViewHost and their associated routing ids. We have been moving Chromium code to be based on frames, but there are still non-trivial amount of places that need to be converted over.

Comment 11 by creis@chromium.org, Jan 20 2017

Owner: hubbe@chromium.org
hubbe@ or dalecurtis@, would you be able to help investigate this?  I'm not familiar with any of the media code, and I'm not aware of current bugs that would cause this.  (We have seen some recent bugs with sibling OOPIFs, like scrolling not working in  issue 675695  or navigations failing after a crash in  issue 682024 , but these don't seem involved here.)

Including alexmos@ in case he has thoughts on this from the OOPIF side.  We're happy to chat in person if that helps.
Doesn't repro on CrOS M55, will take another look tomorrow.

I just remembered that we had an issue like this in the past,  issue 546255  which is that shown/hidden status is not correctly delivered to iframes. If it's the same issue switching away from the tab and coming back may trigger playback.

I'd guess it's some variation of that issue or the network connection limit as speculated above.
Actually guess the top comment is misleading, it wasn't the fourth video that wasn't playing for me, it was the 5th of the 5 videos. Definitely not the loading issue I mentioned, video is loading, we can seek and get a new frame painted, but playback does not progress for some reason. Generally that means audio callbacks are not occurring for some reason.
dalecurtis@, indeed, I was able to reproduce on the last video too. I guess it's a race issue: if the video at the bottom (for some reason) loads before the 2nd, the second will not block. The videos share the same URL which makes me think that hubbe@ might be into something :)

dalecurtis@, I wasn't able to get any image of the video though. The controls were staying at a "loading" state. The exact state varied slightly depending on reloads.
Issue is different on my linux desktop build. I also get no video frames out. Logging indiciates MultibufferDataSource::Initialize() is never completing for 2/5 of the videos. Still digging to see why.
Seems to be an issue deeper in the network stack. I've traced it to the end of the media code and we're never getting back a response for the associatedURLLoader() we create:

https://cs.chromium.org/chromium/src/media/blink/resource_multibuffer_data_provider.cc?l=69
https://cs.chromium.org/chromium/src/media/blink/resource_multibuffer_data_provider.cc?l=203

[1:1:0126/124332.490705:ERROR:webmediaplayer_impl.cc(310)] load(0, https://storage.googleapis.com/dalecurtis-shared/buck2.mp4, 0)
[1:1:0126/124332.491508:ERROR:resource_multibuffer_data_provider.cc(71)] Start
[1:1:0126/124332.553032:ERROR:webmediaplayer_impl.cc(310)] load(0, https://storage.googleapis.com/dalecurtis-shared/buck2.mp4, 0)
[1:1:0126/124332.553386:ERROR:resource_multibuffer_data_provider.cc(71)] Start
[1:1:0126/124332.648841:ERROR:webmediaplayer_impl.cc(310)] load(0, https://storage.googleapis.com/dalecurtis-shared/buck2.mp4, 0)
[1:1:0126/124332.649564:ERROR:resource_multibuffer_data_provider.cc(71)] Start
[1:1:0126/124332.651038:ERROR:webmediaplayer_impl.cc(310)] load(0, https://storage.googleapis.com/dalecurtis-shared/buck2.mp4, 0)
[1:1:0126/124332.651317:ERROR:resource_multibuffer_data_provider.cc(71)] Start
[1:1:0126/124332.652250:ERROR:webmediaplayer_impl.cc(310)] load(0, https://storage.googleapis.com/dalecurtis-shared/buck2.mp4, 0)
[1:1:0126/124332.652504:ERROR:resource_multibuffer_data_provider.cc(71)] Start
[1:1:0126/124332.824266:ERROR:resource_multibuffer_data_provider.cc(226)] didReceiveResponse: HTTP/1.1 200
[1:1:0126/124332.825169:ERROR:multibuffer_data_source.cc(461)] StartCallback
[1:1:0126/124332.825690:ERROR:webmediaplayer_impl.cc(1580)] DataSourceInitialized
[1:1:0126/124332.850305:ERROR:resource_multibuffer_data_provider.cc(226)] didReceiveResponse: HTTP/1.1 200
[1:1:0126/124332.852380:ERROR:multibuffer_data_source.cc(461)] StartCallback
[1:1:0126/124332.852832:ERROR:webmediaplayer_impl.cc(1580)] DataSourceInitialized
[1:1:0126/124332.860643:ERROR:resource_multibuffer_data_provider.cc(226)] didReceiveResponse: HTTP/1.1 200
[1:1:0126/124332.863440:ERROR:multibuffer_data_source.cc(461)] StartCallback
[1:1:0126/124332.863776:ERROR:webmediaplayer_impl.cc(1580)] DataSourceInitialized

The logs indicate the last provider never receives a response. Can anyone from the network team take a look? Network inspect only shows two active network requests for the video, so it shouldn't be a locked range issue in the http cache.

Comment 17 by creis@chromium.org, Jan 26 2017

Cc: mmenke@chromium.org
Owner: csharrison@chromium.org
dalecurtis@: Thanks for digging that far!

csharrison@: Can you take a look at the network stack aspect?  I wonder if this is another view/frame issue for supporting OOPIFs in the network stack?
So on Windows Canary, I'm seeing 5 videos (2 in one iframe), and only the 5th (Second one in final iframe) is stuck.  Is that what I'm supposed to see?  Looking at about:Net-internals, I see 5 requests for the same URL, and one of them (the 4th) was canceled as soon as we received headers.
Failure mode seems to vary, but at least one of the last two videos will not play.
Cc: rdsmith@chromium.org
There are a few weird things that I'm seeing. In net-internals, the two cross-origin documents are loaded twice. The first time fails.

I'm also seeing two requests to the video hang in the ResourceScheduler, issued by the out of process frames.

The ResourceScheduler is still keyed off of RVH ids (+rdsmith who is OOO), and we are working to kill it. That could be a potential issue here. Do RVH ids have any special properties in TDI mode?
Oh, and when the other videos complete (yes after waiting 10 mins), the blocked ones (for me, always the muted ones) play, and are unblocked by ResourceScheduler.

Definitely a weird ResourceScheduler issue, but there are enough things broken here that I'm not sure whose fault it is. Why are the cross-origin frames marked as cancelled when their navigations commit, and they load data?

Gonna keep digging.

Comment 22 by creis@chromium.org, Jan 26 2017

Comments 20-21: Could the cancel-then-load-again thing be due to the transfer from the old process to the new process?  (We intend to keep the same network request but transfer it to a different RenderFrameHost / global routing ID.)

TDI is a red herring-- I think this applies in --site-per-process as well, and any OOPIF mode.  Agreed that ResourceScheduler could be to blame if it's keyed on RVHs.  The OOPIFs will have a different RVH than the main frame.
Yeah sorry I misspoke, I did mean "OOPIF in general" rather than TDI mode. That different OOPIFs have different RVH ids is good to know, I had forgotten that.

I think the root cause is that we aren't receiving the OnWillInsertBody IPC, so we only are allowing 1 delayable request go through the ResourceScheduler, per OOPIF (aka per RVH). This IPC is fundamentally broken with the RVH-per-OOPIF model, because we only send the IPC for main frames [1] !!

We could just start sending this IPC for all frames in OOPIF mode. I could edit the IPC handler so it is idempotent in case a same-RVH frame. What do people think? I know this code is very hacky but we are trying to kill this class and do something better. The work is slow but steady.

[1] https://cs.chromium.org/chromium/src/content/renderer/render_frame_impl.cc?rcl=1485447450&l=4520

Comment 24 by creis@chromium.org, Jan 26 2017

Possibly important clarification: there is not one-RVH-per-OOPIF.  Two OOPIFs from the same site on the same page will share the same RVH.

Thus, a page that looks like A(A, B, B(A, C)) will have 3 RVHs: one for site A, one for site B, and one for site C.

The test page for this bug (http://mounirlamouri.github.io/sandbox/autoplay/test.html) has 2 RVHs: one for mounirlamouri.github.io and one for oldworld.fr.
Ah yes, that is what I was seeing. I think I'm just being unclear with my language.

So, I'm pretty sure #23 is correct, but now I'm having trouble even receiving the message in the browser process. How are ViewHostMsgs treated coming from OOPIFs?

I think the "correct" simple solution would be to change this to a Frame message, and pull the render_view_routing_id out of it. Note that OnNavigate is also broken (we "reset" the view's state, but only call it for main frames), but not so badly as this WillInsertBody.

Comment 26 by creis@chromium.org, Jan 27 2017

Cc: nasko@chromium.org
Yes, I think moving it over to be a Frame message makes a lot of sense.  It's almost certainly the case that the View message is getting filtered out because it's coming from a "swapped out" RenderView, and isn't whitelisted in swapped_out_messages.cc.  (We should really find a way to kill that filter if we can, now that we're using proxies.)  Thanks!
Thanks for the confirmation, I can try to get a patch up soon but might not be able to get to it until Monday.

QQ: Is there an easy way to test if we're the top frame of a OOPIF from the renderer?

Comment 28 by nasko@chromium.org, Jan 27 2017

You could, but I'd discourage that. You can iterate the frame tree looking for WebRemoteFrames.
Status: Started (was: Assigned)
CL is up at https://codereview.chromium.org/2655393004/ but I need to add tests. Ideally, the tests should be agnostic about ResourceScheduler or throttling at this layer so they will work when Randy replaces with the network throttler.
Labels: -Needs-Feedback
Project Member

Comment 31 by bugdroid1@chromium.org, Feb 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d86c35bcf47e34f09f295127796ea246746b5ab1

commit d86c35bcf47e34f09f295127796ea246746b5ab1
Author: csharrison <csharrison@chromium.org>
Date: Thu Feb 02 17:41:26 2017

Make ResourceScheduler work in OOPIF

This CL lets all frames send the "will insert body" IPC to the browser
process, rather than just main frames. This enables OOPIFs to get past
the throttling stage in ResourceScheduler.

Note that OnNavigate is still gated on main frame, so navigations within
an OOPIF will not reset the throttling state. This is a definite
performance bug, but is less critical than the current state, which
only allows a single delayable request at a time (so a hung request
will starve all delayable requests).

This CL also adds throttler-agnostic tests, as ResourceScheduler is
going away soon so we don't want to rely on its specific layering, as
it is being migrated to a global throttler in the network layer.

BUG= 678206 

Review-Url: https://codereview.chromium.org/2655393004
Cr-Commit-Position: refs/heads/master@{#447792}

[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/browser/loader/resource_scheduler.cc
[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/browser/loader/resource_scheduler_filter.cc
[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/browser/loader/resource_scheduler_filter.h
[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/browser/site_per_process_browsertest.cc
[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/browser/site_per_process_browsertest.h
[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/common/frame_messages.h
[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/common/view_messages.h
[modify] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/renderer/render_frame_impl.cc
[add] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/test/data/site_isolation/subframe_resources.html
[add] https://crrev.com/d86c35bcf47e34f09f295127796ea246746b5ab1/content/test/data/site_isolation/subframes_with_resources.html

Should be fixed, but there's fundamental problems with keying of RVH ids in ResourceScheduler, under OOPIF mode.

For a scenario with all OOPIFs under a single RVH R, we now have the problem that the "unthrottle" message unthrottles *all* of the OOPIFs, for all time, even after subframe navigation (as long as it doesn't take the frame to another RVH). This is better than the existing behavior (all OOPIFs are always throttled for all time), but it is still weird and spooky.

One middle ground would be to reset this throttle state upon subframe navigation (i.e. when OOPIF A navigates, it re-throttles OOPIF B, if they are under the same RVH). Still weird, but might lead to better behavior. 
Status: Fixed (was: Started)
Cc: dcheng@chromium.org
+dcheng
I was just thinking about this. Iframes are always loaded after the main frame has inserted the body, so it seems fine to *always* unthrottle iframe resources.

The current behavior after this patch (throttle until the *first* iframe hosted by a RVH inserts its body), has strictly more throttling than non-OOPIF case, but it seems desirable.

Am I missing something? I think we don't need anything more complicated.
> I was just thinking about this. Iframes are always loaded after the main frame has inserted the body, so it seems fine to *always* unthrottle iframe resources.

Right, I thought that was the approach we were discussing (bouncing to the UI thread to use the RFHM to get all the relevant RVHs).

> The current behavior after this patch (throttle until the *first* iframe hosted by a RVH inserts its body), has strictly more throttling than non-OOPIF case, but it seems desirable.

I assume the main resource load isn't throttled, so I guess it might be OK if we just keep the behavior as implemented after the patch.
> Right, I thought that was the approach we were discussing (bouncing to the UI thread to use the RFHM to get all the relevant RVHs).

Yeah I think that solution we chatted about would work, but I think I didn't realize that iframes should always be unthrottled. In that case, we wouldn't need to bounce to the UI thread at all, because resources are already annotated with whether they are main frame or not. So non-main frame resources could always be unthrottled :P

But in any case, I think the current behavior is fine and makes slightly more sense performance wise.

Sign in to add a comment