Randy wrote the following comment on the above issue: "I haven't looked at this change in any detail, just watched the comments go by, but I wanted to call out that if the associated keyword was removed purely based on test results, I suspect that was a mistake. If there is a bug, whether or not the lack of "associated" causes problems is going to be dependent on racy behavior between old IPC and mojo. So I would have recommended not removing the "associated" keyword without a careful evaluation of what races might occur and a good argument that they are no longer present. FWIW."
We chatted about this briefly in today's network service sync up. Randy and Matt gave the example of setting a cookie in JS and then making a request. While these interfaces (URLLoaderFactory and RenderFrameMessageFilter) are not associated, I think it's a mojo implementation detail that they won't race. To improve this, we might either want to combine the two interfaces. They'll both be implemented in the network service in the future as well.
Randy suggested that it would be good to audit IPCs and look for ones related to networking/cookies/cache. Yutaka: can you please do this? Since this landed right after branch point, we have 6 weeks to ensure there are no races before next branch :)
John and I chatted about this issue and here is the summary: (Please correct me if I am wrong, John)
Looking at the code, it seems fine to change URLLoaderFactory.CreateLoaderAndStart() to return a non-associated interface. It is the order between "starting URL request" and "setting cookies" that matters. Starting URL request is a message of URLLoaderFactory, which is still associated with the RenderFrameMessageFilter interface in the MojoLoading code path. (Both are channel-associated interfaces.)
In the network service case, we don't use IPC channel between the network process and renderer process. In that case, we could consider using another Factory interface to vend associated URLLoaderFactory and CookieStore interfaces. For example,
interface NetworkFactory {
GetURLLoaderFactory(associated URLLoaderFactory& factory);
GetCookieStore(associated CookieStore& cookie_store);
};
Yes the summary sounds right to me too. Once a request starts any reordering or throttling could happen already in the current code, while if request to starting a loader hits the service earlier than other requests like setting a cookie things could get wrong.
When the network service is fully up, I expect the cookie interface to be part of the larger network service interface (associated at a minimum, but it may just be wrapped in). And similarly for other IPCs, which will have been converted to mojo.
But my basic point still stands: A lack of failing tests doesn't really say much about whether or not there aren't any races, so changes that specifically may open up racy behavior should IMO be validated from a theoretical perspective as well. I offered cookies as a example, but it was to elucidate the problem, not because I thought it was the only possible problem. I'd recommend *some* amount of due diligence to validate the lack of races beyond trusting tests in removing the associated keyword.
Having said all that, I was indeed confusing the interface for a URLLoader with the interface to start a request/create a URLLoader. The URLLoader interface itself seems like there are noticeably fewer race possibilities, just because the process of fetching a URL is so naturally variable because of server response. Glancing at the URLLoader interface, it seems very simple, but I will call out that the "FollowRedirect()" method may allow for cookie races if you get a notification of a redirect, set a cookie, and then follow the redirect. It's a corner case, but it's the type of thing I'd recommend looking for in this kind of situation. The corner cases are the bug that will be the absolute hardest to find when they do come up.
URLLoaderFactory actually have three methods. ChangePriority, FollowRedirect, and Cancel. I've ran tests a few times, and I saw tests failures only with Cancel (at that time the failure was consistent).
Redirect handling is not notified to scripts, at least synchronously.
Changing owner, raising priority to reflect the situation better
(Status: watching some spotty regressions, for now we believe they're not that significant that may require reverting)
Comment 1 by jam@chromium.org
, May 24 2017