Speed: Service Worker-controlled navigations invoke SafeBrowsing on the critical path |
|||||
Issue descriptionWe're seeing slowness in PWA startup because SafeBrowsing is being consulted for these navigations which are, by definition, Service Worker controlled. The attached trace shows ~300ms of SafeBrowsing time for Google Maps Go loading the top-level resource which was returned directly from the Cache Storage API by the Service Worker. To get a Service Worker controlling a navigation, the site must have been previously visited. Presumably, it doesn't make sense for us to continue to check the URL of something that is being returned from local script. Related: https://crbug.com/802818
,
Mar 2 2018
Yeah, we've talked about this before and our current conclusion is that resource loading from SW would still need to go through SafeBrowsing checks. Questions to SB people: would there be a way to somehow detect updates in SB database and only apply checks for the URLs, or could it be okay to lazily run the checks (and evict resource from the cache) after the critical resource load is done?
,
Mar 2 2018
For S13nSW: in the current implementation, doesn't the service worker intercept before safe browsing? Could it make sense for safe browsing to just sit in front of network service, so if the SW responds with a resource from Cache Storage we don't go to SB?
,
Mar 2 2018
This would be solved by implementing local Safe Browsing cache on the client (similar to how desktop Safe Browsing works). This is on the 2018 roadmap of the Safe Browsing team. The 300ms blocking is an outlier though (99.95th %ile): https://uma.googleplex.com/p/chrome/histograms/?endDate=20180228&dayCount=1&histograms=SB2.RemoteCall.CheckDispatchTime&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
,
Mar 2 2018
,
Mar 2 2018
From a security model perspective, Safebrowsing's assumptions just don't make sense in a SW-controlled world. Safebrowsing matches individual URLs: https://developers.google.com/safe-browsing/v4/lookup-api Service Workers operate on *scopes*: https://developers.google.com/web/ilt/pwa/introduction-to-service-worker#registration_and_scope It's meaningful to check a destination URL for uncontrolled pageloads (first visit to a site). It stops making sense for Service Worker controlled page loads because the actor is no longer the server; it's the local script proxy which may have been installed at some point in the past. It *MAY* be useful to (asynchronously) check on the Service Worker (although I don't know that Safebrowsing tracks pwn'd serviceworkers) and uninstall them or perhaps async check the URL (to, again, uninstall the SW and refresh the page), but blocking the navigation is pointless. csharrison@: this trace is on a slow device at Chrome Startup. This might be unusual for Chrome navigations overall, but that doesn't really speak to how painful it makes the "tap the homescreen icon to launch the app" experience. As PWAs become more common, so will this experience.
,
Mar 2 2018
Ah gotcha thanks for the clarification. We should definitely add a histogram for the first check. I think there is another Android bug for something related to startup too... found it, issue 802818. In fact, the Safe Browsing check is _worse_ than 300ms, that's just the task that dispatches the check to GMSCore (I'm not quite sure why it takes so long but I think it is aggressively descheduling the thread). The response comes in at the 4200ms mark, making the total check more like 900ms! However, I haven't stared at the trace for long but is the navigation actually blocked on Safe Browsing? The 300ms time is spent on a thread pool and is mostly descheduled (not consuming a core), so it shouldn't be blocking critical work. I'm not used to analyzing SW traces but I would expect to see a "blocked by delegate: Safe Browsing..." somewhere in the netlog if we're actually paused waiting for SB results. In terms of the security model for SafeBrowsing + SW, I'm not sure I have enough experience of either to make a good judgement call. Maybe SB team can comment.
,
Mar 3 2018
Thanks for digging into that and for the context, csharrison! From a security model perspective, in addition to the mismatch with Safebrowsing when Service Workers are active (generally), consider that the URL being launched is *very* trusted from the user's perspective when invoked from the homescreen. Even if we're not happy removing this check in the common case for SWs (although I argue we should at least be happy to make it very asynchronous and/or skip it), it seems just punitive to punish startup in this specific case.
,
Mar 7 2018
Thinking more about the security model question, I'd like to propose that: - We avoid the synchronous check for SW-controlled navigations - We add a Safebrowsing check when issuing an update check for the Service Worker script: https://w3c.github.io/ServiceWorker/#soft-update-algorithm The rationale for this change: - Checking URLs whose content is being returned by a Service Worker doesn't do what we assume it does. The content is the product of previous navigations, not the current one, so checking it in the moment isn't useful per sae - That said, we *do* issue a background check to the network asynchronously to understand if the SW script is fresh. This fetch cannot be intercepted by the existing Service Worker; this makes it a good time and place to check a URL - Since the Service Worker being checked is the result of a navigation, we can check the URL last navigated to - A failed SafeBrowsing check at SW update time should: - Unregister the Service Worker for the origin and perhaps clobber all state (ala Clear-Site-Data: https://www.w3.org/TR/clear-site-data/) - Replace tab contents for all controlled windows (tabs) with the safebrowsing interstitial Thoughts?
,
Mar 8 2018
+vakh,nparker explicitly: Can you comment on #9?
,
Mar 8 2018
I do not understand the details of how SWs work so the following may not be 100% correct from SW perspective: > We avoid the synchronous check for SW-controlled navigations We do not have any synchronous checks in Android. For a SW, since requests in its scope may be returned locally, the SB check would always lose that race and cause a wait. Is that what you meant? > We add a Safebrowsing check when issuing an update check for the Service Worker script Yes, does it not already happen? If not, that's a bug because it means we are executing code locally without checking its SB reputation. > The content is the product of previous navigations, not the current one, so checking it in the moment isn't useful per se Yes, but as pointed out in #c1: "Even if a site had been visited in the past, issues may have been detected since then." and the SW is caching content that we have since discovered to be bad. We want to evaluate the reputation of each URL as it is requested, irrespective of its past reputation. > This fetch cannot be intercepted by the existing Service Worker; this makes it a good time and place to check a URL. In order to ensure that the user is safe, we must verify that Safe Browsing thinks they are not unsafe -- whether it is at SW install time, or update time. > A failed SafeBrowsing check at SW update time should ... Yes, I agree with both of those proposals.
,
Mar 8 2018
> A failed SafeBrowsing check at SW update time should ... Yes, I agree with both of those proposals but I don't think that would be sufficient since a malicious SW will choose to not update itself at all or for a long period of time if it knows that an update would force an SB check.
,
Mar 8 2018
Service Workers don't get to decide how often the check is performed. We issue the check on their behalf with a maximum lifetime of 24 hours. That is, you can specify whatever caching headers you want on your SW, but if you navigate and we would check for an updated SW, then we only respect cache headers up to a single day.
,
Mar 13 2018
vakh@, csharrison@: does the above make sense? If not, happy to describe the SW lifecycle in more details and/or discuss in a meeting.
,
Mar 14 2018
There could be different types of resource loads: SW scripts and other resources (main resource and sub-resources). For scripts situation looks clearer: * For installed SW scripts that come from internal storage we actually already skip SB checks. * For updating SW scripts they should be checked against SB on update time (I'll check the current behavior again, but the desirable state would be we do the check). * I agree that busting the SW if we find issues in SB check during updates would make sense. For non-script resources: * SW updates may not really load (install) all the resources that are to be used for the subsequent navigations. Skipping all resource loads from controlled page that go to SW doesn't seem to make sense. * If we agree that using the resources that come from Cache Storage is safe we can possibly change the current layering from: [page -> SB -> SW -> SB -> Net] to: [page -> SW -> SB -> Net]. This is NOT easy for non-S13N code path to do this though. I can imagine we can do this for S13N code path in a reasonable timeframe though.
,
Mar 14 2018
Well, for the last case for non-S13N code path case: we can possibly do some filtering for installed resources somewhere not to go through SB, it'd be hacky but sounds possible. I think the biggest question we should have consensus is this: "If we agree that using the resources that come from Cache Storage is safe" This seems a bit sketchy. Or... the proposal is to distinguish resources that are stored during installation/updates and others?
,
Mar 16 2018
,
Aug 14
csharrison: Any update on "This would be solved by implementing local Safe Browsing cache on the client (similar to how desktop Safe Browsing works)."? This looks related to issue 873575 . Safe Browsing is one of several URLLoaderThrottles. It's looking like the plan is to apply throttles before service worker interception, partially because some throttles mutate the request and it would make sense for the SW to see the mutated request.
,
Aug 14
Sorry I don't know, I haven't been active in Safe Browsing for a while. vakh may know.
,
Aug 14
You can follow the updates here on Issue 747428. The current status is that I have the design doc ready and a working version of the CL ready but I am trying to figure out how to measure the memory/performance impact of the local blacklist. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by cbentzel@chromium.org
, Mar 1 2018