Reduce overhead of subresource_filter safe browsing integration. |
|||||||||||||||
Issue descriptionIt seems like on android this is causing perf issues.
,
Nov 2 2017
So I took some traces and it seems like actual CPU contention could be an issue here, with descheduled time (caused by IPC) partially to blame. I have a CL which will land some additional metrics to help us see.
,
Nov 2 2017
Great news, I think the culprit is basically 100% explained by my guess at #2. Take a look at the histogram: https://uma.googleplex.com/p/chrome/variations/?sid=345a1dc1d714307cd28da84ff70a3525 This is measuring the time a nav starting to when it is requested by the net stack. Since we dispatch the IPC *before* the navigation is handed off to the net stack, I'm guessing we're delaying that from happening. I'm so thankful we found this metric! Solution are a bit harder though. Let me think about it.
,
Nov 2 2017
+Navigation and clamy specifically for ideas here. For context: I think SubresourceFilterSafeBrowsingClientRequest::Start is causing the IO thread to deschedule, since it sends an IPC to GMSCore SafetyNet. This blocks the navigation request from going out, since we run this task in response to NavigationThrottle constructor. Ideally, we want a way to run this *after* the navigation goes out to network, since we don't want to delay at all. We could do something hacky on the IO thread like re-post a task to the end of the queue or post a delayed task. Another idea is to expose another API to NavigationThrottle like DidStart / DidRedirect, etc. which gets called after all throttles are done. Posting a task to the IO thread from DidStart should not block the nav request from going out.
,
Nov 3 2017
I guess more descriptive names would be: 1. StartDidProceed() 2. RedirectDidProcess() These would need to be called 1. After the call to NavigationURLLoader::Create is made. 2. After the call to NavigationURLLoader::FollowRedirect is made clamy: WDYT? Let me know if you see a better solution here.
,
Nov 3 2017
Proof of concept: https://chromium-review.googlesource.com/c/chromium/src/+/753741
,
Nov 3 2017
I chatted with +mmenke offline who confirmed that descheduling the IO thread *after* the resource loader starts the request will still allow network work to be done (in most circumstances). That is, we aren't just gaming the Navigation.TimeToURLJobStart metric.
,
Nov 3 2017
Note that "in most circumstances" really means "in what I think are most circumstances, at least with the simple cache, and no extension that interposes on all URLRequests" - though in either of those cases, we will at least kick off the initial cache lookup / extension check, which is done off thread.
,
Nov 3 2017
Yeah, understood. This problem is only on Android so I think we can assume simple cache / no extensions.
,
Nov 3 2017
Rather than introducing new NavigationThrottle APIs, can't you just POST_TASK on the UI thread the call that will POST_TASK on the IO thread. I know that makes two post tasks, but I think it should work in most cases.
,
Nov 3 2017
The problem with posting the task on the UI thread, and then posting to the IO thread is that we'll potentially be delaying a lot more. I'm looking at a trace where the task (creating and navigating a new tab) finishes 45ms after the request is sent out to the network. In general, double posted tasks or posting delayed tasks could probably work, but I think the results will be inconsistent, or yield worse performance. I'm open to exploring other APIs ideas though, this was just the simplest I could think of that does basically exactly what we want :)
,
Nov 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/78ff41eab058af57f7629b6b1beec3628f7ce719 commit 78ff41eab058af57f7629b6b1beec3628f7ce719 Author: Charles Harrison <csharrison@chromium.org> Date: Fri Nov 03 16:59:32 2017 [subresource_filter] Add metric for total SB check time We have some ideas to make this feature a bit more async, to avoid thread descheds janking navigation code on the IO thread. This metric can help us evaluate that impact. This CL also adds a histogram for the time it took to dispatch the safe browsing check, which could be substantial on Android where we are seeing IPC descheduling. Bug: 779914 Change-Id: I2cabba3c5979a7ea8008d294d02d1bfbf7004ebb Reviewed-on: https://chromium-review.googlesource.com/747441 Commit-Queue: Charlie Harrison <csharrison@chromium.org> Reviewed-by: Shivani Sharma <shivanisha@chromium.org> Reviewed-by: Steven Holte <holte@chromium.org> Cr-Commit-Position: refs/heads/master@{#513807} [modify] https://crrev.com/78ff41eab058af57f7629b6b1beec3628f7ce719/components/subresource_filter/content/browser/subresource_filter_safe_browsing_activation_throttle.cc [modify] https://crrev.com/78ff41eab058af57f7629b6b1beec3628f7ce719/components/subresource_filter/content/browser/subresource_filter_safe_browsing_activation_throttle.h [modify] https://crrev.com/78ff41eab058af57f7629b6b1beec3628f7ce719/components/subresource_filter/content/browser/subresource_filter_safe_browsing_client_request.cc [modify] https://crrev.com/78ff41eab058af57f7629b6b1beec3628f7ce719/tools/metrics/histograms/histograms.xml
,
Nov 3 2017
-rvg since uma links are OK
,
Nov 7 2017
Could you link to the trace? I'd like to have a look at it before commenting on a possible solution. I'm really worried about the added complexity to the navigation API.
,
Nov 7 2017
Oops for some reason I thought I posted it already. Here is one with lots of descehduling. The event that we want to postpone a bit is at 5,203.031 ms.
,
Nov 7 2017
Note that I think the idea to post another task to get to the end of the queue could work, but I suspect it would be better to do it on the IO thread. However, it seems like for redirects, we fail to post after the related navigation task, so we may need to use a delayed task in that case. As I said, my main worry with the post tasking solution is that it is inconsistent (and possibly fragile). I'd need to collect more traces but I'm nervous we'd be delaying sending the IPC to SafetyNet by another 10-20ms. However, I would be willing to try it and measure the results if you aren't happy with hooking navigation further (I totally understand your hesitation).
,
Nov 7 2017
Thanks! I won't have time to look at it today, will try tomorrow.
,
Nov 9 2017
So I've given more thought to the issue, and here's where I am at: - I'm not very happy with the new interface proposed for NavigationThrottles - it looks like observer methods which look weird on a throttle. We could potentially put them on WebContentsObserver (but I would prefer to avoid adding too many methods to WebContentsObserver). - I have been wondering if moving your throttle higher up in the list of throttles that need to execute might make the issue less of a problem. This way we post the task to the IO thread while still having work to do on the UI thread, instead of just before posting the task to start on the IO thread. What do you think?
,
Nov 9 2017
I think your suggestion might work, but I'm not confident. From traces I've seen it seems like actual throttles themselves are reasonably quick, so I'm not sure if things will change dramatically if we are first in line. However, after looking at a recent histogram (SubresourceFilter.SafeBrowsing.CheckDispatchTime) which calculates wall time spent in our IO thread task, it makes me more and more concerned that this task just takes too long for it to be on the IO thread. I have a tentative WIP CL [1] to offload it to a worker thread rather than running it on the IO thread on Android. I think that solution would make me happiest, but needs general approval from SB folks (+vakh). [1]: https://chromium-review.googlesource.com/c/chromium/src/+/760550
,
Nov 10 2017
,
Nov 10 2017
,
Nov 15 2017
Attached are two traces taken on my Pixel C with a custom Chrome build. One (iothread) shows current behavior. Note the subresource_filter message on the IO thread at 1974ms that blocks navigation by ~6ms. Also note that the call to BaseResourceThrottle::CheckUrl descheds and takes ~3.5ms. Note that the time spent in the base resource throttle (afaik) *does* block the URLRequest from going out to the network, since we call it after the URLRequest is created but before "Start()" is called. The new behavior I am proposing is in the "workerthread" trace. In this patch, the safe browsing check is dispatched on a worker thread (using base::PostTask and base::MayBlock()). I think for purposes of explanation the best metric to look at in these traces is the time from the subresource_filter task to the creation to the URL_REQUEST_START_JOB trace in net (mmenke correct me if I'm wrong). In "iothread" trace, this takes ~11ms. In "workerthread" trace, it takes ~1ms. Unfortunately the time spent in the base resource throttle is not accounted for in Navigation.TimeToURLJobStart, but we may be able to see movements in other metrics.
,
Nov 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/aa65eacee6f10757a1fa35faefd890553834eafe commit aa65eacee6f10757a1fa35faefd890553834eafe Author: Charles Harrison <csharrison@chromium.org> Date: Thu Nov 16 18:28:44 2017 Add metric to capture the time between request initiation and start This CL adds a metric to measure the time spent from URLRequest creation and when Start() is called on it. The purpose is to measure time spent in various resource handlers before sending the request out to the net stack. Bug: 779914 Change-Id: I8e462a7d9fb541de38589aeb926f81047a735d50 Reviewed-on: https://chromium-review.googlesource.com/771549 Commit-Queue: Charlie Harrison <csharrison@chromium.org> Reviewed-by: Camille Lamy <clamy@chromium.org> Reviewed-by: Matt Menke <mmenke@chromium.org> Reviewed-by: Ilya Sherman <isherman@chromium.org> Cr-Commit-Position: refs/heads/master@{#517131} [modify] https://crrev.com/aa65eacee6f10757a1fa35faefd890553834eafe/content/browser/loader/resource_loader.cc [modify] https://crrev.com/aa65eacee6f10757a1fa35faefd890553834eafe/tools/metrics/histograms/histograms.xml
,
Nov 16 2017
,
Nov 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a18b84fd0a3431feb0112a82f4882404a1c241f4 commit a18b84fd0a3431feb0112a82f4882404a1c241f4 Author: Charles Harrison <csharrison@chromium.org> Date: Fri Nov 17 13:04:55 2017 [safe_browsing] Dispatch SafetyNet API request on a worker thread. Recent data shows that the time it takes to dispatch the IPC to Google Play Services is quite expensive. Android canary data (so probably trends higher end), for the subresource filter check (in navigation critical path): 50th percentile: 6ms 75th percentile: 10ms 95th percentile: 28ms 99th percentile: 100ms Average: 11ms Since this is directly in the critical path of navigation, it makes sense not to hog the thread / deschedule it before the network request goes out. It also looks like we hit some perf issues in other uses of the API too. See the linked bug for some chrome://tracing traces. Bug: 779914 Change-Id: Ibc44d9f199715c3fbf7a2d66604e82ee3ccfc5e4 Reviewed-on: https://chromium-review.googlesource.com/760550 Commit-Queue: Charlie Harrison <csharrison@chromium.org> Reviewed-by: Varun Khaneja <vakh@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#517389} [modify] https://crrev.com/a18b84fd0a3431feb0112a82f4882404a1c241f4/components/safe_browsing/android/java/src/org/chromium/components/safe_browsing/SafeBrowsingApiBridge.java [modify] https://crrev.com/a18b84fd0a3431feb0112a82f4882404a1c241f4/components/safe_browsing/android/java/src/org/chromium/components/safe_browsing/SafeBrowsingApiHandler.java [modify] https://crrev.com/a18b84fd0a3431feb0112a82f4882404a1c241f4/components/safe_browsing/android/safe_browsing_api_handler_bridge.cc [modify] https://crrev.com/a18b84fd0a3431feb0112a82f4882404a1c241f4/components/safe_browsing/android/safe_browsing_api_handler_bridge.h
,
Nov 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bbfe4e4de4a8ad10a352d1a2aa48c997bfdae42d commit bbfe4e4de4a8ad10a352d1a2aa48c997bfdae42d Author: Charles Harrison <csharrison@chromium.org> Date: Mon Nov 20 15:39:54 2017 Adjust Navigation metrics to account for synchronous time in resource handlers This CL slightly alters the timestamp in Navigation.TimeToURLJobStart to include synchronous time spent before (potentially) starting the request. This change makes the metric more precisely capture the time until the request reached the network stack. Bug: 779914 Change-Id: I0eaf1fe7380cb6a14a2fbe328f431e35bb82efa3 Reviewed-on: https://chromium-review.googlesource.com/774938 Reviewed-by: Camille Lamy <clamy@chromium.org> Commit-Queue: Charlie Harrison <csharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#517829} [modify] https://crrev.com/bbfe4e4de4a8ad10a352d1a2aa48c997bfdae42d/content/browser/loader/navigation_url_loader_impl_core.cc [modify] https://crrev.com/bbfe4e4de4a8ad10a352d1a2aa48c997bfdae42d/content/browser/loader/navigation_url_loader_impl_core.h [modify] https://crrev.com/bbfe4e4de4a8ad10a352d1a2aa48c997bfdae42d/content/browser/loader/navigation_url_loader_unittest.cc
,
Dec 6 2017
,
Dec 14 2017
,
Dec 14 2017
,
Feb 18 2018
,
Mar 27 2018
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by bugdroid1@chromium.org
, Oct 31 2017