Currently we have a number of histograms measuring the timing of service worker FetchEvent handling, but its been suggested we could add some more measurements. I need to investigate what we currently have and what could be added.
I investigated a bit what we have today:
ServiceWorker.EventDispatchingDelay_FETCH_MAIN_FRAME
ServiceWorker.EventDispatchingDelay_FETCH_SUB_FRAME
ServiceWorker.EventDispatchingDelay_FETCH_SUB_RESOURCE
These measure the time required to dispatch the FetchEvent. They do not include any worker start or js execution time.
ServiceWorker.FetchEvent.Fallback.Time
ServiceWorker.FetchEvent.HasResponse.Time
ServiceWorker.FetchEvent.WaitUntil.Time
These measure the time between the dispatch of the FetchEvent and the specified action.
ServiceWorker.StartWorker.Time
ServiceWorker.StartWorker.Time_ExistingReadyProcess_FETCH_MAIN_FRAME
ServiceWorker.StartWorker.Time_ExistingReadyProcess
ServiceWorker.StopWorker.Time
Times to start/stop the worker.
There are also many navpreload histograms.
Note, I saw some discrepancies in chrome://histograms with how some of these histograms are collected.
AFAICT, StartWorker.Time histograms are only collected when a navigation triggers the start of the worker. If a subresource request from a controlled document starts the worker, then the Time histogram is not collected. This is true for both legacy and s13n mode.
The FetchEvent.Fallback, HasResponse, and WaitUntil Time histograms are only getting collected for subresources in legacy mode. In s13n mode these histograms are not collected for subresource requests.
Are these issues intended or expected?
Thanks for looking into this.
BTW the "EventDispatchingDelay" ones are fishy:
- They compare timings from different processes without using IsConsistentAcrossProcesses and IsHighResolution.
- There is (was?) a bug where they use the wrong time when the event aborted (timed out).
- They are using mojo::Time instead of mojo::TimeTicks.
I think it'd be OK to remove them and make new ones that fit the fetch event path better.
I just saw your comment#2 too. Yes StartWorker.Time is only for the time to start a worker. It's independent of each fetch event.
FetchEvent.Fallback, HasResponse, and WaitUntil are probably not plumbed to s13n mode. As above, we can probably redesign everything to have more consistent timings for the fetch event path specifically.
> I just saw your comment#2 too. Yes StartWorker.Time is only for the time to start a worker. It's independent of each fetch event.
I mean, I was doing this:
1. Stop work via devtools button.
2. In web console trigger a subresource load with fetch("/")
3. Verify worker is running in devtools
4. Observe there is no StartWorker.Time histogram recorded in chrome://histograms
Shouldn't it be recorded regardless of how the worker was started?
I read that too quickly. Yes the expected behavior is to record the histogram, so that's surprising. Just checking, did you reload chrome://histograms after starting the worker?
Yes, I reloaded chrome://histograms. I can double-check again later.
Btw, I was using this site to run the steps in #c4:
https://fetch-event-echo.glitch.me/
It always hits FetchEvent fallback, in case that matters.
Handing this over to the service worker team to pursue since it sounds like they have some more bandwidth now. Happy to look at it again if needed. Thanks.
Actually, let me see what I can do in the short term. I have some ideas I'd like to try, but I don't know if it will meet the SW team's long term design goals. We can discuss at our next meeting. Sorry for my confusion.
In our meeting Matt mentioned the EventDispatchingDelay metrics are a bit suspect because the code does not verify the timestamps are comparable across processes before recording them. So older windows devices may be polluting the data. Something to make sure we avoid here.
It seems to me there are a few different axes we can use to slice service worker loading data:
By destination:
* document (top/frame)
* script (<script>/module/worker/importScripts)
* stylesheet (top/@import)
* image (<img>/background-image/etc)
* others (xhr/fetch/beacon/?)
By Response source:
* Fallback to network stack
* hitting network
* hitting http cache
* respondWith() Response
* produced by Fetch API
* hitting the network
* hitting http cache
* produced by Cache API
* produced by new Response()
* maybe consider if clone() was called on the response?
By resource timeline:
* worker start
* FetchEvent dispatch
* FetchEvent.respondWith() called
* Response creation starts (beginning of fetch() or cache.match() depending on source)
* Response creation completes (fetch() or cache.match() settles depending on source)
* FetchEvent.respondWith() promise settles
* Start reading Response body
* Finish reading Response body
If we don't mind creating a ton of histograms, we could simply track every permutation:
<TIMELINE_PART>
<DESTINATION_PART>
<SOURCE_PART>
<TIMELINE_PART>-<DESTINATION_PART>
<TIMELINE_PART>-<SOURCE_PART>
<TIMELINE_PART>-<DESTINATION_PART>-<SOURCE_PART>
<DESTINATION_PART>-<SOURCE_PART>
With 5 destination, 6 sources, and 8 timeline parts that would be 300+ histograms. Maybe we could narrow it down a bit.
Thanks for digging into this, these are good ideas. Just to comment about "ton of histograms", my intuition is it might be excessive, especially if it's combinatorial since it explodes quickly. In practice it'd also be hard to analyze. We previously received this kind of feedback from the metrics team:
https://codereview.chromium.org/2218943002#msg72
Maybe a possible approach is to first implement top-level ones, then based on those results start doing targeted combinations as needed?
Yea, I agree with that approach. Since we have Fallback.Time and HasResponse.Time already, I was pursuing adding some info about the resource source. So we could maybe add HasResponse-CacheStorage.Time, etc as a next step.
Also, someone suggested that we currently don't have any histograms that measure BlodDataHandle read times. That might be a straightforward thing to add to get a sense of CacheStorage body reading performance.
Comment 1 by wanderview@chromium.org
, Aug 27