Sampling profiler periodic samples are biased towards session resume |
|||||
Issue descriptionChrome Version: All OS: All (that sample) I was looking @ https://b204354250-dev-wittman-dot-uma-hrd.googleplex.com/p/chrome/callstacks/?sid=eee55403e48b55843a9c487a6abaa630 and realized that the most notified ObserverList according to periodic samples is one that gets invoked when the connection changes (which should be infrequent but does happen on resume). I then looked specifically at OnResume() stacks : https://b204354250-dev-wittman-dot-uma-hrd.googleplex.com/p/chrome/callstacks/?sid=5715f51661a38041a26df979a9d4fed0 and these should be fast and infrequent events (so even 0.07% is more than expected). I suspect what's happening is that, on resume, the delay to the next sample has often expired while the machine was sleeping, so it gets scheduled right away and always samples resumes (causing a biased sample). I think the sample profiler should be a PowerMonitorObsever and not send samples collected in the same range as a resume (waiting a bit before sending a sample to make sure it doesn't get a racy OnResume() notification right after).
,
Aug 20
Thanks for the bug. I suspected we had bias from samples collected on resume, but hadn't had the chance look into it. PowerObserver seems like an effective solution.
,
Aug 20
FWIW, resume is also a space where our performance is poor. Instead of discarding samples on resume I think we should treat it like a whole new profiling phase (akin to startup). It's just wrong to put them in the "periodic" bucket, but the samples are definitely not worthless.
,
Aug 20
This sounds like something we could be supporting with metadata state to indicate that we're resuming. Do we have a notion in the code of when a resume is complete? If not perhaps we could just tag samples for a dedicated period after a resume.
,
Aug 21
I don't think there's a notion of "completing"; it's just a bunch of fire-and-forget notifications. But yeah, arbitrarily capturing the first 30s after a resume makes sense.
,
Sep 17
I think this issue extends to all remaining samples in the profile, rather than just the next sample. the profiler effectively tries to schedule samples at 300 equal spaced 100ms intervals from the start of profiling. If the computer is suspended, on resume the profiler will try to churn through the backlog of samples that it thinks should have happened in the past. We probably should just skip over any sample times that are more than one interval in the past.
,
Sep 17
,
Sep 17
Raising priority, given that this is causing Chrome to become unusable for periods of time.
,
Sep 19
I will be OOO and unlikely to get to this until mid-October. Given that the performance impact of this issue is only affecting canary and dev channel I think we can reasonably downgrade to Pri-2.
,
Sep 24
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by gab@chromium.org
, Aug 20