2 minute slow down at startup caused by DeviceStatusListener::StartAfterDelay() |
|||||
Issue descriptionWe collected startup traces from a small % of users. Some traces have really long startup. This bug is filed based on the traces and we do not have repro steps. See report: https://crash.corp.google.com/browse?q=reportid=%27ab9f5630f48ba2e6%27#6 We can see that DeviceStatusListener::StartAfterDelay() took about 2 minutes to complete. We can also observe that the DNS lookup attempt took similar amount of time. The UI thread is clearly unblocked by the DNS loopup attempt, just based on the timing. I do not have proof, but it looks like the DeviceStatusListener::StartAfterDelay() is somehow waiting on UI thread for DNS lookup to finish.
,
Aug 28
+Matt to help understand why UI thread might be blocked by DNS lookup.
,
Aug 29
I have no clue. I don't even see any calls in that code to net methods. The network doesn't implement any explicit locks on activity on other threads, though platform APIs it uses might - but as I'm not seeing any net calls...
,
Aug 29
Thanks for finding this, I'll take a look. We use NetworkStatusListenerAndroid to monitor network types. Maybe this needs to be done in another thread?
,
Aug 29
I am wondering whether this has something to do with taskscheduler. There is no task scheduled during the DNS lookup, not sure if this is normal
,
Aug 29
There could be multiple reasons there are not many tasks in this 2 minutes. 1. We trim the trace events in the file because of upload size. The trimming logic is currently just discarding the latest events after startup. 2. The browser was running for download background service only and there are not other tasks to run. 3. All other tasks on browser are blocked on UI thread to respond. So, no other tasks were able to run. Anyways it seems very hard to find why main thread is waiting. There are lots of things happening in the task. I am going to try to get better data (stack traces for the waiting event). In case we know of a known wait on UI thread, we can fix it. Else let's just wait for stack traces to work.
,
Nov 27
We only have native stack trace samples for the issue. The 2 most common stack samples are: This issue is seen on reports: Report ID, Jank in ms 456dd450fdb20224 ,3940.846 3a3415e126cf9b4a ,3673.432 aa13fec92254d0d4 ,1875.294 7eab233791bef51c ,1741.798 4442f0dcdc0f97ca ,1141.938 b074303d301253d5 ,784.008 40ebf60fc037d924 ,700.323 ad0b74b9f5fb3597 ,673.615 <Skipped java frames> download::android::DownloadTaskScheduler::CancelTask(download::DownloadTaskType) download::ControllerImpl::ActivateMoreDownloads() base::OneShotTimer::RunUserTask() base::internal::TimerBase::RunScheduledTask() base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) base::MessageLoopImpl::RunTask(base::PendingTask*) base::MessageLoopImpl::DoDelayedWork(base::TimeTicks*) base::MessagePumpForUI::OnNonDelayedLooperCallback() base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*) The issue seen in reports: Report ID, Jank in ms 456dd450fdb20224 ,3940.846 3a3415e126cf9b4a ,3673.432 4442f0dcdc0f97ca ,1141.938 49a156eed76d6ef4 ,571.41 743ed58188664cc8 ,445.892 8a5dcfcc9437ef25 ,429.989 562b63906ab7be6d ,334.948 <Skipped java frames> download::BatteryStatusListenerAndroid::GetBatteryPercentageInternal() download::BatteryStatusListenerImpl::UpdateBatteryPercentage(bool) download::DeviceStatusListener::StartAfterDelay() base::OneShotTimer::RunUserTask() base::internal::TimerBase::RunScheduledTask() base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) base::MessageLoopImpl::RunTask(base::PendingTask*) base::MessageLoopImpl::DoDelayedWork(base::TimeTicks*) base::MessagePumpForUI::OnNonDelayedLooperCallback() base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*) Each of the traces show some time for both stack traces. The cancel task issue affects 0.17% of users. I cannot estimate the exact user count for battery issue. But, it seems to be more popular.
,
Nov 27
This doesn't happen only in background. In most cases the jank happens during navigation, seen in cases: 49a156eed76d6ef4 4442f0dcdc0f97ca 743ed58188664cc8 In other cases it's called right after navigation delaying input gestures. Can we try to avoid getting battery status? The default network startup delay is set to 5 seconds. I think a lot of cases this is before navigation is committed.
,
Nov 27
Background download system will pause downloads after a navigation happens. And after a while when no navigation detected, resume downloads. Maybe this is slow. The battery logic will eventually calls into Android in BatteryStatusListenerAndroid.getBatteryPercentage. This has to be done in main thread but shouldn't be slow. DeviceStatusListener::StartAfterDelay may eventually notify background download system to start downloads which might do a lot of works.
,
Nov 28
The following data is an estimate based on samples and not accurate. If we want more accurate metrics for each of this case, we can add trace event across these functions to get them. Average duration is calculated based on the number of samples we collected. Percent of users affected is calculated based on number of reports with more than 100ms on the pointed event. There could be overlap between the groups (ie, same user can be affected by multiple of the issues). Avg dur Users affected Issue 67 ms 2.2% users DownloadTaskScheduler::CancelTask 60 ms 1.5% users NetworkStatusListenerAndroid::Start 90 ms 0.9% users BatteryStatusListenerAndroid::GetBatteryPercentageInternal
,
Nov 28
Is the average duration based on the execution time of the base::PendingTask or just the execution time of these functions?
,
Nov 28
I calculated average time based on the number of samples we collected and the sampling duration we set. If you want accurate duration I can add trace events and we will get metrics from next Dev channel.
,
Nov 28
I can add the trace event soon.
,
Nov 29
Thanks for adding events. I noticed that DeviceStatusListener::StartAfterDelay() is always called at chrome startup. I guess downloads could be paused and restarted based on navigations, but the initial startup task is the one causing issues. There is also a 5s delay at startup after which we start the service. Would it be possible to simply set the delay to 1 minute so that we don't affect the initial user actions at startup. Maybe we could use the same mechanism we have to detect navigations, and start the service when we detect no navigations in the past some time?
,
Nov 29
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/04d6dc2f9f44c3a6a7b9b466715130cc605fe7a2 commit 04d6dc2f9f44c3a6a7b9b466715130cc605fe7a2 Author: Xing Liu <xingliu@chromium.org> Date: Thu Nov 29 21:30:31 2018 Background download: Add trace events to debug jank. This CL adds 3 trace event in background download system to debug potential jank reporting through crash report. Bug: 878593 Change-Id: Ibebca6ba331fb00b5b74a327b900c98e8b516151 Reviewed-on: https://chromium-review.googlesource.com/c/1354638 Reviewed-by: ssid <ssid@chromium.org> Reviewed-by: David Trainor <dtrainor@chromium.org> Commit-Queue: Xing Liu <xingliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#612356} [modify] https://crrev.com/04d6dc2f9f44c3a6a7b9b466715130cc605fe7a2/chrome/browser/android/download/service/download_task_scheduler.cc [modify] https://crrev.com/04d6dc2f9f44c3a6a7b9b466715130cc605fe7a2/components/download/internal/background_service/android/battery_status_listener_android.cc [modify] https://crrev.com/04d6dc2f9f44c3a6a7b9b466715130cc605fe7a2/components/download/internal/background_service/android/network_status_listener_android.cc [modify] https://crrev.com/04d6dc2f9f44c3a6a7b9b466715130cc605fe7a2/components/tracing/common/trace_startup_config.cc |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by ssid@chromium.org
, Aug 28