New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 878593 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

2 minute slow down at startup caused by DeviceStatusListener::StartAfterDelay()

Project Member Reported by ssid@chromium.org, Aug 28

Issue description

We collected startup traces from a small % of users. Some traces have really long startup. This bug is filed based on the traces and we do not have repro steps.

See report:
https://crash.corp.google.com/browse?q=reportid=%27ab9f5630f48ba2e6%27#6

We can see that DeviceStatusListener::StartAfterDelay() took about 2 minutes to complete. We can also observe that the DNS lookup attempt took similar amount of time. The UI thread is clearly unblocked by the DNS loopup attempt, just based on the timing. I do not have proof, but it looks like the DeviceStatusListener::StartAfterDelay() is somehow waiting on UI thread for DNS lookup to finish.
 
I understand this is not enough information for spotting the bug accurately. I also do not have the impact of this bug as number of users right now. So, please spend limited amount of time debugging.
Cc: mmenke@chromium.org
+Matt to help understand why UI thread might be blocked by DNS lookup.
I have no clue.  I don't even see any calls in that code to net methods.  The network doesn't implement any explicit locks on activity on other threads, though platform APIs it uses might - but as I'm not seeing any net calls...
Thanks for finding this, I'll take a look. We use NetworkStatusListenerAndroid to monitor network types. 

Maybe this needs to be done in another thread?
I am wondering whether this has something to do with taskscheduler. There is no task scheduled during the DNS lookup, not sure if this is normal
Cc: xingliu@chromium.org
Owner: ----
There could be multiple reasons there are not many tasks in this 2 minutes.
1. We trim the trace events in the file because of upload size. The trimming logic is currently just discarding the latest events after startup.
2. The browser was running for download background service only and there are not other tasks to run.
3. All other tasks on browser are blocked on UI thread to respond. So, no other tasks were able to run.

Anyways it seems very hard to find why main thread is waiting. There are lots of things happening in the task. I am going to try to get better data (stack traces for the waiting event). In case we know of a known wait on UI thread, we can fix it. Else let's just wait for stack traces to work.
Cc: -mmenke@chromium.org
Components: UI>Browser>Downloads
Labels: -Pri-3 Pri-2
We only have native stack trace samples for the issue. The 2 most common stack samples are:

This issue is seen on reports:
Report ID,        Jank in ms
456dd450fdb20224 ,3940.846
3a3415e126cf9b4a ,3673.432
aa13fec92254d0d4 ,1875.294
7eab233791bef51c ,1741.798
4442f0dcdc0f97ca ,1141.938
b074303d301253d5 ,784.008
40ebf60fc037d924 ,700.323
ad0b74b9f5fb3597 ,673.615

<Skipped java frames>
download::android::DownloadTaskScheduler::CancelTask(download::DownloadTaskType)
download::ControllerImpl::ActivateMoreDownloads()
base::OneShotTimer::RunUserTask()
base::internal::TimerBase::RunScheduledTask()
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoopImpl::RunTask(base::PendingTask*)
base::MessageLoopImpl::DoDelayedWork(base::TimeTicks*)
base::MessagePumpForUI::OnNonDelayedLooperCallback()
base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*)


The issue seen in reports:
Report ID,        Jank in ms
456dd450fdb20224 ,3940.846
3a3415e126cf9b4a ,3673.432
4442f0dcdc0f97ca ,1141.938
49a156eed76d6ef4 ,571.41
743ed58188664cc8 ,445.892
8a5dcfcc9437ef25 ,429.989
562b63906ab7be6d ,334.948

<Skipped java frames>
download::BatteryStatusListenerAndroid::GetBatteryPercentageInternal()
download::BatteryStatusListenerImpl::UpdateBatteryPercentage(bool)
download::DeviceStatusListener::StartAfterDelay()
base::OneShotTimer::RunUserTask()
base::internal::TimerBase::RunScheduledTask()
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoopImpl::RunTask(base::PendingTask*)
base::MessageLoopImpl::DoDelayedWork(base::TimeTicks*)
base::MessagePumpForUI::OnNonDelayedLooperCallback()
base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*)


Each of the traces show some time for both stack traces.
The cancel task issue affects 0.17% of users. I cannot estimate the exact user count for battery issue. But, it seems to be more popular.
Cc: nyquist@chromium.org
This doesn't happen only in background. In most cases the jank happens during navigation, seen in cases:
49a156eed76d6ef4
4442f0dcdc0f97ca
743ed58188664cc8
In other cases it's called right after navigation delaying input gestures.

Can we try to avoid getting battery status?
The default network startup delay is set to 5 seconds. I think a lot of cases this is before navigation is committed.

Background download system will pause downloads after a navigation happens. And after a while when no navigation detected, resume downloads. Maybe this is slow.

The battery logic will eventually calls into Android in BatteryStatusListenerAndroid.getBatteryPercentage. This has to be done in main thread but shouldn't be slow. DeviceStatusListener::StartAfterDelay may eventually notify background download system to start downloads which might do a lot of works.
The following data is an estimate based on samples and not accurate. If we want more accurate metrics for each of this case, we can add trace event across these functions to get them. Average duration is calculated based on the number of samples we collected. Percent of users affected is calculated based on number of reports with more than 100ms on the pointed event. There could be overlap between the groups (ie, same user can be affected by multiple of the issues).

Avg dur     Users affected     Issue    
67 ms       2.2% users         DownloadTaskScheduler::CancelTask
60 ms       1.5% users         NetworkStatusListenerAndroid::Start
90 ms       0.9% users         BatteryStatusListenerAndroid::GetBatteryPercentageInternal
Is the average duration based on the execution time of the base::PendingTask or just the execution time of these functions?
I calculated average time based on the number of samples we collected and the sampling duration we set. If you want accurate duration I can add trace events and we will get metrics from next Dev channel.
I can add the trace event soon.
Thanks for adding events.
I noticed that DeviceStatusListener::StartAfterDelay() is always called at chrome startup. I guess downloads could be paused and restarted based on navigations, but the initial startup task is the one causing issues.

There is also a 5s delay at startup after which we start the service. Would it be possible to simply set the delay to 1 minute so that we don't affect the initial user actions at startup. Maybe we could use the same mechanism we have to detect navigations, and start the service when we detect no navigations in the past some time?

Sign in to add a comment