New issue
Advanced search Search tips

Issue 823087 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 3
Type: Bug



Sign in to add a comment

hang monitor timeout on desktop should match android

Project Member Reported by ojan@chromium.org, Mar 17 2018

Issue description

See https://bugs.chromium.org/p/chromium/issues/detail?id=148136#c19 and https://cs.chromium.org/chromium/src/content/common/content_constants_internal.cc?type=cs&sq=package:chromium&l=16.

Now that flash deprecation is so far along, we should try making desktop match android's 5 second timeout.

Avi: Assigning to you since you kind of volunteered on chat. Feel free to not do though. :)
 

Comment 1 by a...@chromium.org, Mar 19 2018

The history of the hang timers isn't quite so simple :( . To trace it, note that there are actually two timers that mattered. First is, of course, the hung page timeout. Second, though, is the hung "windowed plugin" timeout.

internal svn r9958 introduced the plugin timeout at 2s.
internal svn r16239 bumped the plugin timeout to 5s (as per http://b/923312) due to Flash.
internal svn r34888 introduced the page timeout at 5s.
internal svn r35935 bumped it up to 10s because (as per http://b/1150315) it was "showing up a lot".

At this point we're at public release. initial.commit had a plugin timeout of 5s, and a page timeout of 10s.

r638, f4df93213ea5e482373dae917386ccebd9f5b430, (http://b/1314703) changed the page timeout to 30s, because they kept hitting it.
r665, bafb959be73508825dabe5b9a022c97ad4dd4af2, changed the plugin timeout to 10s and the page timeout to 10s, because they kept hitting the plugin timeout too. Why they changed the page timeout too I don't know.
r671, 783f5be582cb485e42018d1ff948000eeac1bc2f, changed the page timeout to 20s, because apparently the 10s timeout was too low after the last change.
r1411, 9e3ed3b69a5a25affab3acc41921606847c6dc13, changed the plugin timeout to 30s because plugins often hung. The consequence of having the plugin timeout be longer than the page timeout didn't occur to them until much much later.
r141819, https://chromiumcodereview.appspot.com/10536134, changed the plugin timeout to 25s and the page timeout to 30s, because if the plugin timeout is longer than the page timeout, the page timeout will fire first and you'll never see the plugin timeout.
r347229, https://codereview.chromium.org/1299513002, made the Android timeout 5s.
r382136, https://codereview.chromium.org/1815623004, removed the plugin timeout as windowed plugins were gone.

Chrome's page timeout hasn't been 5s since April 2008, though, granted, computers are a lot faster than then. Android's timeout of 5s is very very aggressive, which is appropriate given that you have one page you're looking at, and that page is likely simpler than a desktop page. On the other other hand, if Android devices can do 5s, and their processors are less powerful than desktop ones, surely the desktop can be as aggressive.

I'm about 50/50 feeling like going for 5s vs going for 10s.

Comment 2 by ojan@chromium.org, Mar 19 2018

I'd prefer we try 5s and add a histogram while we're at it to let us reason about this in the future based off actually how often it will trigger.
Cc: dtapu...@chromium.org

Comment 4 by a...@chromium.org, Mar 19 2018

The hang count does show up in stability metrics (see https://cs.chromium.org/search/?q=NOTIFICATION_RENDER_WIDGET_HOST_HANG) though I don't know offhand where to find those.

Comment 5 by ojan@chromium.org, Mar 19 2018

Yeah, the histogram I had in mind is not just hang count, but also to have TimeoutMonitor histogram the durations in Restart so we can see how much more it would trigger if we further tightened it.

Comment 6 by a...@chromium.org, Mar 20 2018

I wouldn't put UMA into TimeoutMonitor. It's used for the hung renderer dialog, the new content rendering timeout, the close timeout, the beforeunload timeout, and the swapout timeout. Bad idea to conflate five different timers.

If I had to UMA, I would tag RenderWidgetHostImpl's StartHangMonitorTimeout()/RestartHangMonitorTimeoutIfNecessary()/StopHangMonitorTimeout()/RendererIsUnresponsive()/OnRenderProcessGone().

Here are the scenarios:
- Hang starts and resolves with no dialog. We start timing in StartHangMonitorTimeout and stop in StopHangMonitorTimeout.
- Hang starts and resolves while dialog up with no user click. We start timing in StartHangMonitorTimeout, it escalates in RendererIsUnresponsive, we stop in StopHangMonitorTimeout.
- Hang starts, dialog shows, user kills process. We start timing in StartHangMonitorTimeout, dialog shows in RendererIsUnresponsive, process killed in OnRenderProcessGone.
- Hang starts, dialog shows, user restarts hang timer. We start timing in StartHangMonitorTimeout, dialog shows in RendererIsUnresponsive, timer restarted in RestartHangMonitorTimeoutIfNecessary, we start all over again.

Those are four metrics. We could treat them all as separate UMAs, or we could join all the data into one big "time from hang timer start to unhang or process death". The first gives us precise data, the second gives us a simple answer to the question of how we'd be affected by adjusting dialog trigger times.

Comment 7 by ojan@chromium.org, Mar 20 2018

I'm fine with whatever set of UMA you think makes sense. I think probably the merged one seems most useful, but you're right that it would be hard to reason about if we need to do any diagnostic work. My intuition is that the combined metric is enough, but I tend to have terrible intuition with metrics. :)

Comment 8 by a...@chromium.org, Mar 20 2018

As per the stability team email, this would require a Finch A/B test so that a "crash" jump could properly be attributed to this change and not cause them to lose track of other concurrent crashes.
Project Member

Comment 9 by bugdroid1@chromium.org, Mar 21 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a82d6decfeb4794681e0c25acc4f9ed1d6a574db

commit a82d6decfeb4794681e0c25acc4f9ed1d6a574db
Author: Avi Drissman <avi@chromium.org>
Date: Wed Mar 21 21:41:17 2018

Make RenderWidgetHostImpl's clock switchable for testing.

BUG=823087
TEST=none yet

Change-Id: I5c3c57fb380b2d7ec863d43b2e317f634ed40e5b
Reviewed-on: https://chromium-review.googlesource.com/973899
Reviewed-by: Will Harris <wfh@chromium.org>
Commit-Queue: Avi Drissman <avi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#544856}
[modify] https://crrev.com/a82d6decfeb4794681e0c25acc4f9ed1d6a574db/content/browser/renderer_host/render_widget_host_impl.cc
[modify] https://crrev.com/a82d6decfeb4794681e0c25acc4f9ed1d6a574db/content/browser/renderer_host/render_widget_host_impl.h

Comment 11 by a...@chromium.org, Apr 7 2018

We have some very rough numbers from this UMA. Roughly 2.5% of page hangs (longer than 5s) are >30s. If we dropped the hung page timeout to 5s, we're talking about a 40× increase in hung pages. If we dropped it to 10s, it would be about an 8× increase.

Comment 12 by ojan@chromium.org, Apr 9 2018

It's hard to reason about whether users will care about a 40x increase without understanding that relative to other numbers like as a percentage of page loads.

If it's 40x of a sufficiently small number, then we can safely say that users won't be more frustrated by constant prompts. Anecdotally, I don't remember the last time I saw a hang monitor prompt, so I'm inclined to think that it's 40x of a sufficiently small number.

With other prompts we look at DPM, which is decisions / thousand page loads I believe. I expect this won't have a statistically meaningful impact on the real world DPM for users (that other prompts hugely drown this out).

It would be ideal to have desktop and mobile match (5s) IMO, but anything shorter than 30s seems worth a try to me.

Sign in to add a comment