New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 712273 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner: ----
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

M58 Stability : Spike in browser crash rate for Windows.

Project Member Reported by ligim...@chromium.org, Apr 17 2017

Issue description

This bug is for tracking the spike in browser crash rate for Windows.

58.0.3029.68 - BETA 
===================

UMA Browser histogram
=====================
https://uma.googleplex.com/timeline_v2?sid=589c19e47093b0f71e94ae3ed6314072

There was a spike in windows browser crash in M58. But there is no obvious regression in go/chromecrash

Version comparison of M58 Vs M57
================================

https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&compProp=product.Version&v1=58.0.3029.68&v2=57.0.2987.98#stablesignature:1000

Adding in stability sheriffs queue for investigation.
 
Summary: M58 Stability : Spike in browser crash rate for Windows. (was: M52 Stability : Spike in browser crash rate for Windows.)
It's hard to pinpoint to any one specific signature that's causing about 22% increase in CPM. (https://uma.googleplex.com/timeline_v2?sid=9a6089f3d1d2f1ca00b5a8bd16e9f02e). Increase seems to be across all Windows platforms. crbug/707735 caused some spike but it was fixed last week and fix is reflected in 58.0.3029.68. 

Comparing 57 and 58(latest), nothing really sticks out as a clear culprit that may be causing CPM increase.

Few potential signatures that may be causing a spike (based on crash report comparison)
crbug/614753 - StartupBrowserCreator::ProcessCmdLineImpl
crbug/710420 - views::View::SchedulePaint
crbug/697827 content::NavigationControllerImpl::RendererDidNavigateToExistingPage
Owner: bjoyce@chromium.org
Status: Assigned (was: Untriaged)
A big part of this might be hangs? The reason "Simulated Exception" is 12% of crashes in 58.0.3029.33 but only 1.7% in 58.0.3029.19:

https://crash.corp.google.com/dremel_query_ui?q=SELECT%20product.version%2C%20SUM(hit)%2FCOUNT(*)%0AFROM%20(SELECT%20product.version%2C%20crash.reason%20%3D%20%27Simulated%20Exception%27%20AS%20hit%0AFROM%20crash.prod.latest%0AWHERE%20product.name%20%3D%20%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%20%3D%20%27browser%27%20AND%20custom_data.ChromeCrashProto.channel%20%3D%20%27beta%27%0A)GROUP%20BY%201%0AORDER%20BY%201%20DESC

https://crash.corp.google.com/dremel_query_ui?q=SELECT%20a.x%2C%20a.crash.reason%2C%20a.N%20%2F%20b.Tot%0AFROM%20(SELECT%20product.version%20AS%20x%2C%20crash.reason%2C%20COUNT(*)%20as%20N%0AFROM%20crash.prod.latest%0AWHERE%20product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20%2757%27%20%3C%20product.Version%20AND%20product.version%20%3C%20%276%27%20AND%20custom_data.ChromeCrashProto.channel%20%3D%20%27beta%27%0AGROUP%20BY%20x%2C%20crash.reason)%20AS%20a%0AJOIN%20(SELECT%20product.version%20AS%20y%2C%20COUNT(*)%20AS%20Tot%0AFROM%20crash.prod.latest%0AWHERE%20product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20%2757%27%20%3C%20product.Version%20AND%20product.version%20%3C%20%276%27%20AND%20custom_data.ChromeCrashProto.channel%20%3D%20%27beta%27%0AGROUP%20BY%20y)%20AS%20b%0AON%20a.x%20%3D%20b.y%0AORDER%20BY%20a.x%20DESC%2C%20a.crash.reason

There's a jump in frames w/ purecall in .54 (2% of browser crashes) continuing in .68 (4.7%) which is probably the issue with third party software in  Issue 710420 :

https://crash.corp.google.com/dremel_query_ui?q=SELECT%20product.version%2C%20COUNT(*)%0AFROM%20crash.prod.latest%0AWHERE%20product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.channel%20%3D%20%27beta%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%0AOMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%3D%27purecall%27)%20%3D%200%0AGROUP%20BY%201%0AORDER%20BY%201%20DESC

That would leave about ~5%? unexplained though.

It looks like 32-bit regressed more markedly than 64-bit between .19 and .33 so slicing by bitness might be useful:

https://uma.googleplex.com/timeline_v2?sid=4c32160add5cff76073cce1b2dca2982
Cc: dominicc@chromium.org

Comment 6 by bjoyce@chromium.org, Apr 18 2017

58.0.3029.19 to 58.0.3029.54 in 32 bit has markedly higher spike over 64 bit.

Renderer crashes are going down on 64 and 32 bit, but browser crashes are increasing.

Comment 7 by mfo...@chromium.org, Apr 19 2017

 Bug 710420  is attributed to AV software.  It may be that the vendor released an update at this time that exacerbated an existing issue.  Unfortunately it looks like there is little to be done on our end.

Bug 614753 (which I have looked at in a previous sheriffing shift) is attributed to third party software, and perhaps additional distribution of it is triggering an increase in frequency.

A partial fix to Bug 697827 was merged into the M58 branch on 3/17.  It looks like for some reason the fix on desktop will be different, so I will try to find an owner.



Comment 8 by mfo...@chromium.org, Apr 19 2017

Looking at comparative crash reports across 58.0.3029.68 and 58.0.3029.19 shows a big spike in base::i18n::InitializeICU (Bug 445616) in .19 which resolved itself in .68.  Third party software interference is also suspected in this bug.

There's also a lot of utility process crashes (Bug 704495) but I believe that these are existing crashes whose stack traces were redistributed by a server side change in Crash.

I pinged Bug 697827, not sure there is a lot to follow up on otherwise.



Comment 9 by bjoyce@chromium.org, Apr 19 2017

I talked to amineer@, but he is a bit swamped. There is a spike in renderer crashes, but it seems to be stacktrace renaming on old bugs. Had a new https://bugs.chromium.org/p/chromium/issues/detail?id=712969

Comment 10 by w...@chromium.org, Apr 27 2017

Working purely on the data in Stability.Counts UMA, sliced by version rather than channel I see:

57.0.2987.88 -> 57...98 - ~13% increase.
57...98 -> 58.0.3029.19 - ~7% increase.
58...19 -> 58...33 - No significant change.
58...33 -> 58...41 - No significant change.
58...41 -> 58...68 - ~5 reduction.

That seems to fit with the InitializeICU comment describing spike between ...19 and ...68, but leaves a 13% increase between 57...88 and ...98 to explain.

Comment 11 by w...@chromium.org, Apr 27 2017

Cc: roc...@chromium.org scottmg@chromium.org bjoyce@chromium.org
Components: Internals>Mojo
Owner: brucedaw...@chromium.org
Re #10: Hmmm, the numbers fit better with a spike from M57...98->M58...19 once you re-introduce channel, which of course is important due to M57...98 having been promoted to Beta at that time.

The signatures that I see in the top-ten only starting from M58...19 on beta-channel, and still there in M58...81 are:
- StartupBrowserCreator::ProcessCmdLineImpl (see issue 614753, filed for M52; third-party software injecting stuff).
- mojo::edk::Core::WaitManyInternal (see issue 627960, filed for M54; GPU-related hangs being mis-attributed to Mojo).

Bruce, could you take a look at those two signatures to see if there's anything actionable?
For common crashes you can sum the CPM table like this:

https://crash.corp.google.com/dremel_query_ui?q=SELECT%
20cpm_info.version_cpms.version%2C%20SUM(cpm_info.version_cpms.cpm)%0AFROM%
20FLATTEN((SELECT%20Signature%2C%20cpm_info.version_cpms.
version%2C%20cpm_info.version_cpms.cpm%0AFROM%20crash.
analysis.prod.latest%0AWHERE%20product%20%3D%20%27Chrome%
27%20AND%20cpm_info.channel%20%3D%20%27beta%27%0AAND%
20Signature%20IN%20(SELECT%20custom_data.ChromeCrashProto.magic_
signature_1.name%20FROM%20crash.prod.latest%20WHERE%20product.name%20%3D%20%
27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%
20%3D%20%27browser%27%20AND%20custom_data.ChromeCrashProto.channel%20%
3D%20%27beta%27%20AND%20crash.reason%20%3D%20%27Simulated%
20Exception%27%20AND%20product.version%20LIKE%20%
2758%25%27)%20AND%20cpm_info.version_cpms.version%20LIKE%
20%2758%25%27)%2C%20cpm_info.version_cpms)%0AGROUP%20BY%
201%0AORDER%20BY%201%20DESC

In this case it excludes 9,649 reports in 1,153 signatures (these ones
<https://crash.corp.google.com/dremel_query_ui?q=SELECT%20custom_data.ChromeCrashProto.magic_signature_1.name%2C%20COUNT(*)%20FROM%20crash.prod.latest%20WHERE%20product.name%20%3D%20%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%20%3D%20%27browser%27%20AND%20custom_data.ChromeCrashProto.channel%20%3D%20%27beta%27%20AND%20crash.reason%20%3D%20%27Simulated%20Exception%27%20AND%20product.version%20LIKE%20%2758%25%27%0AAND%20custom_data.ChromeCrashProto.magic_signature_1.name%20NOT%20IN%20(SELECT%20Signature%0AFROM%20crash.analysis.prod.latest%0AWHERE%20cpm_info.channel%20%3D%20%27beta%27%20AND%20product%20%3D%20%27Chrome%27%0AAND%20cpm_info.version_cpms.version%20LIKE%20%2758%25%27)%0AGROUP%20BY%201%0AORDER%20BY%202%20DESC%2C%201>)
which don't appear in the analysis table. That's 64% of reports. So you
might be able to impute a CPM from that? I guess for 58.0.3029.81 it is
around 1.2 CPM.

Note you can't compare 1.2 CPM from crash to UMA CPM because they're
measuring different things--crash measures crash reports per million
pageloads; UMA measures unclean shutdowns per million page loads.

Comment 15 by creis@chromium.org, May 24 2017

ligimole@: Just checking in as stability sheriff.  It looks like the crash rate on the original link (https://uma.googleplex.com/timeline_v2?sid=589c19e47093b0f71e94ae3ed6314072) has fallen back down again, closer to where it was (though not quite as low as it was in 59.0.3071.19).

Is it worth putting more time into this issue at this point?
Status: WontFix (was: Available)
Currently M58 is in stable, seeing ~300 CPM  which is closer to M57. Looks like the issue is resolved, closing now.

https://uma.googleplex.com/timeline_v2?sid=40ccec39a4579e91695f35c6fa2cebd9

Sign in to add a comment