New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 842878 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

[Stability] : Mac renderer crash rate spikes from 67.0.3396.30 to 67.0.3396.40 due to GPU_DEAD_ON_ARRIVAL.

Project Member Reported by pbomm...@chromium.org, May 14 2018

Issue description

UMA : https://uma.googleplex.com/timeline_v2?sid=28d2e886899cb10a68fbb565fa7a7a06

Chrome Dash : https://chromedash.googleplex.com/dashboard?dashboard=desktop-release-beta


Based on comparison between 66.0.3359.106 to 67.0.3396.40 below are few magic signatures which have spiked 1% or above : https://goto.google.com/xbjdh
	
crbug/840446 : Offscreen::getCG 	
crbug/819685 : blink::MarkingVisitor::Visit 	
crbug/820218 : v8::internal::ConcurrentMarkingVisitor::ProcessStrongHeapObject  
 

Comment 1 by gov...@chromium.org, May 14 2018

Cc: abdulsyed@chromium.org
Owner: ellyjo...@chromium.org
Status: Assigned (was: Untriaged)
+ellyjones@ (Mac TL), could you ptal and reassign if needed? Pls note we only have  two beta releases left before M67 stable promotion.
I'm not observing a significant change in renderer crash rates on beta according to UMA. Chrome dash asserts that 3396.40 has double the CPM of 3396.30, but none of the crash signatures appear to have changed significant. The only thing I can think of is more OOMs, potentially due to site isolation?
Screen Shot 2018-05-14 at 5.05.54 PM.png
228 KB View Download
Screen Shot 2018-05-14 at 5.07.13 PM.png
54.4 KB View Download

Comment 3 by gov...@chromium.org, May 14 2018

Cc: creis@chromium.org
Adding creis@ for site isolation experiment.
Cc: pbomm...@chromium.org
pbommana - can you clarify what data you're using that shows that the crash rate is elevated [more than expected] for beta renderer channel?

Comment 5 by creis@chromium.org, May 14 2018

Cc: nasko@chromium.org
Comment 2: Site Isolation generally reduces the number of OOMs (and the renderer CPMs on Mac in general), likely because we have more, smaller processes.

Here's a link to the renderer CPMs for Mac Beta with and without Site Isolation:
https://uma.googleplex.com/p/chrome/variations/?sid=8dce10390becb6d6b8a57a453e93ef78
@creis - thanks for the link.

Attaching to graphs of per-renderer and total memory usage comparing .30 and .40 - no significant movement between them.

It's not really clear to me how we could be doubling CPM uniformly across all crash categories. Maybe we've changed how we measure CPM or else Chrome dash has an accounting error?
Screen Shot 2018-05-14 at 5.24.47 PM.png
140 KB View Download
Screen Shot 2018-05-14 at 5.24.36 PM.png
161 KB View Download
@creis - btw, per-process memory usage is lower for OOPIF, but total memory usage is higher:
https://uma.googleplex.com/p/chrome/variations/?sid=4b6242d9e68f96cccf380e10d09c070c
Screen Shot 2018-05-14 at 5.28.45 PM.png
189 KB View Download

Comment 8 by creis@chromium.org, May 14 2018

Right-- total memory use is expected to go up with Site Isolation, but we haven't seen that affect renderer OOM reports in practice.
Got it - we're seeing a spike of ~500 renderer crash exit codes, most of which are GPU_DEAD_ON_ARRIVAL.
Screen Shot 2018-05-14 at 5.31.41 PM.png
222 KB View Download
Screen Shot 2018-05-14 at 5.31.54 PM.png
185 KB View Download
Cc: piman@chromium.org ellyjo...@chromium.org sunn...@chromium.org
Components: Internals>GPU
Owner: ----
Status: Untriaged (was: Assigned)
Summary: [Stability] : Mac renderer crash rate spikes from 67.0.3396.30 to 67.0.3396.40 due to GPU_DEAD_ON_ARRIVAL. (was: [Stability] : Mac renderer crash rate has elevated on M67. )
+ sunnyps, piman. Also adding to GPU triage queue.

Comment 11 by piman@chromium.org, May 14 2018

@#9: renderer exits with GPU_DEAD_ON_ARRIVAL? Sounds like a bucketing issue, the only thing that returns that is the GPU process: https://cs.chromium.org/search/?q=RESULT_CODE_GPU_DEAD_ON_ARRIVAL&sq=package:chromium
This graph shows all UMA-reported renderer crashes. The remainder of the 500 spike are caused by INVALID_CMDLINE_URL, but that appears to have a natural variation of a couple hundred across beta versions. The spike in GPU_DEAD_ON_ARRIVAL appears to be the only make change between versions. 
Screen Shot 2018-05-14 at 5.43.27 PM.png
226 KB View Download
Owner: wfh@chromium.org
Status: Assigned (was: Untriaged)
This UMA histogram is reporting ChildProcessTerminationInfo::exit_code:

https://cs.chromium.org/chromium/src/chrome/browser/metrics/chrome_stability_metrics_provider.cc?g=0&l=101

The exit code appears to be just that:
https://cs.chromium.org/chromium/src/base/process/kill_posix.cc?type=cs&g=0&l=45

The termination status has some more information: https://cs.chromium.org/chromium/src/base/process/kill_posix.cc?type=cs&g=0&l=56

So it's really not clear to me how this UMA histogram is supposed to use the CrashExitCodes enum. Specifically, we're seeing a spike of exit code 4 from renderers. Maybe this is divide by zero? As piman@ points out, RESULT_CODE_GPU_DEAD_ON_ARRIVAL makes no sense as that's only well defined for the GPU process.

Over to histogram owner wfh@ to make sense of this.


Comment 14 by wfh@chromium.org, May 16 2018

Owner: ----
Status: Available (was: Assigned)
sorry I know nothing about macOS stability.
Owner: ellyjo...@chromium.org
Status: Assigned (was: Available)
Assigning back to ellyjones@ (Mac TL).

Comment 16 by piman@chromium.org, May 16 2018

If it's due to a crash, I believe the status is the signal number, and 4 would be SIGILL.

Can we check in crash/ ?
M67 Stable promotion is coming soon. Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and request a merge into the release branch ASAP. 

If fix is ready to be merged by Monday 4:00 PM PT, we can take it in for next week last M67 beta release. Thank you.
The only place we're seeing this regression is in the UMA metric. crash/ is not showing any movement.

Cc: rkaplow@chromium.org
+rkaplow@, ptal comment #13 and #18. Thank you.

Comment 20 by rkaplow@google.com, May 18 2018

I'm having trouble making sense of this analysis so far. The mac renderer CPM did seem to go up, however the signal is very noisy and it's been at this level recently:
https://uma.googleplex.com/timeline_v2?sid=28d2e886899cb10a68fbb565fa7a7a06


Based on https://uma.googleplex.com/timeline_v2?sid=1f763764c6b6589d72ce565e5223ff50
THe increase could be the GPU 

However if we look at it on a 1day basis
https://uma.googleplex.com/timeline_v2?sid=1a0f523cc03ac6a74d696f0bcb942fe0
it was just a one day spike, and isn't recurring. And it;s only 

SO not sure this is worth investigating
Labels: -ReleaseBlock-Stable
Status: WontFix (was: Assigned)
Thanks rkaplow. You're right, the 1-day view makes it clear that this is just a 1-day spike. Let's WontFix this for now and reopen if there's something more actionable.

Sign in to add a comment