New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 800511 link

Starred by 12 users

Issue metadata

Status: Started
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 774288
issue 806661



Sign in to add a comment

Renderer CPM higher on Windows Stable starting in M64

Project Member Reported by bustamante@chromium.org, Jan 9 2018

Issue description

Chrome Version: 64.0.3282.39
OS: Windows

Renderer CPM increase significantly around Dec 16th, and was still high by the end of the month.  Jumping from around 120 CPM to roughly 155 CPM, almost a 30% increase.

https://uma.googleplex.com/p/chrome/timeline_v2?sid=12bfb5044a2c0cf51e7462676f878c18


 

Comment 1 by wfh@chromium.org, Jan 9 2018

Cc: abdulsyed@chromium.org wfh@chromium.org
Labels: -Pri-3 Stability OS-Windows Pri-2
Dec 16th would fit with M64 reaching Beta. https://omahaproxy.appspot.com/history?os=win&channel=beta
This renderer spike is an aggregation all "beta" milestones starting M62. If you select M64 alone then you see dip in the renderer CPM. 

M64 Beta: https://uma.googleplex.com/p/chrome/timeline_v2?sid=f43b0f1504993caadd4eafbb563ee069

>=M62 Beta: https://uma.googleplex.com/p/chrome/timeline_v2?sid=3bb0f03ee7c90dfccf54b4b5b5f302ea
More analysis here: https://docs.google.com/document/d/1AVlTmcrSxYwk0rxQqjSWD0FQtqiLcPh1a2N_8XOu1qE/edit

The main difference between the renderer crashes between M63 and M64 Beta is a spike of about 10% in "[Out of Memory] base::PartitionBucket::SlowPathAlloc" magic signature. crbug/788293. 

regarding #2: Since the pageloads drop after version update, I am not sure if we can be confident about the data. 
https://uma.googleplex.com/timeline_v2?sid=642990975f2ba3e96aaf438a94a58e96 

Is there a way to see what experiments may have been turned on and may be cause increase in renderer CPM?


Status: Available (was: Untriaged)
Cc: hablich@chromium.org palmer@chromium.org adamk@chromium.org
Labels: -Pri-2 ReleaseBlock-Stable Pri-1
Labels: Stability-Memory

Comment 8 by adamk@chromium.org, Jan 16 2018

Cc: yangguo@chromium.org hpayer@chromium.org
Adding this week's V8 stability and memory sheriffs.

Comment 9 by siggi@chromium.org, Jan 17 2018

Looks like there's a 50% bump in sandbox OOM kills on 64 bits around this time too. https://uma.googleplex.com/p/chrome/timeline_v2/?sid=234d5838a2d54c6b203bc5df37a581e6
Dev also increases around the same time and version: https://uma.googleplex.com/p/chrome/timeline_v2?sid=f71bf72d9456c459b3dbd45d5297e7ce 

65 looks normal: https://uma.googleplex.com/p/chrome/timeline_v2?sid=f1ffdba3a3215fba0f8da093fe628b6d

Far shot: Some Finch experiment was rolled out/ramped up on 64 only? Maybe site isolation related?
Cc: bbudge@chromium.org
Cc: haraken@chromium.org
I've prepared a patch to show how to disable address space reservation on 32 bit Windows. It's a speculative fix that we may want to merge to M64 if we can't identify the cause of the OOM's.

It should merge cleanly, and I think it's a safe change.

I would expect this to help allocation failures that don't involve base/allocator/partition_allocator/page_allocator. Note that partition_alloc uses page_allocator (see comment #3).

https://chromium-review.googlesource.com/c/chromium/src/+/874755
Cc: rkaplow@chromium.org
Owner: wfh@chromium.org
I've attached vmmap (available as part of the sysinternals suite at https://docs.microsoft.com/en-us/sysinternals/downloads/sysinternals-suite) recordings of slashdot.org from M63 and M65. The bug is quite obvious once you know what to look for.

If we look at the Private Data line we can see that Private WS and Committed are both lower in M65 compared to M63. However Size (reserved address space is higher).

If we select the Private Data line and then sort the bottom data by Size then the huge number of allocations with 960 K as their size and 512 K as their committed/private/Total WS/Private WS becomes obvious. These 960 K blocks also have two blocks - one committed and one reserved. On M63 these blocks have a size of 512 K and one block.

A vmmap capture of a build with the fix will make it obvious whether the fix worked or not.

I don't know why the private committed data is lower in M65, but that's good.

ChromeM63.mmp
694 KB Download
ChromeM65.mmp
672 KB Download

Comment 16 by wfh@chromium.org, Jan 19 2018

Labels: -Restrict-View-Google
Project Member

Comment 17 by bugdroid1@chromium.org, Jan 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/a9aeaa65e654a1590695f82dc80c1ce9e4c02ef4

commit a9aeaa65e654a1590695f82dc80c1ce9e4c02ef4
Author: Bill Budge <bbudge@chromium.org>
Date: Fri Jan 19 10:17:02 2018

[memory] Change OS::Allocate on Windows to not over allocate.

- Changes OS::Allocate to first try an exact size aligned
  allocation, then padded allocations. All padded allocations should
  be trimmed.

Bug: chromium:800511
Change-Id: Iccab2eddbf2a3b08d2b83b95f96c766c9fad7a82
Reviewed-on: https://chromium-review.googlesource.com/875242
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Commit-Queue: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#50706}
[modify] https://crrev.com/a9aeaa65e654a1590695f82dc80c1ce9e4c02ef4/src/base/platform/platform-cygwin.cc
[modify] https://crrev.com/a9aeaa65e654a1590695f82dc80c1ce9e4c02ef4/src/base/platform/platform-win32.cc

Comment 18 by w...@chromium.org, Jan 19 2018

Looks like this was the likely root-cause of issue 799837.
I agree. My "fix" helped by releasing address space, but didn't address the root cause.

Comment 20 by wfh@chromium.org, Jan 19 2018

Owner: bbudge@chromium.org
Status: Started (was: Available)
looks like we need to check that the fix in #17 looks good in the vmmmap for the next canary, then merge this to M64. I'm assigning to Bill since it looks like he's best place to make sure this happens.
Labels: Merge-Request-64
Status: Fixed (was: Started)
I verified allocations which weren't trimmed (512K with 960K reserved) are now trimmed. Browser seemed stable with a lot of clicking around.

V8 team are cc'ed to assist with merging if needed. Thanks everyone for your help fixing this.
Project Member

Comment 22 by sheriffbot@chromium.org, Jan 20 2018

Labels: -Merge-Request-64 Hotlist-Merge-Review Merge-Review-64
This bug requires manual review: We are only 2 days from stable.
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 23 by wfh@chromium.org, Jan 20 2018

I think the fix is in v8 6.6.1 which is https://chromium.googlesource.com/v8/v8/+log/refs/heads/6.6.1 and is not yet on Canary but should be in 66.0.3326.0 (and it will also need an M65 merge).

Since we won't be cutting a new Beta until Monday (at the earliest) I vote for giving this 24hrs or so on M66 Canary, then merging to M65 then M64 on Monday morning.
This is already merged to 6.5. 
https://chromium.googlesource.com/v8/v8.git/+log/6.5-lkgr
To clarify, I tested a pre-release Canary build, 65.0.3325.3, 32 bit on Windows.
Labels: -Merge-Review-64 Merge-Approved-64
Approving merge for M64 (conditionally once it's well tested and verified in Canary with over weekend's worth of data to ensure no issues, regressions, or stability concerns).
Owner: hpayer@chromium.org
Status: Assigned (was: Fixed)
hpayer@ - can you please merge this to M64 V8 branch? bbudge@ is OOO. (marking it as assigned until fix is landed in 6.4)
Project Member

Comment 28 by bugdroid1@chromium.org, Jan 22 2018

Labels: merge-merged-6.4
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/66948e96674bfccad51219e29e3650957b714099

commit 66948e96674bfccad51219e29e3650957b714099
Author: Hannes Payer <hpayer@chromium.org>
Date: Mon Jan 22 15:06:58 2018

Merged: [memory] Change OS::Allocate on Windows to not over allocate.

Revision: a9aeaa65e654a1590695f82dc80c1ce9e4c02ef4

BUG=chromium:800511
LOG=N
NOTRY=true
NOPRESUBMIT=true
NOTREECHECKS=true
R=mlippautz@chromium.org

Change-Id: I59b42759b36af19824acc88742c498736725cda5
Reviewed-on: https://chromium-review.googlesource.com/878621
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Cr-Commit-Position: refs/branch-heads/6.4@{#80}
Cr-Branched-From: 0407506af3d9d7e2718be1d8759296165b218fcf-refs/heads/6.4.388@{#1}
Cr-Branched-From: a5fc4e085ee543cb608eb11034bc8f147ba388e1-refs/heads/master@{#49724}
[modify] https://crrev.com/66948e96674bfccad51219e29e3650957b714099/src/base/platform/platform-cygwin.cc
[modify] https://crrev.com/66948e96674bfccad51219e29e3650957b714099/src/base/platform/platform-win32.cc

Status: Fixed (was: Assigned)
Labels: -Merge-Approved-64
Project Member

Comment 31 by bugdroid1@chromium.org, Jan 24 2018

Labels: merge-merged-6.5
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/8fa2c8d2189b7847ce46b05531b52d7091487b96

commit 8fa2c8d2189b7847ce46b05531b52d7091487b96
Author: Michael Hablich <hablich@chromium.org>
Date: Wed Jan 24 15:37:48 2018

Merged: [memory] Change OS::Allocate on Windows to not over allocate.

Revision: a9aeaa65e654a1590695f82dc80c1ce9e4c02ef4

BUG=chromium:800511,805439
LOG=N
NOTRY=true
NOPRESUBMIT=true
NOTREECHECKS=true
R=machenbach@chromium.org

Change-Id: Ic048861e0ce04dea0ee7ec2ee71112c29ff11506
Reviewed-on: https://chromium-review.googlesource.com/883803
Reviewed-by: Michael Hablich <hablich@chromium.org>
Cr-Commit-Position: refs/branch-heads/6.5@{#7}
Cr-Branched-From: 73c55f57fe8506011ff854b15026ca765b669700-refs/heads/6.5.254@{#1}
Cr-Branched-From: 594a1a0b6e551397cfdf50870f6230da34db2dc8-refs/heads/master@{#50664}
[modify] https://crrev.com/8fa2c8d2189b7847ce46b05531b52d7091487b96/src/base/platform/platform-cygwin.cc
[modify] https://crrev.com/8fa2c8d2189b7847ce46b05531b52d7091487b96/src/base/platform/platform-win32.cc

Comment 32 by wfh@chromium.org, Jan 29 2018

Blockedon: 806661
Owner: wfh@chromium.org
Status: Assigned (was: Fixed)
Unfortunately, looking at this closer, it's certainly clear to me when looking at overall m64 beta stability, the root cause of the increase in CPM is hangs (WAIT_TIMEOUT or RESULT_CODE_HUNG), not OOMs. 

https://uma.googleplex.com/p/chrome/timeline_v2?sid=209097b90d18dc30dadc4d0bdb2967f6

These hangs are being investigated in https://bugs.chromium.org/p/chromium/issues/detail?id=806661

it seems the reduction later on in beta promotion was due to RESULT_CODE_KILLED_BAD_MESSAGE mostly vanishing.

Comment 33 by wfh@chromium.org, Jan 29 2018

sorry, to clarify, it seems the OOM fix in #17 did fix 32-bit (and helped a ton), but 64-bit is still suffering mostly from hangs...
Labels: M-65
Labels: -Pri-1 Pri-0
wfh@,
Friendly ping to get an update on this issue as its Priority changed to '0' & its stable blocker for M65.

Thanks in advance..!
Hi jmukthavaram@ - we're still actively investigating this.
Cc: e...@chromium.org
Cc: pabrai@chromium.org
Mentioned in b/806661, but I looked into the UMA data. Both WAIT_TIMEOUT and STATUS_ACCESS_VIOLATION have serious regressions that could be root causes of the renderer regression.

Comment 42 by wfh@chromium.org, Feb 1 2018

Summary: Renderer CPM higher on Windows Stable M64 (was: Renderer CPM higher on Windows Beta)
This regression has graduated to stable.
Blockedon: 774288
Cc: wolenetz@chromium.org

Comment 45 by wfh@chromium.org, Feb 8 2018

FWIW, looking at renderer_launch_count and comparing with page_load_count, it seems M64 does have an increase in avg number of renderer launches per page load, possibly naturally as a result of site isolation.

Renderer launches per 100 page loads: (http://shortn/_0jurDWftEA)

M61 - 7
M62 - 6.6
M63 - 7.1
(avg for M61-M62 is 6.9)
M64 - 8.6 (24% increase)

This could explain some of the overall increase, but when slicing renderer_crash_count over renderer_launch_count there still appears to be around an 18% overall increase in M64 (http://shortn/_slROUHfQFI)




Project Member

Comment 46 by sheriffbot@chromium.org, Feb 12 2018

Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable?

If a fix is in active development, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 47 by wfh@chromium.org, Feb 12 2018

Status: Started (was: Assigned)
M65 Stable promotion is coming VERY soon. Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and request a merge  into the release branch ASAP. Thank you.

Comment 49 by wfh@chromium.org, Feb 15 2018

renderer CPM for latest M64 spin 64.0.3282.167/168 on 64-bit seems far lower.

63.0.3239.132 - 85 CPM
64.0.3282.140 - 291 CPM
64.0.3282.167/168 - 100 CPM

Also, renderer CPM on m65 beta seems lower too. It seems a fix was landed 
between 140 and .167 that un-regressed this, or the population changed somehow. There doesn't appear to be a discernible change in crash/ signatures between .140 and .167 which indicates some kind of (large) data gap.
Just to update the latest behavior of this issue, observing below cpm on both M64 and M65 channels for Windows platform under renderer process.

M64:
----
63.0.3239.132 - 115.954 cpm
64.0.3282.140 - 146.898 cpm
64.0.3282.167 - 132.595 cpm
64.0.3282.168 - 124.136 cpm

M65:
----
65.0.3325.51 - 120.474 cpm
65.0.3325.73 - 117.958 cpm

Thanks!




M65 Stable promotion is coming VERY soon. Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and request a merge into the release branch ASAP. Merge has to happen latest by 4:00 PM PT Monday (02/26/18) in order to make it to last M65 beta release next week. Thank you.
Gentle ping !!
It is marked as stable blocker for M65. Please merge fix ASAP as M65 stable promotion is scheduled today night.

Thanks..!
Cc: pamg@chromium.org amineer@chromium.org
There is no new fix yet and M65 stable promotion is coming soon next week. Renderer CPM on m65 beta seems lower per comments #49 and #50.  Should we still consider this a blocker for M65?

Comment 54 by wfh@chromium.org, Feb 27 2018

yes this should still be a blocker. Looking at 14-day aggregated renderer CPM numbers for milestones (at peak data volume over sliding 14 day window), I see 

M63: 105.856 (data vol: 3.1 billion)
M64: 122.253 (data vol: 2.8 billion)
M65: 126.721 (data vol: 2.6 billion)

source: https://uma.googleplex.com/p/chrome/timeline_v2?sid=63f098b254ee189c60d693566e019d29

Given, for the last regression on M64, stable turned out to be worse than beta (in terms of % regression) - I do not have any data to conclude that this is fixed on M65 (yet).


Comment 55 by lfg@chromium.org, Mar 2 2018

I just went through the top 100 renderer crashes. The main new crashes since M63 are issue 793887 and issue 809784.

I commented on issue 793887 and I'm hoping we can reduce the number of crashes reported there. Issue 809784 is still a bit concerning and may need more investigation.

I also have the feeling that the number of OOMs is proportionally increasing, do we have any metrics that track that specifically?

Comment 56 by lfg@chromium.org, Mar 2 2018

Labels: Stability-Sheriff-Desktop

Comment 57 by wfh@chromium.org, Mar 2 2018

Thanks for looking at this. We do track the OOMs (from renderers) - they are also in CrashExitCodes.Renderer. On 32-bit they are all in the "Out of Memory" bucket, and on 64-bit they are are in SUM("Out of Memory" and SBOX_FATAL_MEMORY_EXCEEDED).

They can also be tracked via OOM sad tabs which are Tabs.SadTab.OomCreated

There's a ton of data on M64 here -> http://shortn/_kaWYXYeYt0

TL;DR; OOMs actually seem *down* to me, with plain old crashes (exception) up, and WAIT_TIMEOUT up massively.  It's interesting that you believe OOMs are going up on crash/ - it's possible because we are losing some large proportion of crashes and so proportionally the OOMs are increased...?
Something to note about issue 809784 is that crashes with that magic signature used to be lumped in with "v8::internal::`anonymous namespace'::Invoke", which includes any crash in V8 generated code. So a spike there isn't necessarily "new".

Comment 59 by lfg@chromium.org, Mar 5 2018

Comment #55 should've said issue 793277 instead of 793887.

Comment 61 by wfh@chromium.org, Mar 5 2018

Thanks for that data, lfg.

As part of this bug I have only looked at renderer data and have not looked at browser data at all (I rarely look at browser, I think siggi looks at browser far more than I do).

I'm not even sure of the best way to analyze browser memory usage... or number of OOMs (c.f. issue 667354) - there is an UMA metric Stability.BrowserExitCodes but I am not sure how much I trust this - as looking at http://shortn/_FuuExyJRgM and comparing SUM(anything other than normal exit) and the value of "crashes" in the stability counts shows dramatically different values...

I suggest we track any browser regressions in a separate bug to keep this one about the renderer.
I know of one bug that seems to be hitting quite a few Google employees and probably others, causing crashes that evade crashpad. The problem is tracked in crbug.com/792289 and it is heap corruption inside Microsoft's crypto code. When the heap detects heap corruption it does a fast-fail and no crash is recorded.

The dates don't line up particularly well so it is probably unrelated, but...

If nothing else, heap corruption is one cause of crashes that crashpad cannot catch.

Cc: tommi@chromium.org
Issue 813218 has been affecting WebRTC based apps pretty severely recently. Might be related to this.
Labels: M-66
Friendly ping to get an update on this issue as it is marked as stable blocker.

Thanks..!
[stability sheriff] Should this be a P0 and ReleaseBlock-Stable? If this isn't being actively worked on, we should remove those labels now.

Comment 67 by pam@chromium.org, Mar 21 2018

Labels: -Pri-0 -Stability-Memory -ReleaseBlock-Stable Stability-Crash Pri-1
Summary: Renderer CPM higher on Windows Stable starting in M64 (was: Renderer CPM higher on Windows Stable M64)
The remaining regression is still being worked on, but with careful consideration we have decided not to block releases on it.

Comment 68 by pam@chromium.org, Mar 21 2018

Cc: -pamg@chromium.org pam@chromium.org
Labels: -Stability-Sheriff-Desktop
Cc: -amineer@chromium.org
No longer on the Chrome team, e-mail me @google.com if any attention still required from me here, otherwise good luck!

Sign in to add a comment