New issue
Advanced search Search tips

Issue 683158 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Crashpad handler failing due to out of memory

Project Member Reported by siggi@chromium.org, Jan 20 2017

Issue description

Roughly 75% of Crashpad handler crashes on Windows look to be OOM.
 

Comment 1 by siggi@chromium.org, Jan 20 2017

It's hard to say whether this is the handler doing something silly, or due to a system-wide OOM situation. Adding some metrics capture to the fallback crash handler should allow us to see which it is.

Comment 4 by siggi@chromium.org, Jan 30 2017

Cc: mark@chromium.org scottmg@chromium.org
It looks like a strong majority of the OOM crashes happen when the system is low on available commit. The mode of the distribution has the available system in the range of 4-40 megabytes, which is as near to nothing as makes no difference.

A compounding problem is that when the handler dies in this mode, the crashing process is likely left suspended and hanging forever. If it's a renderer, maybe the browser will offer to shoot it down, but if it's a browser, we're likely leaving a whole Chrome process three hogging all available memory.

Comment 5 by siggi@chromium.org, Jan 30 2017

A common crash site is crashpad::ProcessInfo::BuildHandleVector(void *), where there's a 2MB allocation request. I'm pretty sure an allocation this large goes straight to VirtualAlloc, and so directly makes a >= 2MB commit request.

Maybe it'd be sufficient to do larger allocations with new (nothrow), and handle failure. This would mean either ditching certain levels of detail, grabbing partial detail, or grabbing the detail piece-wise.

It's interesting that we seem to have reasonable success rate in spinning a new process and performing a MinidumpWriteDump under Crasphad handler OOM crashes. I guess this indicates that MDWD is pretty light on memory usage, or perhaps it backs off on the detail it can't allocate memory to grab.

Comment 6 by mark@chromium.org, Jan 30 2017

Ideally we’d just pick up the allocation failure and skip handle collection, and maybe do the same for other things likely to allocate huge chunks of memory. We could also get cute and build a limited handle vector, but I’m not sure how useful it’d be at that point, and I don’t know if we know enough to be able to play the data vs. likelihood of crash tradeoff game well.

Hopefully we’re not getting fatal allocation-failure semantics in crashpad_handler for the time being, and it actually will be possible for us to do a logical “try-allocate” in this and similar cases. Since I’m seeing these show up as EH_EXCEPTION_NUMBER crashes, it seems that this is doable, either with try/catch if we want to allow exception support for allocations such as this, or std::nothrow new as you suggest or even just malloc() otherwise.

Comment 7 by mark@chromium.org, Jan 30 2017

The fallback handler specifies at most MiniDumpWithUnloadedModules | MiniDumpWithProcessThreadData | MiniDumpWithThreadInfo | MiniDumpWithIndirectlyReferencedMemory, so it’s not an apt comparison. Notably, MiniDumpWithHandleData is not present, so MiniDumpWriteDump() is (hopefully) not even trying to grab handle information.

Comment 8 by siggi@chromium.org, Jan 30 2017

The handler should also be a fairy anaemic process, compared to a renderer.
That'll help too.

Comment 9 by mark@chromium.org, Jan 30 2017

What do you mean, just that it won’t be doing a lot of stuff and won’t have run a lot of random wacko Chrome code?

Comment 10 by siggi@chromium.org, Jan 31 2017

Yeah, it should have a lower memory footprint than a Chrome renderer/browser, shouldn't have loaded a whackload of modules (like chrome.dll or chrome_child.dll + a crapton of system stuff), etc.
Project Member

Comment 11 by bugdroid1@chromium.org, Feb 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7a7fbf7c5016a85e4610b9e2fe653a77807e932b

commit 7a7fbf7c5016a85e4610b9e2fe653a77807e932b
Author: siggi <siggi@chromium.org>
Date: Mon Feb 06 20:45:04 2017

Populate allocation size into OOM exception on Windows.

This will allow exposing the attempted allocation size on the crash backend.
Also set up for OOM handling in the crashpad-handler process.

BUG= 683158 

Review-Url: https://codereview.chromium.org/2674253003
Cr-Commit-Position: refs/heads/master@{#448383}

[modify] https://crrev.com/7a7fbf7c5016a85e4610b9e2fe653a77807e932b/base/process/memory_unittest.cc
[modify] https://crrev.com/7a7fbf7c5016a85e4610b9e2fe653a77807e932b/base/process/memory_win.cc
[modify] https://crrev.com/7a7fbf7c5016a85e4610b9e2fe653a77807e932b/components/crash/content/app/run_as_crashpad_handler_win.cc

Comment 13 by siggi@chromium.org, Feb 13 2017

Status: Fixed (was: Started)
Looks like this is ~mostly fixed, though there's perhaps one other allocation site where we could also back off. The only other idea I have is to pad the heap at startup, so that there's some headroom at crash time. Marking fixed until otherwise proven.

Comment 14 by siggi@chromium.org, Feb 13 2017

Issue 682202 has been merged into this issue.
Referring the crashed Builds from issue 682202, the crashes are not seen on M58 after 58.0.3013.3.

58.0.3013.3	0.01%	1	
58.0.3004.4	0.27%	41	
58.0.3004.3	6.07%	923	
58.0.3004.1	0.01%	2	

Can we get this merged into M57 branch before the stable push.
As the crashes are spiking in M57 compared to M56.
57.0.2987.74	8.18%	1243	
57.0.2987.54	15.47%	2352	
57.0.2987.37	17.71%	2692	
57.0.2987.21	19.57%	2975	
57.0.2987.19	2.21%	336	

List of builds where crashes are seen:
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27crashpad%3A%3AProcessInfo%3A%3ABuildHandleVector%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports:5,productversion:1000

Comment 16 by siggi@chromium.org, Feb 28 2017

Labels: Merge-Request-57
I believe I can cherry pick the fix in https://codereview.chromium.org/2688583002 onto the branch reasonably easily.

Comment 17 by siggi@chromium.org, Feb 28 2017

Status: Assigned (was: Fixed)
Project Member

Comment 18 by sheriffbot@chromium.org, Feb 28 2017

Labels: -Merge-Request-57 Hotlist-Merge-Review Merge-Review-57
This bug requires manual review: We are only 13 days from stable.
Please contact the milestone owner if you have questions.
Owners: amineer@(clank), cmasso@(bling), ketakid@(cros), govind@(desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 19 by sheriffbot@chromium.org, Feb 28 2017

Labels: FoundIn-M-57 Fracas
Users experienced this crash on the following builds:

Win Beta 57.0.2987.74 -  1.54 CPM, 1278 reports, 1040 clients (signature crashpad::ProcessInfo::BuildHandleVector)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Labels: -Merge-Review-57 Merge-Approved-57
Approving merge to M57 branch 2987 based on internal mail thread ( 
The fix has been in Canary/Dev for two weeks or so, and has knocked down the crashpad-handler crash rate a lot (60-70-80%).

Comment 21 by siggi@chromium.org, Feb 28 2017

I've tried to land this every which way from Wednesday, and I simply can't. 

The change is uploaded here: 
C:\src\chrome\src>git cl issue
Issue number: 447901 (https://chromium-review.googlesource.com/447901)

But landing it, whether by drover or git cl land both end up here:

C:\src\chrome\src>git cl land -v -v --bypass-hooks
[D2017-02-28 15:02:14,740 11616 17948 subprocess2.py] git symbolic-ref HEAD
[D2017-02-28 15:02:14,776 11616 17948 subprocess2.py] git config branch.b_2987.git-cl-similarity
[D2017-02-28 15:02:14,821 11616 17948 subprocess2.py] git symbolic-ref HEAD
[D2017-02-28 15:02:14,851 11616 17948 subprocess2.py] git config --bool branch.b_2987.git-find-copies
Using 50% similarity for rename/copy detection. Override with --similarity.
[D2017-02-28 15:02:14,923 11616 17948 subprocess2.py] git symbolic-ref HEAD
[D2017-02-28 15:02:14,966 11616 17948 subprocess2.py] git config branch.b_2987.rietveldissue
[D2017-02-28 15:02:15,025 11616 17948 subprocess2.py] git config branch.b_2987.gerritissue
[D2017-02-28 15:02:15,085 11616 17948 subprocess2.py] C:\src\depot_tools\git.bat -c color.ui=never update-index --refresh -q
[D2017-02-28 15:02:16,174 11616 17948 subprocess2.py] C:\src\depot_tools\git.bat -c color.ui=never diff-index --name-status HEAD
[D2017-02-28 15:02:17,611 11616 17948 subprocess2.py] git config branch.b_2987.gerritserver
[D2017-02-28 15:02:17,710 11616 17948 gerrit_util.py] GET https://chromium-review.googlesource.com/a/changes/447901/detail?o=CURRENT_REVISION&o=LABELS&o=CURRENT_COMMIT
[D2017-02-28 15:02:17,726 11616 17948 gerrit_util.py] Authorization: HIDDEN
[D2017-02-28 15:02:18,395 11616 17948 gerrit_util.py] got response 200 for GET /a/changes/447901/detail?o=CURRENT_REVISION&o=LABELS&o=CURRENT_COMMIT

It seems this repository has a Commit Queue, which can test and land changes for you. Are you sure you wish to bypass it?
Press Enter to continue, Ctrl+C to abort.
[D2017-02-28 15:02:20,289 11616 17948 subprocess2.py] git config branch.b_2987.gerritsquashhash
[D2017-02-28 15:02:20,339 11616 17948 subprocess2.py] git diff 623d4c139b3666e23e1fb671d9cd50071759ac6e
[D2017-02-28 15:02:21,769 11616 17948 gerrit_util.py] POST https://chromium-review.googlesource.com/a/changes/447901/submit
[D2017-02-28 15:02:21,786 11616 17948 gerrit_util.py] Content-Type: application/json
[D2017-02-28 15:02:21,796 11616 17948 gerrit_util.py] Authorization: HIDDEN
[D2017-02-28 15:02:21,802 11616 17948 gerrit_util.py] {"wait_for_merge": true}
[D2017-02-28 15:02:22,094 11616 17948 gerrit_util.py] got response 403 for POST /a/changes/447901/submit
Traceback (most recent call last):
  File "C:\src\depot_tools\git_cl.py", line 5333, in <module>
    sys.exit(main(sys.argv[1:]))
  File "C:\src\depot_tools\git_cl.py", line 5315, in main
    return dispatcher.execute(OptionParser(), argv)
  File "C:\src\depot_tools\subcommand.py", line 252, in execute
    return command(parser, args[1:])
  File "C:\src\depot_tools\git_cl.py", line 4363, in CMDland
    options.verbose)
  File "C:\src\depot_tools\git_cl.py", line 2559, in CMDLand
    self.SubmitIssue(wait_for_merge=True)
  File "C:\src\depot_tools\git_cl.py", line 2466, in SubmitIssue
    wait_for_merge=wait_for_merge)
  File "C:\src\depot_tools\gerrit_util.py", line 558, in SubmitChange
    return ReadHttpJsonResponse(conn, ignore_404=False)
  File "C:\src\depot_tools\gerrit_util.py", line 377, in ReadHttpJsonResponse
    conn, expect_status=expect_status, ignore_404=ignore_404)
  File "C:\src\depot_tools\gerrit_util.py", line 370, in ReadHttpResponse
    raise GerritError(response.status, reason)
gerrit_util.GerritError: Forbidden: submit not permitted



How did you end up in Gerrit? Don't you want to just use git drover on https://codereview.chromium.org/2688583002 which will be Rietveld?

Comment 23 by mark@chromium.org, Feb 28 2017

git-cl is talking to Gerrit but I think it should be talking to Rietveld, unless there’s something special about branches.

That happened to me once last week. I was on the Chrome trunk. I worked around it by using git cl issue 0 to clear out the Gerrit association, then git cl upload --rietveld to force it to Rietveld, and then proceeding normally from there.

Theory: this may have to do with the fact that you’re working in a directory with a codereview.settings that specifies Gerrit. git-cl may be finding that instead of Chrome’s top-level one that specifies Rietveld.
Project Member

Comment 24 by bugdroid1@chromium.org, Feb 28 2017

Labels: -merge-approved-57 merge-merged-2987
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2334a333852ba3feadc36f11622f11a9b9201598

commit 2334a333852ba3feadc36f11622f11a9b9201598
Author: Sigurdur Asgeirsson <siggi@chromium.org>
Date: Tue Feb 28 20:17:10 2017

Cherry-pick 6af23a933a0d from Crashpad.

6af23a933a0d Use best-effort allocation in
             ProcessInfo::BuildHandleVector

BUG= 683158 

Review-Url: https://codereview.chromium.org/2688583002
Cr-Commit-Position: refs/heads/master@{#449103}
(cherry picked from commit f161e046ad6ce06ba97c78b9c0618e4bb863eb80)

Change-Id: I2b1160c21c5f4668d8625a709fbc0ab1935927f6
Review-Url: https://codereview.chromium.org/2720903004 .
Cr-Commit-Position: refs/branch-heads/2987@{#721}
Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}

[modify] https://crrev.com/2334a333852ba3feadc36f11622f11a9b9201598/third_party/crashpad/crashpad/util/win/process_info.cc
[modify] https://crrev.com/2334a333852ba3feadc36f11622f11a9b9201598/third_party/crashpad/crashpad/util/win/process_info.h

Comment 25 by siggi@chromium.org, Feb 28 2017

Thanks guys - that was it. A "git cl upload --rietveld", followed by more fetching and merging and crap, finally allowed me to land it.

Comment 26 by siggi@chromium.org, Feb 28 2017

Status: Fixed (was: Assigned)

Comment 27 by mark@chromium.org, Mar 1 2017

Cc: aga...@chromium.org
I found that once git-cl started wanting to talk to Gerrit (which happens after you try working on a Crashpad-only change in Chrome), that horrible-ness “stuck” in Chrome. This command, run from the Chrome checkout, fixes it:

git config --remove-section gerrit

Sign in to add a comment