Crashpad handler failing due to out of memory |
|||||||||||
Issue descriptionRoughly 75% of Crashpad handler crashes on Windows look to be OOM.
,
Jan 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f623c0b50e5dec7eafdfc5699412ea60548471a2 commit f623c0b50e5dec7eafdfc5699412ea60548471a2 Author: siggi <siggi@chromium.org> Date: Fri Jan 20 17:00:19 2017 Grab system and process memory stats into crash keys in fallback handler. R=scottmg@chromium.org BUG= 683158 Review-Url: https://codereview.chromium.org/2643273002 Cr-Commit-Position: refs/heads/master@{#445079} [modify] https://crrev.com/f623c0b50e5dec7eafdfc5699412ea60548471a2/components/crash/content/app/fallback_crash_handler_win.cc [modify] https://crrev.com/f623c0b50e5dec7eafdfc5699412ea60548471a2/components/crash/content/app/fallback_crash_handler_win_unittest.cc
,
Jan 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3175abc1ac46d878e39af3e5956436d48d5518fb commit 3175abc1ac46d878e39af3e5956436d48d5518fb Author: manzagop <manzagop@chromium.org> Date: Wed Jan 25 14:45:14 2017 Grab crash handler's peak commit charge in fallback handler This is a follow up to https://crrev.com/2643273002. BUG= 683158 Review-Url: https://codereview.chromium.org/2653953002 Cr-Commit-Position: refs/heads/master@{#446016} [modify] https://crrev.com/3175abc1ac46d878e39af3e5956436d48d5518fb/components/crash/content/app/fallback_crash_handler_win.cc [modify] https://crrev.com/3175abc1ac46d878e39af3e5956436d48d5518fb/components/crash/content/app/fallback_crash_handler_win_unittest.cc
,
Jan 30 2017
It looks like a strong majority of the OOM crashes happen when the system is low on available commit. The mode of the distribution has the available system in the range of 4-40 megabytes, which is as near to nothing as makes no difference. A compounding problem is that when the handler dies in this mode, the crashing process is likely left suspended and hanging forever. If it's a renderer, maybe the browser will offer to shoot it down, but if it's a browser, we're likely leaving a whole Chrome process three hogging all available memory.
,
Jan 30 2017
A common crash site is crashpad::ProcessInfo::BuildHandleVector(void *), where there's a 2MB allocation request. I'm pretty sure an allocation this large goes straight to VirtualAlloc, and so directly makes a >= 2MB commit request. Maybe it'd be sufficient to do larger allocations with new (nothrow), and handle failure. This would mean either ditching certain levels of detail, grabbing partial detail, or grabbing the detail piece-wise. It's interesting that we seem to have reasonable success rate in spinning a new process and performing a MinidumpWriteDump under Crasphad handler OOM crashes. I guess this indicates that MDWD is pretty light on memory usage, or perhaps it backs off on the detail it can't allocate memory to grab.
,
Jan 30 2017
Ideally we’d just pick up the allocation failure and skip handle collection, and maybe do the same for other things likely to allocate huge chunks of memory. We could also get cute and build a limited handle vector, but I’m not sure how useful it’d be at that point, and I don’t know if we know enough to be able to play the data vs. likelihood of crash tradeoff game well. Hopefully we’re not getting fatal allocation-failure semantics in crashpad_handler for the time being, and it actually will be possible for us to do a logical “try-allocate” in this and similar cases. Since I’m seeing these show up as EH_EXCEPTION_NUMBER crashes, it seems that this is doable, either with try/catch if we want to allow exception support for allocations such as this, or std::nothrow new as you suggest or even just malloc() otherwise.
,
Jan 30 2017
The fallback handler specifies at most MiniDumpWithUnloadedModules | MiniDumpWithProcessThreadData | MiniDumpWithThreadInfo | MiniDumpWithIndirectlyReferencedMemory, so it’s not an apt comparison. Notably, MiniDumpWithHandleData is not present, so MiniDumpWriteDump() is (hopefully) not even trying to grab handle information.
,
Jan 30 2017
The handler should also be a fairy anaemic process, compared to a renderer. That'll help too.
,
Jan 30 2017
What do you mean, just that it won’t be doing a lot of stuff and won’t have run a lot of random wacko Chrome code?
,
Jan 31 2017
Yeah, it should have a lower memory footprint than a Chrome renderer/browser, shouldn't have loaded a whackload of modules (like chrome.dll or chrome_child.dll + a crapton of system stuff), etc.
,
Feb 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7a7fbf7c5016a85e4610b9e2fe653a77807e932b commit 7a7fbf7c5016a85e4610b9e2fe653a77807e932b Author: siggi <siggi@chromium.org> Date: Mon Feb 06 20:45:04 2017 Populate allocation size into OOM exception on Windows. This will allow exposing the attempted allocation size on the crash backend. Also set up for OOM handling in the crashpad-handler process. BUG= 683158 Review-Url: https://codereview.chromium.org/2674253003 Cr-Commit-Position: refs/heads/master@{#448383} [modify] https://crrev.com/7a7fbf7c5016a85e4610b9e2fe653a77807e932b/base/process/memory_unittest.cc [modify] https://crrev.com/7a7fbf7c5016a85e4610b9e2fe653a77807e932b/base/process/memory_win.cc [modify] https://crrev.com/7a7fbf7c5016a85e4610b9e2fe653a77807e932b/components/crash/content/app/run_as_crashpad_handler_win.cc
,
Feb 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f161e046ad6ce06ba97c78b9c0618e4bb863eb80 commit f161e046ad6ce06ba97c78b9c0618e4bb863eb80 Author: siggi <siggi@chromium.org> Date: Wed Feb 08 22:11:58 2017 Update Crashpad to 6af23a933a0dc10cda072570cc02efcd03488a90 6af23a933a0d Use best-effort allocation in ProcessInfo::BuildHandleVector BUG= 683158 Review-Url: https://codereview.chromium.org/2688583002 Cr-Commit-Position: refs/heads/master@{#449103} [modify] https://crrev.com/f161e046ad6ce06ba97c78b9c0618e4bb863eb80/third_party/crashpad/README.chromium [modify] https://crrev.com/f161e046ad6ce06ba97c78b9c0618e4bb863eb80/third_party/crashpad/crashpad/DEPS [modify] https://crrev.com/f161e046ad6ce06ba97c78b9c0618e4bb863eb80/third_party/crashpad/crashpad/util/win/process_info.cc [modify] https://crrev.com/f161e046ad6ce06ba97c78b9c0618e4bb863eb80/third_party/crashpad/crashpad/util/win/process_info.h
,
Feb 13 2017
Looks like this is ~mostly fixed, though there's perhaps one other allocation site where we could also back off. The only other idea I have is to pad the heap at startup, so that there's some headroom at crash time. Marking fixed until otherwise proven.
,
Feb 13 2017
Issue 682202 has been merged into this issue.
,
Feb 28 2017
Referring the crashed Builds from issue 682202, the crashes are not seen on M58 after 58.0.3013.3. 58.0.3013.3 0.01% 1 58.0.3004.4 0.27% 41 58.0.3004.3 6.07% 923 58.0.3004.1 0.01% 2 Can we get this merged into M57 branch before the stable push. As the crashes are spiking in M57 compared to M56. 57.0.2987.74 8.18% 1243 57.0.2987.54 15.47% 2352 57.0.2987.37 17.71% 2692 57.0.2987.21 19.57% 2975 57.0.2987.19 2.21% 336 List of builds where crashes are seen: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27crashpad%3A%3AProcessInfo%3A%3ABuildHandleVector%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports:5,productversion:1000
,
Feb 28 2017
I believe I can cherry pick the fix in https://codereview.chromium.org/2688583002 onto the branch reasonably easily.
,
Feb 28 2017
,
Feb 28 2017
This bug requires manual review: We are only 13 days from stable. Please contact the milestone owner if you have questions. Owners: amineer@(clank), cmasso@(bling), ketakid@(cros), govind@(desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Feb 28 2017
Users experienced this crash on the following builds: Win Beta 57.0.2987.74 - 1.54 CPM, 1278 reports, 1040 clients (signature crashpad::ProcessInfo::BuildHandleVector) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Feb 28 2017
Approving merge to M57 branch 2987 based on internal mail thread ( The fix has been in Canary/Dev for two weeks or so, and has knocked down the crashpad-handler crash rate a lot (60-70-80%).
,
Feb 28 2017
I've tried to land this every which way from Wednesday, and I simply can't. The change is uploaded here: C:\src\chrome\src>git cl issue Issue number: 447901 (https://chromium-review.googlesource.com/447901) But landing it, whether by drover or git cl land both end up here: C:\src\chrome\src>git cl land -v -v --bypass-hooks [D2017-02-28 15:02:14,740 11616 17948 subprocess2.py] git symbolic-ref HEAD [D2017-02-28 15:02:14,776 11616 17948 subprocess2.py] git config branch.b_2987.git-cl-similarity [D2017-02-28 15:02:14,821 11616 17948 subprocess2.py] git symbolic-ref HEAD [D2017-02-28 15:02:14,851 11616 17948 subprocess2.py] git config --bool branch.b_2987.git-find-copies Using 50% similarity for rename/copy detection. Override with --similarity. [D2017-02-28 15:02:14,923 11616 17948 subprocess2.py] git symbolic-ref HEAD [D2017-02-28 15:02:14,966 11616 17948 subprocess2.py] git config branch.b_2987.rietveldissue [D2017-02-28 15:02:15,025 11616 17948 subprocess2.py] git config branch.b_2987.gerritissue [D2017-02-28 15:02:15,085 11616 17948 subprocess2.py] C:\src\depot_tools\git.bat -c color.ui=never update-index --refresh -q [D2017-02-28 15:02:16,174 11616 17948 subprocess2.py] C:\src\depot_tools\git.bat -c color.ui=never diff-index --name-status HEAD [D2017-02-28 15:02:17,611 11616 17948 subprocess2.py] git config branch.b_2987.gerritserver [D2017-02-28 15:02:17,710 11616 17948 gerrit_util.py] GET https://chromium-review.googlesource.com/a/changes/447901/detail?o=CURRENT_REVISION&o=LABELS&o=CURRENT_COMMIT [D2017-02-28 15:02:17,726 11616 17948 gerrit_util.py] Authorization: HIDDEN [D2017-02-28 15:02:18,395 11616 17948 gerrit_util.py] got response 200 for GET /a/changes/447901/detail?o=CURRENT_REVISION&o=LABELS&o=CURRENT_COMMIT It seems this repository has a Commit Queue, which can test and land changes for you. Are you sure you wish to bypass it? Press Enter to continue, Ctrl+C to abort. [D2017-02-28 15:02:20,289 11616 17948 subprocess2.py] git config branch.b_2987.gerritsquashhash [D2017-02-28 15:02:20,339 11616 17948 subprocess2.py] git diff 623d4c139b3666e23e1fb671d9cd50071759ac6e [D2017-02-28 15:02:21,769 11616 17948 gerrit_util.py] POST https://chromium-review.googlesource.com/a/changes/447901/submit [D2017-02-28 15:02:21,786 11616 17948 gerrit_util.py] Content-Type: application/json [D2017-02-28 15:02:21,796 11616 17948 gerrit_util.py] Authorization: HIDDEN [D2017-02-28 15:02:21,802 11616 17948 gerrit_util.py] {"wait_for_merge": true} [D2017-02-28 15:02:22,094 11616 17948 gerrit_util.py] got response 403 for POST /a/changes/447901/submit Traceback (most recent call last): File "C:\src\depot_tools\git_cl.py", line 5333, in <module> sys.exit(main(sys.argv[1:])) File "C:\src\depot_tools\git_cl.py", line 5315, in main return dispatcher.execute(OptionParser(), argv) File "C:\src\depot_tools\subcommand.py", line 252, in execute return command(parser, args[1:]) File "C:\src\depot_tools\git_cl.py", line 4363, in CMDland options.verbose) File "C:\src\depot_tools\git_cl.py", line 2559, in CMDLand self.SubmitIssue(wait_for_merge=True) File "C:\src\depot_tools\git_cl.py", line 2466, in SubmitIssue wait_for_merge=wait_for_merge) File "C:\src\depot_tools\gerrit_util.py", line 558, in SubmitChange return ReadHttpJsonResponse(conn, ignore_404=False) File "C:\src\depot_tools\gerrit_util.py", line 377, in ReadHttpJsonResponse conn, expect_status=expect_status, ignore_404=ignore_404) File "C:\src\depot_tools\gerrit_util.py", line 370, in ReadHttpResponse raise GerritError(response.status, reason) gerrit_util.GerritError: Forbidden: submit not permitted
,
Feb 28 2017
How did you end up in Gerrit? Don't you want to just use git drover on https://codereview.chromium.org/2688583002 which will be Rietveld?
,
Feb 28 2017
git-cl is talking to Gerrit but I think it should be talking to Rietveld, unless there’s something special about branches. That happened to me once last week. I was on the Chrome trunk. I worked around it by using git cl issue 0 to clear out the Gerrit association, then git cl upload --rietveld to force it to Rietveld, and then proceeding normally from there. Theory: this may have to do with the fact that you’re working in a directory with a codereview.settings that specifies Gerrit. git-cl may be finding that instead of Chrome’s top-level one that specifies Rietveld.
,
Feb 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2334a333852ba3feadc36f11622f11a9b9201598 commit 2334a333852ba3feadc36f11622f11a9b9201598 Author: Sigurdur Asgeirsson <siggi@chromium.org> Date: Tue Feb 28 20:17:10 2017 Cherry-pick 6af23a933a0d from Crashpad. 6af23a933a0d Use best-effort allocation in ProcessInfo::BuildHandleVector BUG= 683158 Review-Url: https://codereview.chromium.org/2688583002 Cr-Commit-Position: refs/heads/master@{#449103} (cherry picked from commit f161e046ad6ce06ba97c78b9c0618e4bb863eb80) Change-Id: I2b1160c21c5f4668d8625a709fbc0ab1935927f6 Review-Url: https://codereview.chromium.org/2720903004 . Cr-Commit-Position: refs/branch-heads/2987@{#721} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/2334a333852ba3feadc36f11622f11a9b9201598/third_party/crashpad/crashpad/util/win/process_info.cc [modify] https://crrev.com/2334a333852ba3feadc36f11622f11a9b9201598/third_party/crashpad/crashpad/util/win/process_info.h
,
Feb 28 2017
Thanks guys - that was it. A "git cl upload --rietveld", followed by more fetching and merging and crap, finally allowed me to land it.
,
Feb 28 2017
,
Mar 1 2017
I found that once git-cl started wanting to talk to Gerrit (which happens after you try working on a Crashpad-only change in Chrome), that horrible-ness “stuck” in Chrome. This command, run from the Chrome checkout, fixes it: git config --remove-section gerrit |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by siggi@chromium.org
, Jan 20 2017