Issue metadata
Sign in to add a comment
|
sad tab with no crash reported |
||||||||||||||||||||
Issue descriptionVersion: I think this started in 55.0.2870.0 - 64-bit. Canary. OS: Windows 10 What steps will reproduce the problem? (1) Both times the tab was hosting gmail (2) Tab crashed with a sad tab (3) No crash in chrome://crashes What is the expected output? A crash in chrome://crashes. An increment of stability renderer_crash_count. A histogram in crashexitcodes.renderer What do you see instead? Sad tab page, an increment in Tabs.SadTab.CrashCreated histogram, but no crash reported and the histograms above not incremented Please use labels and text to provide additional information. I have no reproduction steps. It happened twice, once on 2016-09-24 with Chrome 55.0.2870.0 and once on 2016-09-25 with 55.0.2871.0
,
Sep 26 2016
Able to reproduce this crash on 55.0.2871.0 on Windows during navigation in gmail.
,
Sep 27 2016
Have seen this on Win 10 canary 55.0.2872.0, and was able to work again on refreshing the Tab.Do not have a repro steps to further triage it. Added TE-NeedsTriageHelp as it can't be triaged by TE.
,
Sep 27 2016
,
Sep 27 2016
The trick here would be to find the code path that causes the sad tab to appear, but not increment CrashExitCodes.Renderer or any of the stability data. Maybe simulating a crash in v8 might do it.
,
Sep 27 2016
I’ve had intermittent sad tabs in Gmail using the canary over the past couple of days. go/crash/28e79f0300000000 is one of them, bug 649967 is implicated. Maybe the known failure mode will help shed light on why we’re not catching them on Windows? Then again, I see go/crash/1a26405e00000000 which is the same crash reported on Windows, so Will, maybe your crash is different.
,
Sep 27 2016
The crash that I was experiencing in #0 is fixed, so I'll have to probably introduce a crasher of the same type, or patch the bad CL into a working copy. Either way, I'm not as concerned about crashpad not capturing the crash (as I can imagine situations this might occur in, such as if the exception handler has been completely smashed) - but that a sad tab appeared (so Chrome was aware of it) but the stability UMA data was not incremented.
,
Sep 27 2016
I'll dig into this and try and at least repro the failure. I expect knowing this type of issue, I won't be able to repro... :(
,
Sep 27 2016
I’m still somewhat concerned about Crashpad not catching them. Scott filed bug crashpad:133 which has some promise in that arena. If your theory is correct and the exception handler couldn’t run successfully, then what would the browser see as the renderer’s termination status? And would it register that status as a crash for metrics purposes?
,
Sep 27 2016
If the exception handler didn't run then the return code of the child process would be the original exception code i.e. EXCEPTION_ACCESS_VIOLATION or something like that, which would increment the stability counts as this is treated as a renderer crash. The only way this could happen is if the exit code is 0 somehow. I'm not sure how that could happen maybe a TerminateProcess is happening and setting this code?
,
Sep 27 2016
How about a double fault. (You know what I mean.)
,
Sep 27 2016
hmm - I'm pretty sure a double fault would still return an exit code of non-zero, I would *guess* 0xC0000025 - STATUS_NONCONTINUABLE_EXCEPTION...? Perhaps I can try introducing an exception inside the crashpad exception handler and see what happens...
,
Oct 28 2016
I got this happen again today. I had a renderer process crash that was hosting three tabs. All three tabs showed a sad tab. I did not get a crash report in chrome://crashes and I did not However, this time I got an entry in CrashExitCodes.Renderer 1073740940 which is STATUS_HEAP_CORRUPTION. No way to know if this entry was caused by this crash since the browser had been up for a while. But it warrants investigation over whether STATUS_HEAP_CORRUPTION and terminateonheapcorruption might mean crashpad isn't catching some crashes. I think this specifically warrants some investigation, can a test be added to crashpad to verify this?
,
Oct 28 2016
,
Oct 28 2016
fun fun..
,
Oct 28 2016
Yeah, this causes no call to UEF(), and if WER is on, a .dmp in LocalDumps with .ecxr 0xC0000374.
---
#include <windows.h>
#include <stdio.h>
#include <vector>
LONG WINAPI UEF(struct _EXCEPTION_POINTERS *ExceptionInfo) {
printf("HI\n");
return 0;
}
int main() {
SetUnhandledExceptionFilter(&UEF);
std::vector<void*> ptrs;
for (int i = 0; i < 100000; ++i) {
void *mem =
HeapAlloc(GetProcessHeap(), 0, 10000);
ptrs.push_back(mem);
}
for (void* mem : ptrs) {
HeapFree(GetProcessHeap(), 0, (char *)mem + (rand() % 100) - 50);
}
}
---
Looks like the option in 2013 was to link with nohetoc.obj, which sets
int _NoHeapEnableTerminationOnCorruption = 1;
because https://msdn.microsoft.com/en-us/library/windows/desktop/aa366705(v=vs.85).aspx doesn't allow turning it off once it's on. But I'm not sure if it's the same in 2015.
,
Oct 28 2016
Wait, we set it anyway. Why? https://cs.chromium.org/chromium/src/base/process/memory_win.cc?rcl=0&l=55
,
Oct 28 2016
Yeah, so https://cs.chromium.org/chromium/src/content/app/content_main_runner.cc?rcl=0&l=532 Crashpad is not going to be able to catch any heap corruption exceptions. It looks like there was some experimentation here https://bugs.chromium.org/p/chromium/issues/detail?id=394842 related to this. I'm not sure what the outcome of that was.
,
Oct 28 2016
On the positive side of things, this looks like < 0.3% of renderer exit codes - https://uma.googleplex.com/p/chrome/timeline_v2?sid=19d376abf0ec34069bc016f2869f9530 so it shouldn't be happening too often in the wild.
,
Oct 28 2016
on the negative side of things, this has just spiked 30x on canary - https://uma.googleplex.com/p/chrome/timeline_v2?sid=8791f4f8f6c3e707b4dde0479fac23f6
,
Oct 28 2016
pennymac - what exit code does CFG product if it detects a violation?
,
Oct 29 2016
> Huh. > > https://connect.microsoft.com/VisualStudio/feedback/details/664497/cant-catch-0xc0000374-exception-status-heap-corruption > > That might suck. Sounds like another thing that we might be able to catch if we were using WerRegisterRuntimeExceptionModule() (bug crashpad:133).
,
Oct 31 2016
There are other classes of errors that unfortunately get handled the same way. Stack cookie failures divert straight to kernel32!UnhandledExceptionFilter as a case in point. Gotta wonder if it's possible to grab some of those with an ingenious enough intercept on kernel32!UEF or ntdll!RtlRaiseException, or some such. Alternatively we might be able to make like a debugger for renderers on Canary, a debugger always gets first-chance exceptions - I don't think this requires any execution in the crashee. Might need to be sampling or some such. WER is also cool, though I'm doubtful that it'll ever be workable for Canary :(. @wfh: Maybe it's worth setting up some chirps on the known-no-crash exit codes?
,
Oct 31 2016
If Windows catches a CFG violation, it throws the following: ExceptionAddress: 00007ffa003e9ba0 (ntdll!RtlFailFast2) ExceptionCode: c0000409 (Security check failure or stack buffer overrun) ExceptionFlags: 00000001 NumberParameters: 1 Parameter[0]: 000000000000000a Subcode: 0xa FAST_FAIL_GUARD_ICALL_CHECK_FAILURE I'll also attach a sample call stack from windbg - the top 3 functions of the call stack being the cfg check signature. This is a perfect example of a hook function being called from code that is compiled with CFG - and the hook calls an external function that has not been added to the valid function list.
,
Oct 31 2016
penny, can you induce a CFG violation in a Chrome subprocess, and see what you get in chrome://histograms for CrashExitCodes.Renderer, and whether crashpad collects the crash?
,
Oct 31 2016
Give me some time. Nothing set up right now to quickly test/trigger this in a child process. I'll start setting up a test, but I travel tomorrow. Will report back here!
,
Feb 28 2017
There is currently a spike for STATUS_HEAP_CORRUPTION happening on beta see https://goto.google.com/vxojrchlgk |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by wfh@chromium.org
, Sep 25 2016