New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 650108 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

sad tab with no crash reported

Project Member Reported by wfh@chromium.org, Sep 25 2016

Issue description

Version: I think this started in 55.0.2870.0 - 64-bit. Canary.
OS: Windows 10

What steps will reproduce the problem?
(1) Both times the tab was hosting gmail
(2) Tab crashed with a sad tab
(3) No crash in chrome://crashes

What is the expected output?

A crash in chrome://crashes. An increment of stability renderer_crash_count. A histogram in crashexitcodes.renderer

What do you see instead?

Sad tab page, an increment in Tabs.SadTab.CrashCreated histogram, but no crash reported and the histograms above not incremented

Please use labels and text to provide additional information.

I have no reproduction steps. It happened twice, once on 2016-09-24 with Chrome 55.0.2870.0 and once on 2016-09-25 with 55.0.2871.0
 

Comment 1 by wfh@chromium.org, Sep 25 2016

I think the crash is issue 649967. but I'm curious why I get no crash report.
Able to reproduce this crash on 55.0.2871.0 on Windows during navigation in gmail.
Labels: M-55 TE-NeedsTriageHelp
Have seen this on Win 10 canary 55.0.2872.0, and was able to work again on refreshing the Tab.Do not have a repro steps to further triage it.

Added TE-NeedsTriageHelp as it can't be triaged by TE.
Cc: mark@chromium.org scottmg@chromium.org

Comment 5 by wfh@chromium.org, Sep 27 2016

The trick here would be to find the code path that causes the sad tab to appear, but not increment CrashExitCodes.Renderer or any of the stability data. Maybe simulating a crash in v8 might do it.

Comment 6 by mark@chromium.org, Sep 27 2016

I’ve had intermittent sad tabs in Gmail using the canary over the past couple of days. go/crash/28e79f0300000000 is one of them, bug 649967 is implicated. Maybe the known failure mode will help shed light on why we’re not catching them on Windows?

Then again, I see go/crash/1a26405e00000000 which is the same crash reported on Windows, so Will, maybe your crash is different.

Comment 7 by wfh@chromium.org, Sep 27 2016

The crash that I was experiencing in #0 is fixed, so I'll have to probably introduce a crasher of the same type, or patch the bad CL into a working copy. Either way, I'm not as concerned about crashpad not capturing the crash (as I can imagine situations this might occur in, such as if the exception handler has been completely smashed) - but that a sad tab appeared (so Chrome was aware of it) but the stability UMA data was not incremented.

Comment 8 by wfh@chromium.org, Sep 27 2016

Owner: wfh@chromium.org
Status: Started (was: Untriaged)
I'll dig into this and try and at least repro the failure. I expect knowing this type of issue, I won't be able to repro... :(

Comment 9 by mark@chromium.org, Sep 27 2016

I’m still somewhat concerned about Crashpad not catching them. Scott filed bug crashpad:133 which has some promise in that arena.

If your theory is correct and the exception handler couldn’t run successfully, then what would the browser see as the renderer’s termination status? And would it register that status as a crash for metrics purposes?

Comment 10 by wfh@chromium.org, Sep 27 2016

If the exception handler didn't run then the return code of the child process would be the original exception code i.e. EXCEPTION_ACCESS_VIOLATION or something like that, which would increment the stability counts as this is treated as a renderer crash.

The only way this could happen is if the exit code is 0 somehow. I'm not sure how that could happen maybe a TerminateProcess is happening and setting this code?

Comment 11 by mark@chromium.org, Sep 27 2016

How about a double fault. (You know what I mean.)

Comment 12 by wfh@chromium.org, Sep 27 2016

hmm - I'm pretty sure a double fault would still return an exit code of non-zero, I would *guess* 0xC0000025 - STATUS_NONCONTINUABLE_EXCEPTION...? Perhaps I can try introducing an exception inside the crashpad exception handler and see what happens...

Comment 13 by wfh@chromium.org, Oct 28 2016

I got this happen again today. I had a renderer process crash that was hosting three tabs. All three tabs showed a sad tab. I did not get a crash report in chrome://crashes and I did not 

However, this time I got an entry in CrashExitCodes.Renderer 1073740940 which is STATUS_HEAP_CORRUPTION.

No way to know if this entry was caused by this crash since the browser had been up for a while. But it warrants investigation over whether STATUS_HEAP_CORRUPTION and terminateonheapcorruption might mean crashpad isn't catching some crashes.

I think this specifically warrants some investigation, can a test be added to crashpad to verify this?

Comment 15 by wfh@chromium.org, Oct 28 2016

Cc: siggi@chromium.org
fun fun..
Yeah, this causes no call to UEF(), and if WER is on, a .dmp in LocalDumps with .ecxr 0xC0000374.

---

#include <windows.h>
#include <stdio.h>
#include <vector>

LONG WINAPI UEF(struct _EXCEPTION_POINTERS *ExceptionInfo) {
  printf("HI\n");
  return 0;
}

int main() {
  SetUnhandledExceptionFilter(&UEF);
  std::vector<void*> ptrs;
  for (int i = 0; i < 100000; ++i) {
    void *mem =
        HeapAlloc(GetProcessHeap(), 0, 10000);
    ptrs.push_back(mem);
  }
  for (void* mem : ptrs) {
    HeapFree(GetProcessHeap(), 0, (char *)mem + (rand() % 100) - 50);
  }
}

---

Looks like the option in 2013 was to link with nohetoc.obj, which sets

int _NoHeapEnableTerminationOnCorruption = 1;

because https://msdn.microsoft.com/en-us/library/windows/desktop/aa366705(v=vs.85).aspx doesn't allow turning it off once it's on. But I'm not sure if it's the same in 2015.
Yeah, so https://cs.chromium.org/chromium/src/content/app/content_main_runner.cc?rcl=0&l=532 Crashpad is not going to be able to catch any heap corruption exceptions.

It looks like there was some experimentation here https://bugs.chromium.org/p/chromium/issues/detail?id=394842 related to this. I'm not sure what the outcome of that was.

Comment 19 by wfh@chromium.org, Oct 28 2016

On the positive side of things, this looks like < 0.3% of renderer exit codes - https://uma.googleplex.com/p/chrome/timeline_v2?sid=19d376abf0ec34069bc016f2869f9530 so it shouldn't be happening too often in the wild.

Comment 20 by wfh@chromium.org, Oct 28 2016

on the negative side of things, this has just spiked 30x on canary - https://uma.googleplex.com/p/chrome/timeline_v2?sid=8791f4f8f6c3e707b4dde0479fac23f6

Comment 21 by wfh@chromium.org, Oct 28 2016

Cc: penny...@chromium.org
pennymac - what exit code does CFG product if it detects a violation?

Comment 22 by mark@chromium.org, Oct 29 2016

> Huh.
>
> https://connect.microsoft.com/VisualStudio/feedback/details/664497/cant-catch-0xc0000374-exception-status-heap-corruption
>
> That might suck.

Sounds like another thing that we might be able to catch if we were using WerRegisterRuntimeExceptionModule() (bug crashpad:133).

Comment 23 by siggi@chromium.org, Oct 31 2016

There are other classes of errors that unfortunately get handled the same way. Stack cookie failures divert straight to kernel32!UnhandledExceptionFilter as a case in point. Gotta wonder if it's possible to grab some of those with an ingenious enough intercept on kernel32!UEF or ntdll!RtlRaiseException, or some such.
Alternatively we might be able to make like a debugger for renderers on Canary, a debugger always gets first-chance exceptions - I don't think this requires any execution in the crashee.
Might need to be sampling or some such.

WER is also cool, though I'm doubtful that it'll ever be workable for Canary :(.

@wfh: Maybe it's worth setting up some chirps on the known-no-crash exit codes?
If Windows catches a CFG violation, it throws the following:

ExceptionAddress: 00007ffa003e9ba0 (ntdll!RtlFailFast2)
   ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 000000000000000a
Subcode: 0xa FAST_FAIL_GUARD_ICALL_CHECK_FAILURE


I'll also attach a sample call stack from windbg - the top 3 functions of the call stack being the cfg check signature.

This is a perfect example of a hook function being called from code that is compiled with CFG - and the hook calls an external function that has not been added to the valid function list.
Capture.PNG
101 KB View Download

Comment 25 by wfh@chromium.org, Oct 31 2016

penny, can you induce a CFG violation in a Chrome subprocess, and see what you get in chrome://histograms for CrashExitCodes.Renderer, and whether crashpad collects the crash?
Give me some time.  Nothing set up right now to quickly test/trigger this in a child process.  I'll start setting up a test, but I travel tomorrow.  Will report back here!

Comment 27 by wfh@chromium.org, Feb 28 2017

There is currently a spike for STATUS_HEAP_CORRUPTION happening on beta see https://goto.google.com/vxojrchlgk

Sign in to add a comment