New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 12 users
Status: Fixed
Owner:
Email to this user bounced
Closed: Aug 2011
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Restricted
  • Only users with EditIssue permission may comment.



Sign in to add a comment
Chrome: Crash Report - Stack Signature: -19BBA34
Project Member Reported by lafo...@chromium.org, May 3 2011 Back to list
Product: Chrome
Stack Signature: -19BBA34
New Signature Label: RunnableFunction<GpuProcessHostUIShim * (*)(int),Tuple1<int> >::Run()
New Signature Hash: b1ffce06_b49643f6_9d2a5c68_06020831_dedb35da

Report link: http://go/crash/reportdetail?reportid=4b8261fef862d40e

Meta information:
Product Name: Chrome
Product Version: 13.0.754.0
Report ID: 4b8261fef862d40e
Report Time: 2011/05/03 19:22:53, Tue
Uptime: 11 sec
Cumulative Uptime: 0 sec
OS Name: Windows NT
OS Version: 5.1.2600 Service Pack 3
CPU Architecture: x86
CPU Info: GenuineIntel family 15 model 4 stepping 9

Thread 9 *CRASHED* ( EXCEPTION_ACCESS_VIOLATION_READ @ 0x00000000 )

0x0254c3c2	 [chrome.dll	 - task.h:459	RunnableFunction<GpuProcessHostUIShim * (*)(int),Tuple1<int> >::Run()
0x021ba1ad	 [chrome.dll	 - message_loop.cc:100	`anonymous namespace'::TaskClosureAdapter::Run()
0x021baa20	 [chrome.dll	 - message_loop.cc:458	MessageLoop::RunTask(MessageLoop::PendingTask const &)
0x021baaa5	 [chrome.dll	 - message_loop.cc:476	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const &)
0x021bae46	 [chrome.dll	 - message_loop.cc:666	MessageLoop::DoWork()
0x7c802607	 [kernel32.dll	 + 0x00002607]	WaitForSingleObjectEx
0x021bb88e	 [chrome.dll	 - bind_internal.h:1065	base::internal::InvokerStorage1<void ( `anonymous namespace'::TaskClosureAdapter::*)(void),A0x4166a960::TaskClosureAdapter *>::~InvokerStorage1<void ( `anonymous namespace'::TaskClosureAdapter::*)(void),A0x4166a960::TaskClosureAdapter *>()
0x021d1a00	 [chrome.dll	 - utf_string_conversion_utils.cc:101	base::WriteUnicodeCharacter(unsigned int,std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > *)
0x021d1b23	 [chrome.dll	 - message_pump_default.cc:50	base::MessagePumpDefault::Run(base::MessagePump::Delegate *)
0x01c7492c	 [chrome.dll	 - generic_allocators.cc:16	generic_cpp_alloc
0x021ba8e4	 [chrome.dll	 - message_loop.cc:406	MessageLoop::RunHandler()
0x021c652f	 [chrome.dll	 - thread.cc:128	base::Thread::Run(MessageLoop *)
0x021c6642	 [chrome.dll	 - thread.cc:164	base::Thread::ThreadMain()


 
Cc: vangelis...@gtempaccount.com amarinic...@gtempaccount.com jam@chromium.org
Owner: apatrick...@gmail.com
This crash started showing up in the latest canary (13.0.754.0) which points to a chromium CL range: 83707 -> 83837 . One possible suspect is:

http://codereview.chromium.org/6901146

and the second one (although much less likely):
http://codereview.chromium.org/6901144
This crash actually appeared in 13.0.751.0. It has a different signature in each build, probably because the optimizer is merging template code that is the same.

751 -> RunnableFunction<void (*)(net::URLRequestContextGetter *),Tuple1<scoped_refptr<net::URLRequ ... 
752 -> RunnableFunction<void (*)(void *),Tuple1<Profile *> >::Run() 
753 -> RunnableFunction<void (*)(IOThread *),Tuple1<IOThread *> >::Run()
754 -> RunnableFunction<GpuProcessHostUIShim * (*)(int),Tuple1<int> >::Run() 
755 -> RunnableFunction<void (*)(`anonymous namespace'::PluginsDOMHandler::ListWrapper *),Tuple1< ... 
756 -> RunnableFunction<void (*)(MessageLoop *),Tuple1<MessageLoop *> >::Run() 

In all cases it involves the invocation of a single argument callback function that leads to a null dereference. It always happens in the browser process.

The regression range was probably 83287:83481.

I have no leads.
Cc: piman@chromium.org eroman%c...@gtempaccount.com
Looking at the intersection of plugins and GPU, could it be http://src.chromium.org/viewvc/chrome?view=rev&revision=83442?

-----
piman@google.com	
Rework FlushSync to return early if commands have been processed since the last update

BUG= 80480 
I'm guessing the current top browser crasher on the Canary, http://crash/reportdetail?reportid=01feccecee80e026#crashing_thread, is a relative of this issue.
None of the changes in 83442 appear to affect code that runs in the browser process so I think it is unlikely to be that.

I have inspected every change to code that runs on windows in the 83287:83481 probable regression and range and nothing stands out. I particularly looked for code that runs only on windows and code that invokes callbacks, especially through a variable that might be null.

laforge, would it be possible to push a canary build with the compiler / linker setting that merges identical ganerated code disabled so that we can get an accurate signature for the related callback? I expect the binary would be larger than usual.
I landed this at r84869:

http://codereview.chromium.org/6982004

It will make PostTask assert in Release builds if null is passed to any of the variants of PostTask. It is not firing in 13.0.762.0, which is built from r84939. Since r84939 contains the new assert, this bug is not caused by null being passed to PostTask.

In 13.0.762.0, the signature of the crashing function has changed to this:

RunnableFunction<void (*)(int),Tuple1<int> >::Run()

I still have no leads.
Project Member Comment 7 by bugdroid1@chromium.org, May 14 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=85359

------------------------------------------------------------------------
r85359 | apatrick@chromium.org | Fri May 13 18:06:54 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/task.h?r1=85359&r2=85358&pathrev=85359

Added release build assert on attempt to create a RunnableFunction for a function pointer with address 1.

This is actually happening. See  http://crbug.com/81449 . The generated code to invoke the callback puts the address of the function in the EAX register before doing CALL EAX. I see 0x00000001 in the EAX register when it crashes in the reported minidumps.

I'll revert this after the next Canary.

TEST=run locally and verify no assertion
BUG= 81449 
Review URL: http://codereview.chromium.org/7013014
------------------------------------------------------------------------
Project Member Comment 8 by bugdroid1@chromium.org, May 16 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=85547

------------------------------------------------------------------------
r85547 | apatrick@chromium.org | Mon May 16 15:21:28 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/task.h?r1=85547&r2=85546&pathrev=85547

Revert 85359 because it did not reveal the site where the crashing task was posted.

Original message:

Added release build assert on attempt to create a RunnableFunction for a function pointer with address 1.

This is actually happening. See  http://crbug.com/81449 . The generated code to invoke the callback puts the address of the function in the EAX register before doing CALL EAX. I see 0x00000001 in the EAX register when it crashes in the reported minidumps.

I'll revert this after the next Canary.

TEST=run locally and verify no assertion
BUG= 81449 
Review URL: http://codereview.chromium.org/7013014

TEST=compiles
BUG= 81449 
------------------------------------------------------------------------
I checked in a change with r85991 that will provide more information about the site the crashing task was posted from in minidumps. This will not fix the bug and the crashes will look the same but hopefully once the next canary goes out I will have more information about this bug.
Project Member Comment 10 by bugdroid1@chromium.org, May 20 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=86172

------------------------------------------------------------------------
r86172 | apatrick@chromium.org | Fri May 20 16:29:23 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/debug/alias.cc?r1=86172&r2=86171&pathrev=86172

Try another way to alias a variable in optimized builds.

The previous way did not fool LTCG optimization.

I tested that this works by doing and LTCG build without this change and verified that the compiler strips out the assignment to program_counter in MessageLoop::RunTask. Then I repeated with this change and verified that the compiler did not strip it out.

TEST=compiles plus the above
BUG= 81449 
Review URL: http://codereview.chromium.org/7054025
------------------------------------------------------------------------
Cc: ajwong@chromium.org
I tracked down where the task that crashes is posted. It is here in child_process_launcher.cc:

  void Terminate() {
    if (!process_.handle())
      return;

    // On Posix, EnsureProcessTerminated can lead to 2 seconds of sleep!  So
    // don't this on the UI/IO threads.
    BrowserThread::PostTask(
        BrowserThread::PROCESS_LAUNCHER, FROM_HERE,
        NewRunnableFunction(
            &ChildProcessLauncher::Context::TerminateInternal,
#if defined(OS_LINUX)
            zygote_,
#endif
            process_.handle()));
    process_.set_handle(base::kNullProcessHandle);
  }

So the good news is this probably isn't causing any harm because it happens at process termination. The bad news is it is not clear why the task calls a "random" address when it crashes since the function is a constant. This code has been around in this form for a long time and there was no problem before.

It is possible the task was destroyed before it was run, perhaps the recent task / closure refactoring is involved.
Project Member Comment 12 by bugdroid1@chromium.org, May 24 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=86447

------------------------------------------------------------------------
r86447 | apatrick@chromium.org | Tue May 24 10:57:46 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/debug/alias.h?r1=86447&r2=86446&pathrev=86447
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/task.h?r1=86447&r2=86446&pathrev=86447

Store information about invoked RunnableFunction on stack to aid debugging of canary channel crashes.

TEST=compiles
BUG= 81449 
Review URL: http://codereview.chromium.org/7066006
------------------------------------------------------------------------
Project Member Comment 13 by bugdroid1@chromium.org, May 24 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=86448

------------------------------------------------------------------------
r86448 | apatrick@chromium.org | Tue May 24 10:57:55 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/tracked.cc?r1=86448&r2=86447&pathrev=86448

Prevent MSVC from inlining GetProgramCounter for LTCG builds.

This should ensure that it gets the frame where FROM_HERE is used, rather than its caller.

TEST=compiles
BUG= 81449 
Review URL: http://codereview.chromium.org/7067004
------------------------------------------------------------------------
Project Member Comment 14 by bugdroid1@chromium.org, May 27 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=87052

------------------------------------------------------------------------
r87052 | apatrick@chromium.org | Fri May 27 11:30:07 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/child_process_launcher.cc?r1=87052&r2=87051&pathrev=87052

Turn off optimization for ChildProcessLauncher::Context::TerminateInternal.

This is to try and get more information about a crash.

BUG= 81449 
Review URL: http://codereview.chromium.org/6976042
------------------------------------------------------------------------
Cc: finnur@chromium.org
I'm fairly certain the crash is in ChildProcessLauncher::Context::TerminateInternal now. This function does not do all that much. I believe the crash results from TerminateInternal making a system call, I think most likely TerminateProcess, which has been hooked by a third party DLL called "utcclb.dll":

0x00af7957	 [utcclb.dll	 + 0x00017957]	
0x00b8b564	 [utcclb.dll	 + 0x000ab564]	
0x00b8b889	 [utcclb.dll	 + 0x000ab889]	
0x5f040009			
0x026a2aab	 [chrome.dll	 - child_process_launcher.cc:256]	ChildProcessLauncher::Context::TerminateInternal(void *)
0x02229d4e	 [chrome.dll	 - task.h:458]	RunnableFunction<void (*)(int),Tuple1<int> >::Run()
0x0213bb66	 [chrome.dll	 - message_loop.cc:367]	MessageLoop::RunTask(Task *)
0x0213bbed	 [chrome.dll	 - message_loop.cc:376]	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const &)
0x0213bf9a	 [chrome.dll	 - message_loop.cc:569]	MessageLoop::DoWork()
0x02151bc4	 [chrome.dll	 - message_pump_default.cc:50]	base::MessagePumpDefault::Run(base::MessagePump::Delegate *)
0x0213bae7	 [chrome.dll	 - message_loop.cc:342]	MessageLoop::RunInternal()
0x0213ba6c	 [chrome.dll	 - message_loop.cc:315]	MessageLoop::RunHandler()
0x0213b960	 [chrome.dll	 - message_loop.cc:239]	MessageLoop::Run()
0x021505e8	 [chrome.dll	 - thread.cc:128]	base::Thread::Run(MessageLoop *)
0x021506fb	 [chrome.dll	 - thread.cc:164]	base::Thread::ThreadMain()
0x02142aed	 [chrome.dll	 - platform_thread_win.cc:37]	base::`anonymous namespace'::ThreadFunc(void *)
0x7c80b50a	 [kernel32.dll	 + 0x0000b50a]	BaseThreadStart

In this example the DLL is still loaded. I think in other cases the DLL might be unloaded before the crash, which would prevent it from showing up on the call stack or the modules list. The third party DLL is part of "Internet Explorer Security Pro".

http://www.mybestsoft.com/products.html

There are indications that the DLL might be associated with a key logger:

http://www.emsisoft.de/en/malware/Adware.Win32.Parental_Control_Tool_7.2-remove.aspx
http://pchomesoft.com/


 Issue 87619  has been merged into this issue.
Comment 17 by darin@chromium.org, Jul 26 2011
Cc: apatrick@chromium.org
 Issue 85467  has been merged into this issue.
Any updates? This crash is happening in 14.0.835.8 and its one of the top crashes.
http://crash/reportdetail?reportid=050fbe14293e24dd
I have been unable to determine the cause of the crash. What progress I have made is noted above.
The signature changed again:


0x62baf0ca	 [chrome.dll	 - task.h:474]	RunnableFunction<void (*)(MessageLoop *),Tuple1<MessageLoop *> >::Run()
0x6303b308	 [chrome.dll	 - child_process_launcher.cc:257]	ChildProcessLauncher::Context::SetProcessBackgrounded(bool)
0x62b6c1a7	 [chrome.dll	 - task.cc:57]	base::subtle::TaskClosureAdapter::Run()
0x62b630d1	 [chrome.dll	 - message_loop.cc:486]	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const &)
0x6303b282	 [chrome.dll	 - child_process_launcher.cc:250]	ChildProcessLauncher::Context::Terminate()
0x62b6344c	 [chrome.dll	 - message_loop.cc:677]	MessageLoop::DoWork()
0x629e6635	 [chrome.dll	 - bind.h:57]	base::Bind<void ( remoting::ChromotingInstance::*)(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),base::internal::UnretainedWrapper<remoting::ChromotingInstance>,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >(void ( remoting::ChromotingInstance::*)(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),base::internal::UnretainedWrapper<remoting::ChromotingInstance> const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &)
0x6303b282	 [chrome.dll	 - child_process_launcher.cc:250]	ChildProcessLauncher::Context::Terminate()
0x62b7b3bd	 [chrome.dll	 - message_pump_default.cc:42]	base::MessagePumpDefault::Run(base::MessagePump::Delegate *)
0x62b7b411	 [chrome.dll	 - message_pump_default.cc:50]	base::MessagePumpDefault::Run(base::MessagePump::Delegate *)
0x6299bab1	 [chrome.dll	 - allocator_shim.cc:124]	malloc
0x62b62f3c	 [chrome.dll	 - message_loop.cc:410]	MessageLoop::RunHandler()
0x62b73485	 [chrome.dll	 - thread.cc:128]	base::Thread::Run(MessageLoop *)
0x62b73598	 [chrome.dll	 - thread.cc:164]	base::Thread::ThreadMain()

This part is a red herring:

0x629e6635	 [chrome.dll	 - bind.h:57]	base::Bind<void ( remoting::ChromotingInstance::*)(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),base::internal::UnretainedWrapper<remoting::ChromotingInstance>,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >(void ( remoting::ChromotingInstance::*)(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),base::internal::UnretainedWrapper<remoting::ChromotingInstance> const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &)

It only runs in the plugin process whereas the crash only happens in the browser process. The call stack is not to be believed :(
Cc: davemoore@chromium.org
 Issue 92212  has been merged into this issue.
Digging into this some more, just before crashing, it seems to jump to some code residing on the stack. This might just mean the chain of frame pointers is corrupt though. Call stack:

 	00000000()	
 	034ef79c()	
>	chrome.dll!RunnableFunction<void (__cdecl*)(v8::Persistent<v8::Context>),Tuple1<v8::Persistent<v8::Context> > >::Run()  Line 474 + 0x5 bytes	C++
 	chrome.dll!base::subtle::TaskClosureAdapter::Run()  Line 58	C++
 	chrome.dll!MessageLoop::RunTask(const MessageLoop::PendingTask & pending_task={...})  Line 472	C++
 	chrome.dll!MessageLoop::DeferOrRunPendingTask(const MessageLoop::PendingTask & pending_task={...})  Line 489	C++
 	chrome.dll!MessageLoop::DoWork()  Line 677 + 0xb bytes	C++


The second code address from the top of the call stack is on the stack, as can be seen from the values of the ESP and EBP registers at the time of the crash.

ESP = 034EF78C
EBP = 034EF79C

A little bit of statistical data for the "SetProcessBackgrounded" crashes in 13.0.782.112. Probably the most interesting anomaly is the correlation to "import".

Comment #15 is interesting, I will look at some sample minidumps next and see if I can confirm that theory.

(a) This is the single biggest browser crash signature, accounting for 9.27% of crashes (7.25% if you count by user).

(b) The crash happens *very* quickly. In other words, this isn’t some passive cruft/memory corruption that happens over time. That is good news for debugging ;)
  83.69% of the crashes happen in under 30 seconds
  52.69% of the crashes happen in under 8 seconds
  20.80% of the crashes happen in under 4 seconds
  10.83% of the crashes happen in under 2 seconds
  4.76% of the crashes happen in under 1 second
  1.67% of the crashes happen in under 500 milliseconds

(c) There appears to be a bias towards this happening during import. I can see this in the distribution of command line flags. Specifically, 4.79% of these crashes have the flag --import=*, which indicates it happened during an import process (rather than a regular browser session). That may not sound significant, but note that accross all the other browser crashes, --import=* is only seen in 0.47% of the crashes. Stated differently, 49% of all the crashes during import are in "SetProcessBackgrounded". This doesn't prove that it is caused by import, but it does suggest that it is more easily hit during the import codepath.

(d) 8% of these crashes occur during shutdown.

(e) This mainly is happening on Windows XP. Note however, that this isn't very far off from the ordinary distribution of crashes by platform, so I wouldn't read too much into it.
   62.27% on WinXP
   30.05% on Win7
   7.68% on WinVista

(f) It is not extension related (I don’t see any meaningful clustering of chrome extensions; in fact the majority have no extensions).

(g) For some users, this is a highly reproducible, chronic crash.

For instance, looking at top crasher 5A51363FC5294D3ABC1668646E145671, they hit this crash at the following times today.

2011/08/10 19:31:58, Wed	
2011/08/10 19:31:41, Wed	
2011/08/10 19:22:32, Wed	
2011/08/10 19:22:10, Wed	
2011/08/10 18:39:49, Wed	
2011/08/10 18:39:25, Wed	
2011/08/10 18:38:53, Wed	
2011/08/10 18:31:25, Wed	
2011/08/10 18:31:23, Wed	
2011/08/10 18:23:32, Wed	
2011/08/10 18:23:26, Wed	
2011/08/10 18:02:52, Wed	
2011/08/10 17:54:26, Wed	
2011/08/10 17:53:52, Wed	
2011/08/10 17:53:22, Wed	
2011/08/10 17:39:38, Wed	
2011/08/10 17:38:56, Wed	
2011/08/10 17:38:50, Wed	
2011/08/10 17:30:58, Wed	
2011/08/10 17:30:42, Wed	
2011/08/10 17:23:50, Wed	
2011/08/10 17:23:22, Wed	
2011/08/10 17:02:28, Wed	
2011/08/10 16:54:30, Wed	
2011/08/10 16:53:50, Wed	
2011/08/10 16:53:38, Wed	
2011/08/10 16:38:53, Wed	
2011/08/10 16:38:14, Wed	
2011/08/10 16:38:00, Wed	
2011/08/10 15:53:44, Wed	
2011/08/10 15:53:42, Wed	
2011/08/10 15:53:29, Wed	
2011/08/10 14:41:48, Wed	

The timings above are definitely fishy (suggesting this is the most patient user ever, restarting the browser seconds after crashing). In fact many of those timings are impossible, since the process uptime suggests more time elapsed than what the starttime of next crash implies. It is important to realize that these timestamps are actually the timestamp when the crashserver *processed* the report, and not necessarily when the crash was generated by the client.. I opened a couple of the minidumps and explored the PID. On Windows process IDs is monotonically increasing per session so this gives a better sense of the relative timings. In fact the order above does not match the PIDs, so clearly this listing is out of order.

Whatever the case, it is clear that some users are hitting this crash very frequently, hence it is highly reproducible for them. We could probably get more help by prompting users that hit the crash (sorta like I did for the cookiemonster memory corruption in the past).
I think I determined what causes the crash.

TerminateInternal calls Process::Terminate. In turn Process::Terminate calls the Win32 function TerminateProcess, which is stdcall. When TerminateProcess returns, the address in the ESP register is 4-bytes to low, and points to the frame pointer for the callers frame. The RET instruction in Process::Terminate then reads the frame pointer off the stack instead of the address it should jump to to return. It then attempts to jump to the caller's frame, which explains why there is a stack address on the call stack when it crashes.

A possibility is that the entry for TerminateProcess in the dispatch table has been hooked to point to a function in a third party DLL and the implementation of the replacement is buggy. It might, for example, not be stdcall or it might not take two 32-bit arguments.

To determine if this is the case I will check in some code to read the contents of that entry in the dispatch table and record it on the stack prior to the crash. If it does not point to an address in kernel32.dll, that would indicate it has been hooked.
@apatrick: thanks! I definitely agree with your analysis that the problem is happening somewhere in TerminateInternal, probably due to hooking of terminate process. Capturing that information would be great.

To add to the great analysis you have already done, I noticed that we usually have the code for one of the mysterious frames on the callstack. It isn't mapped to any of the loaded DLLs, nor any in the list of unloaded DLLs. However it is valid code, albeit hockey looking. It generally looks the same, something like this:

039bfd6c a0fd9b038a      mov     al,byte ptr ds:[8A039BFDh]
039bfd71 4f              dec     edi
039bfd72 c401            les     eax,fword ptr [ecx]
039bfd74 90              nop
039bfd75 fd              std
039bfd76 9b              wait
039bfd77 030e            add     ecx,dword ptr [esi]
039bfd79 5b              pop     ebx
039bfd7a c401            les     eax,fword ptr [ecx]
039bfd7c f0236c0568      lock and ebp,dword ptr [ebp+eax+68h]
039bfd81 fe              ???
039bfd82 9b              wait
039bfd83 0370fe          add     esi,dword ptr [eax-2]

Lastly I sampled a number of minidumps, and found that they almost always contain a keyboard DLL in the recently unloaded modules list. For instance, these are the top names I saw:

KBDLA.DLL
KBDFR.DLL
KBDSP.DLL
KBDUS.DLL

I don't know enough about windows to know if this is abnormal, but it sounds a bit fishy and I can't explain it.

Cheers.
Comment 26 by k...@google.com, Aug 11 2011
Labels: ReleaseBlock-Stable Mstone-14
This is now our top browser crash on 14.
Comment 27 Deleted
BTW, apatrick's latest instrumentation code is now live on the windows canary. So far there has been just 1 crash report: http://crash/reportdetail?reportid=6e591eaf344c0102

According to that report, the address for TerminateProcess (i.e [chrome.dll!_imp__TerminateProcess]) is in fact pointing to kernel32!TerminateProcess... so no smoking gun yet in terms of hooking.

A couple ideas:

  - It could be that the code for kernel32!TerminateProcess is being patched directly. We could try copying a couple bytes of it into our minidump in case we can spot some re-writing.

  - TerminateProcess is really just a wrapper around NtTerminateProcess. We can similarly instrument NtTerminateProcess in case that is the one getting hooked.

  - We are getting the address for TerminateProcess as part of the thread start, perhaps the hooking is simply happening later on. However I looked at all the threads in the dump, and they all had the same value, so this is less likely.
This is the series of jumps and calls between Process::Terminate and entry into the kernel.

chrome.dll!base::Process::Terminate:
...
call        dword ptr [__imp__TerminateProcess@8 (6046E18Ch)]
...

kernel32.dll!_TerminateProcessStub@8:
75419DE1  mov         edi,edi 
75419DE3  push        ebp  
75419DE4  mov         ebp,esp 
75419DE6  pop         ebp  
75419DE7  jmp         _TerminateProcess@8 (754010A2h) 

kernel32.dll!_TerminateProcess@8:
754010A2  jmp         dword ptr [__imp__TerminateProcess@8 (75400874h)] 

KernelBase.dll!_TerminateProcess@8:
7638E804  mov         edi,edi 
7638E806  push        ebp  
7638E807  mov         ebp,esp 
7638E809  cmp         dword ptr [ebp+8],0 
7638E80D  jne         _TerminateProcess@8+15h (7638E819h) 
7638E80F  push        6    
7638E811  call        dword ptr [__imp__RtlSetLastWin32Error@4 (76381044h)] 
7638E817  jmp         _TerminateProcess@8+3Bh (7638E83Fh) 
7638E819  push        dword ptr [ebp+0Ch] 
7638E81C  push        dword ptr [ebp+8] 
7638E81F  call        _RtlReportSilentProcessExit@8 (763B68A4h) 
7638E824  push        dword ptr [ebp+0Ch] 
7638E827  push        dword ptr [ebp+8] 
7638E82A  call        dword ptr [__imp__NtTerminateProcess@8 (763811DCh)] 
7638E830  test        eax,eax 
7638E832  jl          _TerminateProcess@8+35h (7638E839h) 
7638E834  xor         eax,eax 
7638E836  inc         eax  
7638E837  jmp         _TerminateProcess@8+3Dh (7638E841h) 
7638E839  push        eax  
7638E83A  call        _BaseSetLastNTError@4 (763B6CE2h) 
7638E83F  xor         eax,eax 
7638E841  pop         ebp  
7638E842  ret         8    

ntdll.dll!_ZwTerminateProcess@8:
7700FC40  mov         eax,29h 
7700FC45  xor         ecx,ecx 
7700FC47  lea         edx,[esp+4] 
7700FC4B  call        dword ptr fs:[0C0h] 
7700FC52  add         esp,4 
7700FC55  ret         8    

I think TerminateProcess could be hooked in a number of places.

1) The chrome.dll!__imp__TerminateProcess@8 import entry could be hooked. This is relatively easy to check.

2) The code of kernel32.dll!_TerminateProcessStub@8 could be modified. This is also relatively easy to check by copying the six bytes referenced by __imp__TerminateProcess on to the stack or by looking up the entry with GetProcAddress.

3) The code of kernel32.dll!_TerminateProcess@8 could be modified. This could be checked by looking up the entry with GetProcAddress.

3) The kernel32.dll!__imp__TerminateProcess@8 import entry could be hooked. Checking this would involved locating the kernel32.dll import table, which is more complicated.

4) The code of KernelBase.dll!_TerminateProcess@8 could be modified. Again, the code could be copied to the stack with GetProcAddress.

5) The KernelBase.dll!__imp__NtTerminateProcess@8 import entry could be hooked. Same problem as with 3). I think this is less likely though as the function that appears to be hooked in does not clean up the stack correctly and if that is the case, KernelBase.dll!_TerminateProcess@8 would fail to return.

6) The code of ntdll.dll!_ZwTerminateProcess@8 could be modified. I think this is unlikely for the same reason as 5).

If I were trying to intercept calls to TerminateProcess, I would tend to do 3) or 5) because it makes no assumptions about code layout it different versions of DLLs and because it saves hooking the import table of every loaded DLL; only the import tables of kernel32.dll or KernelBasel.dll respectively would need to be hooked. I think it is not 5) because of the symtoms of the crash. 3) seems the more likely candidate to me.

As eroman noted, checking these things in base::ThreadFunc might be too early; the hooking might take place later. The latest time we could collect this information and have it visible on the stack at the time of the crash would be TaskClosureAdapter::Run I think. However, the above is a lot of work to do in a function that is called relatively frequently.

I have tried to do some of these things and I have been unable to prevent the optimizer from stripping out the diagnostic information.
I suspect we may never get much more useful information from these crashes (as to point the user to some solution, other than to scan the computer for malware).

However, it looks like the dump from comment 28 may be actually happening right after returning from CloseHandle, and not TerminateProcess. Of course it is still possible that the a hook on TerminateProcess could corrupt the stack enough but the pattern of this crash is easier to explain if the corruption happens inside CloseHandle, because it looks like the first two stack positions are fine, but there is a null at the third one, and we write that zero at the end of Process::Close()

03d7fd4c  03d7fd64
03d7fd50  01ea6f3a chrome_1c30000!RunnableFunction<void (__cdecl*)
03d7fd54  00000000

So I'd say that we crash attempting to return from Process::Close.

In any case, if we want to gather more data, the place to do that would be TerminateInternal... we could check the IAT for CloseHandle and TerminateProcess, and maybe grab a few bytes from the preamble of the target code... but most likely it will just point to a random address not part of any loaded DLL :(.
I'll see what effect this has if any. If it has no effect then I think it rules out hooking of either TerminateProcess or CloseHandles.

http://codereview.chromium.org/7640008/

I landed the patch in #31.
Comment 33 by kbr@chromium.org, Aug 15 2011
Cc: kbr@chromium.org
Comment 34 by k...@google.com, Aug 16 2011
Labels: Stability-CodeYellow
Looking at the crashes reported for 15.0.854.0 and 15.0.854.1000, which contain the patch mentioned in #31, there have been 315 and 416 browser process crashes respectively at the time of writing. I don't see any instances of this crash.

If this is indeed fixed, I still don't know whether it is CloseHandle or TerminateProcess that is being hooked but, given severity of the crash, I think the patch could still potentially be merged into other branches before narrowing it down further.
Labels: Merge-Requested
Merge requested for this:
http://codereview.chromium.org/7640008/
Comment 37 by k...@google.com, Aug 18 2011
Labels: -Merge-Requested Merge-Approved
Status: Fixed
Comment 39 by k...@google.com, Aug 22 2011
Status: Started
I don't see this merged yet, moving back to started.
Comment 40 by k...@google.com, Aug 23 2011
Labels: -Merge-Approved Merge-Merged
Status: Fixed
Nevermind, I see it now.
Follow up. Only a potential TerminateProcess intercept is bypassed at this point (r97407). The patch appears to be holding with CloseHandle called in the regular way. I still don't know what is hooking TerminateProcess or why.
Cc: willchan@chromium.org ananta@chromium.org jar@chromium.org wtc@chromium.org darin@chromium.org tommi@chromium.org
 Issue 73215  has been merged into this issue.
Project Member Comment 43 by bugdroid1@chromium.org, Oct 13 2012
Labels: Restrict-AddIssueComment-Commit
This issue has been closed for some time. No one will pay attention to new comments.
If you are seeing this bug or have new data, please click New Issue to start a new bug.
Project Member Comment 44 by bugdroid1@chromium.org, Mar 10 2013
Labels: -Area-Internals -Feature-GPU -Mstone-14 Cr-Internals-GPU Cr-Internals M-14
Project Member Comment 45 by bugdroid1@chromium.org, Mar 13 2013
Labels: -Restrict-AddIssueComment-Commit Restrict-AddIssueComment-EditIssue
Project Member Comment 46 by bugdroid1@chromium.org, Mar 6 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/639696bcf242b514f41bab37ba08f46291ff212d

commit 639696bcf242b514f41bab37ba08f46291ff212d
Author: rvargas <rvargas@chromium.org>
Date: Fri Mar 06 19:15:08 2015

Align base::Process::Terminate with base::KillProcess for Windows.

BUG=417532,  81449 

Review URL: https://codereview.chromium.org/982973003

Cr-Commit-Position: refs/heads/master@{#319475}

[modify] http://crrev.com/639696bcf242b514f41bab37ba08f46291ff212d/base/process/process_win.cc

Sign in to add a comment