Issue metadata
Sign in to add a comment
|
Chrome: Crash Report - crashpad::SessionEndWatcher::ThreadMain |
||||||||||||||||||||
Issue descriptionProduct name: Chrome Magic Signature: crashpad::SessionEndWatcher::ThreadMain Current link: https://crash.corp.google.com/browse?q=product.name%3D'Chrome'%20AND%20product.version%3D'58.0.3026.3'%20AND%20custom_data.ChromeCrashProto.channel%3D'dev'%20AND%20custom_data.ChromeCrashProto.ptype%3D'crashpad-handler'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'crashpad%3A%3ASessionEndWatcher%3A%3AThreadMain'%20AND%20ReportID%3D'845bb8ce20000000'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3 Search properties: product.name: Chrome product.version: 58.0.3026.3 custom_data.chromecrashproto.channel: dev custom_data.chromecrashproto.ptype: crashpad-handler custom_data.chromecrashproto.magic_signature_1.name: crashpad::SessionEndWatcher::ThreadMain reportid: 845bb8ce20000000 Metadata : Product Name: Chrome Product Version: 58.0.3026.3 Report ID: 845bb8ce20000000 Report Time: Sat, 04 Mar 2017 02:19:49 GMT Uptime: 0 ms Cumulative Uptime: 0 ms User Email: OS Name: Windows NT OS Version: 6.1.7601 Service Pack 1 CPU Architecture: x86 CPU Info: GenuineIntel family 6 model 37 stepping 5 Stack Trace: ================= Thread 1 CRASHED [EXCEPTION_ACCESS_VIOLATION_EXEC @ 0x7785e1e1 ] MAGIC SIGNATURE THREAD Stack Quality84%Show frame trust levels 0x7785e1e1 0x0041e22e (user32.dll + 0x0000e22e ) __fnHkINDWORD 0x7775702d (ntdll.dll + 0x0004702d ) KiUserCallbackDispatcher 0x77756fdf (ntdll.dll + 0x00046fdf ) KiUserApcDispatcher 0x0041ec53 (user32.dll + 0x0000ec53 ) _CreateWindowEx 0x0041ecae (user32.dll + 0x0000ecae ) CreateWindowExW 0x01285dcc (chrome.exe -session_end_watcher.cc:159 ) crashpad::SessionEndWatcher::ThreadMain() 0x01288bea (chrome.exe -thread_win.cc:38 ) crashpad::Thread::ThreadEntryThunk(void *) 0x7745ee1b (kernel32.dll + 0x0004ee1b ) BaseThreadInitThunk 0x777737ea (ntdll.dll + 0x000637ea ) __RtlUserThreadStart 0x777737bd (ntdll.dll + 0x000637bd ) _RtlUserThreadStart This crash is first started from 58.0.3022.0 and observed the spike in canary 58.0.3028.0 so far seeing 20 instances from 20 clients. Link to the list of builds =============================================== https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27crashpad-handler%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27crashpad%3A%3ASessionEndWatcher%3A%3AThreadMain%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D Change log ============================= https://chromium.googlesource.com/chromium/src/+log/58.0.3020.0..58.0.3022.0?pretty=fuller&n=10000 suspect CL: https://codereview.chromium.org/2710663006 Adding the release-block stable label, since it is a recent regression mark@,could you please look into this issue if it is related to your change, else please route this to an appropriate dev person. Thank You..
,
Mar 9 2017
Friendly ping to get an update on this.
,
Mar 9 2017
Oh no!
,
Mar 9 2017
Darn. I notice they're all Win7 currently. That code looks uncontroversial to me though.
,
Mar 9 2017
I see --ptype=watcher is crashing in a similar location too e.g. https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27watcher%27%20AND%20product.Version%3D%2759.0.3033.0%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=7616d86480000000&index=0#0 so likely either we've hooked ourselves badly, or someone else has.
,
Mar 9 2017
WinDbg gets a small step further on this. Breakpad missed the CallHookWithSEH frame. (Typical.)
But ugh, there’s no MemoryInfoListStream. This dump would have been produced by MiniDumpWriteDump(), but the fallback crash handler isn’t specifying MiniDumpWithFullMemoryInfo, so we’re not getting this. So I don’t know what’s actually at 0x7785e1e1.
CallHookWithSEH makes me think that some third-party code is tampering with us, and either doing a bad job or just colliding by racing their modification of a page against our execution from it. I checked the modules tab for crashes that have this signature to learn more.
(product.name='Chrome' AND custom_data.ChromeCrashProto.ptype='crashpad-handler' AND custom_data.ChromeCrashProto.magic_signature_1.name='crashpad::SessionEndWatcher::ThreadMain')
There are 124 crashes that match, and I looked at the first 13 (more than 10%), and of those, all showed snxhk.dll was loaded. And that’s Avast?
And it looks like that’s one of the usual suspect third-party modules, and that once we see it, we basically stop, throw our hands up in the air, and say “we just can’t win.” And I don’t know if we can here.
WinDbg log follows for completeness. Scott, Mr. Windows, is this reasoning sound? Or is there something that we really can do here? Because it sucks to have this sort of problem affect crashpad_handler, which should really be shooting for top-notch stability.
0:001> .ecxr
eax=00b1fa14 ebx=00000000 ecx=00b1fa24 edx=00000028 esi=0000c20b edi=0000c20b
eip=7785e1e1 esp=00b1f99c ebp=00b1f9d8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
7785e1e1 ?? ???
0:001> k
*** Stack trace for last set context - .thread/.cxr resets it
# ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
00 00b1f998 0041e1a9 0x7785e1e1
01 00b1f9d8 0041e22f user32!CallHookWithSEH+0x21
02 00b1f9fc 7775702e user32!__fnHkINDWORD+0x24
03 00b1fa28 0041eb94 ntdll!KiUserCallbackDispatcher+0x2e
04 00b1fa2c 0041eb28 user32!NtUserCreateWindowEx+0xc
05 00b1fcd0 0041ec54 user32!VerNtUserCreateWindowEx+0x1a3
06 00b1fd7c 0041ecaf user32!_CreateWindowEx+0x201
*** WARNING: Unable to verify timestamp for chrome.exe
*** ERROR: Module load completed but symbols could not be loaded for chrome.exe
07 00b1fdb8 01285dcd user32!CreateWindowExW+0x33
08 00b1ff08 01288beb chrome+0x55dcd
09 00b1ff10 7745ee1c chrome+0x58beb
0a 00b1ff1c 777737eb kernel32!BaseThreadInitThunk+0xe
0b 00b1ff5c 777737be ntdll!__RtlUserThreadStart+0x70
0c 00b1ff74 00000000 ntdll!_RtlUserThreadStart+0x1b
0:001> !address 0x7785e1e1
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...
Usage: <unknown>
Base Address: 7784c000
End Address: 77920000
Region Size: 000d4000 ( 848.000 kB)
State: <info not present at the target>
Protect: <info not present at the target>
Type: <info not present at the target>
Allocation Base: <info not present at the target>
Allocation Protect: <info not present at the target>
Content source: 0 (invalid), length: 8777e1f
,
Mar 14 2017
(stability sheriff here) - I've double checked the numbers: 196/216 reports have snxhk.dll [1]. - It doesn't sound like this is actionable. - I'll remove the stability sheriff label for now. Let's see what scottmg@ says as per #6. [1] query suffix: OMIT RECORD IF SUM(regexp(third_party_modules.CodeFile, 'snxhk.dll$')) = 0
,
Mar 20 2017
I didn't check the data, but makes sense to me. Given that this code is "only" to be able to collect UMA data, and that it's affecting --type=crashpad-handler I think we should consider removing this instrumentation, unless we can get more data by adding MiniDumpWithFullMemoryInfo (or heck, MiniDumpWithFullMemory for that matter) and debugging further crashes.
,
Mar 20 2017
I did add MiniDumpWithFullMemoryInfo a couple of weeks ago. https://crrev.com/df6aab609a80
,
Mar 22 2017
Latest crash rates on all latest channels are as below.Seeing more number of crashes on latest beta 59.0.3047.4 0.64% 3 59.0.3047.3 0.21% 1 59.0.3045.1 0.64% 3 58.0.3029.19 28.94% 136 Link to the list of builds https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27crashpad-handler%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27crashpad%3A%3ASessionEndWatcher%3A%3AThreadMain%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports:5,productversion:1000 mark@ Please look into this stable blocker issue. Thanks,
,
Mar 22 2017
Siggi, do you have any ideas? Our WM_ENDSESSION detection seems to be destabilized by bad hooking by snxhk.dll on Windows 7. We could stop watching for that, but it’d kill off events for Crashpad.HandlerLifetimeMilestone = kTerminated, which is useful to help put kCrashed in perspective.
,
Mar 22 2017
IMHO it's nonsensical to make crashes in the crash handler a release blocker. These crashes have zero user impact. The bugs are informational for us, but should never block releases. This looks like a third party interaction with Avast, I suggest WontFix unless Will or Chris have suggestions otherwise.
,
Mar 27 2017
Just to update the latest behaviour of crash, This crash is seen in all channels as below 58.0.3029.33 11.63% 77 --- Latest Beta Issue is seen only in latest beta as above. Link to the builds: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27crashpad-handler%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27crashpad%3A%3ASessionEndWatcher%3A%3AThreadMain%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports:5,productversion:1000 mark@ - Could you please provide any update on this issue. Thanks...!!
,
Mar 30 2017
I’ve got nothing new. These are affected by third-party taint. See Siggi’s comment 12. It’s not clear that we need to treat this as a release blocker at all. In fact, I can’t really even promise that disabling the code in question wouldn’t just move the crash elsewhere. Still waiting for Will or Chris to weigh in if they have any thoughts, per comment 12.
,
Mar 30 2017
agree on RBS. Triage team, can you point me at the current set of docs that has info on when to RBS and when not to. Thanks.
,
Mar 31 2017
I found the same suspicious string in dumps from this bug that I did in bug 706393 : \Sessions\1\Windows\ApiPortection Possibly mal, but it definitely smells like third-party junk. More analysis on bug 706393 .
,
Apr 5 2017
This is attributable to third-party software messing with things, and is not actionable.
,
May 3 2017
I thought I was done with this, but I looked a little bit more. In every case that I’ve seen, the unmapped memory that it’s trying to execute looks like it should belong to a system library (it’s in the 0x77000000 range) and, indeed, user32.dll has loaded at some weird low address. Sounds like something is assuming that addresses in one process will be valid in ours too, but for whatever reason, user32’s been relocated in our process.
,
May 4 2017
Sure enough, I can reproduce this crash on Windows 7 just by mapping something at user32’s preferred load address before it loads, and then calling CreateWindowEx(). (32-bit OS, Windows 7 RTM. I don’t see the problem in a fully-patched 64-bit Windows 7 SP1 running a 32-bit test process.)
Z:\crashpad>user32reloc
user32.dll @ 0x76520000
ok
Z:\crashpad>user32reloc 0x76520000
alloc @ 0x76520000
user32.dll @ 0xc0000
(crash!)
ModLoad: 00e00000 00e69000 Z:\crashpad\user32reloc.exe
ModLoad: 77ae0000 77c1c000 C:\Windows\SYSTEM32\ntdll.dll
ModLoad: 76440000 76514000 C:\Windows\system32\kernel32.dll
ModLoad: 75d60000 75daa000 C:\Windows\system32\KERNELBASE.dll
ModLoad: 000c0000 00189000 C:\Windows\system32\user32.dll
ModLoad: 77600000 7764e000 C:\Windows\system32\GDI32.dll
ModLoad: 77c20000 77c2a000 C:\Windows\system32\LPK.dll
ModLoad: 77740000 777dd000 C:\Windows\system32\USP10.dll
ModLoad: 75ff0000 7609c000 C:\Windows\system32\msvcrt.dll
ModLoad: 760a0000 760bf000 C:\Windows\system32\IMM32.DLL
ModLoad: 760d0000 7619c000 C:\Windows\system32\MSCTF.dll
ModLoad: 748c0000 74900000 C:\Windows\system32\uxtheme.dll
(590.438): Access violation - code c0000005 (!!! second chance !!!)
eax=002cf9f8 ebx=00000000 ecx=002cfa08 edx=00000028 esi=0000c192 edi=0000c192
eip=7653353f esp=002cf980 ebp=002cf9bc iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
7653353f ?? ???
0:000> k
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
002cf97c 000d31eb 0x7653353f
002cf9bc 000d3245 user32!CallHookWithSEH+0x21
002cf9e0 77b2642e user32!__fnHkINDWORD+0x24
002cfa0c 000d0d69 ntdll!KiUserCallbackDispatcher+0x2e
002cfa10 000d0cfd user32!NtUserCreateWindowEx+0xc
002cfcb4 000d0e29 user32!VerNtUserCreateWindowEx+0x1a3
002cfd60 000d0e84 user32!_CreateWindowEx+0x201
*** WARNING: Unable to verify checksum for Z:\crashpad\user32reloc.exe
002cfd9c 00e0679d user32!CreateWindowExW+0x33
002cfe10 00e07763 user32reloc!wmain+0x15d [z:\crashpad\user32reloc.cc @ 45]
002cfe58 76491174 user32reloc!__scrt_common_main_seh+0xf8 [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl @ 259]
002cfe64 77b3b3f5 kernel32!BaseThreadInitThunk+0xe
002cfea4 77b3b3c8 ntdll!__RtlUserThreadStart+0x70
002cfebc 00000000 ntdll!_RtlUserThreadStart+0x1b
0x7653353f is what something is expecting to be user32.dll + 0x1353f, but with user32.dll at 0xc0000:
0:000> ln 0xd353f
(000d353f) user32!CtfHookProcWorker | (000d3569) user32!DispatchMessageA
Exact matches:
user32!CtfHookProcWorker = <no type information>
,
May 4 2017
For <reasons>, chrome.exe is built with chrome_elf as the very first import dependency, and it in turn binds only to kernel32.dll (though it does have some delayloads). This causes chrome_elf's entry point to be called quite early in the process' lifetime, which is - I think - quite unusual. I think most processes end up loading and initializing system DLLs like user32 at bind time. I wonder if something chrome_elf is doing - early on - invites ****ware into the process. Do you typically find the address space user32 would like to occupy totally vacant, or are there infiltrators or VMAllocs in there?
,
May 4 2017
Looks like someone’s trying to patch user32.dll before it’s loaded. go/crash/8e3b67d450000000 (also on Windows 7 RTM) crashed calling 0x778d353f, which means that user32.dll was expected to be at 0x778c0000, but instead it loaded at 0x220000 It’s 0xc9000 bytes long, so it should have ended at 0x77989000. Here’s what’s nearby: 0:000> !address […] 778bb000 778be000 3000 MEM_IMAGE MEM_COMMIT PAGE_READONLY Image [gdi32; "C:\Windows\System32\gdi32.dll"] + 778be000 77910000 52000 MEM_FREE PAGE_NOACCESS Free + 77910000 77911000 1000 MEM_PRIVATE MEM_COMMIT PAGE_EXECUTE_READ <unknown> + 77911000 77a90000 17f000 MEM_FREE PAGE_NOACCESS Free + 77a90000 77a91000 1000 MEM_IMAGE MEM_COMMIT PAGE_READONLY Image [kernel32; "C:\Windows\System32\kernel32.dll"] […] So it looks like someone tried to patch something in the user32.dll + 0x50000 page before it was even loaded. The dump didn’t capture that page, but here’s what would have been on it if the module had loaded there: 0:000> x /a user32!* […] 0026ffbf user32!DisplayExitWindowsWarnings (<no parameter info>) 0027024b user32!ExitWindowsWorker (<no parameter info>) 0027033c user32!StringCchLengthW (<no parameter info>) 00270378 user32!RecordShutdownReason (<no parameter info>) 002706ef user32!ExitWindowsEx (<no parameter info>) 002707ed user32!ShutdownBlockReasonCreate (<no parameter info>) 002708d9 user32!ShutdownBlockReasonQuery (<no parameter info>) 002709e9 user32!TabTextOut (<no parameter info>) 00270a90 user32!UserLpkTabbedTextOut (<no parameter info>) 00270c4d user32!TabbedTextOutW (<no parameter info>) 00270c7c user32!TabbedTextOutA (<no parameter info>) 00270d12 user32!GetTabbedTextExtentW (<no parameter info>) 00270d3c user32!GetTabbedTextExtentA (<no parameter info>) 00270dd1 user32!PSMTextOut (<no parameter info>) 00270de2 user32!UserLpkPSMTextOut (<no parameter info>) 00270fa0 user32!DdeGetLastError (<no parameter info>) 00270fe1 user32!DdeImpersonateClient (<no parameter info>) 00271043 user32!DumpDDEMessage (<no parameter info>) […]
,
May 4 2017
Yups, looks like 77910000 77911000 1000 MEM_PRIVATE MEM_COMMIT PAGE_EXECUTE_READ is someone's misguided, too early patch, which is preventing user32 from loading where it belongs. The question is which piece of @#$! ware is stomping in there, and perhaps how chrome_elf is tickling it. Maybe you can query from these client IDs to browser crashes, and see whether there's a strong association to one of Will's fave AV products? This is almost certainly either AV or corp ****ware, or malware.
,
May 4 2017
They’ve overwhelmingly got snxhk.dll (Avast) loaded per comments 6 and 7. The dumps tend to have the suspicious-looking "\Sessions\1\Windows\ApiPortection" string in them per comment 16. I was thinking that if we find user32.dll loaded in an odd spot, we could write a jmp to the loaded module’s copy of CtfHookProcWorker() in the spot where it should have been, provided that nothing’s already mapped there. But how can we know where user32.dll should have loaded in the ASLR world? And since CtfHookProcWorker() is an unexported internal function, how can we know its offset into the module? For the latter, we could just brute-force it, and provide forwarders for all functions by scanning for the 90 90 90 90 90 8b ff pattern.
,
May 4 2017
K, I don't think it's worth putting more eng time into this. Maybe outreach to Avast would help? |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by kkaluri@chromium.org
, Mar 4 2017