"ProcessInfo.OtherProcess" is flaky |
|||||
Issue description"ProcessInfo.OtherProcess" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyIwsSBUZsYWtlIhhQcm9jZXNzSW5mby5PdGhlclByb2Nlc3MM. Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
,
Dec 6 2017
Yes. Scott, what do you think of this? In crashpad_util_test ProcessInfo.OtherProcess, the flakes report shows CreateProcess() failing in one of two ways: CreateProcess: The subsystem needed to support the image type is not present. (0x134) CreateProcess: Attempt to access invalid address. (0x1E7) These are ERROR_IMAGE_SUBSYSTEM_NOT_PRESENT and ERROR_INVALID_ADDRESS. This would be while trying to run crashpad_util_test_process_info_test_child, whose funky unique characteristics include being linked with /BASE:0x78000000 /DYNAMICBASE:NO /FIXED. That may be related to the flake.
,
Dec 6 2017
remove 'Sheriff-Chromium' after triage,
,
Dec 6 2017
Yeah, I think you've diagnosed it. The forced base address is probably conflicting with something else that gets ASLRd in there on the bots randomly. I guess we could just exclude that test when in Chromium, though there's no great reason why it shouldn't flake on Crashpad's bot either I guess. Looking at the test https://cs.chromium.org/chromium/src/third_party/crashpad/crashpad/util/win/process_info_test.cc?rcl=5a3ed56a2bd05a2f79bb82f03d4b7f5f98bc091d&l=173 I can't immediately recall why we didn't just search for that module being loaded in the list, rather than trying to force it to be index 0 in Modules(). I guess just trying to be clear about expectations and ordering of modules?
,
Dec 6 2017
It may flake for us too, but if it’s flaking relatively infrequently, we’d be more likely to see it on the much larger volume of Chromium build and try jobs. A desirable property for ProcessSnapshot::Modules() is for the main executable to appear at index 0. Do you know offhand what Windows’ loader would do if we just said /BASE:0x78000000 but didn’t say /DYNAMICBASE:NO or /FIXED? If it’d just treat 0x78000000 as a suggestion but would allow the module to load elsewhere if that range was occupied, it’d probably be a decent compromise.
,
Dec 6 2017
Yes, it'll load but be randomized to wherever. Wouldn't we lose the index 0 property then? Or are you thinking that we'd fix that in Modules()?
,
Dec 6 2017
What we should be doing at https://chromium.googlesource.com/crashpad/crashpad/+/31df2acb12398c5a52cd6dc4153914fcfc5aa560/util/win/process_info.cc#273 is grabbing the executable first, regardless of where it’s loaded, and then grabbing everything else in initialization order. It seems weird that we get the executable as the first item from the InMemoryOrderModuleList, and then pick up everything else from the InInitializationOrderModuleList, but the comments say that the executable isn’t in the latter. If our test is passing with the executable having a high fixed base, then that would indicate that our code is correct, and that the executable is the first in InMemoryOrderModuleList even when it loaded at an address higher than some other module. But the weirdness around how to find the main executable and which of the loader’s module lists it’s in makes it kind of important that we have a test that makes the main executable load at something that wouldn’t actually be the lowest address of all loaded modules. If just /BASE without /DYNAMICBASE:NO and /FIXED results in a random load address, then it shouldn’t kill the index 0 property, but it’d make this test useless for verifying the index 0 property. 0x78000000 was chosen to get the main executable loading above even system libraries, but maybe that range is still fraught. Maybe what we need to do is choose a lower fixed base address below system libraries for our executable, and make sure that another module appears below it in memory by linking it against a DLL of our own that has an even lower fixed base address. There’s a lot more address space at our disposal below 0x70000000 than above it. Do you think that’d work? Do you have a suggestion for a viable range to use? I imagine that the low side, starting just at 0x400000, is reasonably unlikely to have anything camping out, because that’s where you’d expect old fixed-base executables to load. What if we made our executable load at a fixed 0x500000, and had it load our own fixed-base DLL at 0x400000?
,
Dec 7 2017
I was able to reproduce this locally on a Win 7 VM. I get the same two failures (flakily): "CreateProcess: The subsystem needed to support the image type is not present. (0x134)" "CreateProcess: Attempt to access invalid address. (0x1E7)" and it only seems to happen on the Win7 VM, not on the Win10 host. However, if I remove all the ldflags on crashpad_util_test_process_info_test_child, I still get the same behaviour on Win7. So we might be barking up the wrong tree here. I'll disable (directly in the Chromium tree, as it should be temporary) and debug some more.
,
Dec 7 2017
I discovered I was confused in #c8 (was having some VMware problems). It really is the base address causing the problem. I'll debug some more asap.
,
Dec 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6e30e1cf0617df15b628e5519159b195b4941c8c commit 6e30e1cf0617df15b628e5519159b195b4941c8c Author: Scott Graham <scottmg@chromium.org> Date: Thu Dec 07 05:19:46 2017 Disable crashpad_tests:ProcessInfo.OtherProcess, flaking TBR=mark@chromium.org Bug: 792619 Change-Id: Ic1e807fcd70377596737ba06e797cade5c7bd59b Reviewed-on: https://chromium-review.googlesource.com/812453 Commit-Queue: Scott Graham <scottmg@chromium.org> Reviewed-by: Scott Graham <scottmg@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Cr-Commit-Position: refs/heads/master@{#522347} [modify] https://crrev.com/6e30e1cf0617df15b628e5519159b195b4941c8c/third_party/crashpad/README.chromium [modify] https://crrev.com/6e30e1cf0617df15b628e5519159b195b4941c8c/third_party/crashpad/crashpad/util/win/process_info_test.cc
,
Dec 7 2017
I'm having trouble understanding the (my) logic here. Gone around in circles for a while now, but here's where I'm at.
I was wondering about the definition of PEB_LDR_DATA, and if possibly the fields were misnamed. The current definition matches what dt -t ntdll!_PDB_LDR_DATA says in windbg. But it doesn't really make sense to read the first entry of the InMemoryOrderModuleList to get the exe, since it would be randomized (?). So maybe (since it's working) that's actually InLoadOrderModuleList.
Digging in with some *awful* windbg commands I get this:
0:000> !peb
PEB at 04211000
InheritedAddressSpace: No
ReadImageFileExecOptions: No
BeingDebugged: Yes
ImageBaseAddress: 77f70000
Ldr 77ee7be0
Ldr.Initialized: Yes
Ldr.InInitializationOrderModuleList: 06663f68 . 066650c8
Ldr.InLoadOrderModuleList: 06664060 . 066650b8
Ldr.InMemoryOrderModuleList: 06664068 . 066650c0
Base TimeStamp Module
77f70000 5a298c47 Dec 07 10:45:27 2017 c:\src\cr\src\out\release\crashpad_util_test_process_info_test_child.exe
77dd0000 C:\WINDOWS\SYSTEM32\ntdll.dll
77a90000 744765ce Oct 26 19:56:14 2031 C:\WINDOWS\System32\KERNEL32.DLL
76c60000 2cd1ce3d Oct 29 19:15:25 1993 C:\WINDOWS\System32\KERNELBASE.dll
746e0000 5066dc78 Sep 29 04:33:12 2012 C:\WINDOWS\SYSTEM32\apphelp.dll
SubSystemData: 00000000
ProcessHeap: 06660000
ProcessParameters: 066621a8
CurrentDirectory: 'c:\src\cr\src\'
WindowTitle: 'c:\src\cr\src\out\release\crashpad_util_test_process_info_test_child.exe'
ImageFile: 'c:\src\cr\src\out\release\crashpad_util_test_process_info_test_child.exe'
CommandLine: 'c:\src\cr\src\out\release\crashpad_util_test_process_info_test_child.exe'
...
0:000> !list "-t _LIST_ENTRY.Flink -x \"? @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InInitializationOrderLinks));dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InInitializationOrderLinks))\" 06663f68"
Evaluate expression: 107364184 = 06663f58
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\ntdll.dll"
Evaluate expression: 107366384 = 066647f0
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNELBASE.dll"
Evaluate expression: 107365440 = 06664440
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNEL32.DLL"
Evaluate expression: 107368632 = 066650b8
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\apphelp.dll"
0:000> !list "-t _LIST_ENTRY.Flink -x \"? @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InInitializationOrderLinks));dt ntdll!_LDR_DATA_TABLE_ENTRY DllBase @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InInitializationOrderLinks))\" 06663f68"
Evaluate expression: 107364184 = 06663f58
+0x018 DllBase : 0x77dd0000 Void
Evaluate expression: 107366384 = 066647f0
+0x018 DllBase : 0x76c60000 Void
Evaluate expression: 107365440 = 06664440
+0x018 DllBase : 0x77a90000 Void
Evaluate expression: 107368632 = 066650b8
+0x018 DllBase : 0x746e0000 Void
0:000> !list "-t _LIST_ENTRY.Flink -x \"? @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InLoadOrderLinks));dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InLoadOrderLinks))\" 06664060"
Evaluate expression: 107364448 = 06664060
+0x024 FullDllName : _UNICODE_STRING "c:\src\cr\src\out\release\crashpad_util_test_process_info_test_child.exe"
Evaluate expression: 107364184 = 06663f58
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\ntdll.dll"
Evaluate expression: 107365440 = 06664440
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNEL32.DLL"
Evaluate expression: 107366384 = 066647f0
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNELBASE.dll"
Evaluate expression: 107368632 = 066650b8
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\apphelp.dll"
0:000> !list "-t _LIST_ENTRY.Flink -x \"? @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks));dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks))\" 06664068"
Evaluate expression: 107364448 = 06664060
+0x024 FullDllName : _UNICODE_STRING "c:\src\cr\src\out\release\crashpad_util_test_process_info_test_child.exe"
Evaluate expression: 107364184 = 06663f58
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\ntdll.dll"
Evaluate expression: 107365440 = 06664440
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNEL32.DLL"
Evaluate expression: 107366384 = 066647f0
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNELBASE.dll"
Evaluate expression: 107368632 = 066650b8
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\apphelp.dll"
0:000> !list "-t _LIST_ENTRY.Flink -x \"? @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks));dt ntdll!_LDR_DATA_TABLE_ENTRY DllBase @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks))\" 06664068"
Evaluate expression: 107364448 = 06664060
+0x018 DllBase : 0x77f70000 Void
Evaluate expression: 107364184 = 06663f58
+0x018 DllBase : 0x77dd0000 Void
Evaluate expression: 107365440 = 06664440
+0x018 DllBase : 0x77a90000 Void
Evaluate expression: 107366384 = 066647f0
+0x018 DllBase : 0x76c60000 Void
Evaluate expression: 107368632 = 066650b8
+0x018 DllBase : 0x746e0000 Void
Notable things that I see in that mess are:
1. InitializationOrder really doesn't contain the .exe, so the extra code there was warranted.
2. The LoadOrder list might be basically what we want anyway, so maybe I should just use that (and it contains the exe)
3. In the MemoryOrder appears to go from high to low, so the /IMAGEBASE in the test was doing the exact opposite of what it needed to do -- it should have been forcing a low address, not a high one. So maybe leaving it random is almost as good (though then it could be flaky if Modules() was buggy). Or possibly the test could do a forced high and a forced low version.
,
Dec 7 2017
Huh, though when I remove the IMAGEBASE, this is what MemoryOrder looks like: 0:000> !list "-t _LIST_ENTRY.Flink -x \"? @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks));dt ntdll!_LDR_DATA_TABLE_ENTRY DllBase @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks))\" 07434068" Evaluate expression: 121847904 = 07434060 +0x018 DllBase : 0x01080000 Void Evaluate expression: 121847640 = 07433f58 +0x018 DllBase : 0x77dd0000 Void Evaluate expression: 121848896 = 07434440 +0x018 DllBase : 0x77a90000 Void Evaluate expression: 121849840 = 074347f0 +0x018 DllBase : 0x76c60000 Void Evaluate expression: 121852088 = 074350b8 +0x018 DllBase : 0x746e0000 Void 0:000> !list "-t _LIST_ENTRY.Flink -x \"? @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks));dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName @@(#CONTAINING_RECORD(@$extret, ntdll!_LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks))\" 07434068" Evaluate expression: 121847904 = 07434060 +0x024 FullDllName : _UNICODE_STRING "c:\src\cr\src\out\release\crashpad_util_test_process_info_test_child.exe" Evaluate expression: 121847640 = 07433f58 +0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\ntdll.dll" Evaluate expression: 121848896 = 07434440 +0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNEL32.DLL" Evaluate expression: 121849840 = 074347f0 +0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\System32\KERNELBASE.dll" Evaluate expression: 121852088 = 074350b8 +0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\SYSTEM32\apphelp.dll" in particular the order is decreasing, but the exe is still first and isn't in the correct location in the list. So... yeah. I don't think the test was doing what it wanted to do with the build flags, but I'm not sure it's possible to concoct a case where the exe would show up elsewhere. So I think I'll just remove the /IMAGEBASE flags which will deflake the test, and also change the implementation to read LoadOrder for the exe, followed by InitializationOrder for the rest (only because that reads as more sensible, it shouldn't actually affect behaviour.)
,
Dec 7 2017
Thanks for digging in! If we’re switching to exe-from-LoadOrder[0]-and-everything-else-from-InitializationOrder, or even to everything-from-LoadOrder, then I don’t think we need to mess with linker flags. Basically, if we’re not touching MemoryOrder at all, there’s no need to try to get things to load in specific places. If LoadOrder[0] always has the executable, then it sounds like a better place than MemoryOrder[0] to get it. Even if you’ve proved that MemoryOrder[0] always has the executable regardless of memory order, it just seems weird (and maybe subject to being “fixed” in the future). If LoadOrder[0] works as well, let’s do that. Also, we may not even need to use this particular executable, if there’s another nearby executable that’ll work in its place.
,
Dec 7 2017
Yeah, switching to everything-from-LoadOrder seems OK, so I'll do that since it's less code. I vaguely wanted to know the order things got into memory and initialized so there might be a guess at who-did-what. But I don't think I've ever actually considered that in looking at a crash anyway, and I don't know that anyone else would know that's the order they're stored in. The child loads lz32.dll as its last thing, and that still seems useful to verify, so I'll stick with using an executable just for this test.
,
Dec 8 2017
Very good, then. I like the code minus. Is the difference between LoadOrder and InitializationOrder supposed to be that when a.dll depends on b.dll and causes it to be loaded, they’ll show up as (a, b) in LoadOrder and (b, a) in InitializationOrder?
,
Dec 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/crashpad/crashpad.git/+/92e8bc4713e12e4083ca6d0736cd8a82009c9f86 commit 92e8bc4713e12e4083ca6d0736cd8a82009c9f86 Author: Scott Graham <scottmg@chromium.org> Date: Sat Dec 09 00:11:31 2017 win: Fix ProcessInfo.OtherProcess test flake Removes the /BASE:N and /FIXED arguments to the child, which weren't actually testing correctly (see bug), and were causing problems at least on Win7 when something collided with that address. Additionally, switches to storing modules in load order, rather than a combination of memory order and initialization order, since that was a bit confusing and there was no great rationale for it. While reviewing, handle the case of a corrupted module name, and if it's unreadable continue emitting "???" as a name. Adds a test for this functionality. Bug: chromium:792619 Change-Id: I2e95a81b02fe4d527868f6a5f980d315604255a6 Reviewed-on: https://chromium-review.googlesource.com/815875 Commit-Queue: Scott Graham <scottmg@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> [modify] https://crrev.com/92e8bc4713e12e4083ca6d0736cd8a82009c9f86/util/BUILD.gn [modify] https://crrev.com/92e8bc4713e12e4083ca6d0736cd8a82009c9f86/util/util_test.gyp [modify] https://crrev.com/92e8bc4713e12e4083ca6d0736cd8a82009c9f86/util/win/process_info.cc [modify] https://crrev.com/92e8bc4713e12e4083ca6d0736cd8a82009c9f86/util/win/process_info_test.cc [modify] https://crrev.com/92e8bc4713e12e4083ca6d0736cd8a82009c9f86/util/win/process_info_test_child.cc
,
Dec 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/92f888845760c2d2303e019190938309800cea6b commit 92f888845760c2d2303e019190938309800cea6b Author: Scott Graham <scottmg@chromium.org> Date: Wed Dec 13 00:17:29 2017 Update Crashpad to 0924e567514fc577e725b56cd602808467b317a9 c04c05564db1 linux: Fix gcc -Wparentheses error from 7a0daa6989a9 92e8bc4713e1 win: Fix ProcessInfo.OtherProcess test flake 0cc63d4a4558 android: Make run_tests.py work on Android versions before 7.0 (N) 13d0defbfb45 android, win: Enable colored test output when available 0924e567514f linux: Add PtraceBroker and PtraceClient Re-enables ProcessInfo.OtherProcess. Bug: chromium:792619 Change-Id: Ic1178674e6704b9c4560be52a9197c53930a5a9b Reviewed-on: https://chromium-review.googlesource.com/823218 Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Scott Graham <scottmg@chromium.org> Cr-Commit-Position: refs/heads/master@{#523617} [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/README.chromium [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/build/run_tests.py [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/snapshot/elf/elf_image_reader.cc [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/snapshot/linux/process_reader_test.cc [add] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/test/linux/get_tls.cc [add] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/test/linux/get_tls.h [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/test/test.gyp [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/BUILD.gn [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/direct_ptrace_connection.cc [add] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptrace_broker.cc [add] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptrace_broker.h [add] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptrace_broker_test.cc [add] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptrace_client.cc [add] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptrace_client.h [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptracer.cc [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptracer.h [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/linux/ptracer_test.cc [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/util.gyp [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/util_test.gyp [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/win/process_info.cc [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/win/process_info_test.cc [modify] https://crrev.com/92f888845760c2d2303e019190938309800cea6b/third_party/crashpad/crashpad/util/win/process_info_test_child.cc
,
Dec 13 2017
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by gogerald@chromium.org
, Dec 6 2017