Navigation fails on Windows 10 14316 unless --no-sandbox set |
|||||
Issue descriptionVersion: 52.0.2710.0 (Official Build) canary (64-bit) 51.0.2704.7 (Official Build) dev-m (64-bit) OS: Windows 10 14316.1000 What steps will reproduce the problem? (1) Launch Chrome (2) Attempt to navigate anywhere What is the expected output? Navigation occurs. What do you see instead? Spinner spins for a while, then Non-Responsive Page dialog box appears. Clicking "Kill" has no effect. Navigating to chrome:// URLs has the same effect. Passing --no-sandbox causes pages to load without errors. chrome://crashes is does not show any crashes this month. I don't have a "%localappdata%\google\chrome\user data\Crash reports" folder; "%localappdata%\google\chrome\user data\crashpad\reports" contains 4 .dmp files created last month. Reinstalling the latest Canary did not resolve the problem.
,
Apr 16 2016
I'm an admin user, but I was not elevated and UAC is on. I've attached a Process Explorer screenshot showing that the primary process is a Medium IL process as expected. Upon further review, I believe this is a duplicate of a problem I previously blogged about (coincidentally, 62 months ago today): https://blogs.msdn.microsoft.com/ieinternals/2011/02/16/ie9-no-reboot-setup-and-the-windows-restart-manager/ Basically, the problem seems to be that when my system restarted, it did so using Windows' Restart API; the one which uses Job Objects. (I had run Lenovo's Firmware update utility which triggers a restart). As a consequence, after restart, the outer Chrome process is running in a Job, as is Windows Explorer. Chrome fails to successfully run its sandboxed render processes if the parent process is run from a job. After performing a full system (non-hybrid) restart using "shutdown.exe -r", Chrome51/52 now work properly. If my suspicions are correct and Chrome does not run properly in this scenario, we may want to investigate reporting the issue to the user if we cannot work around it ourselves. (Notably, Edge worked fine when the system was in this state).
,
Apr 17 2016
can you post detailed reproduction steps to get Windows into the state you describe above with explorer running in a Job object?
,
Apr 17 2016
The specific thing I installed that triggered the reboot was https://download.lenovo.com/pccbbs/mobiles/n1cuj04w.exe Looking at Process Explorer now in the "Working" case, I see both Explorer.exe and Chrome.exe are running in Job Objects, so perhaps this is just something that Windows 10 does now? While the behavior cited in the 2011 blog is the same (top-level window appears, but no tabs can navigate), in that case the Job Object didn't have the "Breakaway ok" flag which is clearly seen in the screenshot of the "broken" case above. Notably, in Windows 10 14295, I do not see the job objects for either Explorer or Chrome.exe's MediumIL process.
,
Apr 18 2016
I tried 14316 and I do not see any processes being put in a Job object. Even if Windows does, as you say, it's okay as long as it's not marked no-breakaway, since Win8+ supports nested job objects and Chrome is fine to nest its sandbox Job objects inside other Jobs. I'd be interested in knowing what configuration causes explorer and Chrome's broker process to end up in a Job. It might be interesting to UMA and see how often this is actually happening.
,
Apr 18 2016
Job objects are used for Connected Standby, so this is likely happening because you have an AOAC compliant device. As said in #5 this is fine for Chrome, as Windows 8+ supports nested job objects and we live happily inside the job. I would be interested in your could come up with some reproduction steps for getting Explorer/Chrome inside a Job object that does is marked no-breakaway. This certainly should not happen as other things are likely to break in the OS if that flag is set on the Job. Until then I think this is WAI.
,
Apr 18 2016
In the affected state, Chrome was not able to load tabs to any origin, including built-in URLs. It's entirely possible that all talk of Job Objects, etc, is unrelated to the problem at hand, but that doesn't mean that Chrome is working as intended, only that my suspicions about the root cause of the bug are unfounded. Are there better debugging steps I can undertake to help root-cause this properly if it were to reproduce again in the future?
,
Apr 18 2016
If it's not possible to provide reproduction steps to help to repro this in the lab, then it'll be down to you to build your own copy of Chromium and then step through the sandbox initialization code in the debugger. You could try using --allow-no-sandbox-job instead of --no-sandbox - if Chrome renderers start fine then it's certainly something to do with the Job object. If --allow-no-sandbox-job still prevents renderers from launching, but --no-sandbox works, then it's something else that has to be debugged. It's certainly working as intended that if Chrome browser process is running inside a Job with no-breakaway or no support for nested jobs, it should not be able to start any renderer processes.
,
Aug 8 2016
https://twitter.com/anatudor/status/762725107810889728 appears to be an in-the-wild report of the same symptoms.
,
Aug 17 2016
I was able to reproduce this on Windows 10 1607 14393.67 (the anniversary update). Windows restarted for an update and after restart, 64-bit Chrome 54.0.2824.0 dev-m and 64-bit 54.0.2831.0 Canary are unable to start render processes unless the --no-sandbox flag is passed. Additionally, the Brave browser is unable to start its renderers, while Vivaldi and Opera have no problem. Further investigation reveals that Vivaldi and Opera are 32bit while Chrome and Brave are 64 bit. If I swap my 64-bit Canary for a 32-bit Canary, it is able to start its render processes without issue. The "broken" 64-bit Chrome does successfully launch two subprocesses, one of type=crashpad-handler and one of type=gpu-process.
,
Aug 17 2016
I've not been able to repro. Let anniversary edition install updates with 54.0.2831.0 running at the time. On reboot application restarted with no problems. In my case neither explorer or chrome (main medium IL process) are running in a Job. elawrence@ did you try the --allow-no-sandbox-job flag as well?
,
Aug 17 2016
Okay, Chromium 64-bit repros, so I can play with this a bit.
When I run Chromium rather than Chrome, I don't see the GPU process in Task Manager, and I get spew in the system debug console:
[9220] [9220:804:0817/102133:ERROR:browser_gpu_channel_host_factory.cc(113)] Failed to launch GPU process.
If I run chromium --disable-gpu-sandbox, I see the GPU process in Task Manager but pages still fail to load with the same error.
When I run under the debugger, I see debug breaks in InterceptionAgent::OnDllLoad for name=kernel32.dll and user32.dll and gdi32.dll before the process is terminated.
3:033> !peb
PEB at 000000756ba50000
InheritedAddressSpace: No
ReadImageFileExecOptions: No
BeingDebugged: Yes
ImageBaseAddress: 0000000140000000
Ldr 00007fffff9b23a0
Ldr.Initialized: Yes
Ldr.InInitializationOrderModuleList: 0000021027a54c60 . 0000021027a54c60
Ldr.InLoadOrderModuleList: 0000021027a54dd0 . 0000021027a55260
Ldr.InMemoryOrderModuleList: 0000021027a54de0 . 0000021027a55270
Base TimeStamp Module
140000000 57abad51 Aug 10 17:40:17 2016 C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe
7fffff860000 578997b2 Jul 15 21:10:58 2016 C:\WINDOWS\SYSTEM32\ntdll.dll
7ffffced0000 00000000 Dec 31 18:00:00 1969 C:\WINDOWS\System32\KERNEL32.DLL
SubSystemData: 0000000000000000
ProcessHeap: 0000021027a50000
ProcessParameters: 0000021027a51f30
CurrentDirectory: 'C:\Users\ericlaw\AppData\Local\Chromium\Application\54.0.2826.0\'
WindowTitle: 'C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe'
ImageFile: 'C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe'
CommandLine: '"C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe" --type=renderer --enable-features=AutomaticTabDiscarding<AutomaticTabDiscarding,ExpectCTReporting<ExpectCTReporting,IncidentReportingDisableUpload<SafeBrowsingIncidentReportingService,IncidentReportingModuleLoadAnalysis<SafeBrowsingIncidentReportingServiceFeatures,IncidentReportingSuspiciousModuleReporting<SafeBrowsingIncidentReportingServiceFeatures,MainFrameBeforeActivation<MainFrameBeforeActivation,NetworkTimeServiceQuerying<NetworkTimeQueries,NewAudioRenderingMixingStrategy<NewAudioRenderingMixingStrategy,NonValidatingReloadOnNormalReload<NonValidatingReloadOnNormalReload,PassiveDocumentEventListeners<PassiveDocumentEventListeners,PointerEvent<PointerEvent,PreconnectMore<PreconnectMore,UsePasswordSeparatedSigninFlow<PasswordSeparatedSigninFlow,WebRTC-EnableWebRtcEcdsa<WebRTC-EnableWebRtcEcdsa,WebRTC-H264WithOpenH264FFmpeg<WebRTC-H264WithOpenH264FFmpeg,token-binding<TokenBinding,use-new-media-cache<use-new-media-cache --disable-features=DocumentWriteEvaluator<DisallowFetchForDocWrittenScriptsInMainFrame --force-fieldtrials=AutofillClassifier/Enabled/AutofillFieldMetadata/Enabled/AutofillProfileOrderByFrecency/EnabledLimitTo3/*AutomaticTabDiscarding/Enabled_Once_10-gen2/BlockSmallPluginContent/Enabled/BrowserBlacklist/Enabled/CaptivePortalInterstitial/Enabled/ChildAccountDetection/Disabled/ChromeDashboard/Enabled/ChromotingQUIC/Enabled/DefaultBrowserInfobar/SettingsTextNotNow/DisallowFetchForDocWrittenScriptsInMainFrame/DocumentWriteScriptBlockGroup/EnableGoogleCachedCopyTextExperiment/Button/*EnableMediaRouter/Enabled/EnableMediaRouterWithCastExtension/Enabled/EnableSessionCrashedBubbleUI/Enabled/ExpectCTReporting/ExpectCTReportingEnabled/ExtensionActionRedesign/Enabled/*ExtensionContentVerification/Enforce/ExtensionInstallVerification/Enforce/GoogleBrandedContextMenu/branded/GoogleNow/Enable/*IconNTP/Default/InstanceID/Enabled/IntelligentSessionRestore/Enabled/MainFrameBeforeActivation/Enabled/MaterialDesignDownloads/Enabled/MojoChannel/Enabled/*NetworkQualityEstimator/Enabled/NetworkTimeQueries/NetworkTimeQueriesEnabled/NewAudioRenderingMixingStrategy/Enabled/*NewProfileManagement/Enabled/NonValidatingReloadOnNormalReload/Enabled/OfferUploadCreditCards/Enabled/OutOfProcessPac/Enabled/*PageRevisitInstrumentation/Enabled/PassiveDocumentEventListeners/Enabled/PasswordBranding/SmartLockBrandingSavePromptOnly/PasswordGeneration/Disabled/*PasswordManagerSettingsMigration/Enable/PasswordSeparatedSigninFlow/Enabled/PasswordSmartBubble/3-Times/PointerEvent/Enabled/PreRead/NoPrefetchArgument2/PreconnectMore/Enabled/*QUIC/Enabled/RefreshTokenDeviceId/Enabled/ReportCertificateErrors/ShowAndPossiblySend/SRTPromptFieldTrial/On/SSLCommonNameMismatchHandling/Enabled/*SafeBrowsingIncidentReportingService/Enabled/SafeBrowsingIncidentReportingServiceFeatures/WithSuspiciousModuleReporting/SafeBrowsingReportPhishingErrorLink/Enabled/SafeBrowsingUpdateFrequency/UpdateTime15m/SafeBrowsingV4LocalDatabaseManagerEnabled/Enabled/SchedulerExpensiveTaskBlocking/Enabled/SdchPersistence/Enabled/*SettingsEnforcement/enforce_always_with_extensions_and_dse/StrictSecureCookies/Enabled/SyncHttpContentCompression/Enabled/TabSyncByRecency/Enabled/*TokenBinding/TokenBinding/*TriggeredResetFieldTrial/On/V8CacheStrategiesForCacheStorage/default/VarationsServiceControl/Interval_30min/WebFontsInterventionV2/Enabled-slow2g/WebRTC-EnableWebRtcEcdsa/Enabled/WebRTC-H264WithOpenH264FFmpeg/Enabled/WebRTC-LocalIPPermissionCheck/Enabled/WebRTC-PeerConnectionDTLS1.2/Enabled/use-new-media-cache/Enabled/ --primordial-pipe-token=7FB82F75FCC3FC7A46AC4842443F0515 --lang=en-US --instant-process --enable-offline-auto-reload --enable-offline-auto-reload-visible-only --blink-settings=disallowFetchForDocWrittenScriptsInMainFrameOnSlowConnections=true --enable-pinch --device-scale-factor=1 --num-raster-threads=2 --enable-main-frame-before-activation --content-image-texture-target=0,0,3553;0,1,3553;0,2,3553;0,3,3553;0,4,3553;0,5,3553;0,6,3553;0,7,3553;0,8,3553;0,9,3553;0,10,3553;0,11,3553;0,12,3553;0,13,3553;0,14,3553;1,0,3553;1,1,3553;1,2,3553;1,3,3553;1,4,3553;1,5,3553;1,6,3553;1,7,3553;1,8,3553;1,9,3553;1,10,3553;1,11,3553;1,12,3553;1,13,3553;1,14,3553;2,0,3553;2,1,3553;2,2,3553;2,3,3553;2,4,3553;2,5,3553;2,6,3553;2,7,3553;2,8,3553;2,9,3553;2,10,3553;2,11,3553;2,12,3553;2,13,3553;2,14,3553;3,0,3553;3,1,3553;3,2,3553;3,3,3553;3,4,3553;3,5,3553;3,6,3553;3,7,3553;3,8,3553;3,9,3553;3,10,3553;3,11,3553;3,12,3553;3,13,3553;3,14,3553 --mojo-application-channel-token=7FB82F75FCC3FC7A46AC4842443F0515 --channel="6952.0.430078330\1234469475" --mojo-platform-channel-handle=2216 /prefetch:1'
DllPath: '< Name not readable >'
Environment: 0000021027a50dd0
ALLUSERSPROFILE=C:\ProgramData
APPDATA=C:\Users\ericlaw\AppData\Roaming
CHROME_CRASHPAD_PIPE_NAME=\\.\pipe\crashpad_8072_QJNCZEAEQMMDUVVM
CHROME_MAIN_TICKS=12509337120
CHROME_PROBED_PROGRAM_FILES_PATH=C:\Program Files (x86)
CHROME_RESTART=Chromium|Whoa! Chromium has crashed. Relaunch now?|LEFT_TO_RIGHT
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
COMPUTERNAME=T460S
ComSpec=C:\WINDOWS\system32\cmd.exe
configsetroot=C:\WINDOWS\ConfigSetRoot
HOMEDRIVE=C:
HOMEPATH=\Users\ericlaw
LOCALAPPDATA=C:\Users\ericlaw\AppData\Local
LOGONSERVER=\\T460S
NUMBER_OF_PROCESSORS=4
OS=Windows_NT
Path=C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\winext\arcade;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Users\ericlaw\.dnx\bin;C:\Program Files\Microsoft DNX\Dnvm\;C:\Program Files\Intel\WiFi\bin\;C:\Program Files\Common Files\Intel\WirelessCommon\;C:\Users\ericlaw\AppData\Local\Microsoft\WindowsApps;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Users\ericlaw\AppData\Local\Microsoft\WindowsApps;C:\Program Files (x86)\Microsoft VS Code\bin
PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
PROCESSOR_ARCHITECTURE=AMD64
PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
PROCESSOR_LEVEL=6
PROCESSOR_REVISION=4e03
ProgramData=C:\ProgramData
ProgramFiles=C:\Program Files
ProgramFiles(x86)=C:\Program Files (x86)
ProgramW6432=C:\Program Files
PSModulePath=C:\Program Files\WindowsPowerShell\Modules;C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules;C:\Program Files\Microsoft Message Analyzer\PowerShell\
PUBLIC=C:\Users\Public
SystemDrive=C:
SystemRoot=C:\WINDOWS
TEMP=C:\Users\ericlaw\AppData\Local\Temp
TMP=C:\Users\ericlaw\AppData\Local\Temp
TVT=C:\Program Files (x86)\Lenovo
USERDOMAIN=T460S
USERDOMAIN_ROAMINGPROFILE=T460S
USERNAME=ericlaw
USERPROFILE=C:\Users\ericlaw
VS140COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\
WINDBG_DIR=C:\Program Files (x86)\Windows Kits\10\Debuggers\x64
windir=C:\WINDOWS
3:033> ~kp
# Child-SP RetAddr Call Site
00 00000075`6bcfe910 00000001`4016118c chrome!sandbox::InterceptionAgent::OnDllLoad(struct _UNICODE_STRING * full_path = 0x00000210`27990880 "\Device\HarddiskVolume3\Windows\System32\kernel32.dll", struct _UNICODE_STRING * name = 0x00000210`27990840 "KERNEL32.dll", void * base_address = 0x00007fff`fced0000)+0x12a [c:\src\c\src\sandbox\win\src\interception_agent.cc @ 116]
01 00000075`6bcfe980 00000001`40141ca8 chrome!TargetNtMapViewOfSection(<function> * orig_MapViewOfSection = 0x00000210`278deb20, void * section = 0x00000000`00000080, void * process = 0xffffffff`ffffffff, void ** base = 0x00000210`27a55290, unsigned int64 zero_bits = 0, unsigned int64 commit_size = 0, union _LARGE_INTEGER * offset = 0x00000000`00000000, unsigned int64 * view_size = 0x00000075`6bcfeb78, unsigned long inherit = 1, unsigned long allocation_type = 0x800000, unsigned long protect = 4)+0x1ac [c:\src\c\src\sandbox\win\src\target_interceptions.cc @ 60]
02 00000075`6bcfea10 00007fff`ff86ddc5 chrome!TargetNtMapViewOfSection64(void * section = 0x00000000`00000080, void * process = 0xffffffff`ffffffff, void ** base = 0x00000210`27a55290, unsigned int64 zero_bits = 0, unsigned int64 commit_size = 0, union _LARGE_INTEGER * offset = 0x00000000`00000000, unsigned int64 * view_size = 0x00000075`6bcfeb78, unsigned long inherit = 1, unsigned long allocation_type = 0x800000, unsigned long protect = 4)+0xa8 [c:\src\c\src\sandbox\win\src\interceptors_64.cc @ 34]
03 00000075`6bcfea90 00007fff`ff86da52 ntdll!LdrpMapViewOfSection+0xb5
04 00000075`6bcfeb30 00007fff`ff86d925 ntdll!LdrpMapImage+0x72
05 00000075`6bcfebd0 00007fff`ff86d47e ntdll!LdrpMapDllWithSectionHandle+0x2d
06 00000075`6bcfec10 00007fff`ff86d236 ntdll!LdrpLoadKnownDll+0xe6
07 00000075`6bcfec70 00007fff`ff886a5c ntdll!LdrpFindOrPrepareLoadingModule+0xa6
08 00000075`6bcfecd0 00007fff`ff88651d ntdll!LdrpLoadDllInternal+0x110
09 00000075`6bcfed50 00007fff`ff869efc ntdll!LdrpLoadDll+0xf1
0a 00000075`6bcfeef0 00007fff`ff8f1899 ntdll!LdrLoadDll+0x8c
0b 00000075`6bcfeff0 00007fff`ff927af4 ntdll!LdrpInitializeProcess+0x1669
0c 00000075`6bcff3f0 00007fff`ff8d8d5e ntdll!_LdrpInitialize+0x4ed40
0d 00000075`6bcff470 00000000`00000000 ntdll!LdrInitializeThunk+0xe
,
Aug 17 2016
64-bit canary runs fine on my Windows 10 anniversary update. There must be something specific to this machine configuration. Is it running AV or other 3rd party software? Can you !chkimg -d ntdll to see if anything else is hooking? Are you using any (non standard, set up manually, not default) junction point configuration on your user data dir or program files?
,
Aug 17 2016
So with some data from elawrence@ I've at least tracked the cause of what's causing his chrome to fail. It seems that there's a bug in the hooking code on 64bit when it tries to allocate a trampoline structure within 4GiB of the DLL's base address. Reference to code is https://cs.chromium.org/chromium/src/sandbox/win/src/sandbox_nt_util.cc?rcl=0&l=26 Basically the problem (in general) occurs if a DLL we want to hook is within 1GiB of the end of the user virtual memory address range. The code in question tries to allocate some buffer using NtAllocateVirtualMemory which is at least 1GiB above the base address of the DLL. If we're within 1GiB of the end address this _will_ fail. It will try and increment the base by 100MiB 40 times but that will also fail (we're still above the user limit). So after 40 tries it goes to the last ditch effort and tries to just allocate somewhere from the top down. Unfortunately there's two bugs in this, firstly it sets base to NULL, but then increments base by 100MiB meaning that the memory allocation will still try and allocate at that base address. This can never work as if it can allocate at address 100MiB this is unlikely to be within the DLL range (from what I can tell specifying a fixed address will override the MEM_TOP_DOWN flag). But secondly it never even tries, the loop stops when attempts < 41 but we set top down flag at attempt 40, so it exits immediately after setting the flag but before it actually tries to allocate memory. Ultimately the problem here is two fold, firstly core DLLs are loaded to very high addresses by default so even if it's not within 1GiB where it must fail if it can't find a valid address within any of the 100MiB windows (not impossible) it will fail anyway. Secondly this isn't easily reproducible because effectively it's random as the base address of any DLL is dependent on where ASLR puts it on each reboot. This is why in repro it starts after a reboot and is fixed by another reboot as those events will randomize the DLL locations. It might be that on anniversary edition the DLL locations are more random, or higher up, but this could happen randomly to any user of chrome 64 bit. However on 32 bit the allocation strategy is almost guaranteed to work which explains the discrepancy. From a fixing PoV I'd suggest we always try top down allocation (with NULL base address) if the DLL address is within 4GiB of the end, there's no point doing otherwise. Also 100MiB granularity is probably a bad way to go, it would make more sense the query the virtual allocated regions to find the next free region from where we are rather than relying on brute force. Now this of course might increase start up time, but probably not to a great extent as you can quickly skip already allocated regions. Also I'm not 100% sure the 4GiB window is correct. Would have to double check the trampoline code, but assuming we're patching in a relative jump that means that the address must be within +/- 2GiB (as jmp is signed) and not +4GiB.
,
Aug 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/07bf777cebc18782eb83b2a5eda89110d9e58274 commit 07bf777cebc18782eb83b2a5eda89110d9e58274 Author: forshaw <forshaw@chromium.org> Date: Sat Aug 20 18:25:46 2016 Reimplement AllocateNearTo for 64bit. This CL reimplements AllocateNearTo on 64bit so that it searches for a free memory range rather than the naive brute force approach we used before. BUG= 604149 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win10_chromium_x64_rel_ng Review-Url: https://codereview.chromium.org/2258583002 Cr-Commit-Position: refs/heads/master@{#413344} [modify] https://crrev.com/07bf777cebc18782eb83b2a5eda89110d9e58274/sandbox/win/src/sandbox_nt_util.cc [modify] https://crrev.com/07bf777cebc18782eb83b2a5eda89110d9e58274/sandbox/win/src/sandbox_nt_util_unittest.cc
,
Aug 23 2016
Verified the fix with Chromium 54.0.2837.0 (64-bit) Verified the crash in Chromium 54.0.2826.0 (64-bit) Verified the crash in Chrome 54.0.2832.2 (64-bit) I wrote a tool (https://bayden.com/dl/printmoduleaddresses.exe) which will reboot Windows until one of the DLLs in question (user32.dll, kernel32.dll, gdi32.dll) is within 100mb of the top of the user virtual address space. Enable automatic login to the user account for this to work; it usually runs for a few hours. Run by creating a shortcut in your startup group pointed at "printmoduleaddress.exe whatever.dll reboot". = Process is 64-bit ================== User-addr space top: 0x7fffffffffff Thunkarea is 0x000006400000 This Executable is at 0x027daf980000 gdi32.dll at 0x7ffffa8c0000; 0x573ffff (87mb) to top of user virtual address space Candidate thunk 0x800000cc0000 ^^^^^^^^^ WARNING: Above user virtual address space!! ^^^^^^^^^^^^^ user32.dll at 0x7ffffb000000; 0x4ffffff (79mb) to top of user virtual address space Candidate thunk 0x800001400000 ^^^^^^^^^ WARNING: Above user virtual address space!! ^^^^^^^^^^^^^ kernel32.dll at 0x7ffffa010000; 0x5feffff (95mb) to top of user virtual address space Candidate thunk 0x800000410000 ^^^^^^^^^ WARNING: Above user virtual address space!! ^^^^^^^^^^^^^ PRIOR TO THE FIX: After 30 failed attempts to grow upward beyond the 0x7fffffffffff top of user-address space, the code would fall back to trying to allocate from the base+100mb, adding 100 for the 9 subsequent attempts. If this failed, the allocator would give up and fail the allocation. Surprisingly, thunks could sometimes succeed with under 100mb between the target and the top of the address space; for instance, with kernel32.dll at 0x7ffffa4b0000, with only 0x5b4ffff bytes remaining, the thunk was successfully placed. AFTER THE FIX: Everything works great, although I do worry that I'm not aware of any reason why Windows couldn't relocate any of these DLLS to the VERY TOP of the address range, such that there's no free 4KB page after the module to inject our thunks. Would it make sense for the new code to, upon failure to find an open slot, try just a plain AllocateVirtualMemory with the TOP_DOWN flag set?
,
Aug 23 2016
Nice elawrence@. Strictly speaking the problem is the original code only checked in 100MiB jumps. So it will only succeed if the free memory block happens to fall on a 100MiB boundary. I was therefore slightly wrong in my assessment in that it could fail even if DLLs are not allocated very high. But normally if Windows hasn't randomized the DLLs up the top of memory then there would almost certainly be a large free area 1GiB above the DLLs so it would generally succeed. Of course what would have probably hidden the issue was the use of the TOP_DOWN flag, but due to a bug that code path never got called anyway. Not sure in this case that calling with the TOP_DOWN flag would help too much as the new code searches for a valid free area somewhere above the source. If it doesn't find a free location TOP_DOWN presumably wouldn't either. It is limited to 2GiB above the base as well, but I'd have to double check with the hooking code whether it can really be up to 4GiB above or (as I suspect) it's really +/- 2GiB.
,
Aug 23 2016
I believe you're right that the limit is +/- 2gb based on the JMP instruction (https://cs.chromium.org/chromium/src/sandbox/win/src/resolver_64.cc?q=JMP+sandbox&sq=package:chromium&dr=CSs&l=23) My concern with the new fix is the scenario where we've got memory layout like so: [TopOfVASpace] [Kernel32.dll] [Free pages (A)] [Other DLLS] [137TB of free pages (B)] In this case, the search for free space between [kernel32] and [TopofVASpace] would return no pages. But if we did a blind allocation with TOP_DOWN, presumably the default allocator would find space within (A) or (B), and potentially that space would be within 2GB (below) the DLL to be thunked.
,
Aug 23 2016
After a bit of digging it looks like the original code is correct in that it's allowed to be +4GiB above the source. This is because it looks like the DLL patches (outside of inline hooking which is only used for system calls) hooks the EAT. As the EAT only contains RVAs this means that the location of the trampoline can at most be +4GiB above the base address. I could change it to meet this requirement, would be a simple change though being conservative might be a good idea. Unfortunately if a DLL load hits against the top of VA with no free blocks between it then this code will fail (but we're no worse off than we were before). In that situation I guess we could try and scavenge a memory location from somewhere, such as unallocated data at the end of the .data section or reserved but not committed memory, but that starts to get complicated and also risky. Perhaps we need all interested parties should sit around and think if there's a better way to do this in the future. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by scottmg@chromium.org
, Apr 16 2016