New issue
Advanced search Search tips

Issue 604149 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Apr 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

Navigation fails on Windows 10 14316 unless --no-sandbox set

Project Member Reported by elawrence@chromium.org, Apr 16 2016

Issue description

Version: 
52.0.2710.0 (Official Build) canary (64-bit)
51.0.2704.7 (Official Build) dev-m (64-bit)

OS: Windows 10 14316.1000

What steps will reproduce the problem?
(1) Launch Chrome
(2) Attempt to navigate anywhere

What is the expected output?

Navigation occurs.

What do you see instead?

Spinner spins for a while, then Non-Responsive Page dialog box appears. Clicking "Kill" has no effect. Navigating to chrome:// URLs has the same effect.

Passing --no-sandbox causes pages to load without errors.

chrome://crashes is does not show any crashes this month. I don't have a "%localappdata%\google\chrome\user data\Crash reports" folder; "%localappdata%\google\chrome\user data\crashpad\reports" contains 4 .dmp files created last month.

Reinstalling the latest Canary did not resolve the problem.

 
Cc: scottmg@chromium.org
Are you by chance running as Admin/elevated?

Comment 2 by bay...@gmail.com, Apr 16 2016

I'm an admin user, but I was not elevated and UAC is on.

I've attached a Process Explorer screenshot showing that the primary process is a Medium IL process as expected.

Upon further review, I believe this is a duplicate of a problem I previously blogged about (coincidentally, 62 months ago today):

https://blogs.msdn.microsoft.com/ieinternals/2011/02/16/ie9-no-reboot-setup-and-the-windows-restart-manager/

Basically, the problem seems to be that when my system restarted, it did so using Windows' Restart API; the one which uses Job Objects. (I had run Lenovo's Firmware update utility which triggers a restart).

As a consequence, after restart, the outer Chrome process is running in a Job, as is Windows Explorer. Chrome fails to successfully run its sandboxed render processes if the parent process is run from a job.

After performing a full system (non-hybrid) restart using "shutdown.exe -r", Chrome51/52 now work properly.

If my suspicions are correct and Chrome does not run properly in this scenario, we may want to investigate reporting the issue to the user if we cannot work around it ourselves. (Notably, Edge worked fine when the system was in this state).
NonNavigatingChromeProcesses.png
17.5 KB View Download
ChromeInAJob.png
39.3 KB View Download
ExplorerInAJob.png
33.7 KB View Download

Comment 3 by wfh@chromium.org, Apr 17 2016

can you post detailed reproduction steps to get Windows into the state you describe above with explorer running in a Job object?
The specific thing I installed that triggered the reboot was https://download.lenovo.com/pccbbs/mobiles/n1cuj04w.exe

Looking at Process Explorer now in the "Working" case, I see both Explorer.exe and Chrome.exe are running in Job Objects, so perhaps this is just something that Windows 10 does now? While the behavior cited in the 2011 blog is the same (top-level window appears, but no tabs can navigate), in that case the Job Object didn't have the "Breakaway ok" flag which is clearly seen in the screenshot of the "broken" case above.

Notably, in Windows 10 14295, I do not see the job objects for either Explorer or Chrome.exe's MediumIL process.

Comment 5 by wfh@chromium.org, Apr 18 2016

Cc: forshaw@chromium.org jsc...@chromium.org
I tried 14316 and I do not see any processes being put in a Job object. Even if Windows does, as you say, it's okay as long as it's not marked no-breakaway, since Win8+ supports nested job objects and Chrome is fine to nest its sandbox Job objects inside other Jobs.

I'd be interested in knowing what configuration causes explorer and Chrome's broker process to end up in a Job. It might be interesting to UMA and see how often this is actually happening.

Comment 6 by wfh@chromium.org, Apr 18 2016

Status: WontFix (was: Untriaged)
Job objects are used for Connected Standby, so this is likely happening because you have an AOAC compliant device.

As said in #5 this is fine for Chrome, as Windows 8+ supports nested job objects and we live happily inside the job.

I would be interested in your could come up with some reproduction steps for getting Explorer/Chrome inside a Job object that does is marked no-breakaway. This certainly should not happen as other things are likely to break in the OS if that flag is set on the Job.

Until then I think this is WAI.
In the affected state, Chrome was not able to load tabs to any origin, including built-in URLs.

It's entirely possible that all talk of Job Objects, etc, is unrelated to the problem at hand, but that doesn't mean that Chrome is working as intended, only that my suspicions about the root cause of the bug are unfounded.

Are there better debugging steps I can undertake to help root-cause this properly if it were to reproduce again in the future?

Comment 8 by wfh@chromium.org, Apr 18 2016

If it's not possible to provide reproduction steps to help to repro this in the lab, then it'll be down to you to build your own copy of Chromium and then step through the sandbox initialization code in the debugger.

You could try using --allow-no-sandbox-job instead of --no-sandbox - if Chrome renderers start fine then it's certainly something to do with the Job object. If --allow-no-sandbox-job still prevents renderers from launching, but --no-sandbox works, then it's something else that has to be debugged.

It's certainly working as intended that if Chrome browser process is running inside a Job with no-breakaway or no support for nested jobs, it should not be able to start any renderer processes.
https://twitter.com/anatudor/status/762725107810889728 appears to be an in-the-wild report of the same symptoms.
I was able to reproduce this on Windows 10 1607 14393.67 (the anniversary update).

Windows restarted for an update and after restart, 64-bit Chrome 54.0.2824.0 dev-m and 64-bit 54.0.2831.0 Canary are unable to start render processes unless the --no-sandbox flag is passed. 

Additionally, the Brave browser is unable to start its renderers, while Vivaldi and Opera have no problem. Further investigation reveals that Vivaldi and Opera are 32bit while Chrome and Brave are 64 bit. If I swap my 64-bit Canary for a 32-bit Canary, it is able to start its render processes without issue.

The "broken" 64-bit Chrome does successfully launch two subprocesses, one of type=crashpad-handler and one of type=gpu-process.
I've not been able to repro. Let anniversary edition install updates with 54.0.2831.0 running at the time. On reboot application restarted with no problems. In my case neither explorer or chrome (main medium IL process) are running in a Job. elawrence@ did you try the --allow-no-sandbox-job flag as well?
Okay, Chromium 64-bit repros, so I can play with this a bit.

When I run Chromium rather than Chrome, I don't see the GPU process in Task Manager, and I get spew in the system debug console:

[9220] [9220:804:0817/102133:ERROR:browser_gpu_channel_host_factory.cc(113)] Failed to launch GPU process.

If I run chromium --disable-gpu-sandbox, I see the GPU process in Task Manager but pages still fail to load with the same error.

When I run under the debugger, I see debug breaks in InterceptionAgent::OnDllLoad for name=kernel32.dll and user32.dll and gdi32.dll before the process is terminated.

3:033> !peb
PEB at 000000756ba50000
    InheritedAddressSpace:    No
    ReadImageFileExecOptions: No
    BeingDebugged:            Yes
    ImageBaseAddress:         0000000140000000
    Ldr                       00007fffff9b23a0
    Ldr.Initialized:          Yes
    Ldr.InInitializationOrderModuleList: 0000021027a54c60 . 0000021027a54c60
    Ldr.InLoadOrderModuleList:           0000021027a54dd0 . 0000021027a55260
    Ldr.InMemoryOrderModuleList:         0000021027a54de0 . 0000021027a55270
            Base TimeStamp                     Module
       140000000 57abad51 Aug 10 17:40:17 2016 C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe
    7fffff860000 578997b2 Jul 15 21:10:58 2016 C:\WINDOWS\SYSTEM32\ntdll.dll
    7ffffced0000 00000000 Dec 31 18:00:00 1969 C:\WINDOWS\System32\KERNEL32.DLL
    SubSystemData:     0000000000000000
    ProcessHeap:       0000021027a50000
    ProcessParameters: 0000021027a51f30
    CurrentDirectory:  'C:\Users\ericlaw\AppData\Local\Chromium\Application\54.0.2826.0\'
    WindowTitle:  'C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe'
    ImageFile:    'C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe'
    CommandLine:  '"C:\Users\ericlaw\AppData\Local\Chromium\Application\chrome.exe" --type=renderer --enable-features=AutomaticTabDiscarding<AutomaticTabDiscarding,ExpectCTReporting<ExpectCTReporting,IncidentReportingDisableUpload<SafeBrowsingIncidentReportingService,IncidentReportingModuleLoadAnalysis<SafeBrowsingIncidentReportingServiceFeatures,IncidentReportingSuspiciousModuleReporting<SafeBrowsingIncidentReportingServiceFeatures,MainFrameBeforeActivation<MainFrameBeforeActivation,NetworkTimeServiceQuerying<NetworkTimeQueries,NewAudioRenderingMixingStrategy<NewAudioRenderingMixingStrategy,NonValidatingReloadOnNormalReload<NonValidatingReloadOnNormalReload,PassiveDocumentEventListeners<PassiveDocumentEventListeners,PointerEvent<PointerEvent,PreconnectMore<PreconnectMore,UsePasswordSeparatedSigninFlow<PasswordSeparatedSigninFlow,WebRTC-EnableWebRtcEcdsa<WebRTC-EnableWebRtcEcdsa,WebRTC-H264WithOpenH264FFmpeg<WebRTC-H264WithOpenH264FFmpeg,token-binding<TokenBinding,use-new-media-cache<use-new-media-cache --disable-features=DocumentWriteEvaluator<DisallowFetchForDocWrittenScriptsInMainFrame --force-fieldtrials=AutofillClassifier/Enabled/AutofillFieldMetadata/Enabled/AutofillProfileOrderByFrecency/EnabledLimitTo3/*AutomaticTabDiscarding/Enabled_Once_10-gen2/BlockSmallPluginContent/Enabled/BrowserBlacklist/Enabled/CaptivePortalInterstitial/Enabled/ChildAccountDetection/Disabled/ChromeDashboard/Enabled/ChromotingQUIC/Enabled/DefaultBrowserInfobar/SettingsTextNotNow/DisallowFetchForDocWrittenScriptsInMainFrame/DocumentWriteScriptBlockGroup/EnableGoogleCachedCopyTextExperiment/Button/*EnableMediaRouter/Enabled/EnableMediaRouterWithCastExtension/Enabled/EnableSessionCrashedBubbleUI/Enabled/ExpectCTReporting/ExpectCTReportingEnabled/ExtensionActionRedesign/Enabled/*ExtensionContentVerification/Enforce/ExtensionInstallVerification/Enforce/GoogleBrandedContextMenu/branded/GoogleNow/Enable/*IconNTP/Default/InstanceID/Enabled/IntelligentSessionRestore/Enabled/MainFrameBeforeActivation/Enabled/MaterialDesignDownloads/Enabled/MojoChannel/Enabled/*NetworkQualityEstimator/Enabled/NetworkTimeQueries/NetworkTimeQueriesEnabled/NewAudioRenderingMixingStrategy/Enabled/*NewProfileManagement/Enabled/NonValidatingReloadOnNormalReload/Enabled/OfferUploadCreditCards/Enabled/OutOfProcessPac/Enabled/*PageRevisitInstrumentation/Enabled/PassiveDocumentEventListeners/Enabled/PasswordBranding/SmartLockBrandingSavePromptOnly/PasswordGeneration/Disabled/*PasswordManagerSettingsMigration/Enable/PasswordSeparatedSigninFlow/Enabled/PasswordSmartBubble/3-Times/PointerEvent/Enabled/PreRead/NoPrefetchArgument2/PreconnectMore/Enabled/*QUIC/Enabled/RefreshTokenDeviceId/Enabled/ReportCertificateErrors/ShowAndPossiblySend/SRTPromptFieldTrial/On/SSLCommonNameMismatchHandling/Enabled/*SafeBrowsingIncidentReportingService/Enabled/SafeBrowsingIncidentReportingServiceFeatures/WithSuspiciousModuleReporting/SafeBrowsingReportPhishingErrorLink/Enabled/SafeBrowsingUpdateFrequency/UpdateTime15m/SafeBrowsingV4LocalDatabaseManagerEnabled/Enabled/SchedulerExpensiveTaskBlocking/Enabled/SdchPersistence/Enabled/*SettingsEnforcement/enforce_always_with_extensions_and_dse/StrictSecureCookies/Enabled/SyncHttpContentCompression/Enabled/TabSyncByRecency/Enabled/*TokenBinding/TokenBinding/*TriggeredResetFieldTrial/On/V8CacheStrategiesForCacheStorage/default/VarationsServiceControl/Interval_30min/WebFontsInterventionV2/Enabled-slow2g/WebRTC-EnableWebRtcEcdsa/Enabled/WebRTC-H264WithOpenH264FFmpeg/Enabled/WebRTC-LocalIPPermissionCheck/Enabled/WebRTC-PeerConnectionDTLS1.2/Enabled/use-new-media-cache/Enabled/ --primordial-pipe-token=7FB82F75FCC3FC7A46AC4842443F0515 --lang=en-US --instant-process --enable-offline-auto-reload --enable-offline-auto-reload-visible-only --blink-settings=disallowFetchForDocWrittenScriptsInMainFrameOnSlowConnections=true --enable-pinch --device-scale-factor=1 --num-raster-threads=2 --enable-main-frame-before-activation --content-image-texture-target=0,0,3553;0,1,3553;0,2,3553;0,3,3553;0,4,3553;0,5,3553;0,6,3553;0,7,3553;0,8,3553;0,9,3553;0,10,3553;0,11,3553;0,12,3553;0,13,3553;0,14,3553;1,0,3553;1,1,3553;1,2,3553;1,3,3553;1,4,3553;1,5,3553;1,6,3553;1,7,3553;1,8,3553;1,9,3553;1,10,3553;1,11,3553;1,12,3553;1,13,3553;1,14,3553;2,0,3553;2,1,3553;2,2,3553;2,3,3553;2,4,3553;2,5,3553;2,6,3553;2,7,3553;2,8,3553;2,9,3553;2,10,3553;2,11,3553;2,12,3553;2,13,3553;2,14,3553;3,0,3553;3,1,3553;3,2,3553;3,3,3553;3,4,3553;3,5,3553;3,6,3553;3,7,3553;3,8,3553;3,9,3553;3,10,3553;3,11,3553;3,12,3553;3,13,3553;3,14,3553 --mojo-application-channel-token=7FB82F75FCC3FC7A46AC4842443F0515 --channel="6952.0.430078330\1234469475" --mojo-platform-channel-handle=2216 /prefetch:1'
    DllPath:      '< Name not readable >'
    Environment:  0000021027a50dd0
        ALLUSERSPROFILE=C:\ProgramData
        APPDATA=C:\Users\ericlaw\AppData\Roaming
        CHROME_CRASHPAD_PIPE_NAME=\\.\pipe\crashpad_8072_QJNCZEAEQMMDUVVM
        CHROME_MAIN_TICKS=12509337120
        CHROME_PROBED_PROGRAM_FILES_PATH=C:\Program Files (x86)
        CHROME_RESTART=Chromium|Whoa! Chromium has crashed. Relaunch now?|LEFT_TO_RIGHT
        CommonProgramFiles=C:\Program Files\Common Files
        CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
        CommonProgramW6432=C:\Program Files\Common Files
        COMPUTERNAME=T460S
        ComSpec=C:\WINDOWS\system32\cmd.exe
        configsetroot=C:\WINDOWS\ConfigSetRoot
        HOMEDRIVE=C:
        HOMEPATH=\Users\ericlaw
        LOCALAPPDATA=C:\Users\ericlaw\AppData\Local
        LOGONSERVER=\\T460S
        NUMBER_OF_PROCESSORS=4
        OS=Windows_NT
        Path=C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\winext\arcade;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Users\ericlaw\.dnx\bin;C:\Program Files\Microsoft DNX\Dnvm\;C:\Program Files\Intel\WiFi\bin\;C:\Program Files\Common Files\Intel\WirelessCommon\;C:\Users\ericlaw\AppData\Local\Microsoft\WindowsApps;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Users\ericlaw\AppData\Local\Microsoft\WindowsApps;C:\Program Files (x86)\Microsoft VS Code\bin
        PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
        PROCESSOR_ARCHITECTURE=AMD64
        PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
        PROCESSOR_LEVEL=6
        PROCESSOR_REVISION=4e03
        ProgramData=C:\ProgramData
        ProgramFiles=C:\Program Files
        ProgramFiles(x86)=C:\Program Files (x86)
        ProgramW6432=C:\Program Files
        PSModulePath=C:\Program Files\WindowsPowerShell\Modules;C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules;C:\Program Files\Microsoft Message Analyzer\PowerShell\
        PUBLIC=C:\Users\Public
        SystemDrive=C:
        SystemRoot=C:\WINDOWS
        TEMP=C:\Users\ericlaw\AppData\Local\Temp
        TMP=C:\Users\ericlaw\AppData\Local\Temp
        TVT=C:\Program Files (x86)\Lenovo
        USERDOMAIN=T460S
        USERDOMAIN_ROAMINGPROFILE=T460S
        USERNAME=ericlaw
        USERPROFILE=C:\Users\ericlaw
        VS140COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\
        WINDBG_DIR=C:\Program Files (x86)\Windows Kits\10\Debuggers\x64
        windir=C:\WINDOWS
3:033> ~kp
 # Child-SP          RetAddr           Call Site
00 00000075`6bcfe910 00000001`4016118c chrome!sandbox::InterceptionAgent::OnDllLoad(struct _UNICODE_STRING * full_path = 0x00000210`27990880 "\Device\HarddiskVolume3\Windows\System32\kernel32.dll", struct _UNICODE_STRING * name = 0x00000210`27990840 "KERNEL32.dll", void * base_address = 0x00007fff`fced0000)+0x12a [c:\src\c\src\sandbox\win\src\interception_agent.cc @ 116]
01 00000075`6bcfe980 00000001`40141ca8 chrome!TargetNtMapViewOfSection(<function> * orig_MapViewOfSection = 0x00000210`278deb20, void * section = 0x00000000`00000080, void * process = 0xffffffff`ffffffff, void ** base = 0x00000210`27a55290, unsigned int64 zero_bits = 0, unsigned int64 commit_size = 0, union _LARGE_INTEGER * offset = 0x00000000`00000000, unsigned int64 * view_size = 0x00000075`6bcfeb78, unsigned long inherit = 1, unsigned long allocation_type = 0x800000, unsigned long protect = 4)+0x1ac [c:\src\c\src\sandbox\win\src\target_interceptions.cc @ 60]
02 00000075`6bcfea10 00007fff`ff86ddc5 chrome!TargetNtMapViewOfSection64(void * section = 0x00000000`00000080, void * process = 0xffffffff`ffffffff, void ** base = 0x00000210`27a55290, unsigned int64 zero_bits = 0, unsigned int64 commit_size = 0, union _LARGE_INTEGER * offset = 0x00000000`00000000, unsigned int64 * view_size = 0x00000075`6bcfeb78, unsigned long inherit = 1, unsigned long allocation_type = 0x800000, unsigned long protect = 4)+0xa8 [c:\src\c\src\sandbox\win\src\interceptors_64.cc @ 34]
03 00000075`6bcfea90 00007fff`ff86da52 ntdll!LdrpMapViewOfSection+0xb5
04 00000075`6bcfeb30 00007fff`ff86d925 ntdll!LdrpMapImage+0x72
05 00000075`6bcfebd0 00007fff`ff86d47e ntdll!LdrpMapDllWithSectionHandle+0x2d
06 00000075`6bcfec10 00007fff`ff86d236 ntdll!LdrpLoadKnownDll+0xe6
07 00000075`6bcfec70 00007fff`ff886a5c ntdll!LdrpFindOrPrepareLoadingModule+0xa6
08 00000075`6bcfecd0 00007fff`ff88651d ntdll!LdrpLoadDllInternal+0x110
09 00000075`6bcfed50 00007fff`ff869efc ntdll!LdrpLoadDll+0xf1
0a 00000075`6bcfeef0 00007fff`ff8f1899 ntdll!LdrLoadDll+0x8c
0b 00000075`6bcfeff0 00007fff`ff927af4 ntdll!LdrpInitializeProcess+0x1669
0c 00000075`6bcff3f0 00007fff`ff8d8d5e ntdll!_LdrpInitialize+0x4ed40
0d 00000075`6bcff470 00000000`00000000 ntdll!LdrInitializeThunk+0xe

Comment 13 by wfh@chromium.org, Aug 17 2016

Cc: wfh@chromium.org
64-bit canary runs fine on my Windows 10 anniversary update. There must be something specific to this machine configuration. Is it running AV or other 3rd party software? Can you !chkimg -d ntdll to see if anything else is hooking? Are you using any (non standard, set up manually, not default) junction point configuration on your user data dir or program files?
So with some data from elawrence@ I've at least tracked the cause of what's causing his chrome to fail. 

It seems that there's a bug in the hooking code on 64bit when it tries to allocate a trampoline structure within 4GiB of the DLL's base address. Reference to code is https://cs.chromium.org/chromium/src/sandbox/win/src/sandbox_nt_util.cc?rcl=0&l=26

Basically the problem (in general) occurs if a DLL we want to hook is within 1GiB of the end of the user virtual memory address range. The code in question tries to allocate some buffer using NtAllocateVirtualMemory which is at least 1GiB above the base address of the DLL. If we're within 1GiB of the end address this _will_ fail. It will try and increment the base by 100MiB 40 times but that will also fail (we're still above the user limit). So after 40 tries it goes to the last ditch effort and tries to just allocate somewhere from the top down. Unfortunately there's two bugs in this, firstly it sets base to NULL, but then increments base by 100MiB meaning that the memory allocation will still try and allocate at that base address. This can never work as if it can allocate at address 100MiB this is unlikely to be within the DLL range (from what I can tell specifying a fixed address will override the MEM_TOP_DOWN flag). But secondly it never even tries, the loop stops when attempts < 41 but we set top down flag at attempt 40, so it exits immediately after setting the flag but before it actually tries to allocate memory.

Ultimately the problem here is two fold, firstly core DLLs are loaded to very high addresses by default so even if it's not within 1GiB where it must fail if it can't find a valid address within any of the 100MiB windows (not impossible) it will fail anyway. Secondly this isn't easily reproducible because effectively it's random as the base address of any DLL is dependent on where ASLR puts it on each reboot. This is why in repro it starts after a reboot and is fixed by another reboot as those events will randomize the DLL locations. It might be that on anniversary edition the DLL locations are more random, or higher up, but this could happen randomly to any user of chrome 64 bit. However on 32 bit the allocation strategy is almost guaranteed to work which explains the discrepancy. 

From a fixing PoV I'd suggest we always try top down allocation (with NULL base address) if the DLL address is within 4GiB of the end, there's no point doing otherwise. Also 100MiB granularity is probably a bad way to go, it would make more sense the query the virtual allocated regions to find the next free region from where we are rather than relying on brute force. Now this of course might increase start up time, but probably not to a great extent as you can quickly skip already allocated regions. Also I'm not 100% sure the 4GiB window is correct. Would have to double check the trampoline code, but assuming we're patching in a relative jump that means that the address must be within +/- 2GiB (as jmp is signed) and not +4GiB.
Project Member

Comment 15 by bugdroid1@chromium.org, Aug 20 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/07bf777cebc18782eb83b2a5eda89110d9e58274

commit 07bf777cebc18782eb83b2a5eda89110d9e58274
Author: forshaw <forshaw@chromium.org>
Date: Sat Aug 20 18:25:46 2016

Reimplement AllocateNearTo for 64bit.
This CL reimplements AllocateNearTo on 64bit so that it searches for a
free memory range rather than the naive brute force approach we used
before.

BUG= 604149 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win10_chromium_x64_rel_ng

Review-Url: https://codereview.chromium.org/2258583002
Cr-Commit-Position: refs/heads/master@{#413344}

[modify] https://crrev.com/07bf777cebc18782eb83b2a5eda89110d9e58274/sandbox/win/src/sandbox_nt_util.cc
[modify] https://crrev.com/07bf777cebc18782eb83b2a5eda89110d9e58274/sandbox/win/src/sandbox_nt_util_unittest.cc

Status: Fixed (was: WontFix)
Verified the fix with Chromium 54.0.2837.0 (64-bit)
Verified the crash in Chromium 54.0.2826.0 (64-bit)
Verified the crash in Chrome 54.0.2832.2 (64-bit)

I wrote a tool (https://bayden.com/dl/printmoduleaddresses.exe) which will reboot Windows until one of the DLLs in question (user32.dll, kernel32.dll, gdi32.dll) is within 100mb of the top of the user virtual address space. Enable automatic login to the user account for this to work; it usually runs for a few hours. Run by creating a shortcut in your startup group pointed at "printmoduleaddress.exe whatever.dll reboot".


= Process is 64-bit ==================
User-addr space top:    0x7fffffffffff
Thunkarea is            0x000006400000
This Executable is at   0x027daf980000

gdi32.dll        at     0x7ffffa8c0000; 0x573ffff (87mb) to top of user virtual address space
Candidate thunk         0x800000cc0000
^^^^^^^^^ WARNING: Above user virtual address space!! ^^^^^^^^^^^^^

user32.dll       at     0x7ffffb000000; 0x4ffffff (79mb) to top of user virtual address space
Candidate thunk         0x800001400000
^^^^^^^^^ WARNING: Above user virtual address space!! ^^^^^^^^^^^^^

kernel32.dll     at     0x7ffffa010000; 0x5feffff (95mb) to top of user virtual address space
Candidate thunk         0x800000410000
^^^^^^^^^ WARNING: Above user virtual address space!! ^^^^^^^^^^^^^

PRIOR TO THE FIX:
After 30 failed attempts to grow upward beyond the 0x7fffffffffff top of user-address space, the code
would fall back to trying to allocate from the base+100mb, adding 100 for the 9 subsequent attempts. 
If this failed, the allocator would give up and fail the allocation. 

Surprisingly, thunks could sometimes succeed with under 100mb between the target and the top of the 
address space; for instance, with kernel32.dll at 0x7ffffa4b0000, with only 0x5b4ffff bytes remaining,
the thunk was successfully placed. 

AFTER THE FIX:
Everything works great, although I do worry that I'm not aware of any reason why Windows couldn't relocate any of these DLLS to the VERY TOP of the address range, such that there's no free 4KB page after the module to inject our thunks.

Would it make sense for the new code to, upon failure to find an open slot, try just a plain AllocateVirtualMemory with the TOP_DOWN flag set?
Nice elawrence@.

Strictly speaking the problem is the original code only checked in 100MiB jumps. So it will only succeed if the free memory block happens to fall on a 100MiB boundary. I was therefore slightly wrong in my assessment in that it could fail even if DLLs are not allocated very high. But normally if Windows hasn't randomized the DLLs up the top of memory then there would almost certainly be a large free area 1GiB above the DLLs so it would generally succeed. 

Of course what would have probably hidden the issue was the use of the TOP_DOWN flag, but due to a bug that code path never got called anyway. Not sure in this case that calling with the TOP_DOWN flag would help too much as the new code searches for a valid free area somewhere above the source. If it doesn't find a free location TOP_DOWN presumably wouldn't either. It is limited to 2GiB above the base as well, but I'd have to double check with the hooking code whether it can really be up to 4GiB above or (as I suspect) it's really +/- 2GiB.
I believe you're right that the limit is +/- 2gb based on the JMP instruction (https://cs.chromium.org/chromium/src/sandbox/win/src/resolver_64.cc?q=JMP+sandbox&sq=package:chromium&dr=CSs&l=23)

My concern with the new fix is the scenario where we've got memory layout like so:

   [TopOfVASpace]
   [Kernel32.dll]
   [Free pages (A)]
   [Other DLLS]
   [137TB of free pages (B)]

In this case, the search for free space between [kernel32] and [TopofVASpace] would return no pages. But if we did a blind allocation with TOP_DOWN, presumably the default allocator would find space within (A) or (B), and potentially that space would be within 2GB (below) the DLL to be thunked. 
After a bit of digging it looks like the original code is correct in that it's allowed to be +4GiB above the source. This is because it looks like the DLL patches (outside of inline hooking which is only used for system calls) hooks the EAT. As the EAT only contains RVAs this means that the location of the trampoline can at most be +4GiB above the base address. 

I could change it to meet this requirement, would be a simple change though being conservative might be a good idea. Unfortunately if a DLL load hits against the top of VA with no free blocks between it then this code will fail (but we're no worse off than we were before). In that situation I guess we could try and scavenge a memory location from somewhere, such as unallocated data at the end of the .data section or reserved but not committed memory, but that starts to get complicated and also risky. 

Perhaps we need all interested parties should sit around and think if there's a better way to do this in the future.

Sign in to add a comment