consider debugging target process in sandbox |
|||
Issue descriptionRunning sandbox target process under debugger using DEBUG_PROCESS and DEBUG_ONLY_THIS_PROCESS will allow us fine grained control of catching any/all exceptions including those such as fastfail, which we are currently unable to catch (see issue 865632). It might be worth prototypying an option in sandbox to allow this, to measure perf/startup time and also any unexpected interactions (e.g. third party interactions).
,
Aug 15
How do you imagine this should work? Should a sandbox have a skeleton debugger that will provide a way to register callbacks for things like loaded DLL, thrown exception, etc. Or should sandbox just set the proper flags and the debugger will be placed somewhere else?
,
Aug 15
it would be more like the former. The broker would act as the debugger and handle the callbacks.
,
Aug 15
Ok, that's what I thought as well, just wanted to double check what you guys think about it. How about we start with a very basic debugger that won't support callbacks and just ignore all events so that the performance can be measured? Do you have such tests?
,
Aug 15
We don't have specific perf tests for the sandbox integration tests (sbox_integration_tests), but that would certainly be the place to start landing the code and adding functional tests.
,
Aug 16
,
Aug 16
,
Aug 22
I just uploaded, https://chromium-review.googlesource.com/c/chromium/src/+/1184707, a proof of concept cl. At the moment it starts all childs under debugger and doesn't handle any exceptions, but that should be enough for performance testing, right? I had to move the creation of child into a separate thread if we want to use the debugger because only the thread that spawns the child can actually debug it, the reason is that Windows stores debug object that is used by the debug API in TEB. So either you put the debug loop in the thread that spawns the child or attach to the child in a different thread. Problem with the latter is that the child is spawned suspended and attach creates a remote thread that requires ntdll to be present in the child (check ntdll!DbgUiIssueRemoteBreakin) at the time of attach, which is not, and the child crashes because of it. It took me a while to figure this out because if you manage to resume the main thread before the system actually creates the remote thread then everything works as expected. So I was seeing some successful tests and some failed ones as well. There is a way to solve this using ntdll!DbgUiGetThreadDebugObject and ntdll!DbgUiSetThreadDebugObject but those are undocumented and don't think you'd like to use them in Chromium.
,
Aug 22
It sounds like a launcher/debugger thread is the right solution. Do we need one for each child process? That could get unwieldy. I guess we could use this technique on some percentage of users and also on some percentage of child processes - even with small percentages we will capture enough crashes. I would expect that you would have to handle some exceptions in order for things to run. There's the thread naming exception and various others that get thrown during routine behavior.
,
Aug 22
Thanks for #8 as a PoC, much appreciated. Looks like it has potential. I don't think it's landable as-is without an option to turn it on/off. I think also for this to be landable, sandbox should provide a optional delegate interface that callers can provide that implements the various callbacks - since it's probable that different sandbox embedders will have different requirements e.g. some will want to handle exceptions and some won't - e.g. in chromium we might want to plumb the exception callback into crashpad.
,
Aug 22
It never was meant as landable code, I thought this might be used for testing the performance, since every load of dll and new thread creation freezes the child until debugger says it's done processing the event. If that looks ok, I can implement the rest. #9 I'm afraid that one thread can debug only one process, but I'd have to verify this.
,
Aug 22
Hmm if it's one thread per child process that might be something that limits the deployability of this. As Bruce says, perhaps we can enable this for a % of users on a channel like Canary or Dev.
,
Aug 23
And a percentage of processes. Actually, for deciding which processes to debug when this is enabled I'll be that an exponentially decaying probability would work well, so we'd never be debugging more than, say, O(log(n)) processes, thus avoiding crazy numbers of debug threads. Thanks for investigating this. |
|||
►
Sign in to add a comment |
|||
Comment 1 by wfh@chromium.org
, Jul 25