Direct-leak in v8::internal::wasm::ModuleDecoder::DecodeModule |
||||||
Issue descriptionDetailed report: https://cluster-fuzz.appspot.com/testcase?key=6336341702082560 Fuzzer: mbarbella_js_mutation Job Type: linux_asan_d8_dbg Platform Id: linux Crash Type: Direct-leak Crash Address: Crash State: v8::internal::wasm::ModuleDecoder::DecodeModule v8::internal::wasm::DecodeWasmModule v8::internal::wasm::CreateModuleObjectFromBytes Sanitizer: address (ASAN) Regressed: V8: r41703:41704 Minimized Testcase (0.09 Kb): Download: https://cluster-fuzz.appspot.com/download/AMIfv97tyKc-OTnr38R9by09hBokmHe5bCUu3grbVra1utMfvbWXiA_gFry6Dx8BFwpDHWRjYbBDZ4rdc6ZpJEF-NOSjbZlaHsz1LYfAyiH42ileVINO6uh_HFbPAPLWfaa5210X1cL7sFm2CuKkrRgrrFZN4IXa_CiMMoLCssAjZbMzd9UWPPmmPteh1XWCGMYoVcYI6tRvR1GNwxMRDkmcDA6wkj2-IuRJBGB1TjbOEu2vbA3hPFCY0zV7XGJ16g_dVuV7o80YK_OyQRo4gg1nhETgoiHXgmp3RPP_UzvEhgyLDPQa-ZbQiR8OdIaqP4Y8yYcGz95yMhIYtDZowJ1A_Nyut6HTr_G6iNMjDW-mdtdCuChh7JQIcEJpJv9Zwdmu-1rjoiI5PxMdtgUTJ0ZAFd8PgDw9fw?testcase_id=6336341702082560 (function __f_13() { "use asm"; function __f_12() { } return { __f_12: __f_12 }; })(); Issue filed automatically. See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information.
,
Jan 17 2017
I have trouble repro-ing this in tip of tree v8. Specific settings: is_debug = true v8_optimized_debug = false use_goma = true enable_full_stack_frames_for_profiling = true symbol_level = 2 is_component_build=true use_debug_fission=false v8_enable_slow_dchecks = true is_asan = true after which (As per trace in the direct report): d8 --random-seed=808670098 --invoke-weak-callbacks --omit-quit --cache=code --validate-asm fuzz-01545.js
,
Jan 23 2017
I can reproduce this locally with tip-of-tree (i.e. with the [1] revision). Here is how I built (similar to instructions at [2]) ... $ export GYP_DEFINES="asan=1 use_goma=1 gomadir=/usr/local/google/home/mstarzinger/Tools/goma" $ gclient runhooks $ ./gypfiles/gyp_v8 $ ninja -C out/Debug d8 $ ./out/Debug/d8 --random-seed=808670098 --invoke-weak-callbacks --omit-quit --cache=code --validate-asm ~/Downloads/fuzz-01545.js Note that I didn't yet get the ASAN symbolization to work, so I just get raw traces of the allocation sites and am unable to determine where the leaks come from. But the issue no longer reproduces if I remove the --validate-asm flag from the command line. [1] https://crrev.com/53a887e4c682008f604a827c6885dcc6cd64eb53 [2] https://g3doc.corp.google.com/company/teams/v8/build_and_test.md#building-v8-with-ninja-and-asan
,
Jan 23 2017
Note that even if ASAN can't give you stack traces, you can just catch it in the debugger and use the 'bt' command.
,
Jan 23 2017
Thanks, I can repro now. The issue provided a link for how-to repro, https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs. Should we provide instead (or link somewhere prominently) the link [2] from comment#3? Any idea why gyp and not gn?
,
Jan 24 2017
Here's the problem: WasmModule is currently allocated on the plain heap, but it is wrapped into a Foreign object and stored on the JS Heap, and has a finalizer via a weak global handle. The finalizer deletes the native WasmModule. In this bug's setup, d8 calls wasm Compile twice - see CompileForCachedData (in d8.cc). After the first go, d8 calls Dispose() on the isolate, and continues with a second one. Main thing here is, no GC Collect was performed on the first isolate, hence leak. My first thought was "why isn't the finalizer called" - and then saw https://bugs.chromium.org/p/v8/issues/detail?id=1428 (Referenced from GlobalHandles::TearDown) The comment closing 1428 as WontFix makes sense in the general case: the native resources being managed may not be owned by V8 and may have a longer lifetime. In general, we can reasonably expect the embedder to know if there are native resources to be freed, and, thus, can make the decision of calling collect when disposing the isolate. The situation here is special because the native resources are a wasm implementation detail. We could: 1) do (almost) nothing. Fix d8.cc to call Collect in that (trivial fix) 2) have a special type of finalizers that are called upon heap teardown (a restricted case for v8:1428) 3) remember allocated Segments in the AccountingAllocator, and also allocate WasmModule on the Zone. Then, at isolate teardown, as part of the allocator's teardown, we free the memory that's still allocated. I'd prefer option 2, because it follows expected object destruction semantics (destructors are called). Thoughts?
,
Jan 24 2017
,
Jan 25 2017
Adding GC folks for option 2
,
Jan 26 2017
Crazy idea: how about allocating the WasmModule on the JS heap?
,
Jan 26 2017
It was there, but Ben had a good reason (which I forgot, unfortunately) for having it native. We may need to revisit.
,
Jan 26 2017
(2) is a finalizer that Java used to have which caused all sorts of problems (?) It's not what is expected as far as I see, because the object didn't really die. I'd rather not add another finalizer type as it complicates things a lot (see global handles and all their callers in the GC). Long story short, let's not rush things here :)
,
Jan 26 2017
Allocating it on the V8 heap makes it problematic to access it in parallel threads, especially when we want to do asynchronous compilation. The data structures we use there are all C++ vectors and things, which would be a pain to port to fixed arrays and such. Unless we embedded the C++ data structures into ByteArrays, but that could still be problematic if the GC moves them. Another option would be to not add another finalizer type directly to the GC, but squirrel them away in the isolate somewhere, and destroy them explicitly at isolate shutdown.
,
Jan 26 2017
I think we should tease in detail how async compilation may work. A design I discussed with Brad would avoid this type of issue, but we should analyze it offline to make sure that's the case. Apart from async compilation, was there another motivation? I believe the interpreter is another consumer of this data; would it be OK with it being on the JS heap? I remember one more motivation - the desire to move wasm off the managed heap. In the meantime, I do think we need some story for embedders, given wasm is switched on. Perhaps what Ben is suggesting (his last para) - remembering the handles to these wasm-specific wrappers on the isolate, and clearing those at Dispose()?
,
Jan 31 2017
After chatting in MTV, mlippautz mentioned that mstarzinger may have details on what happened in Java with a solution like (2). What are the details? We were wondering if they were applicable in the specific case here (memory allocated by V8, inaccessible to embedders, for a v8 implementation detail) Thanks!
,
Jan 31 2017
What I only realized throughout the meeting is that whatever is allocated and needs to be finalized is allocated within V8, i.e., is an implementation detail. I agree with titzer that the easiest way to handle this is a list on Isolate that is cleaned up during Isolate::Deinit. We should avoid finalizers and their use wherever possible, and try hard not to add any more types.
,
Jan 31 2017
note that we have a bunch of other places that use finalizers internally, e.g., https://cs.chromium.org/search/?q=GlobalHandles::MakeWeak+file:runtime-i18n&type=cs
,
Jan 31 2017
Yes, a list in the isolate I think is the way to go. It probably needs to be a linked list of Foreign objects (actually Managed<T>) objects that are weak. The instance finalizer for WASM modules already delete's the Managed<WasmModule>, but since we cannot guarantee it to run, we should link up the Managed<WasmModule>'s into a list in the isolate and delete them upon Isolate::DeInit.
,
Jan 31 2017
Those uses of WeakGlobalHandles are almost all certainly leaks in the isolate shutdown case. LSAN doesn't find them because we probably don't have cctests for them. Perhaps we should make Managed<T> more of a first-class citizen and use it for these use cases as well? The only worry I have is that if we start to pig out on Managed<T> that list hanging off the isolate could start to grow large.
,
Jan 31 2017
that's not true. All code that shuts down isolates first drops all handles and then calls GC and then shuts down the isolate (except for this CompileForCodeCache one)
,
Jan 31 2017
Yeah, we've had LSAN find quite some issues actually. Most of them were fixed by doing what Jochen said: Dropping the handles before shutting down the Isolate.
,
Jan 31 2017
I want to make sure we're on the same page. Is this accurate: comments 20&21 are countering the idea of making Managed<T> more of a first-class citizen (comment 19). Are we OK with the idea of keeping a list of the specific Foreign objects to be cleared at Isolate::Deinit?
,
Feb 7 2017
When would we need to walk the list? Only at tear down?
,
Feb 7 2017
That's correct, and only if GC hasn't already cleaned up the native memory.
,
Feb 8 2017
A thought: an alternative to this list - based design would be to allocate these native values on a Zone attached to the isolate. The only issue is, what do we do at GC Collect time - ideally, when collecting a Managed object (==this special kind of Foreign object this thread is about), we'd free up its memory. We don't currently do that for Zone objects, afaik, it's all or nothing; and letting them leak until TearDown may be suboptimal - these are large-ish (wasm module metadata) objects. Another negative is that we wouldn't run the destructor for these objects when tearing down the Isolate, since we'd just clean up the arena whole-sale. That'd be different from what'd happen when the objects are piecemeal destroyed at GC time. Not sure if this amounts to anything actionable - unless we wanted to explore further this Zone-based design on some merit I'm not seeing. Thoughts?
,
Feb 8 2017
I just had another idea that doesn't require a weak list, inspired by Mircea's above comment. We could use a level of indirection so that Managed<T> points to a slot in a C++-allocated array that hangs off the isolate. The C++ array would just be raw pointers to the C++ objects underneath. We already have a weak callback for Managed<T> objects which deallocates the C++ object. We can keep that weak callback and make sure it clears the slot in the C++ array. Then we add logic to TearDown to just scan this array and delete any entries that are non-null. That would be nice because it doesn't require maintaining a weak list of all Managed<T> objects, just an isolate-wide C++ array into which the Managed<T> objects have pointers.
,
Feb 8 2017
And also, because there is no weak list, there is no impact to GC latency; in fact we can do this with no changes to the GC whatsoever.
,
Feb 20 2017
Issue 694273 has been merged into this issue.
,
Feb 22 2017
,
Feb 22 2017
Privacy.policy@email.com
,
Feb 28 2017
ClusterFuzz has detected this issue as fixed in range 43349:43350. Detailed report: https://cluster-fuzz.appspot.com/testcase?key=6336341702082560 Fuzzer: mbarbella_js_mutation Job Type: linux_asan_d8_dbg Platform Id: linux Crash Type: Direct-leak Crash Address: Crash State: v8::internal::wasm::ModuleDecoder::DecodeModule v8::internal::wasm::DecodeWasmModule v8::internal::wasm::CreateModuleObjectFromBytes Sanitizer: address (ASAN) Regressed: V8: 41703:41704 Fixed: V8: 43349:43350 Reproducer Testcase: https://cluster-fuzz.appspot.com/download/AMIfv97tyKc-OTnr38R9by09hBokmHe5bCUu3grbVra1utMfvbWXiA_gFry6Dx8BFwpDHWRjYbBDZ4rdc6ZpJEF-NOSjbZlaHsz1LYfAyiH42ileVINO6uh_HFbPAPLWfaa5210X1cL7sFm2CuKkrRgrrFZN4IXa_CiMMoLCssAjZbMzd9UWPPmmPteh1XWCGMYoVcYI6tRvR1GNwxMRDkmcDA6wkj2-IuRJBGB1TjbOEu2vbA3hPFCY0zV7XGJ16g_dVuV7o80YK_OyQRo4gg1nhETgoiHXgmp3RPP_UzvEhgyLDPQa-ZbQiR8OdIaqP4Y8yYcGz95yMhIYtDZowJ1A_Nyut6HTr_G6iNMjDW-mdtdCuChh7JQIcEJpJv9Zwdmu-1rjoiI5PxMdtgUTJ0ZAFd8PgDw9fw?testcase_id=6336341702082560 See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information. If you suspect that the result above is incorrect, try re-doing that job on the test case report page. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by msrchandra@chromium.org
, Jan 11 2017Labels: Test-Predator-Wrong-CLs
Owner: mtrofin@chromium.org
Status: Assigned (was: Untriaged)