New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 680065 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

Direct-leak in v8::internal::wasm::ModuleDecoder::DecodeModule

Project Member Reported by ClusterFuzz, Jan 11 2017

Issue description

Detailed report: https://cluster-fuzz.appspot.com/testcase?key=6336341702082560

Fuzzer: mbarbella_js_mutation
Job Type: linux_asan_d8_dbg
Platform Id: linux

Crash Type: Direct-leak
Crash Address: 
Crash State:
  v8::internal::wasm::ModuleDecoder::DecodeModule
  v8::internal::wasm::DecodeWasmModule
  v8::internal::wasm::CreateModuleObjectFromBytes
  
Sanitizer: address (ASAN)

Regressed: V8: r41703:41704

Minimized Testcase (0.09 Kb):
Download: https://cluster-fuzz.appspot.com/download/AMIfv97tyKc-OTnr38R9by09hBokmHe5bCUu3grbVra1utMfvbWXiA_gFry6Dx8BFwpDHWRjYbBDZ4rdc6ZpJEF-NOSjbZlaHsz1LYfAyiH42ileVINO6uh_HFbPAPLWfaa5210X1cL7sFm2CuKkrRgrrFZN4IXa_CiMMoLCssAjZbMzd9UWPPmmPteh1XWCGMYoVcYI6tRvR1GNwxMRDkmcDA6wkj2-IuRJBGB1TjbOEu2vbA3hPFCY0zV7XGJ16g_dVuV7o80YK_OyQRo4gg1nhETgoiHXgmp3RPP_UzvEhgyLDPQa-ZbQiR8OdIaqP4Y8yYcGz95yMhIYtDZowJ1A_Nyut6HTr_G6iNMjDW-mdtdCuChh7JQIcEJpJv9Zwdmu-1rjoiI5PxMdtgUTJ0ZAFd8PgDw9fw?testcase_id=6336341702082560
(function __f_13() {
  "use asm";
  function __f_12() { }
  return { __f_12: __f_12 };
})();


Issue filed automatically.

See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information.
 
Cc: msrchandra@chromium.org
Labels: Test-Predator-Wrong-CLs
Owner: mtrofin@chromium.org
Status: Assigned (was: Untriaged)
Find it did not provide any possible suspects.
Assigning to the concern owner from CL --
https://chromium.googlesource.com/v8/v8/+log/37253381e2bfac3d4acf095191947bc42d7ae6b9..77b50a8e126186488992b14b0760ed043a9d7631?pretty=fuller

@mtrofin -- Could you please look into the issue, kindly re-assign if this is not related to your changes.
Thank You.
Cc: -msrchandra@chromium.org mtrofin@chromium.org
Owner: msrchandra@chromium.org
I have trouble repro-ing this in tip of tree v8. Specific settings:

is_debug = true
v8_optimized_debug = false
use_goma = true
enable_full_stack_frames_for_profiling = true
symbol_level = 2
is_component_build=true
use_debug_fission=false
v8_enable_slow_dchecks = true
is_asan = true

after which (As per trace in the direct report):

d8 --random-seed=808670098 --invoke-weak-callbacks --omit-quit --cache=code --validate-asm fuzz-01545.js

Cc: -mtrofin@chromium.org titzer@chromium.org mstarzinger@chromium.org
Owner: mtrofin@chromium.org
I can reproduce this locally with tip-of-tree (i.e. with the [1] revision). Here is how I built (similar to instructions at [2]) ...

$ export GYP_DEFINES="asan=1 use_goma=1 gomadir=/usr/local/google/home/mstarzinger/Tools/goma"
$ gclient runhooks
$ ./gypfiles/gyp_v8
$ ninja -C out/Debug d8
$ ./out/Debug/d8 --random-seed=808670098 --invoke-weak-callbacks --omit-quit --cache=code --validate-asm ~/Downloads/fuzz-01545.js

Note that I didn't yet get the ASAN symbolization to work, so I just get raw traces of the allocation sites and am unable to determine where the leaks come from. But the issue no longer reproduces if I remove the --validate-asm flag from the command line.

[1] https://crrev.com/53a887e4c682008f604a827c6885dcc6cd64eb53
[2] https://g3doc.corp.google.com/company/teams/v8/build_and_test.md#building-v8-with-ninja-and-asan

Comment 4 by titzer@chromium.org, Jan 23 2017

Note that even if ASAN can't give you stack traces, you can just catch it in the debugger and use the 'bt' command.
Thanks, I can repro now.

The issue provided a link for how-to repro, https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs. Should we provide instead (or link somewhere prominently) the link [2] from comment#3?

Any idea why gyp and not gn?

Comment 6 Deleted

Here's the problem:


WasmModule is currently allocated on the plain heap, but it is wrapped into a Foreign object and stored on the JS Heap, and has a finalizer via a weak global handle. The finalizer deletes the native WasmModule.


In this bug's setup, d8 calls wasm Compile twice - see CompileForCachedData (in d8.cc). After the first go, d8 calls Dispose() on the isolate, and continues with a second one. Main thing here is, no GC Collect was performed on the first isolate, hence leak.

My first thought was "why isn't the finalizer called" - and then saw https://bugs.chromium.org/p/v8/issues/detail?id=1428 (Referenced from GlobalHandles::TearDown)

The comment closing 1428 as WontFix makes sense in the general case: the native resources being managed may not be owned by V8 and may have a longer lifetime. In general, we can reasonably expect the embedder to know if there are native resources to be freed, and, thus, can make the decision of calling collect when disposing the isolate. 

The situation here is special because the native resources are a wasm implementation detail.

We could:
1) do (almost) nothing. Fix d8.cc to call Collect in that  (trivial fix)

2) have a special type of finalizers that are called upon heap teardown (a restricted case for v8:1428)

3) remember allocated Segments in the AccountingAllocator, and also allocate WasmModule on the Zone. Then, at isolate teardown, as part of the allocator's teardown, we free the memory that's still allocated.

I'd prefer option 2, because it follows expected object destruction semantics (destructors are called).

Thoughts?
Cc: jochen@chromium.org
Cc: hpayer@chromium.org
Adding GC folks for option 2
Cc: mlippautz@chromium.org u...@chromium.org
Crazy idea: how about allocating the WasmModule on the JS heap?
It was there, but Ben had a good reason (which I forgot, unfortunately) for having it native. We may need to revisit.
(2) is a finalizer that Java used to have which caused all sorts of problems (?) It's not what is expected as far as I see, because the object didn't really die.

I'd rather not add another finalizer type as it complicates things a lot (see global handles and all their callers in the GC). Long story short, let's not rush things here :)
Allocating it on the V8 heap makes it problematic to access it in parallel threads, especially when we want to do asynchronous compilation. The data structures we use there are all C++ vectors and things, which would be a pain to port to fixed arrays and such. Unless we embedded the C++ data structures into ByteArrays, but that could still be problematic if the GC moves them.

Another option would be to not add another finalizer type directly to the GC, but squirrel them away in the isolate somewhere, and destroy them explicitly at isolate shutdown.
I think we should tease in detail how async compilation may work. A design I discussed with Brad would avoid this type of issue, but we should analyze it offline to make sure that's the case.

Apart from async compilation, was there another motivation? I believe the interpreter is another consumer of this data; would it be OK with it being on the JS heap?

I remember one more motivation - the desire to move wasm off the managed heap. 

In the meantime, I do think we need some story for embedders, given wasm is switched on. Perhaps what Ben is suggesting (his last para) - remembering the handles to these wasm-specific wrappers on the isolate, and clearing those at Dispose()?
After chatting in MTV, mlippautz mentioned that mstarzinger may have details on what happened in Java with a solution like (2). What are the details? We were wondering if they were applicable in the specific case here (memory allocated by V8, inaccessible to embedders, for a v8 implementation detail)


Thanks!
What I only realized throughout the meeting is that whatever is allocated and needs to be finalized is allocated within V8, i.e., is an implementation detail. I agree with titzer that the easiest way to handle this is a list on Isolate that is cleaned up during Isolate::Deinit. 

We should avoid finalizers and their use wherever possible, and try hard not to add any more types.
note that we have a bunch of other places that use finalizers internally, e.g., https://cs.chromium.org/search/?q=GlobalHandles::MakeWeak+file:runtime-i18n&type=cs
Yes, a list in the isolate I think is the way to go. It probably needs to be a linked list of Foreign objects (actually Managed<T>) objects that are weak. The instance finalizer for WASM modules already delete's the Managed<WasmModule>, but since we cannot guarantee it to run, we should link up the Managed<WasmModule>'s into a list in the isolate and delete them upon Isolate::DeInit.
Those uses of WeakGlobalHandles are almost all certainly leaks in the isolate shutdown case. LSAN doesn't find them because we probably don't have cctests for them.

Perhaps we should make Managed<T> more of a first-class citizen and use it for these use cases as well? The only worry I have is that if we start to pig out on Managed<T> that list hanging off the isolate could start to grow large.
that's not true. All code that shuts down isolates first drops all handles and then calls GC and then shuts down the isolate (except for this CompileForCodeCache one)
Yeah, we've had LSAN find quite some issues actually.

Most of them were fixed by doing what Jochen said: Dropping the handles before shutting down the Isolate.
I want to make sure we're on the same page. Is this accurate: comments 20&21 are countering the idea of making Managed<T> more of a first-class citizen (comment 19). Are we OK with the idea of keeping a list of the specific Foreign objects to be cleared at Isolate::Deinit?
When would we need to walk the list? Only at tear down?
That's  correct, and only if GC hasn't already cleaned up the native memory.
A thought: an alternative to this list - based design would be to allocate these native values on a Zone attached to the isolate. The only issue is, what do we do at GC Collect time - ideally, when collecting a Managed object (==this special kind of Foreign object this thread is about), we'd free up its memory. We don't currently do that for Zone objects, afaik, it's all or nothing; and letting them leak until TearDown may be suboptimal - these are large-ish (wasm module metadata) objects.

Another negative is that we wouldn't run the destructor for these objects when tearing down the Isolate, since we'd just clean up the arena whole-sale. That'd be different from what'd happen when the objects are piecemeal destroyed at GC time.

Not sure if this amounts to anything actionable - unless we wanted to explore further this Zone-based design on some merit I'm not seeing.

Thoughts?
I just had another idea that doesn't require a weak list, inspired by Mircea's above comment.

We could use a level of indirection so that Managed<T> points to a slot in a C++-allocated array that hangs off the isolate. The C++ array would just be raw pointers to the C++ objects underneath. We already have a weak callback for Managed<T> objects which deallocates the C++ object. We can keep that weak callback and make sure it clears the slot in the C++ array. Then we add logic to TearDown to just scan this array and delete any entries that are non-null.

That would be nice because it doesn't require maintaining a weak list of all Managed<T> objects, just an isolate-wide C++ array into which the Managed<T> objects have pointers.
And also, because there is no weak list, there is no impact to GC latency; in fact we can do this with no changes to the GC whatsoever.
 Issue 694273  has been merged into this issue.
Status: Fixed (was: Assigned)
https://codereview.chromium.org/2676513008/ fixes this
Privacy.policy@email.com
Project Member

Comment 31 by ClusterFuzz, Feb 28 2017

ClusterFuzz has detected this issue as fixed in range 43349:43350.

Detailed report: https://cluster-fuzz.appspot.com/testcase?key=6336341702082560

Fuzzer: mbarbella_js_mutation
Job Type: linux_asan_d8_dbg
Platform Id: linux

Crash Type: Direct-leak
Crash Address: 
Crash State:
  v8::internal::wasm::ModuleDecoder::DecodeModule
  v8::internal::wasm::DecodeWasmModule
  v8::internal::wasm::CreateModuleObjectFromBytes
  
Sanitizer: address (ASAN)

Regressed: V8: 41703:41704
Fixed: V8: 43349:43350

Reproducer Testcase: https://cluster-fuzz.appspot.com/download/AMIfv97tyKc-OTnr38R9by09hBokmHe5bCUu3grbVra1utMfvbWXiA_gFry6Dx8BFwpDHWRjYbBDZ4rdc6ZpJEF-NOSjbZlaHsz1LYfAyiH42ileVINO6uh_HFbPAPLWfaa5210X1cL7sFm2CuKkrRgrrFZN4IXa_CiMMoLCssAjZbMzd9UWPPmmPteh1XWCGMYoVcYI6tRvR1GNwxMRDkmcDA6wkj2-IuRJBGB1TjbOEu2vbA3hPFCY0zV7XGJ16g_dVuV7o80YK_OyQRo4gg1nhETgoiHXgmp3RPP_UzvEhgyLDPQa-ZbQiR8OdIaqP4Y8yYcGz95yMhIYtDZowJ1A_Nyut6HTr_G6iNMjDW-mdtdCuChh7JQIcEJpJv9Zwdmu-1rjoiI5PxMdtgUTJ0ZAFd8PgDw9fw?testcase_id=6336341702082560


See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information.

If you suspect that the result above is incorrect, try re-doing that job on the test case report page.

Sign in to add a comment