New issue
Advanced search Search tips

Issue 783727 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Cache WeakCells for JSFunction and PropertyCells

Project Member Reported by u...@chromium.org, Nov 10 2017

Issue description

Most of the weak cells allocated for JSFunction and PropertyCell objects are duplicates.
Caching them will save around 800KB in Gmail.

Weak cells pointing to JSFunction = 75510, duplicates = 57806
Weak cells pointing to PropertyCell = 53878, duplicates = 36626

 
Owner: marja@chromium.org
Cc: bmeu...@chromium.org

Comment 3 by marja@chromium.org, Nov 21 2017

Update: there was a bug in the calculations: the amount of JSFunctions which are not pointed to by any WeakCells wasn't taken into account properly, thus, extending JSFunction the trivial way consumes more memory than assumed.

-> The first part, caching WeakCell in JSFunction doesn't make sense, at least no the trivial implementation. There might be some other way to still cache WeakCells somehow, but it's an unclear space / time tradeoff (saving space vs consuming time to find the WeakCell somehow).

The second part, caching WeakCell in PropertyCell, should still make sense.


I was just discussing this with Toon in the morning, because I was surprised that this would pay off. We could probably consider doing something about methods, where we know that they will end up on the descriptor array, but on the other hand it's probably better to not track closures there at all, but instead focus on the constant field tracking instead.

Comment 5 by u...@chromium.org, Nov 21 2017

Thanks for finding the accounting bug, Marja.

> The second part, caching WeakCell in PropertyCell, should still make sense.
That would save ~100KB on 64-bit platform.

Benedikt, would constant field tracking completely remove weak cells pointing to JSFunction?
If so, that would save ~1.2MB.
We'd still need WeakCells for JSFunctions in the CALL_IC, until we find a way to treat slots in the FeedbackVector as weak itself. But we no longer need to point to JSFunctions from DescriptorArrays.

Comment 7 by marja@chromium.org, Nov 22 2017

... the trivial version of caching WeakCell in PropertyCell doesn't make sense either (thanks ulan@ for a debugging session).

The problem is that if we (strongly) put WeakCell in PropertyCell, that ends up retaining more WeakCells than before.

The previous calculations were reasoning about amounts of objects at GC time (no. of property cells, no. of weak cells pointing to property cells, no. of duplicates among those) and weren't taking this into account.

Like in the JSFunction case, it's possible that some other form of caching would help.
I don't think we really need call feedback at all for anything except for context-loads unless we have context-specialization. We already typically know the target function from the load from LoadIC or GlobalLoadIC related to the call. At least this is how we used it in Crankshaft; I suppose the same info you have in TF.

I've long argued for only having CallICs for properties named "Array" since that's the only custom version where we want to track kind transitions; and we probably don't care enough to also track the case where Array is renamed. That would save us lots of feedback vector space and weak cells for those calls; and make the slow path cheaper (don't have to allocate cells, don't need to check whether feedback still matches).
We've been discussing here how to get rid of it for loads. The only case where CallICs help is in the megamorphic case where we don't have maps. We can just have another type of "megamorphic load" state that tracks the constant it has loaded from all maps.

As long as it always loads the same value, the megamorphic load tracks this; and provides this feedback to TF. Once it sees different values loaded, it goes regular megamorphic. This way we save the call IC slot in the feedback vector, save metadata, don't need WeakCells for JSFunctions for loads that are monomorphic and polymorphic; just for the megamorphic case where it's still constant.
Cc: yangguo@chromium.org jarin@chromium.org danno@chromium.org mvstan...@chromium.org
We've been actively considering to move in the other direction of collecting even more feedback on calls, like information about the parameters or the receiver for several builtins, and stuff like that.

Also I know from Microsoft people that matching "Array" doesn't work. They used to do the syntactic approach initially for IE9 and had terrible performance cliffs with minified/uglified code.

In addition to that we should have hard data of how much we expect to save here compared to other more low hanging fruits that don't conflict with future performance improvements.
We should have hard data on how much these optimizations help on the real web.
+real node
We should also put the allocation site on the load side, rather than on the call IC. We should just merge current call IC behavior into the LoadIC so we can optimize what we store.

This seems to suggest we need different modes for node and the web. Perhaps we just need something similar to the baseline code generation ID: change the bytecode for functions to track more data once they become hot. Most code doesn't need to track any of this since it never becomes hot, and it's just a waste of space. Especially since we're thinking about limiting how much optimized code we can generate at all.
This would be a huge step backwards in our real world performance story and would encourage people to write Crankshaft^WV8Script again in order to stay on the fast path. I.e. you can no longer trust V8 to get this right

  function foo(f) { return f(1); }

and this is only a trivial example. So the performance advice then will be to write something like

  function foo(f) {
    function callF() { return f(1); }
    return callF();
  }

so that you get the CallIC feedback on the load of f from the context. Even for "property calls" you don't always couple Call and Load, for example there's the common pattern

  Object(o.foo)(...)

that webpack generates, so it's even outside the control of website authors. There are many more examples like this.

In general we should be careful to not invert our story and make people write code specifically for V8 again. These performance cliffs might be obvious to us, but they are completely unobvious to developers.

That being said, we might be able to introduce a LoadAndCallProperty bytecode for the somewhat common case of o.foo(...), which combines the LoadIC, call count and target feedback, but there are some difficulties to figure out first, i.e. we need to deopt into the middle of the bytecode then, and we also need to be able to still handle getters properly. Nothing that can't be solved, but requires some work. So if you think that we waste a lot of memory because of LoadIC+CallIC for o.foo(...) then let's try to address that.
Huh? I'm not suggesting to change anything about non-loadIC-based call ICs. Anyway, it's just one of the things we can do and if it turns out it's important for memory we'll investigate. If it's not, we'll leave it alone.
It's great to record these discussions in a bug, thx for doing that.

Could we add an instance type for feedback vector that allows us to selectively treat roots as weak? Based on the IC state, we know what we've got in there.
We've been thinking about reusing failure tags to replace WeakCells altogether for that purpose. Marja will investigate next year as part of the memory work.
Ah, good, sounds awesome. +1.

Sign in to add a comment