The Blink GC infrastructure requires its managed objects to provide a "trace()" method which will visit all the heap references it keeps into the Blink GC heap, by calling the "trace()" method on each of these via an incoming |visitor| argument.
As program execution is paused during the initial phases of a GC, the marking phase which visits all the live objects (by invoking the provided trace() methods), needs to have minimal overhead. To that end, we've optimized the GC marking phase and the code emitted for the trace() methods, a fair bit.
One of the optimizations being the generation of an "inlined marking" specialization of each trace() method (InlinedGlobalMarkingVisitor), which by keeping the visitor object stack allocated, is able to save an instruction per trace() call it makes (i.e., the |visitor| argument doesn't have to be re-loaded into a register before calling.) The cost of code specialization is, as always, more code -- it doubles the size of the trace-related code per object, as the standard trace() method is also generated.
Two issues worth reconsidering:
- does it have enough value to keep using InlinedGlobalMarkingVisitor? (we now have enough data built up on how much overhead tracing adds)
- can we optimize the code generated for the standard trace method, to make InlinedGlobalMarkingVisitor redundant?
Comment 1 by sigbjo...@opera.com
, Jan 22 2017Status: Started (was: Untriaged)