New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 10 users
Status: Started
Owner:
Cc:
Components:
NextAction: ----
OS: ----
Pri: 3
Type: Patch



Sign in to add a comment
Patch for reducing vtable count in sigslot
Reported by andrey.s...@gmail.com, Aug 26 2013 Back to list
We are using webrtc/libjingle in our software. Due to some other dependencies and our own code requirements we have to enable RTTI when compiling webrtc/libjingle, but doing so increases the compiled binary size considerably, because the compiler generates more symbols. And the binary size is not the most critical problem - the linker starts to behave weirdly, it starts to flag errors about missing symbols which are actually present in the libraries and it sometimes produces incorrectly linked binaries. For example, in our case it would generate invalid virtual tables for sigslots (i.e. when calling the virtual has_slots_interface::signal_connect method the deleting destructor of the signal is called instead, which results in a crash).

To mitigate the problem I've rewritten sigslot implementation to avoid virtual tables. I've also simplified the design a bit and reduced the number of classes. All this resulted in significant reduction of number of symbols (and binary size). I'm not seeing linking problems with the new version, and the code seems to work as intended. The new version is interface-compatible with the original one, so no changes are required in the rest of the code.

I've attached the patch against revision 4476 of branches/3.37. Since this is a rewrite, the patch is heavy. For convenience I've also attached the final (patched) version of sigslot.h. Feel free to ask any questions about the implementation.

We are running on various Linux distributions (Ubuntu, Debian). We use gcc 4.4, 4.6 and 4.7 and GNU ld. We tried switching to gold but that didn't help.

 
Project Member Comment 1 by braveyao@webrtc.org, Aug 27 2013
Cc: juberti@webrtc.org wu@webrtc.org braveyao@webrtc.org
Owner: pthatcher@google.com
@peter, PTAL!
I've updated the patch and the resulting (patched) header. The patch is against branches/3.41. The updated patch fixes a few problems we discovered during testing, mostly incorrect behavior in signal/slot disconnection at destruction.

Any updates on this issue?
Project Member Comment 3 by juberti@webrtc.org, Dec 16 2014
Labels: Area-PeerConnection
Project Member Comment 4 by juberti@webrtc.org, Dec 17 2014
Labels: -Type-Bug Type-Patch
Project Member Comment 5 by pthatcher@webrtc.org, Jan 6 2015
Labels: EngTriaged IceBox
Status: Assigned
Project Member Comment 6 by pthatcher@webrtc.org, Nov 8 2016
Labels: Pri-3
I'm attaching an updated version of the patch and the resulting files that correspond to the 55 branch.
02_devirtualize_sigslot.patch
93.4 KB Download
sigslot.h
36.2 KB View Download
sigslot.cc
493 bytes View Download
Project Member Comment 8 by deadbeef@webrtc.org, Nov 16 2016
This is a lot to review so I haven't gotten through it all yet. But it looks like the general approach is to replace any call to a virtual method with a call to an opaque function pointer that was stored by a templated constructor.

However... isn't that, in essence, just re-implementing vtables? For example, "has_slots_interface" was previously an abstract class with three virtual methods (if you don't count the destructor, which probably wasn't necessary). Now it's a non-abstract class that stores three function pointers. Effectively, this moves the cost of the vtables from the compiled binary size to memory used at runtime. Is my understanding correct?
Yes, that's correct. The patch basically does the following:

1. Removes virtual specifier from the methods in the threading base classes (single_threaded, multi_threaded_global, multi_threaded_local).
2. Makes methods of POSIX threading base classes inline.
3. Replaces multiple _connection_baseN template classes with virtual functions with one _opaque_connection class with 3 data members (a total size of 4 pointers). Not counting vtable size, it should be 3 pointers larger than _connection_baseN and have the same size as the _connectionN classes that are actually allocated (see connect() methods).
4. Replaces virtual functions in has_slots_interface with 3 function pointers. This makes the class 2 pointers larger.
5. Replaces virtual functions in _signal_base_interface with 2 function pointers. This makes the class 1 pointer larger.

So the net size increase in user classes comes from has_slots<> (+2 pointers) and every signal (each +1 pointer). Connections have the same size.

As a result of this patch, no virtual tables are used in sigslot implementation, while previously there were a vtable per each involved class and template specialization (that includes every combination of signal call arguments multiplied by the number of threading policies used). It also does not impose a vtable on every user's class that derives from has_slots<>. That significantly reduces the number of symbols to link (each vtable is an additional symbol), which I think is what fixed the original problem for us. It also reduces the binary size. Although I didn't do performance comparison, the new implementation should arguably be faster since it is more cache-friendly.

Note that any decent implementation of std::function does not use vtables to implement type erasure for the same reasons that motivated my patch.

PS: Since webrtc code base has moved on to C++11, the implementation could be simplified further by using variadic templates. Let me know if you want that implemented (if the patch is otherwise ok).

Oh, I forgot to mention that the connections_list in signals used to store a pointer to the separately allocated connection. The patched code stores the connection in the list by value, which makes one memory allocation less per connection.

Attached the C++11 version of the patch. Also replaced union-based casts with memcpy, which is more conforming to the C++ standard.
sigslot.h
25.8 KB View Download
sigslot.cc
493 bytes View Download
02_devirtualize_sigslot.patch
82.9 KB Download
Comment 12 by juberti@google.com, Nov 21 2016
Andrey, can you share some stats that would help us understand the overall benefit of your patch? (i.e. code size with and without RTTI)
Here are some data for our build of webrtc on Kubuntu 16.10, x86_64, gcc 6.2:

1. Total number of static libraries built: 77 in both cases. Total size without the patch: 401,685,948 bytes, with the patch: 343,280,258 bytes.

2. Approximate total number of symbols in all static libraries, counted with this script:

```
#!/bin/bash

sum=0

for f in *.a
do
    sym_count=`nm $f | wc -l`
    sum=$(($sum + $sym_count))
done

echo $sum
```

Without the patch: 72507, with the patch: 65104

3. Some select libraries that use sigslots internally (without the patch => with the patch):

librtc_p2p.a: size: 51,922,826 bytes => 33,459,092 bytes, approx. symbols: 7462 => 4929
librtc_pc.a: size: 21,875,536 bytes => 16,101,550 bytes, approx. symbols: 3201 => 2436
librtc_xmpp.a: size: 28,039,400 bytes => 17,696,410 bytes, approx. symbols: 4485 => 3152
librtc_media.a: size: 27,174,016 bytes => 24,184,738 bytes, approx. symbols: 3667 => 3406

Approximate number of symbols calculated with `nm <libname.a> | wc -l`.

I forgot to mention that the stats are for the 56 branch.
Project Member Comment 15 by deadbeef@webrtc.org, Nov 21 2016
This did reduce the size of libjingle_peerconnection.a by 13% for me. Though I think we need to measure how it impacts runtime performance; is it faster or slower, and how much more memory does it use in an average WebRTC call? We can probably use the Chromium performance test infrastructure to check.
I don't have the infrastructure to run performance tests on webrtc, so it will be difficult for me to benchmark it. I could write a test that uses just sigslot alone and bench it, but it'll take some time.
Project Member Comment 17 by deadbeef@webrtc.org, Nov 21 2016
I'll take a look when I have some time, don't worry.
Comment 18 by juberti@google.com, Nov 22 2016
Re #15, looks very promising. Agree we need to run it in a perf test but I would be surprised if we saw a significant perf difference.

Taylor, can you determine the overall effect on AppRTCMobile.apk?
ping Taylor on APK size impact
Project Member Comment 20 by deadbeef@chromium.org, Dec 7 2016
About a 20KB reduction (3.057MB to 3.036MB) for a default configured release build.
OK, that's good to see. Let's proceed, when time permits.
Regarding `virtual ~has_slots_interface()` destructor, it doesn't need to be virtual not only because the new implementation doesn't need it to be, but also because none of webrtc code relies on it being virtual (at least, none did when I last checked in 56). If there is such code now, I would suggest fixing that code instead and leaving `~has_slots_interface()` non-virtual.

Having the destructor virtual still requires a vtable to be generated for `has_slots_interface` and all derived classes.

Project Member Comment 24 by deadbeef@webrtc.org, Feb 20 2017
Owner: deadbeef@webrtc.org
Status: Started
The code doesn't quite need it to be virtual, but any code that looks like this:

class Foo : public sigslot::has_slots<> {
 public:
  ~Foo() override;
};

... Will stop compiling if the destructor stops being virtual. Fixing this code in the webrtc repository is easy enough, but there's also chromium and other downstream applications that subclass from sigslot::has_slots<> directly.

We can fix this in a phase 2 of the CL, but I wanted to first try landing the main chunk of changes.
Sign in to add a comment