Recurring warnings in genl_unbind() under syzkaller on Pixel C |
||||
Issue descriptionI'm seeing tons of the following warnings while fuzzing the Pixel C kernel with syzkaller: [ 8051.697977] WARNING: CPU: 0 PID: 18847 at /mnt/host/source/src/third_party/kernel/v3.18/net/netlink/genetlink.c:1037 genl_unbind+0x158/0x198() This warning has been removed in trunk by the following patch: https://github.com/torvalds/linux/commit/ee1c244219fd652964710a6cc3e4f922e86aa492 (genetlink: synchronize socket closing and family removal)
,
Mar 18 2016
,
Mar 21 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/efc639722012099d21c2b71cb2571555884af207 commit efc639722012099d21c2b71cb2571555884af207 Author: Johannes Berg <johannes.berg@intel.com> Date: Fri Jan 16 10:37:14 2015 UPSTREAM: genetlink: synchronize socket closing and family removal In addition to the problem Jeff Layton reported, I looked at the code and reproduced the same warning by subscribing and removing the genl family with a socket still open. This is a fairly tricky race which originates in the fact that generic netlink allows the family to go away while sockets are still open - unlike regular netlink which has a module refcount for every open socket so in general this cannot be triggered. Trying to resolve this issue by the obvious locking isn't possible as it will result in deadlocks between unregistration and group unbind notification (which incidentally lockdep doesn't find due to the home grown locking in the netlink table.) To really resolve this, introduce a "closing socket" reference counter (for generic netlink only, as it's the only affected family) in the core netlink code and use that in generic netlink to wait for all the sockets that are being closed at the same time as a generic netlink family is removed. This fixes the race that when a socket is closed, it will should call the unbind, but if the family is removed at the same time the unbind will not find it, leading to the warning. The real problem though is that in this case the unbind could actually find a new family that is registered to have a multicast group with the same ID, and call its mcast_unbind() leading to confusing. Also remove the warning since it would still trigger, but is now no longer a problem. This also moves the code in af_netlink.c to before unreferencing the module to avoid having the same problem in the normal non-genl case. BUG=chromium:596019 TEST=run syzkaller on KASAN kernel Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit ee1c244219fd652964710a6cc3e4f922e86aa492) Signed-off-by: Alexander Potapenko <glider@google.com> Change-Id: I1100bcdffad45eb30f9be4a36c58dc8744579587 Reviewed-on: https://chromium-review.googlesource.com/333910 Commit-Ready: Alexander Potapenko <glider@chromium.org> Tested-by: Alexander Potapenko <glider@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> Reviewed-by: Nicolas Boichat <drinkcat@chromium.org> [modify] https://crrev.com/efc639722012099d21c2b71cb2571555884af207/include/linux/genetlink.h [modify] https://crrev.com/efc639722012099d21c2b71cb2571555884af207/net/netlink/genetlink.c [modify] https://crrev.com/efc639722012099d21c2b71cb2571555884af207/net/netlink/af_netlink.c [modify] https://crrev.com/efc639722012099d21c2b71cb2571555884af207/include/net/genetlink.h [modify] https://crrev.com/efc639722012099d21c2b71cb2571555884af207/net/netlink/af_netlink.h
,
May 27 2016
,
Jan 8 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by glider@chromium.org
, Mar 18 2016