From 9b4aec647a92a2464337db10507348aecf0f0fd7 Mon Sep 17 00:00:00 2001 From: Linus Lüssing Date: Tue, 1 Nov 2016 09:44:44 +0100 Subject: batman-adv: fix rare race conditions on interface removal MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In rare cases during shutdown the following general protection fault can happen: general protection fault: 0000 [#1] SMP Modules linked in: batman_adv(O-) [...] CPU: 3 PID: 1714 Comm: rmmod Tainted: G O 4.6.0-rc6+ #1 [...] Call Trace: [] batadv_hardif_disable_interface+0x29a/0x3a6 [batman_adv] [] batadv_softif_destroy_netlink+0x4b/0xa4 [batman_adv] [] __rtnl_link_unregister+0x48/0x92 [] rtnl_link_unregister+0xc1/0xdb [] ? bit_waitqueue+0x87/0x87 [] batadv_exit+0x1a/0xf48 [batman_adv] [] SyS_delete_module+0x136/0x1b0 [] entry_SYSCALL_64_fastpath+0x18/0xa8 [] ? trace_hardirqs_off_caller+0x37/0xa6 Code: 89 f7 e8 21 bd 0d e1 4d 85 e4 75 0e 31 f6 48 c7 c7 50 d7 3b a0 e8 50 16 f2 e0 49 8b 9c 24 28 01 00 00 48 85 db 0f 84 b2 00 00 00 <48> 8b 03 4d 85 ed 48 89 45 c8 74 09 4c 39 ab f8 00 00 00 75 1c RIP [] batadv_purge_outstanding_packets+0x1c8/0x291 [batman_adv] RSP ---[ end trace 803b9bdc6a4a952b ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt It does not happen often, but may potentially happen when frequently shutting down and reinitializing an interface. With some carefully placed msleep()s/mdelay()s it can be reproduced easily. The issue is, that on interface removal, any still running worker thread of a forwarding packet will race with the interface purging routine to free a forwarding packet. Temporarily giving up a spin-lock to be able to sleep in the purging routine is not safe. Furthermore, there is a potential general protection fault not just for the purging side shown above, but also on the worker side: Temporarily removing a forw_packet from the according forw_{bcast,bat}_list will make it impossible for the purging routine to catch and cancel it. # How this patch tries to fix it: With this patch we split the queue purging into three steps: Step 1), removing forward packets from the queue of an interface and by that claim it as our responsibility to free. Step 2), we are either lucky to cancel a pending worker before it starts to run. Or if it is already running, we wait and let it do its thing, except two things: Through the claiming in step 1) we prevent workers from a) re-arming themselves. And b) prevent workers from freeing packets which we still hold in the interface purging routine. Finally, step 3, we are sure that no forwarding packets are pending or even running anymore on the interface to remove. We can then safely free the claimed forwarding packets. Signed-off-by: Linus Lüssing Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich --- net/batman-adv/send.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'net/batman-adv/send.h') diff --git a/net/batman-adv/send.h b/net/batman-adv/send.h index c58019475025..a94e1e8639ca 100644 --- a/net/batman-adv/send.h +++ b/net/batman-adv/send.h @@ -21,6 +21,7 @@ #include "main.h" #include +#include #include #include "packet.h" @@ -34,6 +35,10 @@ batadv_forw_packet_alloc(struct batadv_hard_iface *if_incoming, struct batadv_hard_iface *if_outgoing, atomic_t *queue_left, struct batadv_priv *bat_priv); +bool batadv_forw_packet_steal(struct batadv_forw_packet *packet, spinlock_t *l); +void batadv_forw_packet_ogmv1_queue(struct batadv_priv *bat_priv, + struct batadv_forw_packet *forw_packet, + unsigned long send_time); int batadv_send_skb_to_orig(struct sk_buff *skb, struct batadv_orig_node *orig_node, -- cgit v1.2.3-59-g8ed1b