datapath: inet: frag: Always orphan skbs inside ip_defrag().
authorJoe Stringer <joe@ovn.org>
Fri, 29 Jan 2016 19:01:56 +0000 (11:01 -0800)
committerJoe Stringer <joe@ovn.org>
Fri, 29 Jan 2016 22:40:44 +0000 (14:40 -0800)
When the linux stack is an endpoint connected to OVS which is performing
IP fragmentation via conntrack actions, it's possible to hit a kernel
BUG. The following upstream commit fixes the issue inside ip_defrag().
For the backport, we provide this inside ip_defrag() for kernels that we
currently backport that function, and also provide just the bugfix for
newer kernels, so we can continue to use upstream functionality as much
as possible.

Upstream commit:
    Later parts of the stack (including fragmentation) expect that there is
    never a socket attached to frag in a frag_list, however this invariant
    was not enforced on all defrag paths. This could lead to the
    BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
    end of this commit message.

    While the call could be added to openvswitch to fix this particular
    error, the head and tail of the frags list are already orphaned
    indirectly inside ip_defrag(), so it seems like the remaining fragments
    should all be orphaned in all circumstances.

    kernel BUG at net/ipv4/ip_output.c:586!
    [...]
    Call Trace:
     <IRQ>
     [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
     [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
     [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
     [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
     [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
     [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
     [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
     [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
     [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
     [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
     [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
     [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
     [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
     [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
     [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
     [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
     [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
     [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
     [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
     [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
     [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
     [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
     [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
     [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
     [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
     [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
     [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
     [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
     [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
     [<ffffffff816ab4c0>] ip_output+0x70/0x110
     [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
     [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
     [<ffffffff816abf89>] ip_send_skb+0x19/0x40
     [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
     [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
     [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
     [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
     [<ffffffff816dfa06>] icmp_echo+0x36/0x70
     [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
     [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
     [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
     [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
     [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
     [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
     [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
     [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
     [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
     [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
     [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
     [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
     [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
     [<ffffffff8165e678>] process_backlog+0x78/0x230
     [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
     [<ffffffff8165e355>] net_rx_action+0x155/0x400
     [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
     [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
     [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
     <EOI>
     [<ffffffff8106b88e>] do_softirq+0x4e/0x60
     [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
     [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
     [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
     [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
     [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
     [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
     [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
     [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
     [<ffffffff816ab4c0>] ip_output+0x70/0x110
     [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
     [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
     [<ffffffff816abf89>] ip_send_skb+0x19/0x40
     [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
     [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
     [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
     [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
     [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
     [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
     [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
     [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
     [<ffffffff81204886>] ? __fget_light+0x66/0x90
     [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
     [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
     [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
    Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
     66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
    RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
     RSP <ffff88006d603170>

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
datapath/linux/compat/include/net/ip.h
datapath/linux/compat/ip_fragment.c

index 63a3f09..2e33664 100644 (file)
@@ -122,6 +122,19 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user);
 int __init rpl_ipfrag_init(void);
 void rpl_ipfrag_fini(void);
 #else /* OVS_FRAGMENT_BACKPORT */
+
+/* We have no good way to detect the presence of upstream commit 8282f27449bf
+ * ("inet: frag: Always orphan skbs inside ip_defrag()"), but it should be
+ * always included in kernels 4.5+. */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4,5,0)
+static inline int rpl_ip_defrag(struct sk_buff *skb, u32 user)
+{
+       skb_orphan(skb);
+       ip_defrag(skb, user);
+}
+#define ip_defrag rpl_ip_defrag
+#endif
+
 static inline int rpl_ipfrag_init(void) { return 0; }
 static inline void rpl_ipfrag_fini(void) { }
 #endif /* OVS_FRAGMENT_BACKPORT */
index 4181349..42514e4 100644 (file)
@@ -682,6 +682,7 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user)
        struct ipq *qp;
 
        IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
+       skb_orphan(skb);
 
        /* Lookup (or create) queue header */
        qp = ip_find(net, ip_hdr(skb), user, vif);