ip_fragment: don't forward defragmented DF packet

We currently always send fragments without DF bit set. Thus, given following setup: mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280 A R1 R2 B Where R1 and R2 run linux with netfilter defragmentation/conntrack enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1 will respond with icmp too big error if one of these fragments exceeded 1400 bytes. However, if R1 receives fragment sizes 1200 and 100, it would forward the reassembled packet without refragmenting, i.e. R2 will send an icmp error in response to a packet that was never sent, citing mtu that the original sender never exceeded. The other minor issue is that a refragmentation on R1 will conceal the MTU of R2-B since refragmentation does not set DF bit on the fragments. This modifies ip_fragment so that we track largest fragment size seen both for DF and non-DF packets, and set frag_max_size to the largest value. If the DF fragment size is larger or equal to the non-df one, we will consider the packet a path mtu probe: We set DF bit on the reassembled skb and also tag it with a new IPCB flag to force refragmentation even if skb fits outdev mtu. We will also set DF bit on each fragment in this case. Joint work with Hannes Frederic Sowa. Reported-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
author: Florian Westphal <fw@strlen.de> 2015-05-22 16:32:51 +0200
committer: David S. Miller <davem@davemloft.net> 2015-05-27 13:03:31 -0400
commit: d6b915e29f4adea94bc02ba7675bb4f84e6a1abd (patch)
tree: 7e5f00b4c156f9549b8b3eeea92ce2a12dd1f0da /net/ipv4/ip_output.c
parent: net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper (diff)
download: linux-dev-d6b915e29f4adea94bc02ba7675bb4f84e6a1abd.tar.xz
linux-dev-d6b915e29f4adea94bc02ba7675bb4f84e6a1abd.zip
1 files changed, 10 insertions, 2 deletions
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d6dd8ba04441..f5f5ef1cebd5 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -278,7 +278,7 @@ static int ip_finish_output(struct sock *sk, struct sk_buff *skb)
 	if (skb_is_gso(skb))
 		return ip_finish_output_gso(sk, skb, mtu);
 
-	if (skb->len > mtu)
+	if (skb->len > mtu || (IPCB(skb)->flags & IPSKB_FRAG_PMTU))
 		return ip_fragment(sk, skb, mtu, ip_finish_output2);
 
 	return ip_finish_output2(sk, skb);
@@ -492,7 +492,10 @@ static int ip_fragment(struct sock *sk, struct sk_buff *skb,
 {
 	struct iphdr *iph = ip_hdr(skb);
 
-	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) ||
+	if ((iph->frag_off & htons(IP_DF)) == 0)
+		return ip_do_fragment(sk, skb, output);
+
+	if (unlikely(!skb->ignore_df ||
 		     (IPCB(skb)->frag_max_size &&
 		      IPCB(skb)->frag_max_size > mtu))) {
 		struct rtable *rt = skb_rtable(skb);
@@ -537,6 +540,8 @@ int ip_do_fragment(struct sock *sk, struct sk_buff *skb,
 	iph = ip_hdr(skb);
 
 	mtu = ip_skb_dst_mtu(skb);
+	if (IPCB(skb)->frag_max_size && IPCB(skb)->frag_max_size < mtu)
+		mtu = IPCB(skb)->frag_max_size;
 
 	/*
 	 *	Setup starting values.
@@ -732,6 +737,9 @@ slow_path:
 		iph = ip_hdr(skb2);
 		iph->frag_off = htons((offset >> 3));
 
+		if (IPCB(skb)->flags & IPSKB_FRAG_PMTU)
+			iph->frag_off |= htons(IP_DF);
+
 		/* ANK: dirty, but effective trick. Upgrade options only if
 		 * the segment to be fragmented was THE FIRST (otherwise,
 		 * options are already fixed) and make it ONCE
author	Florian Westphal <fw@strlen.de>	2015-05-22 16:32:51 +0200
committer	David S. Miller <davem@davemloft.net>	2015-05-27 13:03:31 -0400
commit	d6b915e29f4adea94bc02ba7675bb4f84e6a1abd (patch)
tree	7e5f00b4c156f9549b8b3eeea92ce2a12dd1f0da /net/ipv4/ip_output.c
parent	net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper (diff)
download	linux-dev-d6b915e29f4adea94bc02ba7675bb4f84e6a1abd.tar.xz linux-dev-d6b915e29f4adea94bc02ba7675bb4f84e6a1abd.zip