From b0dd663b60944a3ce86430fa35549fb37968bda0 Mon Sep 17 00:00:00 2001 From: Sonic Zhang Date: Wed, 11 Sep 2013 11:31:53 +0800 Subject: netpoll: Should handle ETH_P_ARP other than ETH_P_IP in netpoll_neigh_reply The received ARP request type in the Ethernet packet head is ETH_P_ARP other than ETH_P_IP. [ Bug introduced by commit b7394d2429c198b1da3d46ac39192e891029ec0f ("netpoll: prepare for ipv6") ] Signed-off-by: Sonic Zhang Signed-off-by: David S. Miller --- net/core/netpoll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 2c637e9a0b27..c3c7b27c112d 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -550,7 +550,7 @@ static void netpoll_neigh_reply(struct sk_buff *skb, struct netpoll_info *npinfo return; proto = ntohs(eth_hdr(skb)->h_proto); - if (proto == ETH_P_IP) { + if (proto == ETH_P_ARP) { struct arphdr *arp; unsigned char *arp_ptr; /* No arp on this interface */ -- cgit v1.2.3-59-g8ed1b From 95ee62083cb6453e056562d91f597552021e6ae7 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Wed, 11 Sep 2013 16:58:36 +0200 Subject: net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmit Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport does not seem to have the desired effect: SCTP + IPv4: 22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116) 192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72 22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340) 192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1): SCTP + IPv6: 22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364) fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp 1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10] Moreover, Alan says: This problem was seen with both Racoon and Racoon2. Other people have seen this with OpenSwan. When IPsec is configured to encrypt all upper layer protocols the SCTP connection does not initialize. After using Wireshark to follow packets, this is because the SCTP packet leaves Box A unencrypted and Box B believes all upper layer protocols are to be encrypted so it drops this packet, causing the SCTP connection to fail to initialize. When IPsec is configured to encrypt just SCTP, the SCTP packets are observed unencrypted. In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext" string on the other end, results in cleartext on the wire where SCTP eventually does not report any errors, thus in the latter case that Alan reports, the non-paranoid user might think he's communicating over an encrypted transport on SCTP although he's not (tcpdump ... -X): ... 0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000 ]p.......}.l.... 0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000 ....plaintext... Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the receiver side. Initial follow-up analysis from Alan's bug report was done by Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this. SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit(). This has the implication that it probably never really got updated along with changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers. SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since a call to inet6_csk_xmit() would solve this problem, but result in unecessary route lookups, let us just use the cached flowi6 instead that we got through sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(), we do the route lookup / flow caching in sctp_transport_route(), hold it in tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst() instead to get the correct source routed dst entry, which we assign to the skb. Also source address routing example from 625034113 ("sctp: fix sctp to work with ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095 it is actually 'recommended' to not use that anyway due to traffic amplification [1]. So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if we overwrite the flow destination here, the lower IPv6 layer will be unable to put the correct destination address into IP header, as routing header is added in ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside, result of this patch is that we do not have any XfrmInTmplMismatch increase plus on the wire with this patch it now looks like: SCTP + IPv6: 08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba: AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72 08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a: AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296 This fixes Kernel Bugzilla 24412. This security issue seems to be present since 2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have its fun with that. lksctp-tools IPv6 regression test suite passes as well with this patch. [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf Reported-by: Alan Chester Reported-by: Alexey Dobriyan Signed-off-by: Daniel Borkmann Cc: Steffen Klassert Cc: Hannes Frederic Sowa Acked-by: Vlad Yasevich Signed-off-by: David S. Miller --- net/sctp/ipv6.c | 42 +++++++++++++----------------------------- 1 file changed, 13 insertions(+), 29 deletions(-) (limited to 'net') diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index da613ceae28c..4f52e2ce263d 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -204,44 +204,23 @@ out: in6_dev_put(idev); } -/* Based on tcp_v6_xmit() in tcp_ipv6.c. */ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport) { struct sock *sk = skb->sk; struct ipv6_pinfo *np = inet6_sk(sk); - struct flowi6 fl6; - - memset(&fl6, 0, sizeof(fl6)); - - fl6.flowi6_proto = sk->sk_protocol; - - /* Fill in the dest address from the route entry passed with the skb - * and the source address from the transport. - */ - fl6.daddr = transport->ipaddr.v6.sin6_addr; - fl6.saddr = transport->saddr.v6.sin6_addr; - - fl6.flowlabel = np->flow_label; - IP6_ECN_flow_xmit(sk, fl6.flowlabel); - if (ipv6_addr_type(&fl6.saddr) & IPV6_ADDR_LINKLOCAL) - fl6.flowi6_oif = transport->saddr.v6.sin6_scope_id; - else - fl6.flowi6_oif = sk->sk_bound_dev_if; - - if (np->opt && np->opt->srcrt) { - struct rt0_hdr *rt0 = (struct rt0_hdr *) np->opt->srcrt; - fl6.daddr = *rt0->addr; - } + struct flowi6 *fl6 = &transport->fl.u.ip6; pr_debug("%s: skb:%p, len:%d, src:%pI6 dst:%pI6\n", __func__, skb, - skb->len, &fl6.saddr, &fl6.daddr); + skb->len, &fl6->saddr, &fl6->daddr); - SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS); + IP6_ECN_flow_xmit(sk, fl6->flowlabel); if (!(transport->param_flags & SPP_PMTUD_ENABLE)) skb->local_df = 1; - return ip6_xmit(sk, skb, &fl6, np->opt, np->tclass); + SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS); + + return ip6_xmit(sk, skb, fl6, np->opt, np->tclass); } /* Returns the dst cache entry for the given source and destination ip @@ -254,10 +233,12 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr, struct dst_entry *dst = NULL; struct flowi6 *fl6 = &fl->u.ip6; struct sctp_bind_addr *bp; + struct ipv6_pinfo *np = inet6_sk(sk); struct sctp_sockaddr_entry *laddr; union sctp_addr *baddr = NULL; union sctp_addr *daddr = &t->ipaddr; union sctp_addr dst_saddr; + struct in6_addr *final_p, final; __u8 matchlen = 0; __u8 bmatchlen; sctp_scope_t scope; @@ -281,7 +262,8 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr, pr_debug("src=%pI6 - ", &fl6->saddr); } - dst = ip6_dst_lookup_flow(sk, fl6, NULL, false); + final_p = fl6_update_dst(fl6, np->opt, &final); + dst = ip6_dst_lookup_flow(sk, fl6, final_p, false); if (!asoc || saddr) goto out; @@ -333,10 +315,12 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr, } } rcu_read_unlock(); + if (baddr) { fl6->saddr = baddr->v6.sin6_addr; fl6->fl6_sport = baddr->v6.sin6_port; - dst = ip6_dst_lookup_flow(sk, fl6, NULL, false); + final_p = fl6_update_dst(fl6, np->opt, &final); + dst = ip6_dst_lookup_flow(sk, fl6, final_p, false); } out: -- cgit v1.2.3-59-g8ed1b From 9a0620133ccce9dd35c00a96405c8d80938c2cc0 Mon Sep 17 00:00:00 2001 From: Chris Healy Date: Wed, 11 Sep 2013 21:37:47 -0700 Subject: resubmit bridge: fix message_age_timer calculation This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy Signed-off-by: David S. Miller --- net/bridge/br_stp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index 1c0a50f13229..f1887ba7fc43 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -209,7 +209,7 @@ static void br_record_config_information(struct net_bridge_port *p, p->designated_age = jiffies - bpdu->message_age; mod_timer(&p->message_age_timer, jiffies - + (p->br->max_age - bpdu->message_age)); + + (bpdu->max_age - bpdu->message_age)); } /* called under bridge lock */ -- cgit v1.2.3-59-g8ed1b From be4f154d5ef0ca147ab6bcd38857a774133f5450 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 12 Sep 2013 17:12:05 +1000 Subject: bridge: Clamp forward_delay when enabling STP At some point limits were added to forward_delay. However, the limits are only enforced when STP is enabled. This created a scenario where you could have a value outside the allowed range while STP is disabled, which then stuck around even after STP is enabled. This patch fixes this by clamping the value when we enable STP. I had to move the locking around a bit to ensure that there is no window where someone could insert a value outside the range while we're in the middle of enabling STP. Signed-off-by: Herbert Xu Cheers, Signed-off-by: David S. Miller --- net/bridge/br_private.h | 1 + net/bridge/br_stp.c | 21 +++++++++++++++------ net/bridge/br_stp_if.c | 12 ++++++++++-- 3 files changed, 26 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 598cb0b333c6..cda83158a21c 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -746,6 +746,7 @@ extern struct net_bridge_port *br_get_port(struct net_bridge *br, extern void br_init_port(struct net_bridge_port *p); extern void br_become_designated_port(struct net_bridge_port *p); +extern void __br_set_forward_delay(struct net_bridge *br, unsigned long t); extern int br_set_forward_delay(struct net_bridge *br, unsigned long x); extern int br_set_hello_time(struct net_bridge *br, unsigned long x); extern int br_set_max_age(struct net_bridge *br, unsigned long x); diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index f1887ba7fc43..3c86f0538cbb 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -544,18 +544,27 @@ int br_set_max_age(struct net_bridge *br, unsigned long val) } +void __br_set_forward_delay(struct net_bridge *br, unsigned long t) +{ + br->bridge_forward_delay = t; + if (br_is_root_bridge(br)) + br->forward_delay = br->bridge_forward_delay; +} + int br_set_forward_delay(struct net_bridge *br, unsigned long val) { unsigned long t = clock_t_to_jiffies(val); + int err = -ERANGE; + spin_lock_bh(&br->lock); if (br->stp_enabled != BR_NO_STP && (t < BR_MIN_FORWARD_DELAY || t > BR_MAX_FORWARD_DELAY)) - return -ERANGE; + goto unlock; - spin_lock_bh(&br->lock); - br->bridge_forward_delay = t; - if (br_is_root_bridge(br)) - br->forward_delay = br->bridge_forward_delay; + __br_set_forward_delay(br, t); + err = 0; + +unlock: spin_unlock_bh(&br->lock); - return 0; + return err; } diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index d45e760141bb..108084a04671 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -129,6 +129,14 @@ static void br_stp_start(struct net_bridge *br) char *envp[] = { NULL }; r = call_usermodehelper(BR_STP_PROG, argv, envp, UMH_WAIT_PROC); + + spin_lock_bh(&br->lock); + + if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY) + __br_set_forward_delay(br, BR_MIN_FORWARD_DELAY); + else if (br->bridge_forward_delay < BR_MAX_FORWARD_DELAY) + __br_set_forward_delay(br, BR_MAX_FORWARD_DELAY); + if (r == 0) { br->stp_enabled = BR_USER_STP; br_debug(br, "userspace STP started\n"); @@ -137,10 +145,10 @@ static void br_stp_start(struct net_bridge *br) br_debug(br, "using kernel STP\n"); /* To start timers on any ports left in blocking */ - spin_lock_bh(&br->lock); br_port_state_selection(br); - spin_unlock_bh(&br->lock); } + + spin_unlock_bh(&br->lock); } static void br_stp_stop(struct net_bridge *br) -- cgit v1.2.3-59-g8ed1b From d830f0fa1dd7ca447c38aec82cd44230e0b7ca75 Mon Sep 17 00:00:00 2001 From: Phil Oester Date: Thu, 12 Sep 2013 18:04:16 -0700 Subject: netfilter: nf_nat_proto_icmpv6:: fix wrong comparison in icmpv6_manip_pkt In commit 58a317f1 (netfilter: ipv6: add IPv6 NAT support), icmpv6_manip_pkt was added with an incorrect comparison of ICMP codes to types. This causes problems when using NAT rules with the --random option. Correct the comparison. This closes netfilter bugzilla #851, reported by Alexander Neumann. Signed-off-by: Phil Oester Signed-off-by: Pablo Neira Ayuso --- net/ipv6/netfilter/nf_nat_proto_icmpv6.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/ipv6/netfilter/nf_nat_proto_icmpv6.c b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c index 61aaf70f376e..2205e8eeeacf 100644 --- a/net/ipv6/netfilter/nf_nat_proto_icmpv6.c +++ b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c @@ -69,8 +69,8 @@ icmpv6_manip_pkt(struct sk_buff *skb, hdr = (struct icmp6hdr *)(skb->data + hdroff); l3proto->csum_update(skb, iphdroff, &hdr->icmp6_cksum, tuple, maniptype); - if (hdr->icmp6_code == ICMPV6_ECHO_REQUEST || - hdr->icmp6_code == ICMPV6_ECHO_REPLY) { + if (hdr->icmp6_type == ICMPV6_ECHO_REQUEST || + hdr->icmp6_type == ICMPV6_ECHO_REPLY) { inet_proto_csum_replace2(&hdr->icmp6_cksum, skb, hdr->icmp6_identifier, tuple->src.u.icmp.id, 0); -- cgit v1.2.3-59-g8ed1b From 1fb1754a8c70d69ab480763c423e0a74369c4a67 Mon Sep 17 00:00:00 2001 From: Hong Zhiguo Date: Sat, 14 Sep 2013 22:42:27 +0800 Subject: bridge: use br_port_get_rtnl within rtnl lock current br_port_get_rcu is problematic in bridging path (NULL deref). Change these calls in netlink path first. Signed-off-by: Hong Zhiguo Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- net/bridge/br_netlink.c | 4 ++-- net/bridge/br_private.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index b9259efa636e..e74ddc1c29a8 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -207,7 +207,7 @@ int br_getlink(struct sk_buff *skb, u32 pid, u32 seq, struct net_device *dev, u32 filter_mask) { int err = 0; - struct net_bridge_port *port = br_port_get_rcu(dev); + struct net_bridge_port *port = br_port_get_rtnl(dev); /* not a bridge port and */ if (!port && !(filter_mask & RTEXT_FILTER_BRVLAN)) @@ -451,7 +451,7 @@ static size_t br_get_link_af_size(const struct net_device *dev) struct net_port_vlans *pv; if (br_port_exists(dev)) - pv = nbp_get_vlan_info(br_port_get_rcu(dev)); + pv = nbp_get_vlan_info(br_port_get_rtnl(dev)); else if (dev->priv_flags & IFF_EBRIDGE) pv = br_get_vlan_info((struct net_bridge *)netdev_priv(dev)); else diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index cda83158a21c..dd583177cba4 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -208,7 +208,7 @@ static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *d return br_port_exists(dev) ? port : NULL; } -static inline struct net_bridge_port *br_port_get_rtnl(struct net_device *dev) +static inline struct net_bridge_port *br_port_get_rtnl(const struct net_device *dev) { return br_port_exists(dev) ? rtnl_dereference(dev->rx_handler_data) : NULL; -- cgit v1.2.3-59-g8ed1b From 716ec052d2280d511e10e90ad54a86f5b5d4dcc2 Mon Sep 17 00:00:00 2001 From: Hong Zhiguo Date: Sat, 14 Sep 2013 22:42:28 +0800 Subject: bridge: fix NULL pointer deref of br_port_get_rcu The NULL deref happens when br_handle_frame is called between these 2 lines of del_nbp: dev->priv_flags &= ~IFF_BRIDGE_PORT; /* --> br_handle_frame is called at this time */ netdev_rx_handler_unregister(dev); In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced without check but br_port_get_rcu(dev) returns NULL if: !(dev->priv_flags & IFF_BRIDGE_PORT) Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary here since we're in rcu_read_lock and we have synchronize_net() in netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT and by the previous patch, make sure br_port_get_rcu is called in bridging code. Signed-off-by: Hong Zhiguo Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- net/bridge/br_private.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'net') diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index dd583177cba4..efb57d911569 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -202,10 +202,7 @@ struct net_bridge_port static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) { - struct net_bridge_port *port = - rcu_dereference_rtnl(dev->rx_handler_data); - - return br_port_exists(dev) ? port : NULL; + return rcu_dereference(dev->rx_handler_data); } static inline struct net_bridge_port *br_port_get_rtnl(const struct net_device *dev) -- cgit v1.2.3-59-g8ed1b From 55524c219aa803887d1c247853842a9566598cba Mon Sep 17 00:00:00 2001 From: Jozsef Kadlecsik Date: Mon, 16 Sep 2013 20:00:08 +0200 Subject: netfilter: ipset: Skip really non-first fragments for IPv6 when getting port/protocol Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_getport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/netfilter/ipset/ip_set_getport.c b/net/netfilter/ipset/ip_set_getport.c index 6fdf88ae2353..dac156f819ac 100644 --- a/net/netfilter/ipset/ip_set_getport.c +++ b/net/netfilter/ipset/ip_set_getport.c @@ -116,12 +116,12 @@ ip_set_get_ip6_port(const struct sk_buff *skb, bool src, { int protoff; u8 nexthdr; - __be16 frag_off; + __be16 frag_off = 0; nexthdr = ipv6_hdr(skb)->nexthdr; protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, &frag_off); - if (protoff < 0) + if (protoff < 0 || (frag_off & htons(~0x7)) != 0) return false; return get_port(skb, nexthdr, protoff, src, port, proto); -- cgit v1.2.3-59-g8ed1b From 0f1799ba1a5db4c48b72ac2da2dc70d8c190a73d Mon Sep 17 00:00:00 2001 From: Jozsef Kadlecsik Date: Mon, 16 Sep 2013 20:04:53 +0200 Subject: netfilter: ipset: Consistent userspace testing with nomatch flag The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test. Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 6 ++++-- net/netfilter/ipset/ip_set_core.c | 3 +-- net/netfilter/ipset/ip_set_hash_ipportnet.c | 4 ++-- net/netfilter/ipset/ip_set_hash_net.c | 4 ++-- net/netfilter/ipset/ip_set_hash_netiface.c | 4 ++-- net/netfilter/ipset/ip_set_hash_netport.c | 4 ++-- 6 files changed, 13 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index d80e2753847c..9ac9fbde7b61 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -296,10 +296,12 @@ ip_set_eexist(int ret, u32 flags) /* Match elements marked with nomatch */ static inline bool -ip_set_enomatch(int ret, u32 flags, enum ipset_adt adt) +ip_set_enomatch(int ret, u32 flags, enum ipset_adt adt, struct ip_set *set) { return adt == IPSET_TEST && - ret == -ENOTEMPTY && ((flags >> 16) & IPSET_FLAG_NOMATCH); + (set->type->features & IPSET_TYPE_NOMATCH) && + ((flags >> 16) & IPSET_FLAG_NOMATCH) && + (ret > 0 || ret == -ENOTEMPTY); } /* Check the NLA_F_NET_BYTEORDER flag */ diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index f77139007983..c8c303c3386f 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1489,8 +1489,7 @@ ip_set_utest(struct sock *ctnl, struct sk_buff *skb, if (ret == -EAGAIN) ret = 1; - return (ret < 0 && ret != -ENOTEMPTY) ? ret : - ret > 0 ? 0 : -IPSET_ERR_EXIST; + return ret > 0 ? 0 : -IPSET_ERR_EXIST; } /* Get headed data of a set */ diff --git a/net/netfilter/ipset/ip_set_hash_ipportnet.c b/net/netfilter/ipset/ip_set_hash_ipportnet.c index c6a525373be4..f15f3e28b9c3 100644 --- a/net/netfilter/ipset/ip_set_hash_ipportnet.c +++ b/net/netfilter/ipset/ip_set_hash_ipportnet.c @@ -260,7 +260,7 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], e.ip = htonl(ip); e.ip2 = htonl(ip2_from & ip_set_hostmask(e.cidr + 1)); ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } @@ -544,7 +544,7 @@ hash_ipportnet6_uadt(struct ip_set *set, struct nlattr *tb[], if (adt == IPSET_TEST || !with_ports || !tb[IPSET_ATTR_PORT_TO]) { ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } diff --git a/net/netfilter/ipset/ip_set_hash_net.c b/net/netfilter/ipset/ip_set_hash_net.c index da740ceb56ae..223e9f546d0f 100644 --- a/net/netfilter/ipset/ip_set_hash_net.c +++ b/net/netfilter/ipset/ip_set_hash_net.c @@ -199,7 +199,7 @@ hash_net4_uadt(struct ip_set *set, struct nlattr *tb[], if (adt == IPSET_TEST || !tb[IPSET_ATTR_IP_TO]) { e.ip = htonl(ip & ip_set_hostmask(e.cidr)); ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret: ip_set_eexist(ret, flags) ? 0 : ret; } @@ -396,7 +396,7 @@ hash_net6_uadt(struct ip_set *set, struct nlattr *tb[], ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c index 84ae6f6ce624..7d798d5d5cd3 100644 --- a/net/netfilter/ipset/ip_set_hash_netiface.c +++ b/net/netfilter/ipset/ip_set_hash_netiface.c @@ -368,7 +368,7 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], if (adt == IPSET_TEST || !tb[IPSET_ATTR_IP_TO]) { e.ip = htonl(ip & ip_set_hostmask(e.cidr)); ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } @@ -634,7 +634,7 @@ hash_netiface6_uadt(struct ip_set *set, struct nlattr *tb[], ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } diff --git a/net/netfilter/ipset/ip_set_hash_netport.c b/net/netfilter/ipset/ip_set_hash_netport.c index 9a0869853be5..09d6690bee6f 100644 --- a/net/netfilter/ipset/ip_set_hash_netport.c +++ b/net/netfilter/ipset/ip_set_hash_netport.c @@ -244,7 +244,7 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], if (adt == IPSET_TEST || !(with_ports || tb[IPSET_ATTR_IP_TO])) { e.ip = htonl(ip & ip_set_hostmask(e.cidr + 1)); ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } @@ -489,7 +489,7 @@ hash_netport6_uadt(struct ip_set *set, struct nlattr *tb[], if (adt == IPSET_TEST || !with_ports || !tb[IPSET_ATTR_PORT_TO]) { ret = adtfn(set, &e, &ext, &ext, flags); - return ip_set_enomatch(ret, flags, adt) ? 1 : + return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } -- cgit v1.2.3-59-g8ed1b From 169faa2e19478b02027df04582ec7543dba1dd16 Mon Sep 17 00:00:00 2001 From: Jozsef Kadlecsik Date: Mon, 16 Sep 2013 20:07:35 +0200 Subject: netfilter: ipset: Validate the set family and not the set type family at swapping This closes netfilter bugzilla #843, reported by Quentin Armitage. Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index c8c303c3386f..f2e30fb31e78 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1052,7 +1052,7 @@ ip_set_swap(struct sock *ctnl, struct sk_buff *skb, * Not an artificial restriction anymore, as we must prevent * possible loops created by swapping in setlist type of sets. */ if (!(from->type->features == to->type->features && - from->type->family == to->type->family)) + from->family == to->family)) return -IPSET_ERR_TYPE_MISMATCH; strncpy(from_name, from->name, IPSET_MAXNAMELEN); -- cgit v1.2.3-59-g8ed1b From 2cf55125c64d64cc106e204d53b107094762dfdf Mon Sep 17 00:00:00 2001 From: Oliver Smith Date: Mon, 16 Sep 2013 20:30:57 +0200 Subject: netfilter: ipset: Fix serious failure in CIDR tracking This fixes a serious bug affecting all hash types with a net element - specifically, if a CIDR value is deleted such that none of the same size exist any more, all larger (less-specific) values will then fail to match. Adding back any prefix with a CIDR equal to or more specific than the one deleted will fix it. Steps to reproduce: ipset -N test hash:net ipset -A test 1.1.0.0/16 ipset -A test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS in set ipset -D test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set This is due to the fact that the nets counter was unconditionally decremented prior to the iteration that shifts up the entries. Now, we first check if there is a proceeding entry and if not, decrement it and return. Otherwise, we proceed to iterate and then zero the last element, which, in most cases, will already be zero. Signed-off-by: Oliver Smith Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 57beb1762b2d..707bc520d629 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -325,18 +325,22 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length) static void mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length) { - u8 i, j; - - for (i = 0; i < nets_length - 1 && h->nets[i].cidr != cidr; i++) - ; - h->nets[i].nets--; - - if (h->nets[i].nets != 0) - return; - - for (j = i; j < nets_length - 1 && h->nets[j].nets; j++) { - h->nets[j].cidr = h->nets[j + 1].cidr; - h->nets[j].nets = h->nets[j + 1].nets; + u8 i, j, net_end = nets_length - 1; + + for (i = 0; i < nets_length; i++) { + if (h->nets[i].cidr != cidr) + continue; + if (h->nets[i].nets > 1 || i == net_end || + h->nets[i + 1].nets == 0) { + h->nets[i].nets--; + return; + } + for (j = i; j < net_end && h->nets[j].nets; j++) { + h->nets[j].cidr = h->nets[j + 1].cidr; + h->nets[j].nets = h->nets[j + 1].nets; + } + h->nets[j].nets = 0; + return; } } #endif -- cgit v1.2.3-59-g8ed1b From 0d2ede929f61783aebfb9228e4d32a0546ee4d23 Mon Sep 17 00:00:00 2001 From: Ding Zhi Date: Mon, 16 Sep 2013 11:31:15 +0200 Subject: ip6_tunnels: raddr and laddr are inverted in nl msg IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted. Introduced by c075b13098b3 (ip6tnl: advertise tunnel param via rtnl). Signed-off-by: Ding Zhi Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller --- net/ipv6/ip6_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 61355f7f4da5..2d8f4829575b 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1656,9 +1656,9 @@ static int ip6_tnl_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put_u32(skb, IFLA_IPTUN_LINK, parm->link) || nla_put(skb, IFLA_IPTUN_LOCAL, sizeof(struct in6_addr), - &parm->raddr) || - nla_put(skb, IFLA_IPTUN_REMOTE, sizeof(struct in6_addr), &parm->laddr) || + nla_put(skb, IFLA_IPTUN_REMOTE, sizeof(struct in6_addr), + &parm->raddr) || nla_put_u8(skb, IFLA_IPTUN_TTL, parm->hop_limit) || nla_put_u8(skb, IFLA_IPTUN_ENCAP_LIMIT, parm->encap_limit) || nla_put_be32(skb, IFLA_IPTUN_FLOWINFO, parm->flowinfo) || -- cgit v1.2.3-59-g8ed1b From 3f96a532113131d5a65ac9e00fc83cfa31b0295f Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 16 Sep 2013 12:36:02 +0200 Subject: net: sctp: rfc4443: do not report ICMP redirects to user space Adapt the same behaviour for SCTP as present in TCP for ICMP redirect messages. For IPv6, RFC4443, section 2.4. says: ... (e) An ICMPv6 error message MUST NOT be originated as a result of receiving the following: ... (e.2) An ICMPv6 redirect message [IPv6-DISC]. ... Therefore, do not report an error to user space, just invoke dst's redirect callback and leave, same for IPv4 as done in TCP as well. The implication w/o having this patch could be that the reception of such packets would generate a poll notification and in worst case it could even tear down the whole connection. Therefore, stop updating sk_err on redirects. Reported-by: Duan Jiong Reported-by: Hannes Frederic Sowa Suggested-by: Vlad Yasevich Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller --- net/sctp/input.c | 3 +-- net/sctp/ipv6.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/sctp/input.c b/net/sctp/input.c index 5f2068679f83..98b69bbecdd9 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -634,8 +634,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info) break; case ICMP_REDIRECT: sctp_icmp_redirect(sk, transport, skb); - err = 0; - break; + /* Fall through to out_unlock. */ default: goto out_unlock; } diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 4f52e2ce263d..e7b2d4fe2b6a 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -183,7 +183,7 @@ static void sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, break; case NDISC_REDIRECT: sctp_icmp_redirect(sk, transport, skb); - break; + goto out_unlock; default: break; } -- cgit v1.2.3-59-g8ed1b From 0a0d80eb39aa465b7bdf6f7754d0ba687eb3d2a7 Mon Sep 17 00:00:00 2001 From: Gao feng Date: Tue, 17 Sep 2013 13:03:47 +0200 Subject: netfilter: nfnetlink_queue: use network skb for sequence adjustment Instead of the netlink skb. Signed-off-by: Gao feng Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nfnetlink_queue_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 95a98c8c1da6..ae2e5c11d01a 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -1009,7 +1009,7 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb, verdict = NF_DROP; if (ct) - nfqnl_ct_seq_adjust(skb, ct, ctinfo, diff); + nfqnl_ct_seq_adjust(entry->skb, ct, ctinfo, diff); } if (nfqa[NFQA_MARK]) -- cgit v1.2.3-59-g8ed1b From 4c18c425b2d228415b635e97a64737d7f27c5536 Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Wed, 11 Sep 2013 19:14:44 +0200 Subject: batman-adv: set the TAG flag for the vid passed to BLA When receiving or sending a packet a packet on a VLAN, the vid has to be marked with the TAG flag in order to make any component in batman-adv understand that the packet is coming from a really tagged network. This fix the Bridge Loop Avoidance behaviour which was not able to send announces over VLAN interfaces. Introduced by 0b1da1765fdb00ca5d53bc95c9abc70dfc9aae5b ("batman-adv: change VID semantic in the BLA code") Signed-off-by: Antonio Quartulli Acked-by: Simon Wunderlich Signed-off-by: Marek Lindner --- net/batman-adv/soft-interface.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 4493913f0d5c..813db4e64602 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -168,6 +168,7 @@ static int batadv_interface_tx(struct sk_buff *skb, case ETH_P_8021Q: vhdr = (struct vlan_ethhdr *)skb->data; vid = ntohs(vhdr->h_vlan_TCI) & VLAN_VID_MASK; + vid |= BATADV_VLAN_HAS_TAG; if (vhdr->h_vlan_encapsulated_proto != ethertype) break; @@ -331,6 +332,7 @@ void batadv_interface_rx(struct net_device *soft_iface, case ETH_P_8021Q: vhdr = (struct vlan_ethhdr *)skb->data; vid = ntohs(vhdr->h_vlan_TCI) & VLAN_VID_MASK; + vid |= BATADV_VLAN_HAS_TAG; if (vhdr->h_vlan_encapsulated_proto != ethertype) break; -- cgit v1.2.3-59-g8ed1b From 269aa759b474570fa642452742741525cfc226a9 Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Mon, 16 Sep 2013 21:44:20 -0400 Subject: tcp: fix RTO calculated from cached RTT Commit 1b7fdd2ab5852 ("tcp: do not use cached RTT for RTT estimation") did not correctly account for the fact that crtt is the RTT shifted left 3 bits. Fix the calculation to consistently reflect this fact. Signed-off-by: Neal Cardwell Cc: Eric Dumazet Cc: Yuchung Cheng Acked-by: Eric Dumazet Acked-By: Yuchung Cheng Signed-off-by: David S. Miller --- net/ipv4/tcp_metrics.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 4a22f3e715df..52f3c6b971d2 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -502,7 +502,9 @@ reset: * ACKs, wait for troubles. */ if (crtt > tp->srtt) { - inet_csk(sk)->icsk_rto = crtt + max(crtt >> 2, tcp_rto_min(sk)); + /* Set RTO like tcp_rtt_estimator(), but from cached RTT. */ + crtt >>= 3; + inet_csk(sk)->icsk_rto = crtt + max(2 * crtt, tcp_rto_min(sk)); } else if (tp->srtt == 0) { /* RFC6298: 5.7 We've failed to get a valid RTT sample from * 3WHS. This is most likely due to retransmission, -- cgit v1.2.3-59-g8ed1b From bd784a140712fd06674f2240eecfc4ccae421129 Mon Sep 17 00:00:00 2001 From: Duan Jiong Date: Wed, 18 Sep 2013 20:03:27 +0800 Subject: net:dccp: do not report ICMP redirects to user space DCCP shouldn't be setting sk_err on redirects as it isn't an error condition. it should be doing exactly what tcp is doing and leaving the error handler without touching the socket. Signed-off-by: Duan Jiong Signed-off-by: David S. Miller --- net/dccp/ipv6.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net') diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 9c61f9c02fdb..6cf9f7782ad4 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -135,6 +135,7 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, if (dst) dst->ops->redirect(dst, sk, skb); + goto out; } if (type == ICMPV6_PKT_TOOBIG) { -- cgit v1.2.3-59-g8ed1b From 749154aa56b57652a282cbde57a57abc278d1205 Mon Sep 17 00:00:00 2001 From: Ansis Atteka Date: Wed, 18 Sep 2013 15:29:52 -0700 Subject: ip: use ip_hdr() in __ip_make_skb() to retrieve IP header skb->data already points to IP header, but for the sake of consistency we can also use ip_hdr() to retrieve it. Signed-off-by: Ansis Atteka Signed-off-by: David S. Miller --- net/ipv4/ip_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 9ee17e3d11c3..eae2e262fbe5 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1316,7 +1316,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, else ttl = ip_select_ttl(inet, &rt->dst); - iph = (struct iphdr *)skb->data; + iph = ip_hdr(skb); iph->version = 4; iph->ihl = 5; iph->tos = inet->tos; -- cgit v1.2.3-59-g8ed1b From 703133de331a7a7df47f31fb9de51dc6f68a9de8 Mon Sep 17 00:00:00 2001 From: Ansis Atteka Date: Wed, 18 Sep 2013 15:29:53 -0700 Subject: ip: generate unique IP identificator if local fragmentation is allowed If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka Signed-off-by: David S. Miller --- drivers/net/ppp/pptp.c | 2 +- include/net/ip.h | 12 ++++++++---- net/ipv4/igmp.c | 4 ++-- net/ipv4/inetpeer.c | 4 ++-- net/ipv4/ip_output.c | 6 +++--- net/ipv4/ipmr.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv4/xfrm4_mode_tunnel.c | 2 +- net/netfilter/ipvs/ip_vs_xmit.c | 2 +- 9 files changed, 20 insertions(+), 16 deletions(-) (limited to 'net') diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c index 6fa5ae00039f..01805319e1e0 100644 --- a/drivers/net/ppp/pptp.c +++ b/drivers/net/ppp/pptp.c @@ -281,7 +281,7 @@ static int pptp_xmit(struct ppp_channel *chan, struct sk_buff *skb) nf_reset(skb); skb->ip_summed = CHECKSUM_NONE; - ip_select_ident(iph, &rt->dst, NULL); + ip_select_ident(skb, &rt->dst, NULL); ip_send_check(iph); ip_local_out(skb); diff --git a/include/net/ip.h b/include/net/ip.h index 48f55979d842..5e5268807a1c 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -264,9 +264,11 @@ int ip_dont_fragment(struct sock *sk, struct dst_entry *dst) extern void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more); -static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst, struct sock *sk) +static inline void ip_select_ident(struct sk_buff *skb, struct dst_entry *dst, struct sock *sk) { - if (iph->frag_off & htons(IP_DF)) { + struct iphdr *iph = ip_hdr(skb); + + if ((iph->frag_off & htons(IP_DF)) && !skb->local_df) { /* This is only to work around buggy Windows95/2000 * VJ compression implementations. If the ID field * does not change, they drop every other packet in @@ -278,9 +280,11 @@ static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst, str __ip_select_ident(iph, dst, 0); } -static inline void ip_select_ident_more(struct iphdr *iph, struct dst_entry *dst, struct sock *sk, int more) +static inline void ip_select_ident_more(struct sk_buff *skb, struct dst_entry *dst, struct sock *sk, int more) { - if (iph->frag_off & htons(IP_DF)) { + struct iphdr *iph = ip_hdr(skb); + + if ((iph->frag_off & htons(IP_DF)) && !skb->local_df) { if (sk && inet_sk(sk)->inet_daddr) { iph->id = htons(inet_sk(sk)->inet_id); inet_sk(sk)->inet_id += 1 + more; diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index d6c0e64ec97f..dace87f06e5f 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -369,7 +369,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size) pip->saddr = fl4.saddr; pip->protocol = IPPROTO_IGMP; pip->tot_len = 0; /* filled in later */ - ip_select_ident(pip, &rt->dst, NULL); + ip_select_ident(skb, &rt->dst, NULL); ((u8 *)&pip[1])[0] = IPOPT_RA; ((u8 *)&pip[1])[1] = 4; ((u8 *)&pip[1])[2] = 0; @@ -714,7 +714,7 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc, iph->daddr = dst; iph->saddr = fl4.saddr; iph->protocol = IPPROTO_IGMP; - ip_select_ident(iph, &rt->dst, NULL); + ip_select_ident(skb, &rt->dst, NULL); ((u8 *)&iph[1])[0] = IPOPT_RA; ((u8 *)&iph[1])[1] = 4; ((u8 *)&iph[1])[2] = 0; diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c index 000e3d239d64..33d5537881ed 100644 --- a/net/ipv4/inetpeer.c +++ b/net/ipv4/inetpeer.c @@ -32,8 +32,8 @@ * At the moment of writing this notes identifier of IP packets is generated * to be unpredictable using this code only for packets subjected * (actually or potentially) to defragmentation. I.e. DF packets less than - * PMTU in size uses a constant ID and do not use this code (see - * ip_select_ident() in include/net/ip.h). + * PMTU in size when local fragmentation is disabled use a constant ID and do + * not use this code (see ip_select_ident() in include/net/ip.h). * * Route cache entries hold references to our nodes. * New cache entries get references via lookup by destination IP address in diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index eae2e262fbe5..a04d872c54f9 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -148,7 +148,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk, iph->daddr = (opt && opt->opt.srr ? opt->opt.faddr : daddr); iph->saddr = saddr; iph->protocol = sk->sk_protocol; - ip_select_ident(iph, &rt->dst, sk); + ip_select_ident(skb, &rt->dst, sk); if (opt && opt->opt.optlen) { iph->ihl += opt->opt.optlen>>2; @@ -386,7 +386,7 @@ packet_routed: ip_options_build(skb, &inet_opt->opt, inet->inet_daddr, rt, 0); } - ip_select_ident_more(iph, &rt->dst, sk, + ip_select_ident_more(skb, &rt->dst, sk, (skb_shinfo(skb)->gso_segs ?: 1) - 1); skb->priority = sk->sk_priority; @@ -1324,7 +1324,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, iph->ttl = ttl; iph->protocol = sk->sk_protocol; ip_copy_addrs(iph, fl4); - ip_select_ident(iph, &rt->dst, sk); + ip_select_ident(skb, &rt->dst, sk); if (opt) { iph->ihl += opt->optlen>>2; diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 9ae54b09254f..62212c772a4b 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -1658,7 +1658,7 @@ static void ip_encap(struct sk_buff *skb, __be32 saddr, __be32 daddr) iph->protocol = IPPROTO_IPIP; iph->ihl = 5; iph->tot_len = htons(skb->len); - ip_select_ident(iph, skb_dst(skb), NULL); + ip_select_ident(skb, skb_dst(skb), NULL); ip_send_check(iph); memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index a86c7ae71881..bfec521c717f 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -387,7 +387,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, iph->check = 0; iph->tot_len = htons(length); if (!iph->id) - ip_select_ident(iph, &rt->dst, NULL); + ip_select_ident(skb, &rt->dst, NULL); iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl); } diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c index eb1dd4d643f2..b5663c37f089 100644 --- a/net/ipv4/xfrm4_mode_tunnel.c +++ b/net/ipv4/xfrm4_mode_tunnel.c @@ -117,7 +117,7 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb) top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ? 0 : (XFRM_MODE_SKB_CB(skb)->frag_off & htons(IP_DF)); - ip_select_ident(top_iph, dst->child, NULL); + ip_select_ident(skb, dst->child, NULL); top_iph->ttl = ip4_dst_hoplimit(dst->child); diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index b75ff6429a04..c47444e4cf8c 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -883,7 +883,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, iph->daddr = cp->daddr.ip; iph->saddr = saddr; iph->ttl = old_iph->ttl; - ip_select_ident(iph, &rt->dst, NULL); + ip_select_ident(skb, &rt->dst, NULL); /* Another hack: avoid icmp_send in ip_fragment */ skb->local_df = 1; -- cgit v1.2.3-59-g8ed1b From d0fe8c888b1fd1a2f84b9962cabcb98a70988aec Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Thu, 19 Sep 2013 15:02:35 +0200 Subject: netpoll: fix NULL pointer dereference in netpoll_cleanup I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- net/core/netpoll.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/core/netpoll.c b/net/core/netpoll.c index c3c7b27c112d..fc75c9e461b8 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -1284,15 +1284,14 @@ EXPORT_SYMBOL_GPL(__netpoll_free_async); void netpoll_cleanup(struct netpoll *np) { - if (!np->dev) - return; - rtnl_lock(); + if (!np->dev) + goto out; __netpoll_cleanup(np); - rtnl_unlock(); - dev_put(np->dev); np->dev = NULL; +out: + rtnl_unlock(); } EXPORT_SYMBOL(netpoll_cleanup); -- cgit v1.2.3-59-g8ed1b