aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-10-19batman-adv: use vid when computing local and global TT CRCAntonio Quartulli1-4/+31
now that each TT entry is characterised by a VLAN ID, the latter has to be taken into consideration when computing the local/global table CRC as it would be theoretically possible to have the same client in two different VLANs Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: add the VLAN ID attribute to the TT entryAntonio Quartulli13-135/+312
To make the translation table code VLAN-aware, each entry must carry the VLAN ID which it belongs to. This patch adds such attribute to the related TT structures. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: update email address for Marek LindnerMarek Lindner1-1/+1
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-19batman-adv: update email address for Simon WunderlichSimon Wunderlich1-1/+1
My university will stop email service for alumni in january 2014, please use my new e-mail address instead. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-19batman-adv: check skb preparation return valueAntonio Quartulli1-3/+6
Fix bogus merge conflict resolution by checking the return values of the skb preparation routines. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-18em_ipset: use dev_net() accessorstephen hemminger1-2/+2
Randy found that if network namespace not enabled then nd_net does not exist and would cause compilation failure. This is handled correctly by using the dev_net() macro. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tcp: remove redundant code in __tcp_retransmit_skb()Neal Cardwell1-15/+0
Remove the specialized code in __tcp_retransmit_skb() that tries to trim any ACKed payload preceding a FIN before we retransmit (this was added in 1999 in v2.2.3pre3). This trimming code was made unreachable by the more general code added above it that uses tcp_trim_head() to trim any ACKed payload, with or without a FIN (this was added in "[NET]: Add segmentation offload support to TCP." in 2002 circa v2.5.33). Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Fix updating FDB entries when the PVID is appliedToshiaki Makita1-0/+1
We currently set the value that variable vid is pointing, which will be used in FDB later, to 0 at br_allowed_ingress() when we receive untagged or priority-tagged frames, even though the PVID is valid. This leads to FDB updates in such a wrong way that they are learned with VID 0. Update the value to that of PVID if the PVID is applied. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Fix the way the PVID is referencedToshiaki Makita1-3/+1
We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is set or not at br_get_pvid(), while we don't care about the bit in adding/deleting the PVID, which makes it impossible to forward any incomming untagged frame with vlan_filtering enabled. Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate that the PVID is not set, which is slightly more efficient than using the VLAN_TAG_PRESENT. Fix the problem by getting rid of using the VLAN_TAG_PRESENT. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Apply the PVID to priority-tagged framesToshiaki Makita1-7/+20
IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames use the PVID for the port as its VID. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Apply the PVID to not only untagged frames but also priority-tagged frames. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18bridge: Don't use VID 0 and 4095 in vlan filteringToshiaki Makita3-54/+49
IEEE 802.1Q says that: - VID 0 shall not be configured as a PVID, or configured in any Filtering Database entry. - VID 4095 shall not be configured as a PVID, or transmitted in a tag header. This VID value may be used to indicate a wildcard match for the VID in management operations or Filtering Database entries. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Don't accept adding these VIDs in the vlan_filtering implementation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18fib: Use const struct nl_info * in rtmsg_fibJoe Perches2-2/+2
The rtmsg_fib function doesn't modify this argument so mark it const. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18ax25: cleanup a range testDan Carpenter1-1/+1
The current test works fine in practice. The "amount" variable is actually used as a boolean so negative values or any non-zero values count as "true". However since we don't allow numbers greater than one, let's not allow negative numbers either. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18fib_trie: remove duplicated rcu lockbaker.zhang1-2/+0
fib_table_lookup has included the rcu lock protection. Signed-off-by: baker.zhang <baker.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-nextDavid S. Miller5-2/+1139
Steffen Klassert says: ==================== 1) Don't use a wildcard SA if a more precise one is in acquire state, from Fan Du. 2) Simplify the SA lookup when using wildcard source. We need to check only the destination in this case, from Fan Du. 3) Add a receive path hook for IPsec virtual tunnel interfaces to xfrm6_mode_tunnel. 4) Add support for IPsec virtual tunnel interfaces to ipv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tcp: rename tcp_tso_segment()Eric Dumazet2-4/+4
Rename tcp_tso_segment() to tcp_gso_segment(), to better reflect what is going on, and ease grep games. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: simplify the link lookup routineErik Hugne1-97/+13
When checking statistics or changing parameters on a link, the link_find_link function is used to locate the link with a given name. The complex method of deconstructing the name into local and remote address/interface is error prone and may fail if the interface names contains special characters. We change the lookup method to iterate over the list of nodes and compare the link names. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: correct return value of link_cmd_set_value routineYing Xue1-9/+19
link_cmd_set_value() takes commands for link, bearer and media related configuration. Genereally the function returns 0 when a command is recognized, and -EINVAL when it is not. However, in the switch for link related commands it returns 0 even when the command is unrecognized. This will sometimes make it look as if a failed configuration command has been successful, but has otherwise no negative effects. We remove this anomaly by returning -EINVAL even for link commands. We also rework all three switches to make them conforming to common kernel coding style. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: correct return value of recv_msg routineYing Xue2-6/+6
Currently, rcv_msg() always returns zero on a packet delivery upcall from net_device. To make its behavior more compliant with the way this API should be used, we change this to let it return NET_RX_SUCCESS (which is zero anyway) when it is able to handle the packet, and NET_RX_DROP otherwise. The latter does not imply any functional change, it only enables the driver to keep more accurate statistics about the fate of delivered packets. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: avoid unnecessary lookup for tipc bearer instanceYing Xue4-18/+10
tipc_block_bearer() currently takes a bearer name (const char*) as argument. This requires the function to make a lookup to find the pointer to the corresponding bearer struct. In the current code base this is not necessary, since the only two callers (tipc_continue(),recv_notification()) already have validated copies of this pointer, and hence can pass it directly in the function call. We change tipc_block_bearer() to directly take struct tipc_bearer* as argument instead. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: make bearer and media naming consistentYing Xue4-57/+57
TIPC 'bearer' exists as an abstract concept, while 'media' is deemed a specific implementation of a bearer, such as Ethernet or Infiniband media. When a component inside TIPC wants to control a specific media, it only needs to access the generic bearer API to achieve this. However, in the current media implementations, the 'bearer' name is also extensively used in media specific function and variable names. This may create confusion, so we choose to replace the term 'bearer' with 'media' in all function names, variable names, and prefixes where this is what really is meant. Note that this change is cosmetic only, and no runtime behaviour changes are made here. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: silence sparse warningsYing Xue2-5/+5
Eliminate below sparse warnings: net/tipc/link.c:1210:37: warning: cast removes address space of expression net/tipc/link.c:1218:59: warning: incorrect type in argument 2 (different address spaces) net/tipc/link.c:1218:59: expected void const [noderef] <asn:1>*from net/tipc/link.c:1218:59: got unsigned char const [usertype] *[assigned] sect_crs net/tipc/socket.c:341:49: warning: Using plain integer as NULL pointer net/tipc/socket.c:1371:36: warning: Using plain integer as NULL pointer net/tipc/socket.c:1694:57: warning: Using plain integer as NULL pointer Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Andreas Bofjäll <andreas.bofjall@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: remove iovec length parameter from all sending functionsYing Xue7-78/+49
tipc_msg_build() now copies message data from iovec to skb_buff using memcpy_fromiovecend(), which doesn't need to be passed the iovec length to perform the copying. So we remove the parameter indicating iovec length in all functions where TIPC messages are built and sent. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tipc: don't use memcpy to copy from user spaceYing Xue1-13/+9
tipc_msg_build() calls skb_copy_to_linear_data_offset() to copy data from user space to kernel space. However, the latter function does in its turn call memcpy() to perform the actual copying. This poses an obvious security and robustness risk, since memcpy() never makes any validity check on the pointer it is copying from. To correct this, we the replace the offending function call with a call to memcpy_fromiovecend(), which uses copy_from_user() to perform the copying. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18net: refactor sk_page_frag_refill()Eric Dumazet1-4/+23
While working on virtio_net new allocation strategy to increase payload/truesize ratio, we found that refactoring sk_page_frag_refill() was needed. This patch splits sk_page_frag_refill() into two parts, adding skb_page_frag_refill() which can be used without a socket. While we are at it, add a minimum frag size of 32 for sk_page_frag_refill() Michael will either use netdev_alloc_frag() from softirq context, or skb_page_frag_refill() from process context in refill_work() (GFP_KERNEL allocations) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-nextDavid S. Miller26-542/+1237
John W. Linville says: ==================== This is a batch of updates intended for the 3.13 stream... The biggest item of interest in here is wcn36xx, the new mac80211 driver for Qualcomm WCN3660/WCN3680 hardware. Regarding the mac80211 bits, Johannes says: "We have an assortment of cleanups and new features, of which the biggest one is probably the channel-switch support in IBSS. Nothing else really stands out much." On top of that, the ath9k and rt2x00 get a lot of update action from Felix Fietkau and Gabor Juhos, respectively. There are a handful of updates to other drivers here and there as well. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17bridge: Correctly clamp MAX forward_delay when enabling STPVlad Yasevich1-1/+1
Commit be4f154d5ef0ca147ab6bcd38857a774133f5450 bridge: Clamp forward_delay when enabling STP had a typo when attempting to clamp maximum forward delay. It is possible to set bridge_forward_delay to be higher then permitted maximum when STP is off. When turning STP on, the higher then allowed delay has to be clamed down to max value. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17ipv4: shrink rt_cache_statEric Dumazet1-8/+8
Half of the rt_cache_stat fields are no longer used after IP route cache removal, lets shrink this per cpu area. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()Eric Dumazet1-2/+1
sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO packets in the first place. (As a performance hint) Once we have GSO packets in write queue, we can not decide they are no longer GSO only because flow now uses a route which doesn't handle TSO/GSO. Core networking stack handles the case very well for us, all we need is keeping track of packet counts in MSS terms, regardless of segmentation done later (in GSO or hardware) Right now, if tcp_fragment() splits a GSO packet in two parts, @left and @right, and route changed through a non GSO device, both @left and @right have pcount set to 1, which is wrong, and leads to incorrect packet_count tracking. This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with MTU discovery") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17tcp: must unclone packets before mangling themEric Dumazet1-3/+6
TCP stack should make sure it owns skbs before mangling them. We had various crashes using bnx2x, and it turned out gso_size was cleared right before bnx2x driver was populating TC descriptor of the _previous_ packet send. TCP stack can sometime retransmit packets that are still in Qdisc. Of course we could make bnx2x driver more robust (using ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack. We have identified two points where skb_unclone() was needed. This patch adds a WARN_ON_ONCE() to warn us if we missed another fix of this kind. Kudos to Neal for finding the root cause of this bug. Its visible using small MSS. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirelessDavid S. Miller10-5/+43
John W. Linville says: ==================== Please pull this batch of fixes intended for the 3.12 stream! For the mac80211 bits, Johannes says: "Jouni fixes a remain-on-channel vs. scan bug, and Felix fixes client TX probing on VLANs." And also: "This time I have two fixes from Emmanuel for RF-kill issues, and fixed two issues reported by Evan Huus and Thomas Lindroth respectively." On top of those... Avinash Patil adds a couple of mwifiex fixes to properly inform cfg80211 about some different types of disconnects, avoiding WARNINGs. Mark Cave-Ayland corrects a pointer arithmetic problem in rtlwifi, avoiding incorrect automatic gain calculations. Solomon Peachy sends a cw1200 fix for locking around calls to cw1200_irq_handler, addressing "lost interrupt" problems. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17tcp: fix incorrect ca_state in tail loss probeYuchung Cheng1-1/+1
On receiving an ACK that covers the loss probe sequence, TLP immediately sets the congestion state to Open, even though some packets are not recovered and retransmisssion are on the way. The later ACks may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g., https://bugzilla.redhat.com/show_bug.cgi?id=989251 The fix is to follow the similar procedure in recovery by calling tcp_try_keep_open(). The sender switches to Open state if no packets are retransmissted. Otherwise it goes to Disorder and let subsequent ACKs move the state to Recovery or Open. Reported-By: Michael Sterrett <michael@sterretts.net> Tested-By: Dormando <dormando@rydia.net> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17sctp: Perform software checksum if packet has to be fragmented.Vlad Yasevich1-1/+1
IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum. This causes problems if SCTP packets has to be fragmented and ipsummed has been set to PARTIAL due to checksum offload support. This condition can happen when retransmitting after MTU discover, or when INIT or other control chunks are larger then MTU. Check for the rare fragmentation condition in SCTP and use software checksum calculation in this case. CC: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17sctp: Use software crc32 checksum when xfrm transform will happen.Fan Du1-1/+2
igb/ixgbe have hardware sctp checksum support, when this feature is enabled and also IPsec is armed to protect sctp traffic, ugly things happened as xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing up and pack the 16bits result in the checksum field). The result is fail establishment of sctp communication. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftablesDavid S. Miller63-120/+8785
Pablo Neira Ayuso says: ==================== netfilter updates: nf_tables pull request The following patchset contains the current original nf_tables tree condensed in 17 patches. I have organized them by chronogical order since the original nf_tables code was released in 2009 and by dependencies between the different patches. The patches are: 1) Adapt all existing hooks in the tree to pass hook ops to the hook callback function, required by nf_tables, from Patrick McHardy. 2) Move alloc_null_binding to nf_nat_core, as it is now also needed by nf_tables and ip_tables, original patch from Patrick McHardy but required major changes to adapt it to the current tree that I made. 3) Add nf_tables core, including the netlink API, the packet filtering engine, expressions and built-in tables, from Patrick McHardy. This patch includes accumulated fixes since 2009 and minor enhancements. The patch description contains a list of references to the original patches for the record. For those that are not familiar to the original work, see [1], [2] and [3]. 4) Add netlink set API, this replaces the original set infrastructure to introduce a netlink API to add/delete sets and to add/delete set elements. This includes two set types: the hash and the rb-tree sets (used for interval based matching). The main difference with ipset is that this infrastructure is data type agnostic. Patch from Patrick McHardy. 5) Allow expression operation overload, this API change allows us to provide define expression subtypes depending on the configuration that is received from user-space via Netlink. It is used by follow up patches to provide optimized versions of the payload and cmp expressions and the x_tables compatibility layer, from Patrick McHardy. 6) Add optimized data comparison operation, it requires the previous patch, from Patrick McHardy. 7) Add optimized payload implementation, it requires patch 5, from Patrick McHardy. 8) Convert built-in tables to chain types. Each chain type have special semantics (filter, route and nat) that are used by userspace to configure the chain behaviour. The main chain regarding iptables is that tables become containers of chain, with no specific semantics. However, you may still configure your tables and chains to retain iptables like semantics, patch from me. 9) Add compatibility layer for x_tables. This patch adds support to use all existing x_tables extensions from nf_tables, this is used to provide a userspace utility that accepts iptables syntax but used internally the nf_tables kernel core. This patch includes missing features in the nf_tables core such as the per-chain stats, default chain policy and number of chain references, which are required by the iptables compatibility userspace tool. Patch from me. 10) Fix transport protocol matching, this fix is a side effect of the x_tables compatibility layer, which now provides a pointer to the transport header, from me. 11) Add support for dormant tables, this feature allows you to disable all chains and rules that are contained in one table, from me. 12) Add IPv6 NAT support. At the time nf_tables was made, there was no NAT IPv6 support yet, from Tomasz Bursztyka. 13) Complete net namespace support. This patch register the protocol family per net namespace, so tables (thus, other objects contained in tables such as sets, chains and rules) are only visible from the corresponding net namespace, from me. 14) Add the insert operation to the nf_tables netlink API, this requires adding a new position attribute that allow us to locate where in the ruleset a rule needs to be inserted, from Eric Leblond. 15) Add rule batching support, including atomic rule-set updates by using rule-set generations. This patch includes a change to nfnetlink to include two new control messages to indicate the beginning and the end of a batch. The end message is interpreted as the commit message, if it's missing, then the rule-set updates contained in the batch are aborted, from me. 16) Add trace support to the nf_tables packet filtering core, from me. 17) Add ARP filtering support, original patch from Patrick McHardy, but adapted to fit into the chain type infrastructure. This was recovered to be used by nft userspace tool and our compatibility arptables userspace tool. There is still work to do to fully replace x_tables [4] [5] but that can be done incrementally by extending our netlink API. Moreover, looking at netfilter-devel and the amount of contributions to nf_tables we've been getting, I think it would be good to have it mainstream to avoid accumulating large patchsets skip continuous rebases. I tried to provide a reasonable patchset, we have more than 100 accumulated patches in the original nf_tables tree, so I collapsed many of the small fixes to the main patch we had since 2009 and provide a small batch for review to netdev, while trying to retain part of the history. For those who didn't give a try to nf_tables yet, there's a quick howto available from Eric Leblond that describes how to get things working [6]. Comments/reviews welcome. Thanks! [1] http://lwn.net/Articles/324251/ [2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf [3] http://lwn.net/Articles/564095/ [4] http://people.netfilter.org/pablo/map-pending-work.txt [4] http://people.netfilter.org/pablo/nftables-todo.txt [5] https://home.regit.org/netfilter-en/nftables-quick-howto/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17inet_diag: use sock_gen_put()Eric Dumazet1-6/+3
TCP listener refactoring, part 6 : Use sock_gen_put() from inet_diag_dump_one_icsk() for future SYN_RECV support. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davemJohn W. Linville26-542/+1237
2013-10-15Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davemJohn W. Linville10-5/+43
2013-10-14Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211John W. Linville6-3/+35
2013-10-14netfilter: nf_tables: add ARP filtering supportPablo Neira Ayuso3-0/+107
This patch registers the ARP family and he filter chain type for this family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add trace supportPablo Neira Ayuso2-0/+58
This patch adds support for tracing the packet travel through the ruleset, in a similar fashion to x_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nfnetlink: add batch support and use it from nf_tablesPablo Neira Ayuso3-22/+365
This patch adds a batch support to nfnetlink. Basically, it adds two new control messages: * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch, the nfgenmsg->res_id indicates the nfnetlink subsystem ID. * NFNL_MSG_BATCH_END, that results in the invocation of the ss->commit callback function. If not specified or an error ocurred in the batch, the ss->abort function is invoked instead. The end message represents the commit operation in nftables, the lack of end message results in an abort. This patch also adds the .call_batch function that is only called from the batch receival path. This patch adds atomic rule updates and dumps based on bitmask generations. This allows to atomically commit a set of rule-set updates incrementally without altering the internal state of existing nf_tables expressions/matches/targets. The idea consists of using a generation cursor of 1 bit and a bitmask of 2 bits per rule. Assuming the gencursor is 0, then the genmask (expressed as a bitmask) can be interpreted as: 00 active in the present, will be active in the next generation. 01 inactive in the present, will be active in the next generation. 10 active in the present, will be deleted in the next generation. ^ gencursor Once you invoke the transition to the next generation, the global gencursor is updated: 00 active in the present, will be active in the next generation. 01 active in the present, needs to zero its future, it becomes 00. 10 inactive in the present, delete now. ^ gencursor If a dump is in progress and nf_tables enters a new generation, the dump will stop and return -EBUSY to let userspace know that it has to retry again. In order to invalidate dumps, a global genctr counter is increased everytime nf_tables enters a new generation. This new operation can be used from the user-space utility that controls the firewall, eg. nft -f restore The rule updates contained in `file' will be applied atomically. cat file ----- add filter INPUT ip saddr 1.1.1.1 counter accept #1 del filter INPUT ip daddr 2.2.2.2 counter drop #2 -EOF- Note that the rule 1 will be inactive until the transition to the next generation, the rule 2 will be evicted in the next generation. There is a penalty during the rule update due to the branch misprediction in the packet matching framework. But that should be quickly resolved once the iteration over the commit list that contain rules that require updates is finished. Event notification happens once the rule-set update has been committed. So we skip notifications is case the rule-set update is aborted, which can happen in case that the rule-set is tested to apply correctly. This patch squashed the following patches from Pablo: * nf_tables: atomic rule updates and dumps * nf_tables: get rid of per rule list_head for commits * nf_tables: use per netns commit list * nfnetlink: add batch support and use it from nf_tables * nf_tables: all rule updates are transactional * nf_tables: attach replacement rule after stale one * nf_tables: do not allow deletion/replacement of stale rules * nf_tables: remove unused NFTA_RULE_FLAGS Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add insert operationEric Leblond1-6/+32
This patch adds a new rule attribute NFTA_RULE_POSITION which is used to store the position of a rule relatively to the others. By providing the create command and specifying the position, the rule is inserted after the rule with the handle equal to the provided position. Regarding notification, the position attribute specifies the handle of the previous rule to make sure we don't point to any stale rule in notifications coming from the commit path. This patch includes the following fix from Pablo: * nf_tables: fix rule deletion event reporting Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: complete net namespace supportPablo Neira Ayuso4-34/+146
Register family per netnamespace to ensure that sets are only visible in its approapriate namespace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: Add support for IPv6 NATTomasz Bursztyka8-154/+447
This patch generalizes the NAT expression to support both IPv4 and IPv6 using the existing IPv4/IPv6 NAT infrastructure. This also adds the NAT chain type for IPv6. This patch collapses the following patches that were posted to the netfilter-devel mailing list, from Tomasz: * nf_tables: Change NFTA_NAT_ attributes to better semantic significance * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain * nf_tables: Add support for IPv6 NAT expression * nf_tables: Add support for IPv6 NAT chain * nf_tables: Fix up build issue on IPv6 NAT support And, from Pablo Neira Ayuso: * fix missing dependencies in nft_chain_nat Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add support for dormant tablesPablo Neira Ayuso1-7/+90
This patch allows you to temporarily disable an entire table. You can change the state of a dormant table via NFT_MSG_NEWTABLE messages. Using this operation you can wake up a table, so their chains are registered. This provides atomicity at chain level. Thus, the rule-set of one chain is applied at once, avoiding any possible intermediate state in every chain. Still, the chains that belongs to a table are registered consecutively. This also allows you to have inactive tables in the kernel. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: nft_payload: fix transport header basePablo Neira Ayuso2-2/+2
We cannot use skb->transport_header since it's unset, use pkt->xt.thoff instead. Now possible using information made available through the x_tables compatibility layer. Reported-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add compatibility layer for x_tablesPablo Neira Ayuso13-70/+1078
This patch adds the x_tables compatibility layer. This allows you to use existing x_tables matches and targets from nf_tables. This compatibility later allows us to use existing matches/targets for features that are still missing in nf_tables. We can progressively replace them with native nf_tables extensions. It also provides the userspace compatibility software that allows you to express the rule-set using the iptables syntax but using the nf_tables kernel components. In order to get this compatibility layer working, I've done the following things: * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used to query the x_tables match/target revision, so we don't need to use the native x_table getsockopt interface. * emulate xt structures: this required extending the struct nft_pktinfo to include the fragment offset, which is already obtained from ip[6]_tables and that is used by some matches/targets. * add support for default policy to base chains, required to emulate x_tables. * add NFTA_CHAIN_USE attribute to obtain the number of references to chains, required by x_tables emulation. * add chain packet/byte counters using per-cpu. * support 32-64 bits compat. For historical reasons, this patch includes the following patches that were posted in the netfilter-devel mailing list. From Pablo Neira Ayuso: * nf_tables: add default policy to base chains * netfilter: nf_tables: add NFTA_CHAIN_USE attribute * nf_tables: nft_compat: private data of target and matches in contiguous area * nf_tables: validate hooks for compat match/target * nf_tables: nft_compat: release cached matches/targets * nf_tables: x_tables support as a compile time option * nf_tables: fix alias for xtables over nftables module * nf_tables: add packet and byte counters per chain * nf_tables: fix per-chain counter stats if no counters are passed * nf_tables: don't bump chain stats * nf_tables: add protocol and flags for xtables over nf_tables * nf_tables: add ip[6]t_entry emulation * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6] * nf_tables: support 32bits-64bits x_tables compat * nf_tables: fix compilation if CONFIG_COMPAT is disabled From Patrick McHardy: * nf_tables: move policy to struct nft_base_chain * nf_tables: send notifications for base chain policy changes From Alexander Primak: * nf_tables: remove the duplicate NF_INET_LOCAL_OUT From Nicolas Dichtel: * nf_tables: fix compilation when nf-netlink is a module Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: convert built-in tables/chains to chain typesPablo Neira Ayuso10-265/+197
This patch converts built-in tables/chains to chain types that allows you to deploy customized table and chain configurations from userspace. After this patch, you have to specify the chain type when creating a new chain: add chain ip filter output { type filter hook input priority 0; } ^^^^ ------ The existing chain types after this patch are: filter, route and nat. Note that tables are just containers of chains with no specific semantics, which is a significant change with regards to iptables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nft_payload: add optimized payload implementation for small loadsPatrick McHardy2-28/+72
Add an optimized payload expression implementation for small (up to 4 bytes) aligned data loads from the linear packet area. This patch also includes original Patrick McHardy's entitled (nf_tables: inline nft_payload_fast_eval() into main evaluation loop). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>