aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-03-20Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davemJohn W. Linville46-1824/+2799
2013-03-20dynticks: avoid flow_cache_flush() interrupting every coreChris Metcalf1-3/+39
Previously, if you did an "ifconfig down" or similar on one core, and the kernel had CONFIG_XFRM enabled, every core would be interrupted to check its percpu flow list for items that could be garbage collected. With this change, we generate a mask of cores that actually have any percpu items, and only interrupt those cores. When we are trying to isolate a set of cpus from interrupts, this is important to do. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20filter: add ANC_PAY_OFFSET instruction for loading payload start offsetDaniel Borkmann1-0/+5
It is very useful to do dynamic truncation of packets. In particular, we're interested to push the necessary header bytes to the user space and cut off user payload that should probably not be transferred for some reasons (e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET, we can load it into the accumulator, and return it. E.g. in bpfc syntax ... ld #poff ; { 0x20, 0, 0, 0xfffff034 }, ret a ; { 0x16, 0, 0, 0x00000000 }, ... as a filter will accomplish this without having to do a big hackery in a BPF filter itself. Follow-up JIT implementations are welcome. Thanks to Eric Dumazet for suggesting and discussing this during the Netfilter Workshop in Copenhagen. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20net: flow_dissector: add __skb_get_poff to get a start offset to payloadDaniel Borkmann1-0/+57
__skb_get_poff() returns the offset to the payload as far as it could be dissected. The main user is currently BPF, so that we can dynamically truncate packets without needing to push actual payload to the user space and instead can analyze headers only. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller48-373/+450
Pull in the 'net' tree to get Daniel Borkmann's flow dissector infrastructure change. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20net/irda: add missing error path release_sock callKees Cook1-2/+4
This makes sure that release_sock is called for all error conditions in irda_getsockopt. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Brad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20ipconfig: Fix newline handling in log message.Martin Fuzzey1-1/+2
When using ipconfig the logs currently look like: Single name server: [ 3.467270] IP-Config: Complete: [ 3.470613] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.480670] host=infigo-1, domain=, nis-domain=(none) [ 3.486166] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.492910] nameserver0=172.16.42.1[ 3.496853] ALSA device list: Three name servers: [ 3.496949] IP-Config: Complete: [ 3.500293] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.510367] host=infigo-1, domain=, nis-domain=(none) [ 3.515864] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.522635] nameserver0=172.16.42.1, nameserver1=172.16.42.100 [ 3.529149] , nameserver2=172.16.42.200 Fix newline handling for these cases Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20flow_keys: include thoff into flow_keys for later usageDaniel Borkmann1-0/+2
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're doing the work here anyway, also store thoff for a later usage, e.g. in the BPF filter. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: unhash l2tp sessions on delete, not on freeTom Parkin3-50/+38
If we postpone unhashing of l2tp sessions until the structure is freed, we risk: 1. further packets arriving and getting queued while the pseudowire is being closed down 2. the recv path hitting "scheduling while atomic" errors in the case that recv drops the last reference to a session and calls l2tp_session_free while in atomic context As such, l2tp sessions should be unhashed from l2tp_core data structures early in the teardown process prior to calling pseudowire close. For pseudowires like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: avoid deadlock in l2tp stats updateTom Parkin5-147/+93
l2tp's u64_stats writers were incorrectly synchronised, making it possible to deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp code netlink commands while passing data through l2tp sessions. Previous discussion on netdev determined that alternative solutions such as spinlock writer synchronisation or per-cpu data would bring unjustified overhead, given that most users interested in high volume traffic will likely be running 64bit kernels on 64bit hardware. As such, this patch replaces l2tp's use of u64_stats with atomic_long_t, thereby avoiding the deadlock. Ref: http://marc.info/?l=linux-netdev&m=134029167910731&w=2 http://marc.info/?l=linux-netdev&m=134079868111131&w=2 Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: push all ppp pseudowire shutdown through .release handlerTom Parkin1-43/+10
If userspace deletes a ppp pseudowire using the netlink API, either by directly deleting the session or by deleting the tunnel that contains the session, we need to tear down the corresponding pppox channel. Rather than trying to manage two pppox unbind codepaths, switch the netlink and l2tp_core session_close handlers to close via. the l2tp_ppp socket .release handler. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: purge session reorder queue on deleteTom Parkin1-0/+4
Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall and l2tp_session_delete. Pseudowire implementations which are deleted only via. l2tp_core l2tp_session_delete calls can dispense with their own code for flushing the reorder queue. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: add session reorder queue purge function to coreTom Parkin2-0/+18
If an l2tp session is deleted, it is necessary to delete skbs in-flight on the session's reorder queue before taking it down. Rather than having each pseudowire implementation reaching into the l2tp_session struct to handle this itself, provide a function in l2tp_core to purge the session queue. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: don't BUG_ON sk_socket being NULLTom Parkin1-8/+10
It is valid for an existing struct sock object to have a NULL sk_socket pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookupTom Parkin1-0/+2
When looking up the tunnel socket in struct l2tp_tunnel, hold a reference whether the socket was created by the kernel or by userspace. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: close sessions before initiating tunnel deleteTom Parkin1-0/+1
When a user deletes a tunnel using netlink, all the sessions in the tunnel should also be deleted. Since running sessions will pin the tunnel socket with the references they hold, have the l2tp_tunnel_delete close all sessions in a tunnel before finally closing the tunnel socket. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: close sessions in ip socket destroy callbackTom Parkin2-0/+13
l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel socket being closed from userspace. We need to do the same thing for IP-encapsulation sockets. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: export l2tp_tunnel_closeallTom Parkin2-2/+3
l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a tunnel when a UDP-encapsulation socket is destroyed. We need to do something similar for IP-encapsulation sockets. Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to call it from their .destroy handlers. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: add udp encap socket destroy handlerTom Parkin1-0/+14
L2TP sessions hold a reference to the tunnel socket to prevent it going away while sessions are still active. However, since tunnel destruction is handled by the sock sk_destruct callback there is a catch-22: a tunnel with sessions cannot be deleted since each session holds a reference to the tunnel socket. If userspace closes a managed tunnel socket, or dies, the tunnel will persist and it will be neccessary to individually delete the sessions using netlink commands. This is ugly. To prevent this occuring, this patch leverages the udp encapsulation socket destroy callback to gain early notification when the tunnel socket is closed. This allows us to safely close the sessions running in the tunnel, dropping the tunnel socket references in the process. The tunnel socket is then destroyed as normal, and the tunnel resources deallocated in sk_destruct. While we're at it, ensure that l2tp_tunnel_closeall correctly drops session references to allow the sessions to be deleted rather than leaking. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20udp: add encap_destroy callbackTom Parkin2-0/+15
Users of udp encapsulation currently have an encap_rcv callback which they can use to hook into the udp receive path. In situations where a encapsulation user allocates resources associated with a udp encap socket, it may be convenient to be able to also hook the proto .destroy operation. For example, if an encap user holds a reference to the udp socket, the destroy hook might be used to relinquish this reference. This patch adds a socket destroy hook into udp, which is set and enabled in the same way as the existing encap_rcv hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20genetlink: trigger BUG_ON if a group name is too longMasatake YAMATO1-0/+1
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller10-51/+51
Pablo Neira Ayuso says: ==================== The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are: * Restrict IPv6 stateless NPT targets to the mangle table. Many users are complaining that this target does not work in the nat table, which is the wrong table for it, from Florian Westphal. * Fix possible use before initialization in the netns init path of several conntrack protocol trackers (introduced recently while improving conntrack netns support), from Gao Feng. * Fix incorrect initialization of copy_range in nfnetlink_queue, spotted by Eric Dumazet during the NFWS2013, patch from myself. * Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov. * Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu not required anymore after change introduced in 3.7, again from Julian. * Fix SYN looping in IPVS state sync if the backup is used a real server in DR/TUN modes, this required a new /proc entry to disable the director function when acting as backup, also from Julian. * Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by Paul Bolle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20netfilter: remove unused "config IP_NF_QUEUE"Paul Bolle1-13/+0
Kconfig symbol IP_NF_QUEUE is unused since commit d16cf20e2f2f13411eece7f7fb72c17d141c4a84 ("netfilter: remove ip_queue support"). Let's remove it too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19packet: packet fanout rollover during socket overloadWillem de Bruijn2-24/+88
Changes: v3->v2: rebase (no other changes) passes selftest v2->v1: read f->num_members only once fix bug: test rollover mode + flag Minimize packet drop in a fanout group. If one socket is full, roll over packets to another from the group. Maintain flow affinity during normal load using an rxhash fanout policy, while dispersing unexpected traffic storms that hit a single cpu, such as spoofed-source DoS flows. Rollover breaks affinity for flows arriving at saturated sockets during those conditions. The patch adds a fanout policy ROLLOVER that rotates between sockets, filling each socket before moving to the next. It also adds a fanout flag ROLLOVER. If passed along with any other fanout policy, the primary policy is applied until the chosen socket is full. Then, rollover selects another socket, to delay packet drop until the entire system is saturated. Probing sockets is not free. Selecting the last used socket, as rollover does, is a greedy approach that maximizes chance of success, at the cost of extreme load imbalance. In practice, with sufficiently long queues to absorb bursts, sockets are drained in parallel and load balance looks uniform in `top`. To avoid contention, scales counters with number of sockets and accesses them lockfree. Values are bounds checked to ensure correctness. Tested using an application with 9 threads pinned to CPUs, one socket per thread and sufficient busywork per packet operation to limits each thread to handling 32 Kpps. When sent 500 Kpps single UDP stream packets, a FANOUT_CPU setup processes 32 Kpps in total without this patch, 270 Kpps with the patch. Tested with read() and with a packet ring (V1). Also, passes psock_fanout.c unit test added to selftests. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds23-56/+128
Pull networking fixes from David Miller: 1) Fix ARM BPF JIT handling of negative 'k' values, from Chen Gang. 2) Insufficient space reserved for bridge netlink values, fix from Stephen Hemminger. 3) Some dst_neigh_lookup*() callers don't interpret error pointer correctly, fix from Zhouyi Zhou. 4) Fix transport match in SCTP active_path loops, from Xugeng Zhang. 5) Fix qeth driver handling of multi-order SKB frags, from Frank Blaschka. 6) fec driver is missing napi_disable() call, resulting in crashes on unload, from Georg Hofmann. 7) Don't try to handle PMTU events on a listening socket, fix from Eric Dumazet. 8) Fix timestamp location calculations in IP option processing, from David Ward. 9) FIB_TABLE_HASHSZ setting is not controlled by the correct kconfig tests, from Denis V Lunev. 10) Fix TX descriptor push handling in SFC driver, from Ben Hutchings. 11) Fix isdn/hisax and tulip/de4x5 kconfig dependencies, from Arnd Bergmann. 12) bnx2x statistics don't handle 4GB rollover correctly, fix from Maciej Żenczykowski. 13) Openvswitch bug fixes for vport del/new error reporting, missing genlmsg_end() call in netlink processing, and mis-parsing of LLC/SNAP ethernet types. From Rich Lane. 14) SKB pfmemalloc state should only be propagated from the head page of a compound page, fix from Pavel Emelyanov. 15) Fix link handling in tg3 driver for 5715 chips when autonegotation is disabled. From Nithin Sujir. 16) Fix inverted test of cpdma_check_free_tx_desc return value in davinci_emac driver, from Mugunthan V N. 17) vlan_depth is incorrectly calculated in skb_network_protocol(), from Li RongQing. 18) Fix probing of Gobi 1K devices in qmi_wwan driver, and fix NCM device mode backwards compat in cdc_ncm driver. From Bjørn Mork. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) inet: limit length of fragment queue hash table bucket lists qeth: Fix scatter-gather regression qeth: Fix invalid router settings handling qeth: delay feature trace tcp: dont handle MTU reduction on LISTEN socket bnx2x: fix occasional statistics off-by-4GB error vhost/net: fix heads usage of ubuf_info bridge: Add support for setting BR_ROOT_BLOCK flag. bnx2x: add missing napi deletion in error path drivers: net: ethernet: ti: davinci_emac: fix usage of cpdma_check_free_tx_desc() ethernet/tulip: DE4x5 needs VIRT_TO_BUS isdn: hisax: netjet requires VIRT_TO_BUS net: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility rtnetlink: Mask the rta_type when range checking Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally" Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug smsc75xx: configuration help incorrectly mentions smsc95xx net: fec: fix missing napi_disable call net: fec: restart the FEC when PHY speed changes skb: Propagate pfmemalloc on skb from head page only ...
2013-03-19xfrm: use xfrm direction when lookup policyBaker Zhang1-2/+21
because xfrm policy direction has same value with corresponding flow direction, so this problem is covered. In xfrm_lookup and __xfrm_policy_check, flow_cache_lookup is used to accelerate the lookup. Flow direction is given to flow_cache_lookup by policy_to_flow_dir. When the flow cache is mismatched, callback 'resolver' is called. 'resolver' requires xfrm direction, so convert direction back to xfrm direction. Signed-off-by: Baker Zhang <baker.zhang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19inet: limit length of fragment queue hash table bucket listsHannes Frederic Sowa4-16/+35
This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19can: dump stack on protocol bugsOliver Hartkopp1-7/+4
The rework of the kernel hlist implementation "hlist: drop the node parameter from iterators" (b67bfe0d42cac56c512dd5da4b1b347a23f4b70a) created some fallout in the form of non matching comments and obsolete code. Additionally to the cleanup this patch adds a WARN() statement to catch the caller of the wrong filter removal request. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19ipvs: remove extra rcu lockJulian Anastasov1-2/+0
In 3.7 we added code that uses ipv4_update_pmtu but after commit c5ae7d4192 (ipv4: must use rcu protection while calling fib_lookup) the RCU lock is not needed. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19ipvs: add backup_only flag to avoid loopsJulian Anastasov2-4/+15
Dmitry Akindinov is reporting for a problem where SYNs are looping between the master and backup server when the backup server is used as real server in DR mode and has IPVS rules to function as director. Even when the backup function is enabled we continue to forward traffic and schedule new connections when the current master is using the backup server as real server. While this is not a problem for NAT, for DR and TUN method the backup server can not determine if a request comes from client or from director. To avoid such loops add new sysctl flag backup_only. It can be needed for DR/TUN setups that do not need backup and director function at the same time. When the backup function is enabled we stop any forwarding and pass the traffic to the local stack (real server mode). The flag disables the director function when the backup function is enabled. For setups that enable backup function for some virtual services and director function for other virtual services there should be another more complex solution to support DR/TUN mode, may be to assign per-virtual service syncid value, so that we can differentiate the requests. Reported-by: Dmitry Akindinov <dimak@stalker.com> Tested-by: German Myzovsky <lawyer@sipnet.ru> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19ipvs: fix sctp chunk length orderJulian Anastasov1-7/+9
Fix wrong but non-fatal access to chunk length. sch->length should be in network order, next chunk should be aligned to 4 bytes. Problem noticed in sparse output. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-18tcp: dont handle MTU reduction on LISTEN socketEric Dumazet2-7/+14
When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a LISTEN socket, and this socket is currently owned by the user, we set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags. This is bad because if we clone the parent before it had a chance to clear the flag, the child inherits the tsq_flags value, and next tcp_release_cb() on the child will decrement sk_refcnt. Result is that we might free a live TCP socket, as reported by Dormando. IPv4: Attempt to release TCP socket in state 1 Fix this issue by testing sk_state against TCP_LISTEN early, so that we set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one) This bug was introduced in commit 563d34d05786 (tcp: dont drop MTU reduction indications) Reported-by: dormando <dormando@rydia.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-18net: Fix a comment typoKusanagi Kouichi1-1/+1
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-18Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirelessJohn W. Linville21-95/+232
Conflicts: net/nfc/llcp/llcp.c
2013-03-17tcp: Remove TCPCTChristoph Paasch12-701/+35
TCPCT uses option-number 253, reserved for experimental use and should not be used in production environments. Further, TCPCT does not fully implement RFC 6013. As a nice side-effect, removing TCPCT increases TCP's performance for very short flows: Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests for files of 1KB size. before this patch: average (among 7 runs) of 20845.5 Requests/Second after: average (among 7 runs) of 21403.6 Requests/Second Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitchDavid S. Miller4-20/+12
Conflicts: net/openvswitch/vport-internal_dev.c Jesse Gross says: ==================== A couple of minor enhancements for net-next/3.10. The largest is an extension to allow variable length metadata to be passed to userspace with packets. There is a merge conflict in net/openvswitch/vport-internal_dev.c: A existing commit modifies internal_dev_mac_addr() and a new commit deletes it. The new one is correct, so you can just remove that function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17netpoll: use DEFINE_STATIC_SRCU() to define netpoll_srcuLai Jiangshan1-2/+1
DEFINE_STATIC_SRCU() defines srcu struct and do init at build time. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17bridge: Add support for setting BR_ROOT_BLOCK flag.Vlad Yasevich1-0/+1
Most of the support was already there. The only thing that was missing was the call to set the flag. Add this call. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirelessDavid S. Miller2-9/+55
John W. Linville says: ==================== On the NFC bits, Samuel says: "With this one we have: - A fix for properly decreasing socket ack log. - A timer and works cleanup upon NFC device removal. - A monitoroing socket cleanup round from llcp_socket_release. - A proper error report to pending sockets upon NFC device removal." Regarding the Bluetooth bits, Gustavo says: "I have these two patches for 3.9, these add support for two more devices to the bluetooth drivers." Along with those, we have a few wireless driver fixes... Bing Zhao provides an mwifiex to prevent an out-of-bounds memory access. John Crispin offers a Kconfig fix to enable some otherwise dead code in rt2x00. The correct symbols were added in -rc1 through a different tree, but the symbols for enabling the wireless driver didn't match. Larry Finger brings an rtlwifi fix for a scheduling while atomic bug, and another fix for a reassociation problem caused by failing to clear the BSSID after a disconnect. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17vxlan: generalize forwarding tablesDavid Stevens1-1/+1
This patch generalizes VXLAN forwarding table entries allowing an administrator to: 1) specify multiple destinations for a given MAC 2) specify alternate vni's in the VXLAN header 3) specify alternate destination UDP ports 4) use multicast MAC addresses as fdb lookup keys 5) specify multicast destinations 6) specify the outgoing interface for forwarded packets The combination allows configuration of more complex topologies using VXLAN encapsulation. Changes since v1: rebase to 3.9.0-rc2 Signed-Off-By: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17rtnetlink: Mask the rta_type when range checkingVlad Yasevich1-1/+1
Range/validity checks on rta_type in rtnetlink_rcv_msg() do not account for flags that may be set. This causes the function to return -EINVAL when flags are set on the type (for example NLA_F_NESTED). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-16Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"Timo Teräs1-4/+1
This reverts commit 412ed94744d16806fbec3bd250fd94e71cde5a1f. The commit is wrong as tiph points to the outer IPv4 header which is installed at ipgre_header() and not the inner one which is protocol dependant. This commit broke succesfully opennhrp which use PF_PACKET socket with ETH_P_NHRP protocol. Additionally ssl_addr is set to the link-layer IPv4 address. This address is written by ipgre_header() to the skb earlier, and this is the IPv4 header tiph should point to - regardless of the inner protocol payload. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davemJohn W. Linville2-9/+55
2013-03-15ipv4: replace ip_fast_csum with csum_replace2Li RongQing1-3/+2
replace ip_fast_csum with csum_replace2 to save cpu cycles Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitchDavid S. Miller5-7/+12
Jesse Gross says: ==================== A few different bug fixes, including several for issues with userspace communication that have gone unnoticed up until now. These are intended for net/3.9. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15VSOCK: Support VM sockets connected to the hypervisor.Reilly Grant2-3/+16
The resource ID used for VM socket control packets (0) is already used for the VMCI_GET_CONTEXT_ID hypercall so a new ID (15) must be used when the guest sends these datagrams to the hypervisor. The hypervisor context ID must also be removed from the internal blacklist. Signed-off-by: Reilly Grant <grantr@vmware.com> Acked-by: Andy King <acking@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15netfilter: ip6t_NPT: restrict to mangle tableFlorian Westphal1-0/+2
As the translation is stateless, using it in nat table doesn't work (only initial packet is translated). filter table OUTPUT works but won't re-route the packet after translation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15netfilter: nfnetlink_queue: fix incorrect initialization of copy range fieldPablo Neira Ayuso1-1/+1
2^16 = 0xffff, not 0xfffff (note the extra 'f'). Not dangerous since you adjust it to min_t(data_len, skb->len) just after on. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15netfilter: nf_conntrack: register pernet subsystem before register L4 protoGao feng4-24/+24
In (c296bb4 netfilter: nf_conntrack: refactor l4proto support for netns) the l4proto gre/dccp/udplite/sctp registration happened before the pernet subsystem, which is wrong. Register pernet subsystem before register L4proto since after register L4proto, init_conntrack may try to access the resources which allocated in register_pernet_subsys. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-14tcp: fix skb_availroom()Eric Dumazet2-2/+1
Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : https://code.google.com/p/chromium/issues/detail?id=182056 commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx path) did a poor choice adding an 'avail_size' field to skb, while what we really needed was a 'reserved_tailroom' one. It would have avoided commit 22b4a4f22da (tcp: fix retransmit of partially acked frames) and this commit. Crash occurs because skb_split() is not aware of the 'avail_size' management (and should not be aware) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mukesh Agrawal <quiche@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>