aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-08-23tipc: remove port_lockJon Paul Maloy4-149/+108
In previous commits we have reduced usage of port_lock to a minimum, and complemented it with usage of bh_lock_sock() at the remaining locations. The purpose has been to remove this lock altogether, since it largely duplicates the role of bh_lock_sock. We are now ready to do this. However, we still need to protect the BH callers from inadvertent release of the socket while they hold a reference to it. We do this by replacing port_lock by a combination of a rw-lock protecting the reference table as such, and updating the socket reference counter while the socket is referenced from BH. This technique is more standard and comprehensible than the previous approach, and turns out to have a positive effect on overall performance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: replace port pointer with socket pointer in registryJon Paul Maloy2-16/+11
In order to make tipc_sock the only entity referencable from other parts of the stack, we add a tipc_sock pointer instead of a tipc_port pointer to the registry. As a consequence, we also let the function tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port. We keep the function's name for now, since the lock still is owned by the port. This is another step in the direction of eliminating port_lock, replacing its usage with lock_sock() and bh_lock_sock(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use registry when scanning socketsJon Paul Maloy8-107/+133
The functions tipc_port_get_ports() and tipc_port_reinit() scan over all sockets/ports to access each of them. This is done by using a dedicated linked list, 'tipc_socks' where all sockets are members. The list is in turn protected by a spinlock, 'port_list_lock', while each socket is locked by using port_lock at the moment of access. In order to reduce complexity and risk of deadlock, we want to get rid of the linked list and the accompanying spinlock. This is what we do in this commit. Instead of the linked list, we use the port registry to scan across the sockets. We also add usage of bh_lock_sock() inside the scope of port_lock in both functions, as a preparation for the complete removal of port_lock. Finally, we move the functions from port.c to socket.c, and rename them to tipc_sk_sock_show() and tipc_sk_reinit() repectively. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: eliminate functions tipc_port_init and tipc_port_destroyJon Paul Maloy3-88/+42
After the latest changes to the socket/port layer the existence of the functions tipc_port_init() and tipc_port_destroy() cannot be justified. They are both called only once, from tipc_sk_create() and tipc_sk_delete() respectively, and their functionality can better be merged into the latter two functions. This also entails that all remaining references to port_lock now are made from inside socket.c, something that will make it easier to remove this lock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: redefine message acknowledge functionJon Paul Maloy3-26/+20
The function tipc_acknowledge() is a remnant from the obsolete native API. Currently, it grabs port_lock, before building an acknowledge message and sending it to the peer. Since all access to socket members now is protected by the socket lock, it has become unnecessary to grab port_lock here. In this commit, we remove the usage of port_lock, simplify the function, and move it to socket.c, renaming it to tipc_sk_send_ack(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: eliminate port_connect()/port_disconnect() functionsJon Paul Maloy3-132/+37
tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete native API. Their only task is to grab port_lock and call the functions __tipc_port_connect()/__tipc_port_disconnect() respectively, which will perform the actual state change. Since socket/port exection now is single-threaded the use of port_lock is not needed any more, so we can safely replace the two functions with their lock-free counterparts. In this commit, we remove the two functions. Furthermore, the contents of __tipc_port_disconnect() is so trivial that we choose to eliminate that function too, expanding its functionality into tipc_shutdown(). __tipc_port_connect() is simplified, moved to socket.c, and given the more correct name tipc_sk_finish_conn(). Finally, we eliminate the function auto_connect(), and expand its contents into filter_connect(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: eliminate function tipc_port_shutdown()Jon Paul Maloy3-30/+11
tipc_port_shutdown() is a remnant from the now obsolete native interface. As such it grabs port_lock in order to protect itself from concurrent BH processing. However, after the recent changes to the port/socket upcalls, sockets are now basically single-threaded, and all execution, except the read-only tipc_sk_timer(), is executing within the protection of lock_sock(). So the use of port_lock is not needed here. In this commit we eliminate the whole function, and merge it into its only caller, tipc_shutdown(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: clean up socket timer functionJon Paul Maloy2-103/+69
The last remaining BH upcall to the socket, apart for the message reception function tipc_sk_rcv(), is the timer function. We prefer to let this function continue executing in BH, since it only does read-acces to semi-permanent data, but we make three changes to it: 1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope of port_lock. This is a preparation for replacing port_lock with bh_lock_sock() at the locations where it is still used. 2) We move the function from port.c to socket.c, as a further step of eliminating the port code level altogether. 3) We let it make use of the newly introduced tipc_msg_create() function. This enables us to get rid of three context specific functions (port_create_self_abort_msg() etc.) in port.c Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use message to abort connections when losing contact to nodeJon Paul Maloy3-25/+85
In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use pseudo message to wake up sockets after link congestionJon Paul Maloy12-91/+99
The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: introduce new function tipc_msg_create()Jon Paul Maloy2-2/+33
The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller11-46/+52
Pulling to get some TIPC fixes that a net-next series depends upon. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22tcp: improve undo on timeoutYuchung Cheng1-15/+11
Upon timeout, undo (via both timestamps/Eifel and DSACKs) was disabled if any retransmits were still in flight. The concern was perhaps that spurious retransmission sent in a previous recovery episode may trigger DSACKs to falsely undo the current recovery. However, this inadvertently misses undo opportunities (using either TCP timestamps or DSACKs) when timeout occurs during a loss episode, i.e. recurring timeouts or timeout during fast recovery. In these cases some retransmissions will be in flight but we should allow undo. Furthermore, we should only reset undo_marker and undo_retrans upon timeout if we are starting a new recovery episode. Finally, when we do reset our undo state, we now do so in a manner similar to tcp_enter_recovery(), so that we require a DSACK for each of the outstsanding retransmissions. This will achieve the original goal by requiring that we receive the same number of DSACKs as retransmissions. This patch increases the undo events by 50% on Google servers. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: remove dead code after sk_data_ready changeEric Dumazet2-16/+1
As a followup to commit 676d23690fb ("net: Fix use after free by removing length arg from sk_data_ready callbacks"), we can remove some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: use ktime_get_ns() and ktime_get_real_ns() helpersEric Dumazet8-16/+16
ktime_get_ns() replaces ktime_to_ns(ktime_get()) ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real()) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22mac80211: fix channel switch for chanctx-based driversMichal Kazior1-1/+1
The new_ctx pointer is set only for non-chanctx drivers. This yielded a crash for chanctx-based drivers during channel switch finalization: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211] Use an adequate chanctx pointer to fix this. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-22af_decnet: Use time_after_eqHimangi Saraogi1-1/+2
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2,E3; @@ - jiffies - E1 >= (E2*E3) + time_after_eq(jiffies, E1+E2*E3) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22decnet: Use time_after_eqHimangi Saraogi1-1/+2
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ - (jiffies - E1) >= E2 + time_after_eq(jiffies, E1+E2) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22ipconfig: Use time_beforeHimangi Saraogi1-1/+2
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ - jiffies - E1 < E2 + time_before(jiffies, E1+E2) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22dn_dev: Use time_beforeHimangi Saraogi1-1/+2
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the Coccinelle semantic patch making this change is as follows: @change@ expression E1,E2; @@ ( - (jiffies - E1) < E2 + time_before(jiffies, E1+E2) ) Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22br_multicast: Replace rcu_assign_pointer() with RCU_INIT_POINTER()Andreea-Cristina Bernat1-1/+1
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer()Andreea-Cristina Bernat1-1/+1
The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net/ipv4/igmp.c: Replace rcu_dereference() with rcu_access_pointer()Andreea-Cristina Bernat1-1/+1
The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22ipv4: Restore accept_local behaviour in fib_validate_source()Sébastien Barré1-6/+8
Commit 7a9bc9b81a5b ("ipv4: Elide fib_validate_source() completely when possible.") introduced a short-circuit to avoid calling fib_validate_source when not needed. That change took rp_filter into account, but not accept_local. This resulted in a change of behaviour: with rp_filter and accept_local off, incoming packets with a local address in the source field should be dropped. Here is how to reproduce the change pre/post 7a9bc9b81a5b commit: -configure the same IPv4 address on hosts A and B. -try to send an ARP request from B to A. -The ARP request will be dropped before that commit, but accepted and answered after that commit. This adds a check for ACCEPT_LOCAL, to maintain full fib validation in case it is 0. We also leave __fib_validate_source() earlier when possible, based on the same check as fib_validate_source(), once the accept_local stuff is verified. Cc: Gregory Detal <gregory.detal@uclouvain.be> Cc: Christoph Paasch <christoph.paasch@uclouvain.be> Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: sctp: fix suboptimal edge-case on non-active active/retrans path selectionDaniel Borkmann1-6/+3
In SCTP, selection of active (T.ACT) and retransmission (T.RET) transports is being done whenever transport control operations (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport(). Commits 4c47af4d5eb2 ("net: sctp: rework multihoming retransmission path selection to rfc4960") and a7288c4dd509 ("net: sctp: improve sctp_select_active_and_retran_path selection") have both improved it towards a more fine-grained and optimal path selection. Currently, the selection algorithm for T.ACT and T.RET is as follows: 1) Elect the two most recently used ACTIVE transports T1, T2 for T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used 2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set T.ACT<-T.PRI and T.RET<-T1 3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1 4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is still slightly suboptimal as it can lead to the following scenario: Setup: <A> <B> T1: p1p1 (10.0.10.10) <==> .'`) <==> p1p1 (10.0.10.12) <= T.PRI T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22) net.sctp.rto_min = 1000 net.sctp.path_max_retrans = 2 net.sctp.pf_retrans = 0 net.sctp.hb_interval = 1000 T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to link flapping). Here, the first time transmission is sent over PF path T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks are sent over the INACTIVE path T1 (T.PRI), which is not good. After the patch, it's choosing better transports in both cases by modifying step 4): 4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is the most recently used (if avail) in PF, set T.RET<-T.ACT_new This will still select a best possible path in PF if available (which can also include T.PRI/T.RET), and set both T.ACT/T.RET to it. In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just for a very short while before going back ACTIVE, it will guarantee that this path will be reselected for T.ACT/T.RET since T3 (PF) is not available. Previously, this was not possible, as we would only select between T.PRI and T.RET, and a possible T3 would be NULL due to the fact that we have just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE and would select a suboptimal path when T.PRI/T.RET have worse properties. In the case that T.ACT_old permanently went to INACTIVE during this transition and there's no PF path available, plus T.PRI and T.RET are INACTIVE as well, we would now camp on T.ACT_old, but if everything is being INACTIVE there's really not much we can do except hoping for a successful HB to bring one of the transports back up again and, thus cause a new selection through sctp_assoc_control_transport(). Now both tests work fine: Case 1: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(ACTIVE) T.ACT, T.RET T2 S(PF) 3. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 5. T1 S(PF) T.ACT, T.RET T2 S(INACTIVE) [ 5.1 T1 S(INACTIVE) T.ACT, T.RET T2 S(INACTIVE) ] 6. T1 S(ACTIVE) T.ACT, T.RET T2 S(INACTIVE) 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Case 2: 1. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET 2. T1 S(PF) T2 S(ACTIVE) T.ACT, T.RET 3. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 5. T1 S(INACTIVE) T2 S(PF) T.ACT, T.RET [ 5.1 T1 S(INACTIVE) T2 S(INACTIVE) T.ACT, T.RET ] 6. T1 S(INACTIVE) T2 S(ACTIVE) T.ACT, T.RET 7. T1 S(ACTIVE) T.ACT T2 S(ACTIVE) T.RET Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: sctp: spare unnecessary comparison in sctp_trans_elect_bestDaniel Borkmann1-1/+1
When both transports are the same, we don't have to go down that road only to realize that we will return the very same transport. We are guaranteed that curr is always non-NULL. Therefore, just short-circuit this special case. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22openvswitch: fix panic with multiple vlan headersJiri Benc1-0/+5
When there are multiple vlan headers present in a received frame, the first one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the skb beyond the VLAN TPID may be still non-linear, including the inner TCI and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is executed, __pop_vlan_tci pulls the next vlan header into vlan_tci. This leads to two things: 1. Part of the resulting ethernet header is in the non-linear part of the skb. When eth_type_trans is called later as the result of OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci is in fact accessing random data when it reads past the TPID. 2. network_header points into the ethernet header instead of behind it. mac_len is set to a wrong value (10), too. Reported-by: Yulong Pei <ypei@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: ipv6: fib: don't sleep inside atomic lockBenjamin Block1-1/+1
The function fib6_commit_metrics() allocates a piece of memory in mode GFP_KERNEL while holding an atomic lock from higher up in the stack, in the function __ip6_ins_rt(). This produces the following BUG: > BUG: sleeping function called from invalid context at mm/slub.c:1250 > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd > 2 locks held by dhcpcd/2909: > #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81978e67>] rtnl_lock+0x17/0x20 > #1: (&tb->tb6_lock){++--+.}, at: [<ffffffff81a6951a>] ip6_route_add+0x65a/0x800 > CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1 > Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013 > 0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000 > ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98 > 0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001 > Call Trace: > [<ffffffff81af135a>] dump_stack+0x4e/0x68 > [<ffffffff810f8d3a>] __might_sleep+0x10a/0x120 > [<ffffffff8121ad0e>] kmem_cache_alloc_trace+0x4e/0x190 > [<ffffffff81a6bcd6>] ? fib6_commit_metrics+0x66/0x110 > [<ffffffff81a6bcd6>] fib6_commit_metrics+0x66/0x110 > [<ffffffff81a6cbf3>] fib6_add+0x883/0xa80 > [<ffffffff81a6951a>] ? ip6_route_add+0x65a/0x800 > [<ffffffff81a69535>] ip6_route_add+0x675/0x800 > [<ffffffff81a68f2a>] ? ip6_route_add+0x6a/0x800 > [<ffffffff81a6990c>] inet6_rtm_newroute+0x5c/0x80 > [<ffffffff8197cf01>] rtnetlink_rcv_msg+0x211/0x260 > [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20 > [<ffffffff81119708>] ? lock_release_holdtime+0x28/0x180 > [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20 > [<ffffffff8197ccf0>] ? __rtnl_unlock+0x20/0x20 > [<ffffffff819a989e>] netlink_rcv_skb+0x6e/0xd0 > [<ffffffff81978ee5>] rtnetlink_rcv+0x25/0x40 > [<ffffffff819a8e59>] netlink_unicast+0xd9/0x180 > [<ffffffff819a9600>] netlink_sendmsg+0x700/0x770 > [<ffffffff81103735>] ? local_clock+0x25/0x30 > [<ffffffff8194e83c>] sock_sendmsg+0x6c/0x90 > [<ffffffff811f98e3>] ? might_fault+0xa3/0xb0 > [<ffffffff8195ca6d>] ? verify_iovec+0x7d/0xf0 > [<ffffffff8194ec3e>] ___sys_sendmsg+0x37e/0x3b0 > [<ffffffff8111ef15>] ? trace_hardirqs_on_caller+0x185/0x220 > [<ffffffff81af979e>] ? mutex_unlock+0xe/0x10 > [<ffffffff819a55ec>] ? netlink_insert+0xbc/0xe0 > [<ffffffff819a65e5>] ? netlink_autobind.isra.30+0x125/0x150 > [<ffffffff819a6520>] ? netlink_autobind.isra.30+0x60/0x150 > [<ffffffff819a84f9>] ? netlink_bind+0x159/0x230 > [<ffffffff811f989a>] ? might_fault+0x5a/0xb0 > [<ffffffff8194f25e>] ? SYSC_bind+0x7e/0xd0 > [<ffffffff8194f8cd>] __sys_sendmsg+0x4d/0x80 > [<ffffffff8194f912>] SyS_sendmsg+0x12/0x20 > [<ffffffff81afc692>] system_call_fastpath+0x16/0x1b Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC. Signed-off-by: Benjamin Block <bebl@mageta.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probezhuyj1-0/+1
Since the transport has always been in state SCTP_UNCONFIRMED, it therefore wasn't active before and hasn't been used before, and it always has been, so it is unnecessary to bug the user with a notification. Reported-by: Deepak Khandelwal <khandelwal.deepak.1987@gmail.com> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Suggested-by: Michael Tuexen <tuexen@fh-muenster.de> Suggested-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21packet: handle too big packets for PACKET_V3Eric Dumazet2-0/+18
af_packet can currently overwrite kernel memory by out of bound accesses, because it assumed a [new] block can always hold one frame. This is not generally the case, even if most existing tools do it right. This patch clamps too long frames as API permits, and issue a one time error on syslog. [ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82 In this example, packet header tp_snaplen was set to 3966, and tp_len was set to 5042 (skb->len) Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21lec: Use rtnl lock/unlock when updating MTUchas williams - CONTRACTOR1-0/+2
The LECS response contains the MTU that should be used. Correctly synchronize with other layers when updating. Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-19tipc: Fix build.David S. Miller1-1/+2
Missing semicolon in range check fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-19cbq: now_rt removalVasily Averin1-10/+1
Now q->now_rt is identical to q->now and is not required anymore. Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-19cbq: incorrectly low bandwidth setting blocks limited trafficVasily Averin1-24/+13
Mainstream commit f0f6ee1f70c4 ("cbq: incorrect processing of high limits") have side effect: if cbq bandwidth setting is less than real interface throughput non-limited traffic can delay limited traffic for a very long time. This happen because of q->now changes incorrectly in cbq_dequeue(): in described scenario L2T is much greater than real time delay, and q->now gets an extra boost for each transmitted packet. Accumulated boost prevents update q->now, and blocked class can wait very long time until (q->now >= cl->undertime) will be true again. To fix the problem the patch updates q->now on each cbq_update() call. L2T-related pre-modification q->now was moved to cbq_update(). My testing confirmed that it fixes the problem and did not discover any side-effects Fixes: f0f6ee1f70c4 ("cbq: incorrect processing of high limits") Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-16tipc: fix message importance range checkErik Hugne2-2/+4
Commit 3b4f302d8578 ("tipc: eliminate redundant locking") introduced a bug by removing the sanity check for message importance, allowing programs to assign any value to the msg_user field. This will mess up the packet reception logic and may cause random link resets. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-16batman-adv: Fix parameter order of hlist_add_behindSven Eckelmann1-1/+1
1d023284c31a4e40a94d5bbcb7dbb7a35ee0bcbc ("list: fix order of arguments for hlist_add_after(_rcu)") was incorrectly rebased on top of d9124268d84a836f14a6ead54ff9d8eee4c43be5 ("batman-adv: Fix out-of-order fragmentation support"). The parameter order change of the rebased patch was not re-applied as expected. This causes a memory leak and can cause crashes when out-of-order packets are received. hlist_add_behind will try to access the uninitalized list pointers of frag_entry_new to find the previous/next entry and may modify/read random memory locations. Signed-off-by: Sven Eckelmann <sven@narfation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14netlink: Annotate RCU locking for seq_file walkerThomas Graf1-0/+2
Silences the following sparse warnings: net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14tcp: fix ssthresh and undo for consecutive short FRTO episodesNeal Cardwell1-5/+3
Fix TCP FRTO logic so that it always notices when snd_una advances, indicating that any RTO after that point will be a new and distinct loss episode. Previously there was a very specific sequence that could cause FRTO to fail to notice a new loss episode had started: (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue (2) receiver ACKs packet 1 (3) FRTO sends 2 more packets (4) RTO timer fires again (should start a new loss episode) The problem was in step (3) above, where tcp_process_loss() returned early (in the spot marked "Step 2.b"), so that it never got to the logic to clear icsk_retransmits. Thus icsk_retransmits stayed non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero icsk_retransmits, decide that this RTO is not a new episode, and decide not to cut ssthresh and remember the current cwnd and ssthresh for undo. There were two main consequences to the bug that we have observed. First, ssthresh was not decreased in step (4). Second, when there was a series of such FRTO (1-4) sequences that happened to be followed by an FRTO undo, we would restore the cwnd and ssthresh from before the entire series started (instead of the cwnd and ssthresh from before the most recent RTO). This could result in cwnd and ssthresh being restored to values much bigger than the proper values. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logicHannes Frederic Sowa2-5/+10
tcp_tw_recycle heavily relies on tcp timestamps to build a per-host ordering of incoming connections and teardowns without the need to hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for the last measured RTO. To do so, we keep the last seen timestamp in a per-host indexed data structure and verify if the incoming timestamp in a connection request is strictly greater than the saved one during last connection teardown. Thus we can verify later on that no old data packets will be accepted by the new connection. During moving a socket to time-wait state we already verify if timestamps where seen on a connection. Only if that was the case we let the time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN will be used. But we don't verify this on incoming SYN packets. If a connection teardown was less than TCP_PAWS_MSL seconds in the past we cannot guarantee to not accept data packets from an old connection if no timestamps are present. We should drop this SYN packet. This patch closes this loophole. Please note, this patch does not make tcp_tw_recycle in any way more usable but only adds another safety check: Sporadic drops of SYN packets because of reordering in the network or in the socket backlog queues can happen. Users behing NAT trying to connect to a tcp_tw_recycle enabled server can get caught in blackholes and their connection requests may regullary get dropped because hosts behind an address translator don't have synchronized tcp timestamp clocks. tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled. In general, use of tcp_tw_recycle is disadvised. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()Neal Cardwell3-4/+6
Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications") Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14sit: Fix ipip6_tunnel_lookup device matching criteriaShmulik Ladkani1-3/+3
As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"), when looking up a tunnel, tunnel's underlying interface (t->parms.link) is verified to match incoming traffic's ingress device. However the comparison was incorrectly based on skb->dev->iflink. Instead, dev->ifindex should be used, which correctly represents the interface from which the IP stack hands the ipip6 packets. This allows setting up sit tunnels bound to vlan interfaces (otherwise incoming ipip6 traffic on the vlan interface was dropped due to ipip6_tunnel_lookup match failure). Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)Andrey Vagin2-8/+11
We don't know right timestamp for repaired skb-s. Wrong RTT estimations isn't good, because some congestion modules heavily depends on it. This patch adds the TCPCB_REPAIRED flag, which is included in TCPCB_RETRANS. Thanks to Eric for the advice how to fix this issue. This patch fixes the warning: [ 879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380() [ 879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1 [ 879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 879.568177] 0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2 [ 879.568776] 0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80 [ 879.569386] ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc [ 879.569982] Call Trace: [ 879.570264] [<ffffffff817aa2d2>] dump_stack+0x4d/0x66 [ 879.570599] [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0 [ 879.570935] [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20 [ 879.571292] [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380 [ 879.571614] [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710 [ 879.571958] [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370 [ 879.572315] [<ffffffff81657459>] release_sock+0x89/0x1d0 [ 879.572642] [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860 [ 879.573000] [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80 [ 879.573352] [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40 [ 879.573678] [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20 [ 879.574031] [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0 [ 879.574393] [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b [ 879.574730] ---[ end trace a17cbc38eb8c5c00 ]--- v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit, where it's set for other skb-s. Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages") Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution") Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13net-timestamp: fix missing tcp fragmentation casesWillem de Bruijn1-0/+17
Bytestream timestamps are correlated with a single byte in the skbuff, recorded in skb_shinfo(skb)->tskey. When fragmenting skbuffs, ensure that the tskey is set for the fragment in which the tskey falls (seqno <= tskey < end_seqno). The original implementation did not address fragmentation in tcp_fragment or tso_fragment. Add code to inspect the sequence numbers and move both tskey and the relevant tx_flags if necessary. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13net-timestamp: fix missing ACK timestampWillem de Bruijn1-5/+5
ACK timestamps are generated in tcp_clean_rtx_queue. The TSO datapath can break out early, causing the timestamp code to be skipped. Move the code up before the break. Reported-by: David S. Miller <davem@davemloft.net> Also fix a boundary condition: tp->snd_una is the next unacknowledged byte and between tests inclusive (a <= b <= c), so generate a an ACK timestamp if (prior_snd_una <= tskey <= tp->snd_una - 1). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13irda: Fix rd_frame control field initialization in irlap_send_rd_frame()Maks Naumov1-1/+1
Signed-off-by: Maks Naumov <maksqwe1@ukr.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13lec: Fix bug introduced by b67bfe0d42cac56c512dd5da4b1b347a23f4b70achas williams - CONTRACTOR1-4/+1
b67bfe0d42cac56c512dd5da4b1b347a23f4b70a (hlist: drop the node parameter from iterators) dropped the node parameter from iterators which lec_tbl_walk() was using to iterate the list. Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13atm/svc: Fix blocking in wait loopchas williams - CONTRACTOR1-28/+32
One should not call blocking primitives inside a wait loop, since both require task_struct::state to sleep, so the inner will destroy the outer state. sigd_enq() will possibly sleep for alloc_skb(). Move sigd_enq() before prepare_to_wait() to avoid sleeping while waiting interruptibly. You do not actually need to call sigd_enq() after the initial prepare_to_wait() because we test the termination condition before calling schedule(). Based on suggestions from Peter Zijlstra. Signed-off-by: Chas Williams <chas@cmf.n4rl.navy.mil> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13openvswitch: Fix memory leak in ovs_vport_alloc() error pathChristoph Jaeger1-1/+3
ovs_vport_alloc() bails out without freeing the memory 'vport' points to. Picked up by Coverity - CID 1230503. Fixes: 5cd667b0a4 ("openvswitch: Allow each vport to have an array of 'port_id's.") Signed-off-by: Christoph Jaeger <cj@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds16-149/+98
Pull networking fixes from David Miller: "Several networking final fixes and tidies for the merge window: 1) Changes during the merge window unintentionally took away the ability to build bluetooth modular, fix from Geert Uytterhoeven. 2) Several phy_node reference count bug fixes from Uwe Kleine-König. 3) Fix ucc_geth build failures, also from Uwe Kleine-König. 4) Fix klog false positivies when netlink messages go to network taps, by properly resetting the network header. Fix from Daniel Borkmann. 5) Sizing estimate of VF netlink messages is too small, from Jiri Benc. 6) New APM X-Gene SoC ethernet driver, from Iyappan Subramanian. 7) VLAN untagging is erroneously dependent upon whether the VLAN module is loaded or not, but there are generic dependencies that matter wrt what can be expected as the SKB enters the stack. Make the basic untagging generic code, and do it unconditionally. From Vlad Yasevich. 8) xen-netfront only has so many slots in it's transmit queue so linearize packets that have too many frags. From Zoltan Kiss. 9) Fix suspend/resume PHY handling in bcmgenet driver, from Florian Fainelli" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (55 commits) net: bcmgenet: correctly resume adapter from Wake-on-LAN net: bcmgenet: update UMAC_CMD only when link is detected net: bcmgenet: correctly suspend and resume PHY device net: bcmgenet: request and enable main clock earlier net: ethernet: myricom: myri10ge: myri10ge.c: Cleaning up missing null-terminate after strncpy call xen-netfront: Fix handling packets on compound pages with skb_linearize net: fec: Support phys probed from devicetree and fixed-link smsc: replace WARN_ON() with WARN_ON_SMP() xen-netback: Don't deschedule NAPI when carrier off net: ethernet: qlogic: qlcnic: Remove duplicate object file from Makefile wan: wanxl: Remove typedefs from struct names m68k/atari: EtherNEC - ethernet support (ne) net: ethernet: ti: cpmac.c: Cleaning up missing null-terminate after strncpy call hdlc: Remove typedefs from struct names airo_cs: Remove typedef local_info_t atmel: Remove typedef atmel_priv_ioctl com20020_cs: Remove typedef com20020_dev_t ethernet: amd: Remove typedef local_info_t net: Always untag vlan-tagged traffic on input. drivers: net: Add APM X-Gene SoC ethernet driver support. ...
2014-08-13Merge tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds16-383/+809
Pull NFS client updates from Trond Myklebust: "Highlights include: - stable fix for a bug in nfs3_list_one_acl() - speed up NFS path walks by supporting LOOKUP_RCU - more read/write code cleanups - pNFS fixes for layout return on close - fixes for the RCU handling in the rpcsec_gss code - more NFS/RDMA fixes" * tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) nfs: reject changes to resvport and sharecache during remount NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred NFS: fix two problems in lookup_revalidate in RCU-walk NFS: allow lockless access to access_cache NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU NFS: support RCU_WALK in nfs_permission() sunrpc/auth: allow lockless (rcu) lookup of credential cache. NFS: prepare for RCU-walk support but pushing tests later in code. NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used. NFS: add checks for returned value of try_module_get() nfs: clear_request_commit while holding i_lock pnfs: add pnfs_put_lseg_async pnfs: find swapped pages on pnfs commit lists too nfs: fix comment and add warn_on for PG_INODE_REF nfs: check wait_on_bit_lock err in page_group_lock sunrpc: remove "ec" argument from encrypt_v2 operation sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c ...