2014-07-21af_iucv: avoid path quiesce of severed path in shutdown()Ursula Braun1-1/+2
An af_iucv stress test showed -EPIPE results for sendmsg() calls. They are caused by quiescing a path even though it has been already severed by peer. For IUCV transport shutdown() consists of 2 steps: (1) sending the shutdown message to peer (2) quiescing the iucv path If the iucv path between these 2 steps is severed due to peer closing the path, the quiesce step is no longer needed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15af_iucv: remove unnecessary break after gotoFabian Frederick1-1/+0
Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30af_iucv: correct cleanup if listen backlog is fullUrsula Braun1-2/+1
In case of transport HIPER a sock struct is allocated for an incoming connect request. If the backlog queue is full this socket is not needed, but is left in the list of af_iucv sockets. Final socket release posts console message "Attempt to release alive iucv socket". This patch makes sure the new created socket is cleaned up correctly if the backlog queue is full. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30af_iucv: Add automatic (source) iucv_name to bindPhilipp Hachtmann1-11/+18
If a socket is bound to an address using before calling connect it is usual to leave it to the network system to choose an appropriate outgoing application name respective port address. af_iucv on VM uses a counter and uses simple numbers as unique identifiers. This behaviour was missing when af_iucv is used with HiperSockets. This patch contains a simple approach to harmonize af_iucv's behaviour. Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14af_iucv: wrong mapping of sent and confirmed skbsUrsula Braun1-1/+1
When sending data through IUCV a MESSAGE COMPLETE interrupt signals that sent data memory can be freed or reused again. With commit f9c41a62bba3f3f7ef3541b2a025e3371bcbba97 "af_iucv: fix recvmsg by replacing skb_pull() function" the MESSAGE COMPLETE callback iucv_callback_txdone() identifies the wrong skb as being confirmed, which leads to data corruption. This patch fixes the skb mapping logic in iucv_callback_txdone(). Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11net: Fix use after free by removing length arg from sk_data_ready callbacks.David S. Miller1-2/+2
Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20af_iucv: recvmsg problem for SOCK_STREAM socketsUrsula Braun1-0/+1
Commit f9c41a62bba3f3f7ef3541b2a025e3371bcbba97 introduced a problem for SOCK_STREAM sockets, when only part of the incoming iucv message is received by user space. In this case the remaining data of the iucv message is lost. This patch makes sure an incompletely received iucv message is queued back to the receive queue. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reported-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa1-2/+0
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-28net: pass info struct via netdevice notifierJiri Pirko1-1/+1
So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08af_iucv: fix recvmsg by replacing skb_pull() functionUrsula Braun1-18/+16
When receiving data messages, the "BUG_ON(skb->len < skb->data_len)" in the skb_pull() function triggers a kernel panic. Replace the skb_pull logic by a per skb offset as advised by Eric Dumazet. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts: drivers/nfc/microread/mei.c net/netfilter/nfnetlink_queue_core.c Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which some cleanups are going to go on-top. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()Mathias Krause1-0/+2
The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about iucv_sock_recvmsg() not filling the msg_name in case it was set. Cc: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02net: fix smatch warnings inside datagram_pollJacob Keller1-1/+1
Commit 7d4c04fc170087119727119074e72445f2bb192b ("net: add option to enable error queue packets waking select") has an issue due to operator precedence causing the bit-wise OR to bind to the sock_flags call instead of the result of the terniary conditional. This fixes the *_poll functions to work properly. The old code results in "mask |= POLLPRI" instead of what was intended, which is to only include POLLPRI when the socket option is enabled. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-31net: add option to enable error queue packets waking selectKeller, Jacob E1-1/+2
Currently, when a socket receives something on the error queue it only wakes up the socket on select if it is in the "read" list, that is the socket has something to read. It is useful also to wake the socket if it is in the error list, which would enable software to wait on error queue packets without waking up for regular data on the socket. The main use case is for receiving timestamped transmit packets which return the timestamp to the socket via the error queue. This enables an application to select on the socket for the error queue only instead of for the regular traffic. -v2- * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file * Modified every socket poll function that checks error queue Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Matthew Vick <matthew.vick@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin1-14/+7
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-15net: remove skb_orphan_try()Eric Dumazet1-1/+0
Orphaning skb in dev_hard_start_xmit() makes bonding behavior unfriendly for applications sending big UDP bursts : Once packets pass the bonding device and come to real device, they might hit a full qdisc and be dropped. Without orphaning, the sender is automatically throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming sk_sndbuf is not too big) We could try to defer the orphaning adding another test in dev_hard_start_xmit(), but all this seems of little gain, now that BQL tends to make packets more likely to be parked in Qdisc queues instead of NIC TX ring, in cases where performance matters. Reverts commits : fc6055a5ba31 net: Introduce skb_orphan_try() 87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try() and removes SKBTX_DRV_NEEDS_SK_REF flag Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-07af_iucv: add shutdown for HS transportUrsula Braun1-27/+52
AF_IUCV sockets offer a shutdown function. This patch makes sure shutdown works for HS transport as well. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-07af_iucv: handle netdev eventsUrsula Braun1-44/+62
In case of transport through HiperSockets the underlying network interface may switch to DOWN state or the underlying network device may recover. In both cases the socket must change to IUCV_DISCONN state. If the interface goes down, af_iucv has a chance to notify its connection peer in addition. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08af_iucv: allow retrieval of maximum message sizeUrsula Braun1-1/+9
For HS transport the maximum message size depends on the MTU-size of the HS-device bound to the AF_IUCV socket. This patch adds a getsockopt option MSGSIZE returning the maximum message size that can be handled for this AF_IUCV socket. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08af_iucv: change net_device handling for HS transportUrsula Braun1-57/+62
This patch saves the net_device in the iucv_sock structure during bind in order to fasten skb sending. In addition some other small improvements are made for HS transport: - error checking when sending skbs - locking changes in afiucv_hs_callback_txnotify - skb freeing in afiucv_hs_callback_txnotify And finally it contains code cleanup to get rid of iucv_skb_queue_purge. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08af_iucv: block writing if msg limit is exceededUrsula Braun1-1/+1
When polling on an AF_IUCV socket, writing should be blocked if the number of pending messages exceeds a defined limit. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08af_iucv: remove IUCV-pathes completelyUrsula Braun1-34/+37
A SEVER is missing in the callback of a receiving SEVERED. This may inhibit z/VM to remove the corresponding IUCV-path completely. This patch adds a SEVER in iucv_callback_connrej (together with additional locking. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20af_iucv: get rid of state IUCV_SEVEREDUrsula Braun1-27/+8
af_iucv differs unnecessarily between state IUCV_SEVERED and IUCV_DISCONN. This patch removes state IUCV_SEVERED. While simplifying af_iucv, this patch removes the 2nd invocation of cpcmd as well. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20af_iucv: remove unused timer infrastructureUrsula Braun1-22/+0
af_iucv contains timer infrastructure which is not exploited. This patch removes the timer related code parts. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20af_iucv: release reference to HS deviceUrsula Braun1-13/+24
For HiperSockets transport skbs sent are bound to one of the available HiperSockets devices. Add missing release of reference to a HiperSockets device before freeing an skb. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20af_iucv: accelerate close for HS transportUrsula Braun1-0/+7
Closing an af_iucv socket may wait for confirmation of outstanding send requests. This patch adds confirmation code for the new HiperSockets transport. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20af_iucv: support ancillary data with HS transportUrsula Braun1-0/+2
The AF_IUCV address family offers support for ancillary data. This patch enables usage of ancillary data with the new HiperSockets transport. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-13net: more accurate skb truesizeEric Dumazet1-1/+1
skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored. Considering that skb_shared_info is larger than sk_buff, its time to take it into account for better memory accounting. This patch introduces SKB_TRUESIZE(X) macro to centralize various assumptions into a single place. At skb alloc phase, we put skb_shared_info struct at the exact end of skb head, to allow a better use of memory (lowering number of reallocations), since kmalloc() gives us power-of-two memory blocks. Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are aligned to cache lines, as before. Note: This patch might trigger performance regressions because of misconfigured protocol stacks, hitting per socket or global memory limits that were previously not reached. But its a necessary step for a more accurate memory accounting. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <ak@linux.intel.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13af_iucv: add HiperSockets transportUrsula Braun1-72/+677
The current transport mechanism for af_iucv is the z/VM offered communications facility IUCV. To provide equivalent support when running Linux in an LPAR, HiperSockets transport is added to the AF_IUCV address family. It requires explicit binding of an AF_IUCV socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV is announced. An af_iucv specific transport header is defined preceding the skb data. A small protocol is implemented for connecting and for flow control/congestion management. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13af_iucv: cleanup - use iucv_sk(sk) earlyUrsula Braun1-21/+23
Code cleanup making make use of local variable for struct iucv_sock. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13af_iucv: use loadable iucv interfaceFrank Blaschka1-45/+74
For future af_iucv extensions the module should be able to run in LPAR mode too. For this we use the new dynamic loading iucv interface. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13af_iucv: get rid of compile warningUrsula Braun1-7/+2
-Wunused-but-set-variable generates compile warnings. The affected variables are removed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31Fix common misspellingsLucas De Marchi1-1/+1
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2010-05-26net/iucv: Add missing spin_unlockJulia Lawall1-1/+1
Add a spin_unlock missing on the error path. There seems like no reason why the lock should continue to be held if the kzalloc fail. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Remove unnecessary returns from void function()sJoe Perches1-1/+0
This patch removes from net/ (but not any netfilter files) all the unnecessary return; statements that precede the last closing brace of void functions. It does not remove the returns that are immediately preceded by a label as gcc doesn't like that. Done via: $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \ xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }' Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-01net: sock_def_readable() and friends RCU conversionEric Dumazet1-4/+7
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20net: sk_sleep() helperEric Dumazet1-6/+6
Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-15const: constify remaining dev_pm_opsAlexey Dobriyan1-1/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-05net: pass kern to net_proto_family create functionEric Paris1-1/+2
The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-17af_iucv: remove duplicate sock_set_flagUrsula Braun1-1/+0
Remove duplicate sock_set_flag(sk, SOCK_ZAPPED) in iucv_sock_close, which has been overlooked in September-commit 7514bab04e567c9408fe0facbde4277f09d5eb74. Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-17af_iucv: use sk functions to modify sk->sk_ack_backlogHendrik Brueckner1-2/+2
Instead of modifying sk->sk_ack_backlog directly, use respective socket functions. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07net: mark net_proto_ops as constStephen Hemminger1-1/+1
All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30net: Make setsockopt() optlen be unsigned.David S. Miller1-1/+1
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-16af_iucv: fix race when queueing skbs on the backlog queueHendrik Brueckner1-2/+14
iucv_sock_recvmsg() and iucv_process_message()/iucv_fragment_skb race for dequeuing an skb from the backlog queue. If iucv_sock_recvmsg() dequeues first, iucv_process_message() calls sock_queue_rcv_skb() with an skb that is NULL. This results in the following kernel panic: <1>Unable to handle kernel pointer dereference at virtual kernel address (null) <4>Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC <4>Modules linked in: af_iucv sunrpc qeth_l3 dm_multipath dm_mod vmur qeth ccwgroup <4>CPU: 0 Not tainted 2.6.30 #4 <4>Process client-iucv (pid: 4787, task: 0000000034e75940, ksp: 00000000353e3710) <4>Krnl PSW : 0704000180000000 000000000043ebca (sock_queue_rcv_skb+0x7a/0x138) <4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 <4>Krnl GPRS: 0052900000000000 000003e0016e0fe8 0000000000000000 0000000000000000 <4> 000000000043eba8 0000000000000002 0000000000000001 00000000341aa7f0 <4> 0000000000000000 0000000000007800 0000000000000000 0000000000000000 <4> 00000000341aa7f0 0000000000594650 000000000043eba8 000000003fc2fb28 <4>Krnl Code: 000000000043ebbe: a7840006 brc 8,43ebca <4> 000000000043ebc2: 5930c23c c %r3,572(%r12) <4> 000000000043ebc6: a724004c brc 2,43ec5e <4> >000000000043ebca: e3c0b0100024 stg %r12,16(%r11) <4> 000000000043ebd0: a7190000 lghi %r1,0 <4> 000000000043ebd4: e310b0200024 stg %r1,32(%r11) <4> 000000000043ebda: c010ffffdce9 larl %r1,43a5ac <4> 000000000043ebe0: e310b0800024 stg %r1,128(%r11) <4>Call Trace: <4>([<000000000043eba8>] sock_queue_rcv_skb+0x58/0x138) <4> [<000003e0016bcf2a>] iucv_process_message+0x112/0x3cc [af_iucv] <4> [<000003e0016bd3d4>] iucv_callback_rx+0x1f0/0x274 [af_iucv] <4> [<000000000053a21a>] iucv_message_pending+0xa2/0x120 <4> [<000000000053b5a6>] iucv_tasklet_fn+0x176/0x1b8 <4> [<000000000014fa82>] tasklet_action+0xfe/0x1f4 <4> [<0000000000150a56>] __do_softirq+0x116/0x284 <4> [<0000000000111058>] do_softirq+0xe4/0xe8 <4> [<00000000001504ba>] irq_exit+0xba/0xd8 <4> [<000000000010e0b2>] do_extint+0x146/0x190 <4> [<00000000001184b6>] ext_no_vtime+0x1e/0x22 <4> [<00000000001fbf4e>] kfree+0x202/0x28c <4>([<00000000001fbf44>] kfree+0x1f8/0x28c) <4> [<000000000044205a>] __kfree_skb+0x32/0x124 <4> [<000003e0016bd8b2>] iucv_sock_recvmsg+0x236/0x41c [af_iucv] <4> [<0000000000437042>] sock_aio_read+0x136/0x160 <4> [<0000000000205e50>] do_sync_read+0xe4/0x13c <4> [<0000000000206dce>] vfs_read+0x152/0x15c <4> [<0000000000206ed0>] SyS_read+0x54/0xac <4> [<0000000000117c8e>] sysc_noemu+0x10/0x16 <4> [<00000042ff8def3c>] 0x42ff8def3c Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-16af_iucv: do not call iucv_sock_kill() twiceHendrik Brueckner1-5/+5
For non-accepted sockets on the accept queue, iucv_sock_kill() is called twice (in iucv_sock_close() and iucv_sock_cleanup_listen()). This typically results in a kernel oops as shown below. Remove the duplicate call to iucv_sock_kill() and set the SOCK_ZAPPED flag in iucv_sock_close() only. The iucv_sock_kill() function frees a socket only if the socket is zapped and orphaned (sk->sk_socket == NULL): - Non-accepted sockets are always orphaned and, thus, iucv_sock_kill() frees the socket twice. - For accepted sockets or sockets created with iucv_sock_create(), sk->sk_socket is initialized. This caused the first call to iucv_sock_kill() to return immediately. To free these sockets, iucv_sock_release() uses sock_orphan() before calling iucv_sock_kill(). <1>Unable to handle kernel pointer dereference at virtual kernel address 000000003edd3000 <4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC <4>Modules linked in: af_iucv sunrpc qeth_l3 dm_multipath dm_mod qeth vmur ccwgroup <4>CPU: 0 Not tainted 2.6.30 #4 <4>Process iucv_sock_close (pid: 2486, task: 000000003aea4340, ksp: 000000003b75bc68) <4>Krnl PSW : 0704200180000000 000003e00168e23a (iucv_sock_kill+0x2e/0xcc [af_iucv]) <4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 <4>Krnl GPRS: 0000000000000000 000000003b75c000 000000003edd37f0 0000000000000001 <4> 000003e00168ec62 000000003988d960 0000000000000000 000003e0016b0608 <4> 000000003fe81b20 000000003839bb58 00000000399977f0 000000003edd37f0 <4> 000003e00168b000 000003e00168f138 000000003b75bcd0 000000003b75bc98 <4>Krnl Code: 000003e00168e22a: c0c0ffffe6eb larl %r12,3e00168b000 <4> 000003e00168e230: b90400b2 lgr %r11,%r2 <4> 000003e00168e234: e3e0f0980024 stg %r14,152(%r15) <4> >000003e00168e23a: e310225e0090 llgc %r1,606(%r2) <4> 000003e00168e240: a7110001 tmll %r1,1 <4> 000003e00168e244: a7840007 brc 8,3e00168e252 <4> 000003e00168e248: d507d00023c8 clc 0(8,%r13),968(%r2) <4> 000003e00168e24e: a7840009 brc 8,3e00168e260 <4>Call Trace: <4>([<000003e0016b0608>] afiucv_dbf+0x0/0xfffffffffffdea20 [af_iucv]) <4> [<000003e00168ec6c>] iucv_sock_close+0x130/0x368 [af_iucv] <4> [<000003e00168ef02>] iucv_sock_release+0x5e/0xe4 [af_iucv] <4> [<0000000000438e6c>] sock_release+0x44/0x104 <4> [<0000000000438f5e>] sock_close+0x32/0x50 <4> [<0000000000207898>] __fput+0xf4/0x250 <4> [<00000000002038aa>] filp_close+0x7a/0xa8 <4> [<00000000002039ba>] SyS_close+0xe2/0x148 <4> [<0000000000117c8e>] sysc_noemu+0x10/0x16 <4> [<00000042ff8deeac>] 0x42ff8deeac Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-16af_iucv: handle non-accepted sockets after resuming from suspendHendrik Brueckner1-0/+1
After resuming from suspend, all af_iucv sockets are disconnected. Ensure that iucv_accept_dequeue() can handle disconnected sockets which are not yet accepted. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-16af_iucv: fix race in __iucv_sock_wait()Hendrik Brueckner1-1/+1
Moving prepare_to_wait before the condition to avoid a race between schedule_timeout and wake up. The race can appear during iucv_sock_connect() and iucv_callback_connack(). Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-14net: constify remaining proto_opsAlexey Dobriyan1-2/+2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09net: adding memory barrier to the poll and receive callbacksJiri Olsa1-2/+2
Adding memory barrier after the poll_wait function, paired with receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper to wrap the memory barrier. Without the memory barrier, following race can happen. The race fires, when following code paths meet, and the tp->rcv_nxt and __add_wait_queue updates stay in CPU caches. CPU1 CPU2 sys_select receive packet ... ... __add_wait_queue update tp->rcv_nxt ... ... tp->rcv_nxt check sock_def_readable ... { schedule ... if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible(sk->sk_sleep) ... } If there was no cache the code would work ok, since the wait_queue and rcv_nxt are opposit to each other. Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already passed the tp->rcv_nxt check and sleeps, or will get the new value for tp->rcv_nxt and will return with new data mask. In both cases the process (CPU1) is being added to the wait queue, so the waitqueue_active (CPU2) call cannot miss and will wake up CPU1. The bad case is when the __add_wait_queue changes done by CPU1 stay in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then endup calling schedule and sleep forever if there are no more data on the socket. Calls to poll_wait in following modules were ommited: net/bluetooth/af_bluetooth.c net/irda/af_irda.c net/irda/irnet/irnet_ppp.c net/mac80211/rc80211_pid_debugfs.c net/phonet/socket.c net/rds/af_rds.c net/rfkill/core.c net/sunrpc/cache.c net/sunrpc/rpc_pipe.c net/tipc/socket.c Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>