aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-01-07phy: Centralise print about attached phyAndrew Lunn1-3/+2
Many Ethernet drivers contain the same netdev_info() print statement about the attached phy. Move it into the phy device code. Additionally add a varargs function which can be used to append additional information. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller7-33/+62
2016-01-06tcp: fix zero cwnd in tcp_cwnd_reductionYuchung Cheng1-0/+3
Patch 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode conditionally") introduced a bug that cwnd may become 0 when both inflight and sndcnt are 0 (cwnd = inflight + sndcnt). This may lead to a div-by-zero if the connection starts another cwnd reduction phase by setting tp->prior_cwnd to the current cwnd (0) in tcp_init_cwnd_reduction(). To prevent this we skip PRR operation when nothing is acked or sacked. Then cwnd must be positive in all cases as long as ssthresh is positive: 1) The proportional reduction mode inflight > ssthresh > 0 2) The reduction bound mode a) inflight == ssthresh > 0 b) inflight < ssthresh sndcnt > 0 since newly_acked_sacked > 0 and inflight < ssthresh Therefore in all cases inflight and sndcnt can not both be 0. We check invalid tp->prior_cwnd to avoid potential div0 bugs. In reality this bug is triggered only with a sequence of less common events. For example, the connection is terminating an ECN-triggered cwnd reduction with an inflight 0, then it receives reordered/old ACKs or DSACKs from prior transmission (which acks nothing). Or the connection is in fast recovery stage that marks everything lost, but fails to retransmit due to local issues, then receives data packets from other end which acks nothing. Fixes: 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode conditionally") Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06net: Add eth_platform_get_mac_address() helper.David S. Miller1-0/+31
A repeating pattern in drivers has become to use OF node information and, if not found, platform specific host information to extract the ethernet address for a given device. Currently this is done with a call to of_get_mac_address() and then some ifdef'd stuff for SPARC. Consolidate this into a portable routine, and provide the arch_get_platform_mac_address() weak function hook for all architectures to implement if they want. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06net: possible use after free in dst_releaseFrancesco Ruggeri1-1/+2
dst_release should not access dst->flags after decrementing __refcnt to 0. The dst_entry may be in dst_busy_list and dst_gc_task may dst_destroy it before dst_release gets a chance to access dst->flags. Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()") Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06bridge: add vlan filtering change for new bridged deviceElad Raz1-0/+11
Notifying hardware about newly bridged port vlan-aware changes. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06bridge: add vlan filtering change notificationElad Raz1-2/+16
Notifying hardware about bridge vlan-aware changes. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06bridge: Propagate vlan add failure to userElad Raz1-1/+10
Disallow adding interfaces to a bridge when vlan filtering operation failed. Send the failure code to the user. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06net: sched: fix missing free per cpu on qstatsJohn Fastabend1-1/+3
When a qdisc is using per cpu stats (currently just the ingress qdisc) only the bstats are being freed. This also free's the qstats. Fixes: b0ab6f92752b9f9d8 ("net: sched: enable per cpu qstats") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06soreuseport: change consume_skb to kfree_skb in error caseCraig Gallek1-1/+1
Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06soreuseport: pass skb to secondary UDP socket lookupCraig Gallek2-8/+12
This socket-lookup path did not pass along the skb in question in my original BPF-based socket selection patch. The skb in the udpN_lib_lookup2 path can be used for BPF-based socket selection just like it is in the 'traditional' udpN_lib_lookup path. udpN_lib_lookup2 kicks in when there are greater than 10 sockets in the same hlist slot. Coincidentally, I chose 10 sockets per reuseport group in my functional test, so the lookup2 path was not excersised. This adds an additional set of tests with 20 sockets. Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") Fixes: 3ca8e4029969 ("soreuseport: BPF selection functional test") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05inet: kill unused skb_free opFlorian Westphal4-12/+1
The only user was removed in commit 029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations"). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05bridge: Only call /sbin/bridge-stp for the initial network namespaceHannes Frederic Sowa1-1/+4
[I stole this patch from Eric Biederman. He wrote:] > There is no defined mechanism to pass network namespace information > into /sbin/bridge-stp therefore don't even try to invoke it except > for bridge devices in the initial network namespace. > > It is possible for unprivileged users to cause /sbin/bridge-stp to be > invoked for any network device name which if /sbin/bridge-stp does not > guard against unreasonable arguments or being invoked twice on the > same network device could cause problems. [Hannes: changed patch using netns_eq] Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assocXin Long1-16/+1
sctp_endpoint_lookup_assoc is called in the protection of sock lock there is no need to call local_bh_disable in this function. so remove them. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05sctp: drop the old assoc hashtable of sctpXin Long4-96/+3
transport hashtable will replace the association hashtable, so association hashtable is not used in sctp any more, so drop the codes about that. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05sctp: apply rhashtable api to sctp procfsXin Long1-143/+173
Traversal the transport rhashtable, get the association only once through the condition assoc->peer.primary_path != transport. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05sctp: apply rhashtable api to send/recv pathXin Long4-56/+29
apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc and __sctp_lookup_association, it's invoked in the protection of sock lock, it will be safe, but sctp_lookup_association need to call rcu_read_lock() and to detect the t->dead to protect it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05sctp: add the rhashtable apis for sctp global transport hashtableXin Long1-0/+131
tranport hashtbale will replace the association hashtable to do the lookup for transport, and then get association by t->assoc, rhashtable apis will be used because of it's resizable, scalable and using rcu. lport + rport + paddr will be the base hashkey to locate the chain, with net to protect one netns from another, then plus the laddr to compare to get the target. this patch will provider the lookup functions: - sctp_epaddr_lookup_transport - sctp_addrs_lookup_transport hash/unhash functions: - sctp_hash_transport - sctp_unhash_transport init/destroy functions: - sctp_transport_hashtable_init - sctp_transport_hashtable_destroy Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04af_unix: Fix splice-bind deadlockRainer Weikusat1-26/+40
On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice system call and AF_UNIX sockets, http://lists.openwall.net/netdev/2015/11/06/24 The situation was analyzed as (a while ago) A: socketpair() B: splice() from a pipe to /mnt/regular_file does sb_start_write() on /mnt C: try to freeze /mnt wait for B to finish with /mnt A: bind() try to bind our socket to /mnt/new_socket_name lock our socket, see it not bound yet decide that it needs to create something in /mnt try to do sb_start_write() on /mnt, block (it's waiting for C). D: splice() from the same pipe to our socket lock the pipe, see that socket is connected try to lock the socket, block waiting for A B: get around to actually feeding a chunk from pipe to file, try to lock the pipe. Deadlock. on 2015/11/10 by Al Viro, http://lists.openwall.net/netdev/2015/11/10/4 The patch fixes this by removing the kern_path_create related code from unix_mknod and executing it as part of unix_bind prior acquiring the readlock of the socket in question. This means that A (as used above) will sb_start_write on /mnt before it acquires the readlock, hence, it won't indirectly block B which first did a sb_start_write and then waited for a thread trying to acquire the readlock. Consequently, A being blocked by C waiting for B won't cause a deadlock anymore (effectively, both A and B acquire two locks in opposite order in the situation described above). Dmitry Vyukov(<dvyukov@google.com>) tested the original patch. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04net: Propagate lookup failure in l3mdev_get_saddr to callerDavid Ahern2-4/+10
Commands run in a vrf context are not failing as expected on a route lookup: root@kenny:~# ip ro ls table vrf-red unreachable default root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254 ping: Warning: source address might be selected on device other than vrf-red. PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data. --- 10.100.1.254 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms Since the vrf table does not have a route for 10.100.1.254 the ping should have failed. The saddr lookup causes a full VRF table lookup. Propogating a lookup failure to the user allows the command to fail as expected: root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254 connect: No route to host Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPFCraig Gallek6-41/+229
Expose socket options for setting a classic or extended BPF program for use when selecting sockets in an SO_REUSEPORT group. These options can be used on the first socket to belong to a group before bind or on any socket in the group after bind. This change includes refactoring of the existing sk_filter code to allow reuse of the existing BPF filter validation checks. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04soreuseport: fast reuseport UDP socket selectionCraig Gallek3-33/+138
Include a struct sock_reuseport instance when a UDP socket binds to a specific address for the first time with the reuseport flag set. When selecting a socket for an incoming UDP packet, use the information available in sock_reuseport if present. This required adding an additional field to the UDP source address equality function to differentiate between exact and wildcard matches. The original use case allowed wildcard matches when checking for existing port uses during bind. The new use case of adding a socket to a reuseport group requires exact address matching. Performance test (using a machine with 2 CPU sockets and a total of 48 cores): Create reuseport groups of varying size. Use one socket from this group per user thread (pinning each thread to a different core) calling recvmmsg in a tight loop. Record number of messages received per second while saturating a 10G link. 10 sockets: 18% increase (~2.8M -> 3.3M pkts/s) 20 sockets: 14% increase (~2.9M -> 3.3M pkts/s) 40 sockets: 13% increase (~3.0M -> 3.4M pkts/s) This work is based off a similar implementation written by Ying Cai <ycai@google.com> for implementing policy-based reuseport selection. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04soreuseport: define reuseport groupsCraig Gallek2-1/+174
struct sock_reuseport is an optional shared structure referenced by each socket belonging to a reuseport group. When a socket is bound to an address/port not yet in use and the reuseport flag has been set, the structure will be allocated and attached to the newly bound socket. When subsequent calls to bind are made for the same address/port, the shared structure will be updated to include the new socket and the newly bound socket will reference the group structure. Usually, when an incoming packet was destined for a reuseport group, all sockets in the same group needed to be considered before a dispatching decision was made. With this structure, an appropriate socket can be found after looking up just one socket in the group. This shared structure will also allow for more complicated decisions to be made when selecting a socket (eg a BPF filter). This work is based off a similar implementation written by Ying Cai <ycai@google.com> for implementing policy-based reuseport selection. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04Merge tag 'nfc-next-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-nextDavid S. Miller6-5/+57
Samuel Ortiz says: ==================== NFC 4.5 pull request This is the first NFC pull request for 4.5 and it brings: - A new driver for the STMicroelectronics ST95HF NFC chipset. The ST95HF is an NFC digital transceiver with an embedded analog front-end and as such relies on the Linux NFC digital implementation. This is the 3rd user of the NFC digital stack. - ACPI support for the ST st-nci and st21nfca drivers. - A small improvement for the nfcsim driver, as we can now tune the Rx delay through sysfs. - A bunch of minor cleanups and small fixes from Christophe Ricard, for a few drivers and the NFC core code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04udp: properly support MSG_PEEK with truncated buffersEric Dumazet2-4/+8
Backport of this upstream commit into stable kernels : 89c22d8c3b27 ("net: Fix skb csum races when peeking") exposed a bug in udp stack vs MSG_PEEK support, when user provides a buffer smaller than skb payload. In this case, skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov); returns -EFAULT. This bug does not happen in upstream kernels since Al Viro did a great job to replace this into : skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg); This variant is safe vs short buffers. For the time being, instead reverting Herbert Xu patch and add back skb->ip_summed invalid changes, simply store the result of udp_lib_checksum_complete() so that we avoid computing the checksum a second time, and avoid the problematic skb_copy_and_csum_datagram_iovec() call. This patch can be applied on recent kernels as it avoids a double checksumming, then backported to stable kernels as a bug fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04l2tp: rely on ppp layer for skb scrubbingGuillaume Nault1-16/+1
Since 79c441ae505c ("ppp: implement x-netns support"), the PPP layer calls skb_scrub_packet() whenever the skb is received on the PPP device. Manually resetting packet meta-data in the L2TP layer is thus redundant. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller15-87/+107
2015-12-31Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-nextDavid S. Miller5-52/+28
Johan Hedberg says: ==================== pull request: bluetooth-next 2015-12-31 Here's (probably) the last bluetooth-next pull request for the 4.5 kernel: - Add support for BCM2E65 ACPI ID - Minor fixes/cleanups in the bcm203x & bfusb drivers - Minor debugfs related fix in 6lowpan code Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-31ethtool: Add phy statisticsAndrew Lunn1-1/+80
Ethernet PHYs can maintain statistics, for example errors while idle and receive errors. Add an ethtool mechanism to retrieve these statistics, using the same model as MAC statistics. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-30sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_closeXin Long2-4/+5
In sctp_close, sctp_make_abort_user may return NULL because of memory allocation failure. If this happens, it will bypass any state change and never free the assoc. The assoc has no chance to be freed and it will be kept in memory with the state it had even after the socket is closed by sctp_close(). So if sctp_make_abort_user fails to allocate memory, we should abort the asoc via sctp_primitive_ABORT as well. Just like the annotation in sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said, "Even if we can't send the ABORT due to low memory delete the TCB. This is a departure from our typical NOMEM handling". But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would dereference the chunk pointer, and system crash. So we should add SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other places where it adds SCTP_CMD_REPLY cmd. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-30net, socket, socket_wq: fix missing initialization of flagsNicolai Stange1-0/+1
Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from the current 4.4 release cycle introduced a new flags member in struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA from struct socket's flags member into that new place. Unfortunately, the new flags field is never initialized properly, at least not for the struct socket_wq instance created in sock_alloc_inode(). One particular issue I encountered because of this is that my GNU Emacs failed to draw anything on my desktop -- i.e. what I got is a transparent window, including the title bar. Bisection lead to the commit mentioned above and further investigation by means of strace told me that Emacs is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is reproducible 100% of times and the fact that properly initializing the struct socket_wq ->flags fixes the issue leads me to the conclusion that somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags, preventing my Emacs from receiving any SIGIO's due to data becoming available and it got stuck. Make sock_alloc_inode() set the newly created struct socket_wq's ->flags member to zero. Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-29openvswitch: Fix template leak in error cases.Joe Stringer1-2/+4
Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a reference leak on helper objects, but inadvertently introduced a leak on the ct template. Previously, ct_info.ct->general.use was initialized to 0 by nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action() returned successful. If an error occurred while adding the helper or adding the action to the actions buffer, the __ovs_ct_free_action() cleanup would use nf_ct_put() to free the entry; However, this relies on atomic_dec_and_test(ct_info.ct->general.use). This reference must be incremented first, or nf_ct_put() will never free it. Fix the issue by acquiring a reference to the template immediately after allocation. Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-29NFC: nci: memory leak in nci_core_conn_create()Dan Carpenter1-3/+3
I've moved the check for "number_destination_params" forward a few lines to avoid leaking "cmd". Fixes: caa575a86ec1 ('NFC: nci: fix possible crash in nci_core_conn_create') Acked-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-12-29nfc: netlink: HCI event connectivity implementationChristophe Ricard3-0/+51
Add support for missing HCI event EVT_CONNECTIVITY and forward it to userspace. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-12-29NFC: nci: Fix error check of nci_hci_create_pipe() resultChristophe Ricard1-1/+1
net/nfc/nci/hci.c: In function nci_hci_connect_gate : net/nfc/nci/hci.c:679: warning: comparison is always false due to limited range of data type In case of error, nci_hci_create_pipe() returns NCI_HCI_INVALID_PIPE, and not a negative error code. Correct the check to fix this. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-12-29NFC: digital: Add Type4A tags supportShikha Singh1-1/+2
The definition of DIGITAL_PROTO_NFCA_RF_TECH is modified to support ISO14443 Type4A tags. Without this change it is not possible to start polling for ISO14443 Type4A tags from the initiator side. Signed-off-by: Shikha Singh <shikha.singh@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-12-28sctp: label accepted/peeled off socketsMarcelo Ricardo Leitner1-0/+2
Accepted or peeled off sockets were missing a security label (e.g. SELinux) which means that socket was in "unlabeled" state. This patch clones the sock's label from the parent sock and resolves the issue (similar to AF_BLUETOOTH protocol family). Cc: Paul Moore <pmoore@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-28sctp: use GFP_USER for user-controlled kmallocMarcelo Ricardo Leitner1-3/+6
Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc") missed two other spots. For connectx, as it's more likely to be used by kernel users of the API, it detects if GFP_USER should be used or not. Fixes: cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-25ip_tunnel: Move stats update to iptunnel_xmit()Pravin B Shelar7-34/+21
By moving stats update into iptunnel_xmit(), we can simplify iptunnel_xmit() usage. With this change there is no need to call another function (iptunnel_xmit_stats()) to update stats in tunnel xmit code path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23bridge: use kobj_to_dev instead of to_devGeliang Tang1-2/+1
kobj_to_dev has been defined in linux/device.h, so I replace to_dev with it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23ipv6: honor ifindex in case we receive ll addresses in router advertisementsHannes Frederic Sowa1-2/+2
Marc Haber reported we don't honor interface indexes when we receive link local router addresses in router advertisements. Luckily the non-strict version of ipv6_chk_addr already does the correct job here, so we can simply use it to lighten the checks and use those addresses by default without any configuration change. Link: <http://permalink.gmane.org/gmane.linux.network/391348> Reported-by: Marc Haber <mh+netdev@zugschlus.de> Cc: Marc Haber <mh+netdev@zugschlus.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22tcp: honour SO_BINDTODEVICE for TW_RST case tooFlorian Westphal3-10/+15
Hannes points out that when we generate tcp reset for timewait sockets we pretend we found no socket and pass NULL sk to tcp_vX_send_reset(). Make it cope with inet tw sockets and then provide tw sk. This makes RSTs appear on correct interface when SO_BINDTODEVICE is used. Packetdrill test case: // want default route to be used, we rely on BINDTODEVICE `ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 // test case still works due to BINDTODEVICE 0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0 0.100...0.200 connect(3, ..., ...) = 0 0.100 > S 0:0(0) <mss 1460,sackOK,nop,nop> 0.200 < S. 0:0(0) ack 1 win 32792 <mss 1460,sackOK,nop,nop> 0.200 > . 1:1(0) ack 1 0.210 close(3) = 0 0.210 > F. 1:1(0) ack 1 win 29200 0.300 < . 1:1(0) ack 2 win 46 // more data while in FIN_WAIT2, expect RST 1.300 < P. 1:1001(1000) ack 1 win 46 // fails without this change -- default route is used 1.301 > R 1:1(0) win 0 Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22tcp: send_reset: test for non-NULL sk firstFlorian Westphal2-9/+8
tcp_md5_do_lookup requires a full socket, so once we extend _send_reset() to also accept timewait socket we would have to change if (!sk && hash_location) to something like if ((!sk || !sk_fullsock(sk)) && hash_location) { ... } else { (sk && sk_fullsock(sk)) tcp_md5_do_lookup() } Switch the two branches: check if we have a socket first, then fall back to a listener lookup if we saw a md5 option (hash_location). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22addrconf: always initialize sysctl table dataWANG Cong1-7/+4
When sysctl performs restrict writes, it allows to write from a middle position of a sysctl file, which requires us to initialize the table data before calling proc_dostring() for the write case. Fixes: 3d1bec99320d ("ipv6: introduce secret_stable to ipv6_devconf") Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecDavid S. Miller3-62/+75
Steffen Klassert says: ==================== pull request (net): ipsec 2015-12-22 Just one patch to fix dst_entries_init with multiple namespaces. From Dan Streetman. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22net: tcp: deal with listen sockets properly in tcp_abort.Lorenzo Colitti1-0/+5
When closing a listen socket, tcp_abort currently calls tcp_done without clearing the request queue. If the socket has a child socket that is established but not yet accepted, the child socket is then left without a parent, causing a leak. Fix this by setting the socket state to TCP_CLOSE and calling inet_csk_listen_stop with the socket lock held, like tcp_close does. Tested using net_test. With this patch, calling SOCK_DESTROY on a listen socket that has an established but not yet accepted child socket results in the parent and the child being closed, such that they no longer appear in sock_diag dumps. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22ipv6/addrlabel: fix ip6addrlbl_get()Andrey Ryabinin1-1/+1
ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded, ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed, ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer. Fix this by inverting ip6addrlbl_hold() check. Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22switchdev: bridge: Pass ageing time as clock_t instead of jiffiesIdo Schimmel1-1/+1
The bridge's ageing time is offloaded to hardware when: 1) A port joins a bridge 2) The ageing time of the bridge is changed In the first case the ageing time is offloaded as jiffies, but in the second case it's offloaded as clock_t, which is what existing switchdev drivers expect to receive. Fixes: 6ac311ae8bfb ("Adding switchdev ageing notification on port bridged") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22RDS: don't pretend to use cpu notifiersSebastian Andrzej Siewior1-25/+6
It looks like an attempt to use CPU notifier here which was never completed. Nobody tried to wire it up completely since 2k9. So I unwind this code and get rid of everything not required. Oh look! 19 lines were removed while code still does the same thing. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22net-sysfs: use to_net_dev in net_namespace()Geliang Tang1-2/+2
Use to_net_dev() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>