linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-11-20	dl2k: Handle memory allocation errors in alloc_list	Ondrej Zary	1	-85/+97
	If memory allocation fails in alloc_list(), free the already allocated memory and return -ENOMEM. In rio_open(), call alloc_list() first and abort if it fails. Move HW access (set RFDListPtr) out ot alloc_list(). Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: eliminate remnants of hungarian notation	Jon Paul Maloy	6	-133/+133
	The number of variables with Hungarian notation (l_ptr, n_ptr etc.) has been significantly reduced over the last couple of years. We now root out the last traces of this practice. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: narrow down interface towards struct tipc_link	Jon Paul Maloy	7	-345/+415
	We move the definition of struct tipc_link from link.h to link.c in order to minimize its exposure to the rest of the code. When needed, we define new functions to make it possible for external entities to access and set data in the link. Apart from the above, there are no functional changes. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: narrow down exposure of struct tipc_node	Jon Paul Maloy	7	-461/+462
	In our effort to have less code and include dependencies between entities such as node, link and bearer, we try to narrow down the exposed interface towards the node as much as possible. In this commit, we move the definition of struct tipc_node, along with many of its associated function declarations, from node.h to node.c. We also move some function definitions from link.c and name_distr.c to node.c, since they access fields in struct tipc_node that should not be externally visible. The moved functions are renamed according to new location, and made static whenever possible. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: convert node lock to rwlock	Jon Paul Maloy	3	-133/+136
	According to the node FSM a node in state SELF_UP_PEER_UP cannot change state inside a lock context, except when a TUNNEL_PROTOCOL (SYNCH or FAILOVER) packet arrives. However, the node's individual links may still change state. Since each link now is protected by its own spinlock, we finally have the conditions in place to convert the node spinlock to an rwlock_t. If the node state and arriving packet type are rigth, we can let the link directly receive the packet under protection of its own spinlock and the node lock in read mode. In all other cases we use the node lock in write mode. This enables full concurrent execution between parallel links during steady-state traffic situations, i.e., 99+ % of the time. This commit implements this change. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: introduce per-link spinlock	Jon Paul Maloy	3	-26/+25
	As a preparation to allow parallel links to work more independently from each other we introduce a per-link spinlock, to be stored in the struct nodes's link entry area. Since the node lock still is a regular spinlock there is no increase in parallellism at this stage. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: reduce code dependency between binding table and node layer	Jon Paul Maloy	6	-67/+74
	The file name_distr.c currently contains three functions, named_cluster_distribute(), tipc_publ_subcscribe() and tipc_publ_unsubscribe() that all directly access fields in struct tipc_node. We want to eliminate such dependencies, so we move those functions to the file node.c and rename them to tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe() respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: small cleanup of function tipc_node_check_state()	Jon Paul Maloy	1	-3/+2
	The function tipc_node_check_state() contains the core logics for handling link synchronization and failover. For this reason, it is important to keep it as comprehensible as possible. In this commit, we make three small cleanups. 1) If the node is in state SELF_DOWN_PEER_LEAVING and the received packet confirms that the peer has lost contact, there will be no further action in this function. To make this clearer, we return from the function directly after the state change. 2) Since commit 0f8b8e28fb3241f9fd ("tipc: eliminate risk of stalled link synchronization") only the logically first TUNNEL_PROTO/SYNCH packet can alter the link state and set the synch point, independently of arrival order. Hence, there is not any longer any need to adjust the synch value in case such packets arrive in disorder. We remove this adjustment. 3) It is the intention that any message arriving on any of the links may trig a check for and possible termination of a node SYNCH state. A redundant and unnoticed check for tipc_link_is_synching() obviously beats this purpose, with the effect that only packets arriving on the synching link may currently end the synch state. We remove this check. This change will further shorten the synchronization period between parallel links. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	tipc: move linearization of buffers to generic code	Jon Paul Maloy	3	-5/+3
	In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function tipc_udp_recv(). The location of the change was selected in order to make the commit easily appliable to 'net' and 'stable'. We now move this linearization to where it should be done, in the functions tipc_named_rcv() and tipc_link_proto_rcv() respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	bnx2x: Show port statistics in Multi-function	Yuval Mintz	1	-3/+1
	Today, port statistics are being presented when using `ethool -S' only for single-function devices, but there are some port statistics which are crucial for analyzing bottle-necks. E.g., HW Rx discards due to lack of buffer space [when device isn't handling ingress traffic fast enough]. Judging the pros and cons, it was decided that in-order to better support automatic dump-gathering tools, bnx2x should no longer hide those stats. This leaves only VFs lacking the port statistics. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	bnx2x: Add new SW stat 'tx_exhaustion_events'	Yuval Mintz	1	-2/+3
	Driver already has an internal counter for number of times a given queue had to be stopped due to Tx ring exhaustion. This add the counter to the statistics presented by driver, e.g., by using `ethtool -S'. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	ppp: remove PPPOX_ZOMBIE socket state	Guillaume Nault	3	-4/+3
	PPPOX_ZOMBIE is never set anymore. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	ppp: don't set sk_state to PPPOX_ZOMBIE in pppoe_disc_rcv()	Guillaume Nault	1	-20/+2
	Since 287f3a943fef ("pppoe: Use workqueue to die properly when a PADT is received"), pppoe_disc_rcv() disconnects the socket by scheduling pppoe_unbind_sock_work(). This is enough to stop socket transmission and makes the PPPOX_ZOMBIE state uncessary. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	mlxsw: spectrum: Add error paths to __mlxsw_sp_port_vlans_add	Ido Schimmel	1	-15/+46
	The operation of adding VLANs on a port via switchdev ops can fail and we need to be prepared for it. If we do not rollback hardware operations following a failure, hardware and software will remain in an inconsistent state. Solve that by adding suitable error paths to __mlxsw_sp_port_vlans_add. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	mlxsw: spectrum: Unify setting of HW VLAN filters	Ido Schimmel	1	-24/+34
	When adding or deleting VLANs from a bridged port, HW VLAN filters must be set accordingly. Instead of having the same code in both add and delete functions, just wrap it in a function and call it with the appropriate parameters. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	mlxsw: spectrum: Use correct PVID value when removing VLANs	Ido Schimmel	1	-10/+8
	When removing a range of VLANs in which PVID is a member we should use the correct PVID value instead of some VLAN in the range. Also, change two print statements to use 'dev' instead of 'mlxsw_sp_port->dev', as it's already used in other print statements in the function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	bpf: add show_fdinfo handler for maps	Daniel Borkmann	1	-1/+21
	Add a handler for show_fdinfo() to be used by the anon-inodes backend for eBPF maps, and dump the map specification there. Not only useful for admins, but also it provides a minimal way to compare specs from ELF vs pinned object. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20	net: encx24j600: move rev announcement to probe function	Jon Ringle	1	-12/+12
	When encx24j600 is open and closed many times due to userspace polling the interface, the log gets noise with this log message. Moving this to encx24j600_spi_probe function where it belongs. Signed-off-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: provide generic busy polling to all NAPI drivers	Eric Dumazet	14	-20/+16
	NAPI drivers no longer need to observe a particular protocol to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y) napi_hash_add() and napi_hash_del() are automatically called from core networking stack, respectively from netif_napi_add() and netif_napi_del() This patch depends on free_netdev() and netif_napi_del() being called from process context, which seems to be the norm. Drivers might still prefer to call napi_hash_del() on their own, since they might combine all the rcu grace periods into a single one, knowing their NAPI structures lifetime, while core networking stack has no idea of a possible combining. Once this patch proves to not bring serious regressions, we will cleanup drivers to either remove napi_hash_del() or provide appropriate rcu grace periods combining. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: napi_hash_del() returns a boolean status	Eric Dumazet	2	-5/+10
	napi_hash_del() will soon be used from both drivers (if they want) or core networking stack. Callers are responsibles to ensure an RCU grace period is respected before freeing napi structure : napi_hash_del() can signal if this RCU grace period is needed or not. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: move napi_hash[] into read mostly section	Eric Dumazet	2	-1/+5
	We do not often add/delete a napi context. Moving napi_hash[] into read mostly section avoids potential false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: add netif_tx_napi_add()	Eric Dumazet	12	-16/+38
	netif_tx_napi_add() is a variant of netif_napi_add() It should be used by drivers that use a napi structure to exclusively poll TX. We do not want to add this kind of napi in napi_hash[] in following patches, adding generic busy polling to all NAPI drivers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: move skb_mark_napi_id() into core networking stack	Eric Dumazet	13	-16/+4
	We would like to automatically provide busy polling support to all NAPI drivers, without them having to implement anything. skb_mark_napi_id() can be called from napi_gro_receive() and napi_get_frags(). Few drivers are still calling skb_mark_napi_id() because they use netif_receive_skb(). They should eventually call napi_gro_receive() instead. I will leave this to drivers maintainers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	mlx4: remove mlx4_en_low_latency_recv()	Eric Dumazet	4	-198/+2
	Busy polling can now be handled in generic NAPI poll infrastructure. This removes complexity and fast path overhead : mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi() Tested: Without busy polling : lpaa23:~# echo 0 >/proc/sys/net/core/busy_read lpaa24:~# echo 0 >/proc/sys/net/core/busy_read lpaa23:~# ./netperf -H lpaa24 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 47330.78 With busy polling : lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# ./netperf -H lpaa24 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 97643.55 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	bnx2x: remove bnx2x_low_latency_recv() support	Eric Dumazet	4	-167/+2
	Switch to native NAPI polling, as this reduces overhead and complexity. Normal path is faster, since one cmpxchg() is not anymore requested, and busy polling with the NAPI polling has same performance. Tested: lpk50:~# cat /proc/sys/net/core/busy_read 70 lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 40095.07 16384 87380 IpInReceives 401062 0.0 IpInDelivers 401062 0.0 IpOutRequests 401079 0.0 TcpActiveOpens 7 0.0 TcpPassiveOpens 3 0.0 TcpAttemptFails 3 0.0 TcpEstabResets 5 0.0 TcpInSegs 401036 0.0 TcpOutSegs 401052 0.0 TcpOutRsts 38 0.0 UdpInDatagrams 26 0.0 UdpOutDatagrams 27 0.0 Ip6OutNoRoutes 1 0.0 TcpExtDelayedACKs 1 0.0 TcpExtTCPPrequeued 98 0.0 TcpExtTCPDirectCopyFromPrequeue 98 0.0 TcpExtTCPHPHits 4 0.0 TcpExtTCPHPHitsToUser 98 0.0 TcpExtTCPPureAcks 5 0.0 TcpExtTCPHPAcks 101 0.0 TcpExtTCPAbortOnData 6 0.0 TcpExtBusyPollRxPackets 400832 0.0 TcpExtTCPOrigDataSent 400983 0.0 IpExtInOctets 21273867 0.0 IpExtOutOctets 21261254 0.0 IpExtInNoECTPkts 401064 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	mlx5: support napi_complete_done()	Eric Dumazet	3	-14/+13
	A NAPI poll handler should return number of RX packets processed, instead of 0 / budget. This allows proper busy poll accounting through LINUX_MIB_BUSYPOLLRXPACKETS SNMP counter. napi_complete_done() allows /sys/class/net/ethX/gro_flush_timeout to be used for finer GRO aggregation control. Tested: Enabled busy polling, and checked TcpExtBusyPollRxPackets counter is increasing. echo 70 >/proc/sys/net/core/busy_read nstat >/dev/null netperf -H target -t TCP_RR >/dev/null nstat \| grep TcpExtBusyPollRxPackets TcpExtBusyPollRxPackets 490958 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	mlx5: add busy polling support	Eric Dumazet	2	-0/+8
	It is now easy to add busy polling support to a NAPI driver, with very little impact on normal input path. This patch serves as a reference implementation. Note: A followup patch will add proper napi_complete_done() in mlx5, so that LINUX_MIB_BUSYPOLLRXPACKETS snmp counter is properly handled. Tested: Normal TCP_RR results without busy polling : lpk51:~# echo 0 >/proc/sys/net/core/busy_read lpk52:~# echo 0 >/proc/sys/net/core/busy_read lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 53509.49 16384 87380 Now enable busy polling : lpk51:~# echo 70 >/proc/sys/net/core/busy_read lpk52:~# echo 70 >/proc/sys/net/core/busy_read lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 97530.92 16384 87380 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: network drivers no longer need to implement ndo_busy_poll()	Eric Dumazet	1	-5/+20
	Instead of having to implement complex ndo_busy_poll() method, drivers can simply rely on NAPI poll logic. Busy polling gains are mainly coming from polling itself, not on exact details on how we poll the device. ndo_busy_poll() if implemented can avoid touching napi state, but it adds extra synchronization between normal napi->poll() and busy poll handler, slowing down the common path (non busy polling) with extra atomic operations. In practice few drivers ever got busy poll because of the complexity. We could go one step further, and make busy polling available for all NAPI drivers, but this would require that all netif_napi_del() calls are done in process context so that we can call synchronize_rcu(). Full audit would be required. Before this is done, a driver still needs to call : - skb_mark_napi_id() for each skb provided to the stack. - napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct. - Make sure RCU grace period is respected after napi_hash_del() before memory containing napi structure is freed. Followup patch implements busy poll for mlx5 driver as an example. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: allow BH servicing in sk_busy_loop()	Eric Dumazet	1	-11/+7
	Instead of blocking BH in whole sk_busy_loop(), block them only around ->ndo_busy_poll() calls. This has many benefits. 1) allow tunneled traffic to use busy poll as well as native traffic. Tunnels handlers usually call netif_rx() and depend on net_rx_action() being run (from sofirq handler) 2) allow RFS/RPS being used (sending IPI to other cpus if needed) 3) use the 'lets burn cpu cycles' budget to do useful work (like TX completions, timers, RCU callbacks...) 4) reduce BH latencies, making busy poll a better citizen. Tested: Tested with SIT tunnel lpaa5:~# echo 0 >/proc/sys/net/core/busy_read lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 37373.93 16384 87380 Now enable busy poll on both hosts lpaa5:~# echo 70 >/proc/sys/net/core/busy_read lpaa6:~# echo 70 >/proc/sys/net/core/busy_read lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 58314.77 16384 87380 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: un-inline sk_busy_loop()	Eric Dumazet	3	-55/+49
	There is really little gain from inlining this big function. We'll soon make it even bigger in following patches. This means we no longer need to export napi_by_id() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	mlx4: mlx4_en_low_latency_recv() called with BH disabled	Eric Dumazet	1	-5/+7
	mlx4_en_low_latency_recv() is called with BH disabled, as other ndo_busy_poll() methods. No need for spin_lock_bh()/spin_unlock_bh() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: better skb->sender_cpu and skb->napi_id cohabitation	Eric Dumazet	2	-20/+16
	skb->sender_cpu and skb->napi_id share a common storage, and we had various bugs about this. We had to call skb_sender_cpu_clear() in some places to not leave a prior skb->napi_id and fool netdev_pick_tx() As suggested by Alexei, we could split the space so that these errors can not happen. 0 value being reserved as the common (not initialized) value, let's reserve [1 .. NR_CPUS] range for valid sender_cpu, and [NR_CPUS+1 .. ~0U] for valid napi_id. This will allow proper busy polling support over tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	be2net: remove local variable 'status'	Ivan Vecera	1	-4/+2
	The lancer_cmd_get_file_len() uses lancer_cmd_read_object() to get the current size of registers for ethtool registers dump. Returned status value is stored but not checked. The check itself is not necessary as the data_read output variable is initialized to 0 and status variable can be removed. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net: hisilicon: fix binding document of mdio	huangdaode	1	-1/+6
	This patch explains the occasion of "hisilcon,mdio" and "hisilicon,hns-mdio" according to Arnd's comments. and reformat it according to comments from Rob<robh@kernel.org>. Signed-off-by: huangdaode <huangdaode@hisilicon.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18	net ipv4: use preferred log methods	Bastian Stender	4	-59/+44
	Replace printk calls with preferred unconditional log method calls to keep kernel messages clean. Added newline to "too small MTU" message. Signed-off-by: Bastian Stender <bst@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	MAINTAINERS: Update Mellanox's Eth NIC driver entries	Or Gerlitz	1	-1/+9
	Eugenia (Jenny) Emantayev is replacing Amir Vadai as the mlx4 Ethernet driver maintainer. Saeed Mahameed is assigned to maintain mlx5 Eth functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	net/core: revert "net: fix __netdev_update_features return.." and add comment	Nikolay Aleksandrov	1	-1/+4
	This reverts commit 00ee59271777 ("net: fix __netdev_update_features return on ndo_set_features failure") and adds a comment explaining why it's okay to return a value other than 0 upon error. Some drivers might actually change flags and return an error so it's better to fire a spurious notification rather than miss these. CC: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	af_unix: take receive queue lock while appending new skb	Hannes Frederic Sowa	1	-1/+4
	While possibly in future we don't necessarily need to use sk_buff_head.lock this is a rather larger change, as it affects the af_unix fd garbage collector, diag and socket cleanups. This is too much for a stable patch. For the time being grab sk_buff_head.lock without disabling bh and irqs, so don't use locked skb_queue_tail. Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	rtnetlink: fix frame size warning in rtnl_fill_ifinfo	Hannes Frederic Sowa	1	-122/+152
	Fix the following warning: CC net/core/rtnetlink.o net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’: net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=] } ^ by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we don't have the huge frame allocations at the same time. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	net: use skb_clone to avoid alloc_pages failure.	Martin Zhang	1	-1/+1
	1. new skb only need dst and ip address(v4 or v6). 2. skb_copy may need high order pages, which is very rare on long running server. Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com> Signed-off-by: Martin Zhang <martinbj2008@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	packet: Use PAGE_ALIGNED macro	Tobias Klauser	1	-1/+1
	Use PAGE_ALIGNED(...) instead of open-coding it. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	packet: Don't check frames_per_block against negative values	Tobias Klauser	1	-2/+2
	rb->frames_per_block is an unsigned int, thus can never be negative. Also fix spacing in the calculation of frames_per_block. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	net: phy: Use interrupts when available in NOLINK state	Andrew Lunn	1	-0/+3
	The NOLINK state will poll the phy once a second to see if the link has come up. If the phy has an interrupt line, this polling can be skipped, since the phy should interrupt when the link returns. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	phy: marvell: Add support for 88E1540 PHY	Andrew Lunn	2	-0/+17
	The 88E1540 can be found embedded in the Marvell 88E6352 switch. It is compatible with the 88E1510, so add support for it, using the 88E1510 specific functions. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS	Yang Shi	1	-5/+39
	Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP in prologue in order to get the correct stack backtrace. However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to change during function call so it may cause the BPF prog stack base address change too. Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee saved register, so it will keep intact during function call. It is initialized in BPF prog prologue when BPF prog is started to run everytime. Save and restore x25/x26 in BPF prologue and epilogue to keep them intact for the outside of BPF. Actually, x26 is unnecessary, but SP requires 16 bytes alignment. So, the BPF stack layout looks like: high original A64_SP => 0:+-----+ BPF prologue \|FP/LR\| current A64_FP => -16:+-----+ \| ... \| callee saved registers +-----+ \| \| x25/x26 BPF fp register => -80:+-----+ \| \| \| ... \| BPF prog stack \| \| \| \| current A64_SP => +-----+ \| \| \| ... \| Function call stack \| \| +-----+ low CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	macvlan: fix leak in macvlan_handle_frame	Sabrina Dubroca	1	-0/+2
	Reset pskb in macvlan_handle_frame in case skb_share_check returned a clone. Fixes: 8a4eb5734e8d ("net: introduce rx_handler results and logic around that") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	ipvlan: fix use after free of skb	Sabrina Dubroca	1	-1/+1
	ipvlan_handle_frame is a rx_handler, and when it returns a value other than RX_HANDLER_CONSUMED (here, NET_RX_DROP aka RX_HANDLER_ANOTHER), __netif_receive_skb_core expects that the skb still exists and will process it further, but we just freed it. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	ipvlan: fix leak in ipvlan_rcv_frame	Sabrina Dubroca	1	-5/+7
	Pass a *skb to ipvlan_rcv_frame so that if skb_share_check returns a new skb, we actually use it during further processing. It's safe to ignore the new skb in the ipvlan_xmit_ functions, because they call ipvlan_rcv_frame with local == true, so that dev_forward_skb is called and always takes ownership of the skb. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	vlan: Do not put vlan headers back on bridge and macvlan ports	Vlad Yasevich	2	-1/+8
	When a vlan is configured with REORDER_HEADER set to 0, the vlan header is put back into the packet and makes it appear that the vlan header is still there even after it's been processed. This posses a problem for bridge and macvlan ports. The packets passed to those device may be forwarded and at the time of the forward, vlan headers end up being unexpectedly present. With the patch, we make sure that we do not put the vlan header back (when REORDER_HEADER is 0) if a bridge or macvlan has been configured on top of the vlan device. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17	vlan: Fix untag operations of stacked vlans with REORDER_HEADER off	Vlad Yasevich	1	-1/+2
	When we have multiple stacked vlan devices all of which have turned off REORDER_HEADER flag, the untag operation does not locate the ethernet addresses correctly for nested vlans. The reason is that in case of REORDER_HEADER flag being off, the outer vlan headers are put back and the mac_len is adjusted to account for the presense of the header. Then, the subsequent untag operation, for the next level vlan, always use VLAN_ETH_HLEN to locate the begining of the ethernet header and that ends up being a multiple of 4 bytes short of the actuall beginning of the mac header (the multiple depending on the how many vlan encapsulations ethere are). As a reslult, if there are multiple levles of vlan devices with REODER_HEADER being off, the recevied packets end up being dropped. To solve this, we use skb->mac_len as the offset. The value is always set on receive path and starts out as a ETH_HLEN. The value is also updated when the vlan header manupations occur so we know it will be correct. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>