linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-09-10	net/mlx5: Organize device list API in one place	Mohamad Haj Yahia	5	-296/+362
	Hide the exposed (external) mlx5_dev_list and mlx5_intf_mutex and expose an organized modular API to manage and manipulate mlx5 devices list. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5e: Restore vlan filter after seamless reset	Mohamad Haj Yahia	1	-6/+32
	When detaching the mlx5e interface clear all the vlans rules from the vlan flow table. When attaching it back restore all the active vlans rules to the HW. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5e: Implement mlx5e interface attach/detach callbacks	Mohamad Haj Yahia	3	-63/+183
	Needed to support seamless and lightweight PCI/Internal error recovery. Implement the attach/detach interface callbacks. In attach callback we only allocate HW resources. In detach callback we only deallocate HW resources. All SW/kernel objects initialzing/destroying is kept in add/remove callbacks. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: Implement vports admin state backup/restore	Mohamad Haj Yahia	2	-141/+124
	Save the user configuration in the vport sturct. Restore saved old configuration upon vport enable. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: Align sriov/eswitch modules with the new load/unload flow.	Mohamad Haj Yahia	3	-14/+34
	Init/cleanup sriov/eswitch in the core software context init/cleanup flows. Attach/detach sriov/eswitch in the core load/unload flows. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: Implement eswitch attach/detach flows	Mohamad Haj Yahia	2	-3/+23
	Needed for lightweight and modular internal/pci error handling. Implement eswitch attach function which allocates/starts hw related resources. Implement eswitch detach function which releases/stops hw related resources. Init/cleanup function only handle eswitch software context allocation and destruction. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: Implement SRIOV attach/detach flows	Mohamad Haj Yahia	2	-8/+23
	Needed for lightweight and modular internal/pci error handling. Implement sriov attach function which enables pre-saved number of vfs on the device side. Implement sriov detach function which disable the current vfs on the device side. Init/cleanup function only handles sriov software context allocation and destruction. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: Split the load/unload flow into hardware and software flows	Mohamad Haj Yahia	1	-64/+107
	Gather all software context creating/destroying in one function and call it once in the first load and in the last unload. load/unload functions will now receive indication if we need to create/destroy the software contexts. In internal/pci error do the unload/load flows without releasing the software objects. In this way we perserve the sw core state and it help us restoring old driver state after PCI error/shutdown. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: Introduce attach/detach to interface API	Mohamad Haj Yahia	2	-20/+131
	Add attach/detach callbacks to interface API. This is crucial for implementing seamless reset flow which releases the hardware and it's resources upon detach while keeping software structures and state (e.g netdev) then reset and reallocate the hardware needed resources upon attach. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: SRIOV core code refactoring	Mohamad Haj Yahia	4	-131/+101
	Simplify the code and makes it look modular and symmetric. Split sriov enable/disable to two levels: device level and pci level. When user enable/disable sriov (via sriov_configure driver callback) we will enable/disable both device and pci sriov. When driver load/unload we will enable/disable (on demand) only device sriov while keeping the PCI sriov enabled for next driver load. On internal/pci error, VFs will be kept enabled on PCI and the reset is done only in device level. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/mlx5: Skip waiting for vf pages in internal error	Mohamad Haj Yahia	1	-1/+12
	In case of device in internal error state there is no need to wait for vf pages since they will be reclaimed manually later in the unload flow. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	xfrm: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	1	-1/+1
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	sctp: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	1	-1/+1
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net: sched: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	1	-3/+3
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	l2tp: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	3	-5/+5
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	ipv4: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	1	-1/+1
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	1	-2/+1
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	lec: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	1	-6/+6
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	appletalk: use IS_ENABLED() instead of checking for built-in or module	Javier Martinez Canillas	1	-1/+1
	The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Using the macro makes the code more readable by helping abstract away some of the Kconfig built-in and module enable details. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net: fs_enet: make rx_copybreak value configurable	Christophe Leroy	1	-0/+40
	Measurement shows that on a MPC8xx running at 132MHz, the optimal limit is 112: * 114 bytes packets are processed in 147 TB ticks with higher copybreak * 114 bytes packets are processed in 148 TB ticks with lower copybreak * 128 bytes packets are processed in 154 TB ticks with higher copybreak * 128 bytes packets are processed in 148 TB ticks with lower copybreak * 238 bytes packets are processed in 172 TB ticks with higher copybreak * 238 bytes packets are processed in 148 TB ticks with lower copybreak However it might be different on other processors and/or frequencies. So it is useful to make it configurable. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net: fs_enet: don't unmap DMA when packet len is below copybreak	Christophe Leroy	1	-16/+20
	When the length of the packet is below the defined copybreak limit, the received packet is copied into a newly allocated skb in order to reuse the skb. This is only interesting if it allow us to avoid a new DMA mapping. We shall therefore not DMA unmap and remap the skb->data. Instead, we invalidate the cache with dma_sync_single_for_cpu() once the received data has been copied into the new skb. The following measures have been obtained on a mpc885 running at 132Mhz. Measurement is done using the timebase with packets sent to the target with 'ping -s 1' (packet len is 60): * Without this patch: 182 TB ticks * With this patch: 143 TB ticks As a comparison, if we set the copybreak limit to 0, then we get 148 TB ticks. It means that without this patch, duration is even worse when copying received data to a new skb instead of allocating a new skb for next packet to be received Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net: fs_enet: merge NAPI RX and NAPI TX	Christophe Leroy	5	-295/+160
	Initially, a NAPI TX routine has been implemented separately from NAPI RX, as done on the freescale/gianfar driver. By merging NAPI RX and NAPI TX, we reduce the amount of TX completion interrupts. Handling of the budget in association with TX interrupts is based on indications provided at https://wiki.linuxfoundation.org/networking/napi We never proceed more than the complete TX ring on a single run. At the same time, we fix an issue in the handling of fep->tx_free: It is only when fep->tx_free goes up to MAX_SKB_FRAGS that we need to wake up the queue. There is no need to call netif_wake_queue() at every packet successfully transmitted. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/sched: Introduce act_tunnel_key	Amir Vadai	5	-0/+434
	This action could be used before redirecting packets to a shared tunnel device, or when redirecting packets arriving from a such a device. The action will release the metadata created by the tunnel device (decap), or set the metadata with the specified values for encap operation. For example, the following flower filter will forward all ICMP packets destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before redirecting, a metadata for the vxlan tunnel is created using the tunnel_key action and it's arguments: $ tc filter add dev net0 protocol ip parent ffff: \ flower \ ip_proto 1 \ dst_ip 11.11.11.2 \ action tunnel_key set \ src_ip 11.11.0.1 \ dst_ip 11.11.0.2 \ id 11 \ action mirred egress redirect dev vxlan0 Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/sched: cls_flower: Classify packet in ip tunnels	Amir Vadai	2	-1/+110
	Introduce classifying by metadata extracted by the tunnel device. Outer header fields - source/dest ip and tunnel id, are extracted from the metadata when classifying. For example, the following will add a filter on the ingress Qdisc of shared vxlan device named 'vxlan0'. To forward packets with outer src ip 11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be forwarded to tap device 'vnet0' (after metadata is released): $ tc filter add dev vxlan0 protocol ip parent ffff: \ flower \ enc_src_ip 11.11.0.2 \ enc_dst_ip 11.11.0.1 \ enc_key_id 11 \ dst_ip 11.11.11.1 \ action tunnel_key release \ action mirred egress redirect dev vnet0 The action tunnel_key, will be introduced in the next patch in this series. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/dst: Utility functions to build dst_metadata without supplying an skb	Amir Vadai	1	-13/+39
	Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an skb. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()	Amir Vadai	4	-41/+23
	Add utility functions to convert a 32 bits key into a 64 bits tunnel and vice versa. These functions will be used instead of cloning code in GRE and VXLAN, and in tc act_iptunnel which will be introduced in a following patch in this patchset. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	ATM-iphase: Use kmalloc_array() in tx_init()	Markus Elfring	1	-4/+9
	* Multiplications for the size determination of memory allocations indicated that array data structures should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of data types by pointer dereferences to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	alx: add module parameter to enable msi-x support	Tobias Regnery	1	-1/+4
	msi-x support is default disabled in the alx driver. In order to test msi-x interrupts for regressions add a module parameter to the driver. Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	alx: add msi-x support	Tobias Regnery	4	-8/+184
	Add msi-x support to the alx driver. This is in preparation for multi queue support. msi-x interrupts are disabled by default because without multi queue support there is no advantage over msi interrupts. The performance numbers observed with iperf stay the same. Based on information of the downstream driver at github.com/qca/alx Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	alx: factor out part of the interrupt handler	Tobias Regnery	1	-14/+20
	Factor out the handling of misc interrupts into a new function. This function can be reused later for msi-x interrupts. Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	alx: refactor msi enablement and disablement	Tobias Regnery	2	-9/+26
	Introduce a new flag field for the advanced interrupt capatibilities and add new functions to enable and disable msi interrupts. These functions will be extended later to cover msi-x interrupts. We enable msi interrupts earlier in alx_init_intr because with msi-x and multi queue support the number of queues must be set before we allocate resources for the rx and tx paths. Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	qed: mark symbols static where possible	Baoyou Xie	4	-28/+35
	We get a few warnings when building kernel with W=1: drivers/net/ethernet/qlogic/qed/qed_l2.c:112:5: warning: no previous prototype for 'qed_sp_vport_start' [-Wmissing-prototypes] drivers/net/ethernet/qlogic/qed/qed_sriov.c:110:6: warning: no previous prototype for 'qed_iov_is_valid_vfid' [-Wmissing-prototypes] drivers/net/ethernet/qlogic/qed/qed_sriov.c:188:5: warning: no previous prototype for 'qed_iov_post_vf_bulletin' [-Wmissing-prototypes] drivers/net/ethernet/qlogic/qed/qed_sriov.c:578:6: warning: no previous prototype for 'qed_iov_set_vfs_to_disable' [-Wmissing-prototypes] drivers/net/ethernet/qlogic/qed/qed_sriov.c:1135:28: warning: no previous prototype for 'qed_iov_get_public_vf_info' [-Wmissing-prototypes] drivers/net/ethernet/qlogic/qed/qed_sriov.c:1148:6: warning: no previous prototype for 'qed_iov_clean_vf' [-Wmissing-prototypes] drivers/net/ethernet/qlogic/qed/qed_sriov.c:2444:5: warning: no previous prototype for 'qed_iov_chk_ucast' [-Wmissing-prototypes] drivers/net/ethernet/qlogic/qed/qed_sriov.c:2762:5: warning: no previous prototype for 'qed_iov_vf_flr_cleanup' [-Wmissing-prototypes] .... In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	bpf: add BPF_CALL_x macros for declaring helpers	Daniel Borkmann	6	-158/+149
	This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros that are used today. Motivation for this is to hide all the register handling and all necessary casts from the user, so that it is done automatically in the background when adding a BPF_CALL_<n>() call. This makes current helpers easier to review, eases to write future helpers, avoids getting the casting mess wrong, and allows for extending all helpers at once (f.e. build time checks, etc). It also helps detecting more easily in code reviews that unused registers are not instrumented in the code by accident, breaking compatibility with existing programs. BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some fundamental differences, for example, for generating the actual helper function that carries all u64 regs, we need to fill unused regs, so that we always end up with 5 u64 regs as an argument. I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and they look all as expected. No sparse issue spotted. We let this also sit for a few days with Fengguang's kbuild test robot, and there were no issues seen. On s390, it barked on the "uses dynamic stack allocation" notice, which is an old one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion to the call wrapper, just telling that the perf raw record/frag sits on stack (gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests and they were fine as well. All eBPF helpers are now converted to use these macros, getting rid of a good chunk of all the raw castings. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	bpf: add own ctx rewriter on ifindex for clsact progs	Daniel Borkmann	1	-6/+31
	When fetching ifindex, we don't need to test dev for being NULL since we're always guaranteed to have a valid dev for clsact programs. Thus, avoid this test in fast path. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros	Daniel Borkmann	3	-14/+27
	Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit which otherwise often result in overly long bytes_to_bpf_size(sizeof()) and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF()) check in convert_bpf_extensions(), but we should rather make that generic as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF() users to detect any rewriter size issues at compile time. Note, there are currently none, but we want to assert that it stays this way. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	bpf: minor cleanups in helpers	Daniel Borkmann	2	-7/+6
	Some minor misc cleanups, f.e. use sizeof(__u32) instead of hardcoding and in __bpf_skb_max_len(), I missed that we always have skb->dev valid anyway, so we can drop the unneeded test for dev; also few more other misc bits addressed here. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	ip_tunnel: do not clear l4 hashes	Eric Dumazet	1	-1/+1
	If skb has a valid l4 hash, there is no point clearing hash and force a further flow dissection when a tunnel encapsulation is added. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	ATM-ForeRunnerHE: Use kmalloc_array() in he_init_group()	Markus Elfring	1	-4/+6
	* Multiplications for the size determination of memory allocations indicated that array data structures should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of data types by pointer dereferences to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	ATM-ENI: Use kmalloc_array() in eni_start()	Markus Elfring	1	-2/+3
	* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of a data structure by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	via-velocity: remove null pointer check on array tdinfo->skb_dma	Colin Ian King	1	-12/+9
	tdinfo->skb_dma is a 7 element array of dma_addr_t hence cannot be null, so the pull pointer check on tdinfo->skb_dma is redundant. Remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	qede: mark qede_set_features() static	Baoyou Xie	1	-1/+1
	We get 1 warning when building kernel with W=1: drivers/net/ethernet/qlogic/qede/qede_main.c:2113:5: warning: no previous prototype for 'qede_set_features' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks this function with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	net: phy: Fixed checkpatch errors for Microsemi PHYs.	Raju Lakkaraju	2	-92/+92
	The existing VSC85xx PHY driver did not follow the coding style and caused "checkpatch" to complain. This commit fixes this. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	net: x25: remove null checks on arrays calling_ae and called_ae	Colin Ian King	1	-4/+0
	dtefacs.calling_ae and called_ae are both 20 element __u8 arrays and cannot be null and hence are redundant checks. Remove these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	macsec: set network devtype	stephen hemminger	1	-0/+1
	The netdevice type structure for macsec was being defined but never used. To set the network device type the macro SET_NETDEV_DEVTYPE must be called. Compile tested only, I don't use macsec. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	rtnetlink: remove unused ifla_stats_policy	stephen hemminger	1	-4/+0
	This structure is defined but never used. Flagged with W=1 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	ipv6: report NLM_F_CREATE and NLM_F_EXCL flags in RTM_NEWROUTE events	Guillaume Nault	1	-1/+5
	Since commit 37a1d3611c12 ("ipv6: include NLM_F_REPLACE in route replace notifications"), RTM_NEWROUTE notifications have their NLM_F_REPLACE flag set if the new route replaced a preexisting one. However, other flags aren't set. This patch reports the missing NLM_F_CREATE and NLM_F_EXCL flag bits. NLM_F_APPEND is not reported, because in ipv6 a NLM_F_CREATE request is interpreted as an append request (contrary to ipv4, "prepend" is not supported, so if NLM_F_EXCL is not set then NLM_F_APPEND is implicit). As a result, the possible flag combination can now be reported (iproute2's terminology into parentheses): * NLM_F_CREATE \| NLM_F_EXCL: route didn't exist, exclusive creation ("add"). * NLM_F_CREATE: route did already exist, new route added after preexisting ones ("append"). * NLM_F_REPLACE: route did already exist, new route replaced the first preexisting one ("change"). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	ipv4: fix value of ->nlmsg_flags reported in RTM_NEWROUTE events	Guillaume Nault	1	-3/+7
	fib_table_insert() inconsistently fills the nlmsg_flags field in its notification messages. Since commit b8f558313506 ("[RTNETLINK]: Fix sending netlink message when replace route."), the netlink message has its nlmsg_flags set to NLM_F_REPLACE if the route replaced a preexisting one. Then commit a2bb6d7d6f42 ("ipv4: include NLM_F_APPEND flag in append route notifications") started setting nlmsg_flags to NLM_F_APPEND if the route matched a preexisting one but was appended. In other cases (exclusive creation or prepend), nlmsg_flags is 0. This patch sets ->nlmsg_flags in all situations, preserving the semantic of the NLM_F_* bits: * NLM_F_CREATE: a new fib entry has been created for this route. * NLM_F_EXCL: no other fib entry existed for this route. * NLM_F_REPLACE: this route has overwritten a preexisting fib entry. * NLM_F_APPEND: the new fib entry was added after other entries for the same route. As a result, the possible flag combination can now be reported (iproute2's terminology into parentheses): * NLM_F_CREATE \| NLM_F_EXCL: route didn't exist, exclusive creation ("add"). * NLM_F_CREATE \| NLM_F_APPEND: route did already exist, new route added after preexisting ones ("append"). * NLM_F_CREATE: route did already exist, new route added before preexisting ones ("prepend"). * NLM_F_REPLACE: route did already exist, new route replaced the first preexisting one ("change"). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08	ipv4: accept u8 in IP_TOS ancillary data	Eric Dumazet	1	-2/+5
	In commit f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data") Francesco added IP_TOS values specified as integer. However, kernel sends to userspace (at recvmsg() time) an IP_TOS value in a single byte, when IP_RECVTOS is set on the socket. It can be very useful to reflect all ancillary options as given by the kernel in a subsequent sendmsg(), instead of aborting the sendmsg() with EINVAL after Francesco patch. So this patch extends IP_TOS ancillary to accept an u8, so that an UDP server can simply reuse same ancillary block without having to mangle it. Jesper can then augment https://github.com/netoptimizer/network-testing/blob/master/src/udp_example02.c to add TOS reflection ;) Fixes: f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Francesco Fusco <ffusco@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08	bpf: fix range propagation on direct packet access	Daniel Borkmann	2	-15/+142
	LLVM can generate code that tests for direct packet access via skb->data/data_end in a way that currently gets rejected by the verifier, example: [...] 7: (61) r3 = (u32 )(r6 +80) 8: (61) r9 = (u32 )(r6 +76) 9: (bf) r2 = r9 10: (07) r2 += 54 11: (3d) if r3 >= r2 goto pc+12 R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 12: (18) r4 = 0xffffff7a 14: (05) goto pc+430 [...] from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 24: (7b) (u64 )(r10 -40) = r1 25: (b7) r1 = 0 26: (63) (u32 )(r6 +56) = r1 27: (b7) r2 = 40 28: (71) r8 = (u8 )(r9 +20) invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0) The reason why this gets rejected despite a proper test is that we currently call find_good_pkt_pointers() only in case where we detect tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and derived, for example, from a register of type pkt(id=Y,off=0,r=0) pointing to skb->data. find_good_pkt_pointers() then fills the range in the current branch to pkt(id=Y,off=0,r=Z) on success. For above case, we need to extend that to recognize pkt_end >= rX pattern and mark the other branch that is taken on success with the appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers(). Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the only two practical options to test for from what LLVM could have generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=) that we would need to take into account as well. After the fix: [...] 7: (61) r3 = (u32 )(r6 +80) 8: (61) r9 = (u32 )(r6 +76) 9: (bf) r2 = r9 10: (07) r2 += 54 11: (3d) if r3 >= r2 goto pc+12 R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 12: (18) r4 = 0xffffff7a 14: (05) goto pc+430 [...] from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp 24: (7b) (u64 )(r10 -40) = r1 25: (b7) r1 = 0 26: (63) (u32 )(r6 +56) = r1 27: (b7) r2 = 40 28: (71) r8 = (u8 )(r9 +20) 29: (bf) r1 = r8 30: (25) if r8 > 0x3c goto pc+47 R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56 R9=pkt(id=0,off=0,r=54) R10=fp 31: (b7) r1 = 1 [...] Verifier test cases are also added in this work, one that demonstrates the mentioned example here and one that tries a bad packet access for the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0), pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two with both test variants (>, >=). Fixes: 969bf05eb3ce ("bpf: direct packet access") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08	tcp: use an RB tree for ooo receive queue	Yaogong Wang	8	-149/+218
	Over the years, TCP BDP has increased by several orders of magnitude, and some people are considering to reach the 2 Gbytes limit. Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000 MSS. In presence of packet losses (or reorders), TCP stores incoming packets into an out of order queue, and number of skbs sitting there waiting for the missing packets to be received can be in the 10^5 range. Most packets are appended to the tail of this queue, and when packets can finally be transferred to receive queue, we scan the queue from its head. However, in presence of heavy losses, we might have to find an arbitrary point in this queue, involving a linear scan for every incoming packet, throwing away cpu caches. This patch converts it to a RB tree, to get bounded latencies. Yaogong wrote a preliminary patch about 2 years ago. Eric did the rebase, added ofo_last_skb cache, polishing and tests. Tested with network dropping between 1 and 10 % packets, with good success (about 30 % increase of throughput in stress tests) Next step would be to also use an RB tree for the write queue at sender side ;) Signed-off-by: Yaogong Wang <wygivan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>