linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-12-03	openvswitch: fix hangup on vxlan/gre/geneve device deletion	Paolo Abeni	2	-3/+7
	Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference to the underlying tunnel device, but never released it when such device is deleted. Deleting the underlying device via the ip tool cause the kernel to hangup in the netdev_wait_allrefs() loop. This commit ensure that on device unregistration dp_detach_port_notify() is called for all vports that hold the device reference, properly releasing it. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv4: igmp: Allow removing groups from a removed interface	Andrew Lunn	1	-2/+3
	When a multicast group is joined on a socket, a struct ip_mc_socklist is appended to the sockets mc_list containing information about the joined group. If the interface is hot unplugged, this entry becomes stale. Prior to commit 52ad353a5344f ("igmp: fix the problem when mc leave group") it was possible to remove the stale entry by performing a IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on the interface. However, this fix enforces that the interface must still exist. Thus with time, the number of stale entries grows, until sysctl_igmp_max_memberships is reached and then it is not possible to join and more groups. The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is performed without specifying the interface, either by ifindex or ip address. However here we do supply one of these. So loosen the restriction on device existence to only apply when the interface has not been specified. This then restores the ability to clean up the stale entries. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 52ad353a5344f "(igmp: fix the problem when mc leave group") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv6: sctp: implement sctp_v6_destroy_sock()	Eric Dumazet	1	-1/+8
	Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets. We need to call inet6_destroy_sock() to properly release inet6 specific fields. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	arm64: bpf: add 'store immediate' instruction	Yang Shi	1	-1/+19
	aarch64 doesn't have native store immediate instruction, such operation has to be implemented by the below instruction sequence: Load immediate to register Store register Signed-off-by: Yang Shi <yang.shi@linaro.org> CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv6: kill sk_dst_lock	Eric Dumazet	8	-46/+12
	While testing the np->opt RCU conversion, I found that UDP/IPv6 was using a mixture of xchg() and sk_dst_lock to protect concurrent changes to sk->sk_dst_cache, leading to possible corruptions and crashes. ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest way to fix the mess is to remove sk_dst_lock completely, as we did for IPv4. __ip6_dst_store() and ip6_dst_store() share same implementation. sk_setup_caps() being called with socket lock being held or not, we have to use sk_dst_set() instead of __sk_dst_set() Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);" in ip6_dst_store() before the sk_setup_caps(sk, dst) call. This is because ip6_dst_store() can be called from process context, without any lock held. As soon as the dst is installed in sk->sk_dst_cache, dst can be freed from another cpu doing a concurrent ip6_dst_store() Doing the dst dereference before doing the install is needed to make sure no use after free would trigger. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv6: sctp: add rcu protection around np->opt	Eric Dumazet	1	-3/+10
	This patch completes the work I did in commit 45f6fad84cc3 ("ipv6: add complete rcu protection around np->opt"), as I missed sctp part. This simply makes sure np->opt is used with proper RCU locking and accessors. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	net/neighbour: fix crash at dumping device-agnostic proxy entries	Konstantin Khlebnikov	1	-2/+2
	Proxy entries could have null pointer to net-device. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	sctp: use GFP_USER for user-controlled kmalloc	Marcelo Ricardo Leitner	1	-2/+2
	Dmitry Vyukov reported that the user could trigger a kernel warning by using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that value directly affects the value used as a kmalloc() parameter. This patch thus switches the allocation flags from all user-controllable kmalloc size to GFP_USER to put some more restrictions on it and also disables the warn, as they are not necessary. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	sctp: convert sack_needed and sack_generation to bits	Marcelo Ricardo Leitner	1	-8/+8
	They don't need to be any bigger than that and with this we start a new bitfield for tracking association runtime stuff, like zero window situation. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	ipv6: add complete rcu protection around np->opt	Eric Dumazet	13	-52/+122
	This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	bpf: fix allocation warnings in bpf maps and integer overflow	Alexei Starovoitov	3	-12/+34
	For large map->value_size the user space can trigger memory allocation warnings like: WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989 __alloc_pages_nodemask+0x695/0x14e0() Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff82743b56>] dump_stack+0x68/0x92 lib/dump_stack.c:50 [<ffffffff81244ec9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460 [<ffffffff812450f9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493 [< inline >] __alloc_pages_slowpath mm/page_alloc.c:2989 [<ffffffff81554e95>] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235 [<ffffffff816188fe>] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055 [< inline >] alloc_pages include/linux/gfp.h:451 [<ffffffff81550706>] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414 [<ffffffff815a1c89>] kmalloc_order+0x19/0x60 mm/slab_common.c:1007 [<ffffffff815a1cef>] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018 [< inline >] kmalloc_large include/linux/slab.h:390 [<ffffffff81627784>] __kmalloc+0x234/0x250 mm/slub.c:3525 [< inline >] kmalloc include/linux/slab.h:463 [< inline >] map_update_elem kernel/bpf/syscall.c:288 [< inline >] SYSC_bpf kernel/bpf/syscall.c:744 To avoid never succeeding kmalloc with order >= MAX_ORDER check that elem->value_size and computed elem_size are within limits for both hash and array type maps. Also add __GFP_NOWARN to kmalloc(value_size \| elem_size) to avoid OOM warnings. Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512, so keep those kmalloc-s as-is. Large value_size can cause integer overflows in elem_size and map.pages formulas, so check for that as well. Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0	Marcin Wojtas	1	-0/+1
	The Ethernet controller found in the Armada 38x SoC's family support TCP/IP checksumming with frame sizes larger than 1600 bytes, however only on port 0. This commit enables it by setting 'tx-csum-limit' to 9800B in 'ethernet@70000' node. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: enable setting custom TX IP checksum limit	Marcin Wojtas	2	-2/+23
	Since Armada 38x SoC can support IP checksum for jumbo frames only on a single port, it means that this feature should be enabled per-port, rather than for the whole SoC. This patch enables setting custom TX IP checksum limit by adding new optional property to the mvneta device tree node. If not used, by default 1600B is set for "marvell,armada-370-neta" and 9800B for other strings, which ensures backward compatibility. Binding documentation is updated accordingly. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: fix error path for building skb	Marcin Wojtas	1	-2/+6
	In the actual RX processing, there is same error path for both descriptor ring refilling and building skb fails. This is not correct, because after successful refill, the ring is already updated with newly allocated buffer. Then, in case of build_skb() fail, hitherto code left the original buffer unmapped. This patch fixes above situation by swapping error check of skb build with DMA-unmap of original buffer. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Acked-by: Simon Guinot <simon.guinot@sequanux.org> Cc: <stable@vger.kernel.org> # v4.2+ Fixes a84e32894191 ("net: mvneta: fix refilling for Rx DMA buffers") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: fix bit assignment for RX packet irq enable	Marcin Wojtas	1	-1/+1
	A value originally defined in the driver was inappropriate. Even though the ingress was somehow working, writing MVNETA_RXQ_INTR_ENABLE_ALL_MASK to MVNETA_INTR_ENABLE didn't make any effect, because the bits [31:16] are reserved and read-only. This commit updates MVNETA_RXQ_INTR_ENABLE_ALL_MASK to be compliant with the controller's documentation. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: fix bit assignment in MVNETA_RXQ_CONFIG_REG	Marcin Wojtas	1	-1/+1
	MVNETA_RXQ_HW_BUF_ALLOC bit which controls enabling hardware buffer allocation was mistakenly set as BIT(1). This commit fixes the assignment. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: add configuration for MBUS windows access protection	Marcin Wojtas	1	-0/+2
	This commit adds missing configuration of MBUS windows access protection in mvneta_conf_mbus_windows function - a dedicated variable for that purpose remained there unused since v3.8 initial mvneta support. Because of that the register contents were inherited from the bootloader. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Enable BGX LMAC's RX/TX only after VF is up	Sunil Goutham	3	-3/+38
	Enable or disable BGX LMAC's RX/TX based on corresponding VF's status. If otherwise, when multiple LMAC's physical link is up then packets from all LMAC's whose corresponding VF is not yet initialized will get forwarded to VF0. This is due to VNIC's default configuration where CPI, RSSI e.t.c point to VF0/QSET0/RQ0. This patch will prevent multiple copies of packets on VF0. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Switchon carrier only upon interface link up	Sunil Goutham	3	-4/+18
	Call netif_carrier_on() only if interface's link is up. Switching this on upon IFF_UP by default, is causing issues with ethernet channel bonding in LACP mode. Initial NETDEV_CHANGE notification was being skipped. Also fixed some issues with link/speed/duplex reporting via ethtool. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Set CQ timer threshold properly	Sunil Goutham	3	-5/+4
	Properly set CQ timer threshold and also set it to 2us. With previous incorrect settings it was set to 0.5us which is too less. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Wait for delayed work to finish before destroying it	Thanneeru Srinivasulu	2	-4/+2
	While VNIC or BGX driver teardown, wait for already scheduled delayed work to finish before destroying it. Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Force to load octeon-mdio before bgx driver.	Thanneeru Srinivasulu	2	-0/+4
	Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	openvswitch: properly refcount vport-vxlan module	Paolo Abeni	4	-5/+9
	After 614732eaa12d, no refcount is maintained for the vport-vxlan module. This allows the userspace to remove such module while vport-vxlan devices still exist, which leads to later oops. v1 -> v2: - move vport 'owner' initialization in ovs_vport_ops_register() and make such function a macro Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	bpf, array: fix heap out-of-bounds access when updating elements	Daniel Borkmann	1	-1/+1
	During own review but also reported by Dmitry's syzkaller [1] it has been noticed that we trigger a heap out-of-bounds access on eBPF array maps when updating elements. This happens with each map whose map->value_size (specified during map creation time) is not multiple of 8 bytes. In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and used to align array map slots for faster access. However, in function array_map_update_elem(), we update the element as ... memcpy(array->value + array->elem_size * index, value, array->elem_size); ... where we access 'value' out-of-bounds, since it was allocated from map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER) and later on copied through copy_from_user(value, uvalue, map->value_size). Thus, up to 7 bytes, we can access out-of-bounds. Same could happen from within an eBPF program, where in worst case we access beyond an eBPF program's designated stack. Since 1be7f75d1668 ("bpf: enable non-root eBPF programs") didn't hit an official release yet, it only affects priviledged users. In case of array_map_lookup_elem(), the verifier prevents eBPF programs from accessing beyond map->value_size through check_map_access(). Also from syscall side map_lookup_elem() only copies map->value_size back to user, so nothing could leak. [1] http://github.com/google/syzkaller Fixes: 28fbcfa08d8e ("bpf: add array type of eBPF maps") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	net: fix sock_wake_async() rcu protection	Eric Dumazet	5	-33/+44
	Dmitry provided a syzkaller (http://github.com/google/syzkaller) triggering a fault in sock_wake_async() when async IO is requested. Said program stressed af_unix sockets, but the issue is generic and should be addressed in core networking stack. The problem is that by the time sock_wake_async() is called, we should not access the @flags field of 'struct socket', as the inode containing this socket might be freed without further notice, and without RCU grace period. We already maintain an RCU protected structure, "struct socket_wq" so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it is the safe route. It also reduces number of cache lines needing dirtying, so might provide a performance improvement anyway. In followup patches, we might move remaining flags (SOCK_NOSPACE, SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket' being mostly read and let it being shared between cpus. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA	Eric Dumazet	22	-52/+60
	This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock sk) and sk_clear_bit(int net, struct sock sk) are added so that following patch can change their implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	vmxnet3: fix checks for dma mapping errors	Alexey Khoroshilov	1	-11/+60
	vmxnet3_drv does not check dma_addr with dma_mapping_error() after mapping dma memory. The patch adds the checks and tries to handle failures. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	wan/x25: Fix use-after-free in x25_asy_open_tty()	Peter Hurley	1	-5/+1
	The N_X25 line discipline may access the previous line discipline's closed and already-freed private data on open [1]. The tty->disc_data field _never_ refers to valid data on entry to the line discipline's open() method. Rather, the ldisc is expected to initialize that field for its own use for the lifetime of the instance (ie. from open() to close() only). [1] [ 634.336761] ================================================================== [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] ============================================================================= [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"	Nicolas Dichtel	4	-11/+6
	This reverts commit ab450605b35caa768ca33e86db9403229bf42be4. In IPv6, we cannot inherit the dst of the original dst. ndisc packets are IPv6 packets and may take another route than the original packet. This patch breaks the following scenario: a packet comes from eth0 and is forwarded through vxlan1. The encapsulated packet triggers an NS which cannot be sent because of the wrong route. CC: Jiri Benc <jbenc@redhat.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	tcp: initialize tp->copied_seq in case of cross SYN connection	Eric Dumazet	1	-0/+1
	Dmitry provided a syzkaller (http://github.com/google/syzkaller) generated program that triggers the WARNING at net/ipv4/tcp.c:1729 in tcp_recvmsg() : WARN_ON(tp->copied_seq != tp->rcv_nxt && !(flags & (MSG_PEEK \| MSG_TRUNC))); His program is specifically attempting a Cross SYN TCP exchange, that we support (for the pleasure of hackers ?), but it looks we lack proper tcp->copied_seq initialization. Thanks again Dmitry for your report and testings. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	net: fsl: Fix error checking for platform_get_irq()	Mark Brown	1	-1/+1
	The gianfar driver has recently been enabled on arm64 but fails to build since it check the return value of platform_get_irq() against NO_IRQ. Fix this by instead checking for a negative error code. Even on ARM where this code was previously being built this check was incorrect since platform_get_irq() returns a negative error code which may not be exactly the (unsigned int)(-1) that NO_IRQ is defined to be. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	net: fsl: Don't use NO_IRQ to check return value of irq_of_parse_and_map()	Mark Brown	1	-3/+3
	This driver can be built on arm64 but relies on NO_IRQ to check the return value of irq_of_parse_and_map() which fails to build on arm64 because the architecture does not provide a NO_IRQ. Fix this to correctly check the return value of irq_of_parse_and_map(). Even on ARM systems where the driver was previously used the check was broken since on ARM NO_IRQ is -1 but irq_of_parse_and_map() returns 0 on error. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	af-unix: passcred support for sendpage	Hannes Frederic Sowa	1	-15/+64
	sendpage did not care about credentials at all. This could lead to situations in which because of fd passing between processes we could append data to skbs with different scm data. It is illegal to splice those skbs together. Instead we have to allocate a new skb and if requested fill out the scm details. Fixes: 869e7c62486ec ("net: af_unix: implement stream sendpage support") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	stmmac: fix oversized frame reception	Giuseppe CAVALLARO	1	-0/+6
	The receive skb buffers can be preallocated when the link is opened according to mtu size. While testing on a network environment with not standard MTU (e.g. 3000), a panic occurred if an incoming packet had a length greater than rx skb buffer size. This is because the HW is programmed to copy, from the DMA, an Jumbo frame and the Sw must check if the allocated buffer is enough to store the frame. Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	stmmac: fix PHY reset during resume	Giuseppe CAVALLARO	1	-15/+13
	When stmmac_mdio_reset, was called from stmmac_resume, it was not resetting the PHY due to which MAC was not getting reset properly and hence ethernet interface not was resumed properly. The issue was currently only reproducible on stih301-b2204. Signed-off-by: Pankaj Dev <pankaj.dev@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	stmmac: dwmac-sti: fix st,tx-retime-src check	Giuseppe CAVALLARO	1	-6/+7
	In case of the st,tx-retime-src is missing from device-tree (it's an optional field) the driver will invoke the strcasecmp to check which clock has been selected and this is a bug; the else condition is needed. In the dwmac_setup, the "rs" variable, passed to the strcasecmp, was not initialized and the compiler, depending on the options adopted, could take it in some different part of the stack generating the hang in such configuration. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	stmmac: fix csr clock divisor for 300MHz	Giuseppe CAVALLARO	1	-1/+1
	This patch is to fix the csr clock in case of 300MHz is provided. Reported-by: Kent Borg <Kent.Borg@csr.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30	stmmac: fix a filter problem after resuming.	Giuseppe CAVALLARO	1	-0/+1
	When resume the HW is re-configured but some settings can be lost. For example, the MAC Address_X High/Low Registers used for VLAN tagging.. So, while resuming, the set_filter callback needs to be invoked to re-program perfect and hash-table registers. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-29	drivers: net: xgene: fix possible use after free	Eric Dumazet	1	-1/+1
	Once TX has been enabled on a NIC, it is illegal to access skb, as this skb might have been freed by another cpu, from TX completion handler. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: Iyappan Subramanian <isubramanian@apm.com> Acked-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-29	packet: Allow packets with only a header (but no payload)	Martin Blumenstingl	2	-3/+4
	Commit 9c7077622dd91 ("packet: make packet_snd fail on len smaller than l2 header") added validation for the packet size in packet_snd. This change enforces that every packet needs a header (with at least hard_header_len bytes) plus a payload with at least one byte. Before this change the payload was optional. This fixes PPPoE connections which do not have a "Service" or "Host-Uniq" configured (which is violating the spec, but is still widely used in real-world setups). Those are currently failing with the following message: "pppd: packet size is too short (24 <= 24)" Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-25	bpf: fix clearing on persistent program array maps	Daniel Borkmann	4	-17/+33
	Currently, when having map file descriptors pointing to program arrays, there's still the issue that we unconditionally flush program array contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens when such a file descriptor is released and is independent of the map's refcount. Having this flush independent of the refcount is for a reason: there can be arbitrary complex dependency chains among tail calls, also circular ones (direct or indirect, nesting limit determined during runtime), and we need to make sure that the map drops all references to eBPF programs it holds, so that the map's refcount can eventually drop to zero and initiate its freeing. Btw, a walk of the whole dependency graph would not be possible for various reasons, one being complexity and another one inconsistency, i.e. new programs can be added to parts of the graph at any time, so there's no guaranteed consistent state for the time of such a walk. Now, the program array pinning itself works, but the issue is that each derived file descriptor on close would nevertheless call unconditionally into bpf_fd_array_map_clear(). Instead, keep track of users and postpone this flush until the last reference to a user is dropped. As this only concerns a subset of references (f.e. a prog array could hold a program that itself has reference on the prog array holding it, etc), we need to track them separately. Short analysis on the refcounting: on map creation time usercnt will be one, so there's no change in behaviour for bpf_map_release(), if unpinned. If we already fail in map_create(), we are immediately freed, and no file descriptor has been made public yet. In bpf_obj_pin_user(), we need to probe for a possible map in bpf_fd_probe_obj() already with a usercnt reference, so before we drop the reference on the fd with fdput(). Therefore, if actual pinning fails, we need to drop that reference again in bpf_any_put(), otherwise we keep holding it. When last reference drops on the inode, the bpf_any_put() in bpf_evict_inode() will take care of dropping the usercnt again. In the bpf_obj_get_user() case, the bpf_any_get() will grab a reference on the usercnt, still at a time when we have the reference on the path. Should we later on fail to grab a new file descriptor, bpf_any_put() will drop it, otherwise we hold it until bpf_map_release() time. Joint work with Alexei. Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-25	isdn: Partially revert debug format string usage clean up	Christoph Biedl	4	-6/+6
	Commit 35a4a57 ("isdn: clean up debug format string usage") introduced a safeguard to avoid accidential format string interpolation of data when calling debugl1 or HiSax_putstatus. This did however not take into account VHiSax_putstatus (called by HiSax_putstatus) does not call vsprintf if the head parameter is NULL - the format string is treated as plain text then instead. As a result, the string "%s" is processed literally, and the actual information is lost. This affects the isdnlog userspace program which stopped logging information since that commit. So revert the HiSax_putstatus invocations to the previous state. Fixes: 35a4a5733b0a ("isdn: clean up debug format string usage") Cc: Kees Cook <keescook@chromium.org> Cc: Karsten Keil <isdn@linux-pingi.de> Signed-off-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	RDS: fix race condition when sending a message on unbound socket	Quentin Casasnovas	2	-7/+3
	Sasha's found a NULL pointer dereference in the RDS connection code when sending a message to an apparently unbound socket. The problem is caused by the code checking if the socket is bound in rds_sendmsg(), which checks the rs_bound_addr field without taking a lock on the socket. This opens a race where rs_bound_addr is temporarily set but where the transport is not in rds_bind(), leading to a NULL pointer dereference when trying to dereference 'trans' in __rds_conn_create(). Vegard wrote a reproducer for this issue, so kindly ask him to share if you're interested. I cannot reproduce the NULL pointer dereference using Vegard's reproducer with this patch, whereas I could without. Complete earlier incomplete fix to CVE-2015-6937: 74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection") Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	net: openvswitch: Remove invalid comment	Aaron Conole	1	-2/+2
	During pre-upstream development, the openvswitch datapath used a custom hashtable to store vports that could fail on delete due to lack of memory. However, prior to upstream submission, this code was reworked to use an hlist based hastable with flexible-array based buckets. As such the failure condition was eliminated from the vport_del path, rendering this comment invalid. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	net: ipmr, ip6mr: fix vif/tunnel failure race condition	Nikolay Aleksandrov	2	-12/+0
	Since (at least) commit b17a7c179dd3 ("[NET]: Do sysfs registration as part of register_netdevice."), netdev_run_todo() deals only with unregistration, so we don't need to do the rtnl_unlock/lock cycle to finish registration when failing pimreg or dvmrp device creation. In fact that opens a race condition where someone can delete the device while rtnl is unlocked because it's fully registered. The problem gets worse when netlink support is introduced as there are more points of entry that can cause it and it also makes reusing that code correctly impossible. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	rxrpc: Correctly handle ack at end of client call transmit phase	David Howells	1	-1/+3
	Normally, the transmit phase of a client call is implicitly ack'd by the reception of the first data packet of the response being received. However, if a security negotiation happens, the transmit phase, if it is entirely contained in a single packet, may get an ack packet in response and then may get aborted due to security negotiation failure. Because the client has shifted state to RXRPC_CALL_CLIENT_AWAIT_REPLY due to having transmitted all the data, the code that handles processing of the received ack packet doesn't note the hard ack the data packet. The following abort packet in the case of security negotiation failure then incurs an assertion failure when it tries to drain the Tx queue because the hard ack state is out of sync (hard ack means the packets have been processed and can be discarded by the sender; a soft ack means that the packets are received but could still be discarded and rerequested by the receiver). To fix this, we should record the hard ack we received for the ack packet. The assertion failure looks like: RxRPC: Assertion failed 1 <= 0 is false 0x1 <= 0x0 is false ------------[ cut here ]------------ kernel BUG at ../net/rxrpc/ar-ack.c:431! ... RIP: 0010:[<ffffffffa006857b>] [<ffffffffa006857b>] rxrpc_rotate_tx_window+0xbc/0x131 [af_rxrpc] ... Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	ipv6: distinguish frag queues by device for multicast and link-local packets	Michal Kubeček	3	-5/+11
	If a fragmented multicast packet is received on an ethernet device which has an active macvlan on top of it, each fragment is duplicated and received both on the underlying device and the macvlan. If some fragments for macvlan are processed before the whole packet for the underlying device is reassembled, the "overlapping fragments" test in ip6_frag_queue() discards the whole fragment queue. To resolve this, add device ifindex to the search key and require it to match reassembling multicast packets and packets to link-local addresses. Note: similar patch has been already submitted by Yoshifuji Hideaki in http://patchwork.ozlabs.org/patch/220979/ but got lost and forgotten for some reason. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	drivers: net: xgene: fix: ifconfig up/down crash	Iyappan Subramanian	1	-13/+16
	Fixing kernel crash when doing ifconfig down and up in a loop, [ 124.028237] Call trace: [ 124.030670] [<ffffffc000367ce0>] memcpy+0x20/0x180 [ 124.035436] [<ffffffc00053c250>] skb_clone+0x3c/0xa8 [ 124.040374] [<ffffffc00053ffa4>] __skb_tstamp_tx+0xc0/0x118 [ 124.045918] [<ffffffc00054000c>] skb_tstamp_tx+0x10/0x1c [ 124.051203] [<ffffffc00049bc84>] xgene_enet_start_xmit+0x2e4/0x33c [ 124.057352] [<ffffffc00054fc20>] dev_hard_start_xmit+0x2e8/0x400 [ 124.063327] [<ffffffc00056cb14>] sch_direct_xmit+0x90/0x1d4 [ 124.068870] [<ffffffc000550100>] __dev_queue_xmit+0x28c/0x498 [ 124.074585] [<ffffffc00055031c>] dev_queue_xmit_sk+0x10/0x1c [ 124.080216] [<ffffffc0005c3f14>] ip_finish_output2+0x3d0/0x438 [ 124.086017] [<ffffffc0005c5794>] ip_finish_output+0x198/0x1ac [ 124.091732] [<ffffffc0005c61d4>] ip_output+0xec/0x164 [ 124.096755] [<ffffffc0005c5910>] ip_local_out_sk+0x38/0x48 [ 124.102211] [<ffffffc0005c5d84>] ip_queue_xmit+0x288/0x330 [ 124.107668] [<ffffffc0005da8bc>] tcp_transmit_skb+0x908/0x964 [ 124.113383] [<ffffffc0005dc0d4>] tcp_send_ack+0x128/0x138 [ 124.118753] [<ffffffc0005d1580>] __tcp_ack_snd_check+0x5c/0x94 [ 124.124555] [<ffffffc0005d7a0c>] tcp_rcv_established+0x554/0x68c [ 124.130530] [<ffffffc0005df0d4>] tcp_v4_do_rcv+0xa4/0x37c [ 124.135900] [<ffffffc000539430>] release_sock+0xb4/0x150 [ 124.141184] [<ffffffc0005cdf88>] tcp_recvmsg+0x448/0x9e0 [ 124.146468] [<ffffffc0005f2f3c>] inet_recvmsg+0xa0/0xc0 [ 124.151666] [<ffffffc000533660>] sock_recvmsg+0x10/0x1c [ 124.156863] [<ffffffc0005370d4>] SyS_recvfrom+0xa4/0xf8 [ 124.162061] Code: f2400c84 540001c0 cb040042 36000064 (38401423) [ 124.168133] ---[ end trace 7ab2550372e8a65b ]--- The fix was to reorder napi_enable, napi_disable, request_irq and free_irq calls, move register_netdev after dma_coerce_mask_and_coherent. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Khuong Dinh <kdinh@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	net: cdc_ncm: fix NULL pointer deref in cdc_ncm_bind_common	Bjørn Mork	1	-4/+4
	Commit 77b0a099674a ("cdc-ncm: use common parser") added a dangerous new trust in the CDC functional descriptors presented by the device, unconditionally assuming that any device handled by the driver has a CDC Union descriptor. This descriptor is required by the NCM and MBIM specs, but crashing on non-compliant devices is still unacceptable. Not only will that allow malicious devices to crash the kernel, but in this case it is also well known that there are non-compliant real devices on the market - as shown by the comment accompanying the IAD workaround in the same function. The Sierra Wireless EM7305 is an example of such device, having a CDC header and a CDC MBIM descriptor but no CDC Union: Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 12 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 2 Communications bInterfaceSubClass 14 bInterfaceProtocol 0 iInterface 0 CDC Header: bcdCDC 1.10 CDC MBIM: bcdMBIMVersion 1.00 wMaxControlMessage 4096 bNumberFilters 16 bMaxFilterSize 128 wMaxSegmentSize 4064 bmNetworkCapabilities 0x20 8-byte ntb input size Endpoint Descriptor: .. The conversion to a common parser also left the local cdc_union variable untouched. This caused the IAD workaround code to be applied to all devices with an IAD descriptor, which was never intended. Finish the conversion by testing for hdr.usb_cdc_union_desc instead. Cc: Oliver Neukum <oneukum@suse.com> Fixes: 77b0a099674a ("cdc-ncm: use common parser") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24	tipc: fix error handling of expanding buffer headroom	Ying Xue	1	-2/+5
	Coverity says: *** CID 1338065: Error handling issues (CHECKED_RETURN) /net/tipc/udp_media.c: 162 in tipc_udp_send_msg() 156 struct udp_media_addr dst = (struct udp_media_addr )&dest->value; 157 struct udp_media_addr src = (struct udp_media_addr )&b->addr.value; 158 struct sk_buff clone; 159 struct rtable rt; 160 161 if (skb_headroom(skb) < UDP_MIN_HEADROOM) >>> CID 1338065: Error handling issues (CHECKED_RETURN) >>> Calling "pskb_expand_head" without checking return value (as is done elsewhere 51 out of 56 times). 162 pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC); 163 164 clone = skb_clone(skb, GFP_ATOMIC); 165 skb_set_inner_protocol(clone, htons(ETH_P_TIPC)); 166 ub = rcu_dereference_rtnl(b->media_ptr); 167 if (!ub) { When expanding buffer headroom over udp tunnel with pskb_expand_head(), it's unfortunate that we don't check its return value. As a result, if the function returns an error code due to the lack of memory, it may cause unpredictable consequence as we unconditionally consider that it's always successful. Fixes: e53567948f82 ("tipc: conditionally expand buffer headroom over udp tunnel") Reported-by: <scan-admin@coverity.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>