linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-12-04	rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation	Herbert Xu	1	-2/+3
	When an rhashtable user pounds rhashtable hard with back-to-back insertions we may end up growing the table in GFP_ATOMIC context. Unfortunately when the table reaches a certain size this often fails because we don't have enough physically contiguous pages to hold the new table. Eric Dumazet suggested (and in fact wrote this patch) using __vmalloc instead which can be used in GFP_ATOMIC context. Reported-by: Phil Sutter <phil@nwl.cc> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04	gre6: allow to update all parameters via rtnl	Nicolas Dichtel	1	-5/+3
	Parameters were updated only if the kernel was unable to find the tunnel with the new parameters, ie only if core pamareters were updated (keys, addr, link, type). Now it's possible to update ttl, hoplimit, flowinfo and flags. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04	pppoe: fix memory corruption in padt work structure	Guillaume Nault	1	-4/+10
	pppoe_connect() mustn't touch the padt_work field of pppoe sockets because that work could be already pending. [ 21.473147] BUG: unable to handle kernel NULL pointer dereference at 00000004 [ 21.474523] IP: [<c1043177>] process_one_work+0x29/0x31c [ 21.475164] *pde = 00000000 [ 21.475513] Oops: 0000 [#1] SMP [ 21.475910] Modules linked in: pppoe pppox ppp_generic slhc crc32c_intel aesni_intel virtio_net xts aes_i586 lrw gf128mul ablk_helper cryptd evdev acpi_cpufreq processor serio_raw button ext4 crc16 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio [ 21.476168] CPU: 2 PID: 164 Comm: kworker/2:2 Not tainted 4.4.0-rc1 #1 [ 21.476168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 21.476168] task: f5f83c00 ti: f5e28000 task.ti: f5e28000 [ 21.476168] EIP: 0060:[<c1043177>] EFLAGS: 00010046 CPU: 2 [ 21.476168] EIP is at process_one_work+0x29/0x31c [ 21.484082] EAX: 00000000 EBX: f678b2a0 ECX: 00000004 EDX: 00000000 [ 21.484082] ESI: f6c69940 EDI: f5e29ef0 EBP: f5e29f0c ESP: f5e29edc [ 21.484082] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 21.484082] CR0: 80050033 CR2: 000000a4 CR3: 317ad000 CR4: 00040690 [ 21.484082] Stack: [ 21.484082] 00000000 f6c69950 00000000 f6c69940 c0042338 f5e29f0c c1327945 00000000 [ 21.484082] 00000008 f678b2a0 f6c69940 f678b2b8 f5e29f30 c1043984 f5f83c00 f6c69970 [ 21.484082] f678b2a0 c10437d3 f6775e80 f678b2a0 c10437d3 f5e29fac c1047059 f5e29f74 [ 21.484082] Call Trace: [ 21.484082] [<c1327945>] ? _raw_spin_lock_irq+0x28/0x30 [ 21.484082] [<c1043984>] worker_thread+0x1b1/0x244 [ 21.484082] [<c10437d3>] ? rescuer_thread+0x229/0x229 [ 21.484082] [<c10437d3>] ? rescuer_thread+0x229/0x229 [ 21.484082] [<c1047059>] kthread+0x8f/0x94 [ 21.484082] [<c1327a32>] ? _raw_spin_unlock_irq+0x22/0x26 [ 21.484082] [<c1327ee9>] ret_from_kernel_thread+0x21/0x38 [ 21.484082] [<c1046fca>] ? kthread_parkme+0x19/0x19 [ 21.496082] Code: 5d c3 55 89 e5 57 56 53 89 c3 83 ec 24 89 d0 89 55 e0 8d 7d e4 e8 6c d8 ff ff b9 04 00 00 00 89 45 d8 8b 43 24 89 45 dc 8b 45 d8 <8b> 40 04 8b 80 e0 00 00 00 c1 e8 05 24 01 88 45 d7 8b 45 e0 8d [ 21.496082] EIP: [<c1043177>] process_one_work+0x29/0x31c SS:ESP 0068:f5e29edc [ 21.496082] CR2: 0000000000000004 [ 21.496082] ---[ end trace e362cc9cf10dae89 ]--- Reported-by: Andrew <nitr0@seti.kr.ua> Fixes: 287f3a943fef ("pppoe: Use workqueue to die properly when a PADT is received") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04	net: mvpp2: fix refilling BM pools in RX path	Marcin Wojtas	1	-12/+16
	In hitherto code in case of RX buffer allocation error during refill, original buffer is pushed to the network stack, but the amount of available buffer pointers in BM pool is decreased. This commit fixes the situation by moving refill call before skb_put(), and returning original buffer pointer to the pool in case of an error. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Cc: <stable@vger.kernel.org> # v3.18+ Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04	net: mvpp2: fix buffers' DMA handling on RX path	Marcin Wojtas	1	-5/+16
	Each allocated buffer, whose pointer is put into BM pool is DMA-mapped. Hence it should be properly unmapped after usage or when removing buffers from pool. This commit fixes DMA handling on RX path by adding dma_unmap_single() in mvpp2_rx() and in mvpp2_bufs_free(). The latter function's argument number had to be increased for this purpose. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Cc: <stable@vger.kernel.org> # v3.18+ Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04	net: mvpp2: fix missing DMA region unmap in egress processing	Marcin Wojtas	1	-3/+2
	The Tx descriptor release code currently calls dma_unmap_single() and dev_kfree_skb_any() if the descriptor is associated with a non-NULL skb. This condition is true only for the last fragment of the packet. Since every descriptor's buffer is DMA-mapped it has to be properly unmapped. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Cc: <stable@vger.kernel.org> # v3.18+ Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04	rhashtable: Prevent spurious EBUSY errors on insertion	Herbert Xu	2	-22/+41
	Thomas and Phil observed that under stress rhashtable insertion sometimes failed with EBUSY, even though this error should only ever been seen when we're under attack and our hash chain length has grown to an unacceptable level, even after a rehash. It turns out that the logic for detecting whether there is an existing rehash is faulty. In particular, when two threads both try to grow the same table at the same time, one of them may see the newly grown table and thus erroneously conclude that it had been rehashed. This is what leads to the EBUSY error. This patch fixes this by remembering the current last table we used during insertion so that rhashtable_insert_rehash can detect when another thread has also done a resize/rehash. When this is detected we will give up our resize/rehash and simply retry the insertion with the new table. Reported-by: Thomas Graf <tgraf@suug.ch> Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	net: phy: reset only targeted phy	Jérôme Pouiller	1	-1/+2
	It is possible to address another chip on same MDIO bus. The case is correctly handled for media advertising. It is taken into account only if mii_data->phy_id == phydev->addr. However, this condition was missing for reset case. Signed-off-by: Jérôme Pouiller <jezz@sysmic.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	bnxt_en: Setup uc_list mac filters after resetting the chip.	Michael Chan	1	-5/+8
	Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and mc_list mac address filters. Before the patch, uc_list is not setup again after chip reset (such as ethtool ring size change) and macvlans don't work any more after that. Modify bnxt_cfg_rx_mode() to return error codes appropriately so that the init chip sequence can detect any failures. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	bnxt_en: enforce proper storing of MAC address	Jeffrey Huang	2	-11/+16
	For PF, the bp->pf.mac_addr always holds the permanent MAC addr assigned by the HW. For VF, the bp->vf.mac_addr always holds the administrator assigned VF MAC addr. The random generated VF MAC addr should never get stored to bp->vf.mac_addr. This way, when the VF wants to change the MAC address, we can tell if the adminstrator has already set it and disallow the VF from changing it. v2: Fix compile error if CONFIG_BNXT_SRIOV is not set. Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	bnxt_en: Fixed incorrect implementation of ndo_set_mac_address	Jeffrey Huang	1	-1/+10
	The existing ndo_set_mac_address only copies the new MAC addr and didn't set the new MAC addr to the HW. The correct way is to delete the existing default MAC filter from HW and add the new one. Because of RFS filters are also dependent on the default mac filter l2 context, the driver must go thru close_nic() to delete the default MAC and RFS filters, then open_nic() to set the default MAC address to HW. Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	net: lpc_eth: remove irq > NR_IRQS check from probe()	Vladimir Zapolskiy	1	-1/+1
	If the driver is used on an ARM platform with SPARSE_IRQ defined, semantics of NR_IRQS is different (minimal value of virtual irqs) and by default it is set to 16, see arch/arm/include/asm/irq.h. This value may be less than the actual number of virtual irqs, which may break the driver initialization. The check removal allows to use the driver on such a platform, and, if irq controller driver works correctly, the check is not needed on legacy platforms. Fixes a runtime problem: lpc-eth 31060000.ethernet: error getting resources. lpc_eth: lpc-eth: not found (-6). Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	net_sched: fix qdisc_tree_decrease_qlen() races	Eric Dumazet	5	-14/+26
	qdisc_tree_decrease_qlen() suffers from two problems on multiqueue devices. One problem is that it updates sch->q.qlen and sch->qstats.drops on the mq/mqprio root qdisc, while it should not : Daniele reported underflows errors : [ 681.774821] PAX: sch->q.qlen: 0 n: 1 [ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head; [ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1 [ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015 [ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c [ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b [ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001 [ 681.774962] Call Trace: [ 681.774967] [<ffffffffa95d2810>] dump_stack+0x4c/0x7f [ 681.774970] [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50 [ 681.774972] [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160 [ 681.774976] [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel] [ 681.774978] [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel] [ 681.774980] [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0 [ 681.774983] [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160 [ 681.774985] [<ffffffffa90664c1>] __do_softirq+0xf1/0x200 [ 681.774987] [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30 [ 681.774989] [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260 [ 681.774991] [<ffffffffa9089560>] ? sort_range+0x40/0x40 [ 681.774992] [<ffffffffa9085fe4>] kthread+0xe4/0x100 [ 681.774994] [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170 [ 681.774995] [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70 mq/mqprio have their own ways to report qlen/drops by folding stats on all their queues, with appropriate locking. A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup() without proper locking : concurrent qdisc updates could corrupt the list that qdisc_match_from_root() parses to find a qdisc given its handle. Fix first problem adding a TCQ_F_NOPARENT qdisc flag that qdisc_tree_decrease_qlen() can use to abort its tree traversal, as soon as it meets a mq/mqprio qdisc children. Second problem can be fixed by RCU protection. Qdisc are already freed after RCU grace period, so qdisc_list_add() and qdisc_list_del() simply have to use appropriate rcu list variants. A future patch will add a per struct netdev_queue list anchor, so that qdisc_tree_decrease_qlen() can have more efficient lookups. Reported-by: Daniele Fucini <dfucini@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	openvswitch: fix hangup on vxlan/gre/geneve device deletion	Paolo Abeni	2	-3/+7
	Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference to the underlying tunnel device, but never released it when such device is deleted. Deleting the underlying device via the ip tool cause the kernel to hangup in the netdev_wait_allrefs() loop. This commit ensure that on device unregistration dp_detach_port_notify() is called for all vports that hold the device reference, properly releasing it. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv4: igmp: Allow removing groups from a removed interface	Andrew Lunn	1	-2/+3
	When a multicast group is joined on a socket, a struct ip_mc_socklist is appended to the sockets mc_list containing information about the joined group. If the interface is hot unplugged, this entry becomes stale. Prior to commit 52ad353a5344f ("igmp: fix the problem when mc leave group") it was possible to remove the stale entry by performing a IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on the interface. However, this fix enforces that the interface must still exist. Thus with time, the number of stale entries grows, until sysctl_igmp_max_memberships is reached and then it is not possible to join and more groups. The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is performed without specifying the interface, either by ifindex or ip address. However here we do supply one of these. So loosen the restriction on device existence to only apply when the interface has not been specified. This then restores the ability to clean up the stale entries. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 52ad353a5344f "(igmp: fix the problem when mc leave group") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv6: sctp: implement sctp_v6_destroy_sock()	Eric Dumazet	1	-1/+8
	Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets. We need to call inet6_destroy_sock() to properly release inet6 specific fields. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	arm64: bpf: add 'store immediate' instruction	Yang Shi	1	-1/+19
	aarch64 doesn't have native store immediate instruction, such operation has to be implemented by the below instruction sequence: Load immediate to register Store register Signed-off-by: Yang Shi <yang.shi@linaro.org> CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv6: kill sk_dst_lock	Eric Dumazet	8	-46/+12
	While testing the np->opt RCU conversion, I found that UDP/IPv6 was using a mixture of xchg() and sk_dst_lock to protect concurrent changes to sk->sk_dst_cache, leading to possible corruptions and crashes. ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest way to fix the mess is to remove sk_dst_lock completely, as we did for IPv4. __ip6_dst_store() and ip6_dst_store() share same implementation. sk_setup_caps() being called with socket lock being held or not, we have to use sk_dst_set() instead of __sk_dst_set() Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);" in ip6_dst_store() before the sk_setup_caps(sk, dst) call. This is because ip6_dst_store() can be called from process context, without any lock held. As soon as the dst is installed in sk->sk_dst_cache, dst can be freed from another cpu doing a concurrent ip6_dst_store() Doing the dst dereference before doing the install is needed to make sure no use after free would trigger. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	ipv6: sctp: add rcu protection around np->opt	Eric Dumazet	1	-3/+10
	This patch completes the work I did in commit 45f6fad84cc3 ("ipv6: add complete rcu protection around np->opt"), as I missed sctp part. This simply makes sure np->opt is used with proper RCU locking and accessors. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03	net/neighbour: fix crash at dumping device-agnostic proxy entries	Konstantin Khlebnikov	1	-2/+2
	Proxy entries could have null pointer to net-device. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	sctp: use GFP_USER for user-controlled kmalloc	Marcelo Ricardo Leitner	1	-2/+2
	Dmitry Vyukov reported that the user could trigger a kernel warning by using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that value directly affects the value used as a kmalloc() parameter. This patch thus switches the allocation flags from all user-controllable kmalloc size to GFP_USER to put some more restrictions on it and also disables the warn, as they are not necessary. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	sctp: convert sack_needed and sack_generation to bits	Marcelo Ricardo Leitner	1	-8/+8
	They don't need to be any bigger than that and with this we start a new bitfield for tracking association runtime stuff, like zero window situation. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	ipv6: add complete rcu protection around np->opt	Eric Dumazet	13	-52/+122
	This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	bpf: fix allocation warnings in bpf maps and integer overflow	Alexei Starovoitov	3	-12/+34
	For large map->value_size the user space can trigger memory allocation warnings like: WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989 __alloc_pages_nodemask+0x695/0x14e0() Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff82743b56>] dump_stack+0x68/0x92 lib/dump_stack.c:50 [<ffffffff81244ec9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460 [<ffffffff812450f9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493 [< inline >] __alloc_pages_slowpath mm/page_alloc.c:2989 [<ffffffff81554e95>] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235 [<ffffffff816188fe>] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055 [< inline >] alloc_pages include/linux/gfp.h:451 [<ffffffff81550706>] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414 [<ffffffff815a1c89>] kmalloc_order+0x19/0x60 mm/slab_common.c:1007 [<ffffffff815a1cef>] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018 [< inline >] kmalloc_large include/linux/slab.h:390 [<ffffffff81627784>] __kmalloc+0x234/0x250 mm/slub.c:3525 [< inline >] kmalloc include/linux/slab.h:463 [< inline >] map_update_elem kernel/bpf/syscall.c:288 [< inline >] SYSC_bpf kernel/bpf/syscall.c:744 To avoid never succeeding kmalloc with order >= MAX_ORDER check that elem->value_size and computed elem_size are within limits for both hash and array type maps. Also add __GFP_NOWARN to kmalloc(value_size \| elem_size) to avoid OOM warnings. Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512, so keep those kmalloc-s as-is. Large value_size can cause integer overflows in elem_size and map.pages formulas, so check for that as well. Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0	Marcin Wojtas	1	-0/+1
	The Ethernet controller found in the Armada 38x SoC's family support TCP/IP checksumming with frame sizes larger than 1600 bytes, however only on port 0. This commit enables it by setting 'tx-csum-limit' to 9800B in 'ethernet@70000' node. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: enable setting custom TX IP checksum limit	Marcin Wojtas	2	-2/+23
	Since Armada 38x SoC can support IP checksum for jumbo frames only on a single port, it means that this feature should be enabled per-port, rather than for the whole SoC. This patch enables setting custom TX IP checksum limit by adding new optional property to the mvneta device tree node. If not used, by default 1600B is set for "marvell,armada-370-neta" and 9800B for other strings, which ensures backward compatibility. Binding documentation is updated accordingly. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: fix error path for building skb	Marcin Wojtas	1	-2/+6
	In the actual RX processing, there is same error path for both descriptor ring refilling and building skb fails. This is not correct, because after successful refill, the ring is already updated with newly allocated buffer. Then, in case of build_skb() fail, hitherto code left the original buffer unmapped. This patch fixes above situation by swapping error check of skb build with DMA-unmap of original buffer. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Acked-by: Simon Guinot <simon.guinot@sequanux.org> Cc: <stable@vger.kernel.org> # v4.2+ Fixes a84e32894191 ("net: mvneta: fix refilling for Rx DMA buffers") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: fix bit assignment for RX packet irq enable	Marcin Wojtas	1	-1/+1
	A value originally defined in the driver was inappropriate. Even though the ingress was somehow working, writing MVNETA_RXQ_INTR_ENABLE_ALL_MASK to MVNETA_INTR_ENABLE didn't make any effect, because the bits [31:16] are reserved and read-only. This commit updates MVNETA_RXQ_INTR_ENABLE_ALL_MASK to be compliant with the controller's documentation. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: fix bit assignment in MVNETA_RXQ_CONFIG_REG	Marcin Wojtas	1	-1/+1
	MVNETA_RXQ_HW_BUF_ALLOC bit which controls enabling hardware buffer allocation was mistakenly set as BIT(1). This commit fixes the assignment. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: mvneta: add configuration for MBUS windows access protection	Marcin Wojtas	1	-0/+2
	This commit adds missing configuration of MBUS windows access protection in mvneta_conf_mbus_windows function - a dedicated variable for that purpose remained there unused since v3.8 initial mvneta support. Because of that the register contents were inherited from the bootloader. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	mac80211: fix off-channel mgmt-tx uninitialized variable usage	Johannes Berg	1	-2/+6
	In the last change here, I neglected to update the cookie in one code path: when a mgmt-tx has no real cookie sent to userspace as it doesn't wait for a response, but is off-channel. The original code used the SKB pointer as the cookie and always assigned the cookie to the TX SKB in ieee80211_start_roc_work(), but my change turned this around and made the code rely on a valid cookie being passed in. Unfortunately, the off-channel no-wait TX path wasn't assigning one at all, resulting in an uninitialized stack value being used. This wasn't handed back to userspace as a cookie (since in the no-wait case there isn't a cookie), but it was tested for non-zero to distinguish between mgmt-tx and off-channel. Fix this by assigning a dummy non-zero cookie unconditionally, and get rid of a misleading comment and some dead code while at it. I'll clean up the ACK SKB handling separately later. Fixes: 3b79af973cf4 ("mac80211: stop using pointers as userspace cookies") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-02	mac80211: do not actively scan DFS channels	Antonio Quartulli	1	-4/+5
	DFS channels should not be actively scanned as we can't be sure if we are allowed or not. If the current channel is in the DFS band, active scan might be performed after CSA, but we have no guarantee about other channels, therefore it is safer to prevent active scanning at all. Cc: stable@vger.kernel.org Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-02	mac80211: don't teardown sdata on sdata stop	Eliad Peller	1	-1/+1
	Interfaces are being initialized (setup) on addition, and torn down on removal. However, p2p device is being torn down when stopped, resulting in the next p2p start operation being done on uninitialized interface. Solve it by calling ieee80211_teardown_sdata() only on interface removal (for the non-netdev case). Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> [squashed in fix to call teardown after unregister] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-02	net: thunderx: Enable BGX LMAC's RX/TX only after VF is up	Sunil Goutham	3	-3/+38
	Enable or disable BGX LMAC's RX/TX based on corresponding VF's status. If otherwise, when multiple LMAC's physical link is up then packets from all LMAC's whose corresponding VF is not yet initialized will get forwarded to VF0. This is due to VNIC's default configuration where CPI, RSSI e.t.c point to VF0/QSET0/RQ0. This patch will prevent multiple copies of packets on VF0. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Switchon carrier only upon interface link up	Sunil Goutham	3	-4/+18
	Call netif_carrier_on() only if interface's link is up. Switching this on upon IFF_UP by default, is causing issues with ethernet channel bonding in LACP mode. Initial NETDEV_CHANGE notification was being skipped. Also fixed some issues with link/speed/duplex reporting via ethtool. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Set CQ timer threshold properly	Sunil Goutham	3	-5/+4
	Properly set CQ timer threshold and also set it to 2us. With previous incorrect settings it was set to 0.5us which is too less. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Wait for delayed work to finish before destroying it	Thanneeru Srinivasulu	2	-4/+2
	While VNIC or BGX driver teardown, wait for already scheduled delayed work to finish before destroying it. Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	net: thunderx: Force to load octeon-mdio before bgx driver.	Thanneeru Srinivasulu	2	-0/+4
	Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	openvswitch: properly refcount vport-vxlan module	Paolo Abeni	4	-5/+9
	After 614732eaa12d, no refcount is maintained for the vport-vxlan module. This allows the userspace to remove such module while vport-vxlan devices still exist, which leads to later oops. v1 -> v2: - move vport 'owner' initialization in ovs_vport_ops_register() and make such function a macro Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02	HID: lg: restrict filtering out of first interface to G29 only	Benjamin Tissoires	1	-2/+3
	Looks like 29fae1c85 ("HID: logitech: Add support for G29") was a little bit aggressive and broke other devices. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=108121 Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-01	bpf, array: fix heap out-of-bounds access when updating elements	Daniel Borkmann	1	-1/+1
	During own review but also reported by Dmitry's syzkaller [1] it has been noticed that we trigger a heap out-of-bounds access on eBPF array maps when updating elements. This happens with each map whose map->value_size (specified during map creation time) is not multiple of 8 bytes. In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and used to align array map slots for faster access. However, in function array_map_update_elem(), we update the element as ... memcpy(array->value + array->elem_size * index, value, array->elem_size); ... where we access 'value' out-of-bounds, since it was allocated from map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER) and later on copied through copy_from_user(value, uvalue, map->value_size). Thus, up to 7 bytes, we can access out-of-bounds. Same could happen from within an eBPF program, where in worst case we access beyond an eBPF program's designated stack. Since 1be7f75d1668 ("bpf: enable non-root eBPF programs") didn't hit an official release yet, it only affects priviledged users. In case of array_map_lookup_elem(), the verifier prevents eBPF programs from accessing beyond map->value_size through check_map_access(). Also from syscall side map_lookup_elem() only copies map->value_size back to user, so nothing could leak. [1] http://github.com/google/syzkaller Fixes: 28fbcfa08d8e ("bpf: add array type of eBPF maps") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter	Steven Rostedt (Red Hat)	1	-0/+16
	The set_event_pid filter relies on attaching to the sched_switch and sched_wakeup tracepoints to see if it should filter the tracing on schedule tracepoints. By adding the callbacks to sched_wakeup, pids in the set_event_pid file will trace the wakeups of those tasks with those pids. But sched_wakeup_new and sched_waking were missed. These two should also be traced. Luckily, these tracepoints share the same class as sched_wakeup which means they can use the same pre and post callbacks as sched_wakeup does. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-01	net: fix sock_wake_async() rcu protection	Eric Dumazet	5	-33/+44
	Dmitry provided a syzkaller (http://github.com/google/syzkaller) triggering a fault in sock_wake_async() when async IO is requested. Said program stressed af_unix sockets, but the issue is generic and should be addressed in core networking stack. The problem is that by the time sock_wake_async() is called, we should not access the @flags field of 'struct socket', as the inode containing this socket might be freed without further notice, and without RCU grace period. We already maintain an RCU protected structure, "struct socket_wq" so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it is the safe route. It also reduces number of cache lines needing dirtying, so might provide a performance improvement anyway. In followup patches, we might move remaining flags (SOCK_NOSPACE, SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket' being mostly read and let it being shared between cpus. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA	Eric Dumazet	22	-52/+60
	This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock sk) and sk_clear_bit(int net, struct sock sk) are added so that following patch can change their implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	nvme: temporary fix for Apple controller reset	Stephan Günther	1	-0/+12
	Recent patches added basic support for the Apple NVMe controller but still cause resets and data corruption on that particular controller when a specific pattern of read/flush commands occurs. Limiting the queue depth to 2 works around that issue. This patch enforces that limit only for the Apple controller and is considered a temporary fix until we find the root source of that problem. Signed-off-by: Stephan Günther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01	vmxnet3: fix checks for dma mapping errors	Alexey Khoroshilov	1	-11/+60
	vmxnet3_drv does not check dma_addr with dma_mapping_error() after mapping dma memory. The patch adds the checks and tries to handle failures. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	wan/x25: Fix use-after-free in x25_asy_open_tty()	Peter Hurley	1	-5/+1
	The N_X25 line discipline may access the previous line discipline's closed and already-freed private data on open [1]. The tty->disc_data field _never_ refers to valid data on entry to the line discipline's open() method. Rather, the ldisc is expected to initialize that field for its own use for the lifetime of the instance (ie. from open() to close() only). [1] [ 634.336761] ================================================================== [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] ============================================================================= [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"	Nicolas Dichtel	4	-11/+6
	This reverts commit ab450605b35caa768ca33e86db9403229bf42be4. In IPv6, we cannot inherit the dst of the original dst. ndisc packets are IPv6 packets and may take another route than the original packet. This patch breaks the following scenario: a packet comes from eth0 and is forwarded through vxlan1. The encapsulated packet triggers an NS which cannot be sent because of the wrong route. CC: Jiri Benc <jbenc@redhat.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01	null_blk: change type of completion_nsec to unsigned long	Arianna Avanzini	1	-2/+2
	This commit at least doubles the maximum value for completion_nsec. This helps in special cases where one wants/needs to emulate an extremely slow I/O (for example to spot bugs). Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: Arianna Avanzini <avanzini@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01	null_blk: guarantee device restart in all irq modes	Arianna Avanzini	1	-12/+15
	In single-queue (block layer) mode,the function null_rq_prep_fn stops the device if alloc_cmd fails. Then, once stopped, the device must be restarted on the next command completion, so that the request(s) for which alloc_cmd failed can be requeued. Otherwise the device hangs. Unfortunately, device restart is currently performed only for delayed completions, i.e., in irqmode==2. This fact causes hangs, for the above reasons, with the other irqmodes in combination with single-queue block layer. This commits addresses this issue by making sure that, if stopped, the device is properly restarted for all irqmodes on completions. Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: Arianna AVanzini <avanzini@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>