linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-05-10	blk-stat: don't use this_cpu_ptr() in a preemptable section	Jens Axboe	1	-7/+10
	If PREEMPT_RCU is enabled, rcu_read_lock() isn't strong enough for us to use this_cpu_ptr() in that section. Use the safer get/put_cpu_ptr() variants instead. Reported-by: Mike Galbraith <efault@gmx.de> Fixes: 34dbad5d26e2 ("blk-stat: convert to callback-based statistics reporting") Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-10	elevator: remove redundant warnings on IO scheduler switch	Jens Axboe	1	-4/+1
	We warn twice for switching to a scheduler, if that switch fails. As we also report the failure in the return value to the sysfs write, remove the dmesg induced failures. Keep the failure print for warning to switch to the kconfig selected IO scheduler, as we can't report errors for that in any other way. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-10	block, bfq: stress that low_latency must be off to get max throughput	Paolo Valente	2	-1/+21
	The introduction of the BFQ and Kyber I/O schedulers has triggered a new wave of I/O benchmarks. Unfortunately, comments and discussions on these benchmarks confirm that there is still little awareness that it is very hard to achieve, at the same time, a low latency and a high throughput. In particular, virtually all benchmarks measure throughput, or throughput-related figures of merit, but, for BFQ, they use the scheduler in its default configuration. This configuration is geared, instead, toward a low latency. This is evidently a sign that BFQ documentation is still too unclear on this important aspect. This commit addresses this issue by stressing how BFQ configuration must be (easily) changed if the only goal is maximum throughput. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-10	block, bfq: use pointer entity->sched_data only if set	Paolo Valente	1	-2/+11
	In the function __bfq_deactivate_entity, the pointer entity->sched_data could happen to be used before being properly initialized. This led to a NULL pointer dereference. This commit fixes this bug by just using this pointer only where it is safe to do so. Reported-by: Tom Harrison <l12436.tw@gmail.com> Tested-by: Tom Harrison <l12436.tw@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-10	nvme: lightnvm: fix memory leak	Rakesh Pandit	1	-1/+2
	Free up kmalloc allocated memory if failure happens while handling L2P table transfer in nvme_nvm_get_l2p_tbl. Fixes: 8e79b5cb ("lightnvm: move block provisioning to targets") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-10	ALSA: hda: Fix cpu lockup when stopping the cmd dmas	Jeeja KP	1	-0/+4
	Using jiffies in hdac_wait_for_cmd_dmas() to determine when to time out when interrupts are off (snd_hdac_bus_stop_cmd_io()/spin_lock_irq()) causes hard lockup so unlock while waiting using jiffies. ---<-snip->--- <0>[ 1211.603046] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3 <4>[ 1211.603047] Modules linked in: snd_hda_intel i915 vgem <4>[ 1211.603053] irq event stamp: 13366 <4>[ 1211.603053] hardirqs last enabled at (13365): ... <4>[ 1211.603059] Call Trace: <4>[ 1211.603059] ? delay_tsc+0x3d/0xc0 <4>[ 1211.603059] __delay+0xa/0x10 <4>[ 1211.603060] __const_udelay+0x31/0x40 <4>[ 1211.603060] snd_hdac_bus_stop_cmd_io+0x96/0xe0 [snd_hda_core] <4>[ 1211.603060] ? azx_dev_disconnect+0x20/0x20 [snd_hda_intel] <4>[ 1211.603061] snd_hdac_bus_stop_chip+0xb1/0x100 [snd_hda_core] <4>[ 1211.603061] azx_stop_chip+0x9/0x10 [snd_hda_codec] <4>[ 1211.603061] azx_suspend+0x72/0x220 [snd_hda_intel] <4>[ 1211.603061] pci_pm_suspend+0x71/0x140 <4>[ 1211.603062] dpm_run_callback+0x6f/0x330 <4>[ 1211.603062] ? pci_pm_freeze+0xe0/0xe0 <4>[ 1211.603062] __device_suspend+0xf9/0x370 <4>[ 1211.603062] ? dpm_watchdog_set+0x60/0x60 <4>[ 1211.603063] async_suspend+0x1a/0x90 <4>[ 1211.603063] async_run_entry_fn+0x34/0x160 <4>[ 1211.603063] process_one_work+0x1f4/0x6d0 <4>[ 1211.603063] ? process_one_work+0x16e/0x6d0 <4>[ 1211.603064] worker_thread+0x49/0x4a0 <4>[ 1211.603064] kthread+0x107/0x140 <4>[ 1211.603064] ? process_one_work+0x6d0/0x6d0 <4>[ 1211.603065] ? kthread_create_on_node+0x40/0x40 <4>[ 1211.603065] ret_from_fork+0x2e/0x40 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100419 Fixes: 38b19ed7f81ec ("ALSA: hda: fix to wait for RIRB & CORB DMA to set") Reported-by: Marta Lofstedt <marta.lofstedt@intel.com> Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jeeja KP <jeeja.kp@intel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> CC: stable <stable@vger.kernel.org> # 4.7 Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-05-10	perf/callchain: Force USER_DS when invoking perf_callchain_user()	Will Deacon	1	-0/+6
	Perf can generate and record a user callchain in response to a synchronous request, such as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we can end up walking the user stack (and dereferencing/saving whatever we find there) without the protections usually afforded by checks such as access_ok. Rather than play whack-a-mole with each architecture's stack unwinding implementation, fix the root of the problem by ensuring that we force USER_DS when invoking perf_callchain_user from the perf core. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-09	nfsd: fix undefined behavior in nfsd4_layout_verify	Ari Kauppi	1	-1/+2
	UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34 shift exponent 128 is too large for 32-bit type 'int' Depending on compiler+architecture, this may cause the check for layout_type to succeed for overly large values (which seems to be the case with amd64). The large value will be later used in de-referencing nfsd4_layout_ops for function pointers. Reported-by: Jani Tuovila <tuovila@synopsys.com> Signed-off-by: Ari Kauppi <ari@synopsys.com> [colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32] Cc: stable@vger.kernel.org Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-05-09	pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled	Trond Myklebust	1	-0/+11
	Layoutstats is always desirable when using the flexfiles driver, so we should enable it if that driver is being loaded. It is safe to do so, because even when the mount specifies NFSv4.1, we will turn it off if the server tells us it is unsupported. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-09	NFSv4.1: Work around a Linux server bug...	Trond Myklebust	1	-0/+6
	It turns out the Linux server has a bug in its implementation of supattr_exclcreat; it returns the set of all attributes, whether or not they are supported by minor version 1. In order to avoid a regression, we therefore apply the supported_attrs as a mask on top of whatever the server sent us. Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-09	docs: update references to the device io book	Helmut Grohne	2	-4/+4
	While converting the deviceiobook from DocBook to RST, dangling references were left behind. This commit updates all remaining references to the new location. SeongJae Park improved the ko_KR translation. Fixes: 8a8a602fdb83 ("docs: Convert the deviceio template to RST") Signed-off-by: Helmut Grohne <h.grohne@intenta.de> Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-05-09	Documentation: earlycon: fix Marvell Armada 3700 UART name	Andre Przywara	1	-1/+1
	The Marvell Armada 3700 UART uses "ar3700_uart" for its earlycon name. Adjust documentation to match the code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-05-09	docs-rst: add input docs at main index and use kernel-figure	Mauro Carvalho Chehab	2	-2/+3
	The input subsystem documentation got converted into ReST. Add it to the main documentation index and use kernel-figure for the two svg images there. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-05-09	dccp/tcp: do not inherit mc_list from parent	Eric Dumazet	1	-0/+2
	syzkaller found a way to trigger double frees from ip_mc_drop_socket() It turns out that leave a copy of parent mc_list at accept() time, which is very bad. Very similar to commit 8b485ce69876 ("tcp: do not inherit fastopen_req from parent") Initial report from Pray3r, completed by Andrey one. Thanks a lot to them ! Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Pray3r <pray3r.z@gmail.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	sparc64: fix fault handling in NGbzero.S and GENbzero.S	Dave Aldridge	3	-2/+8
	When any of the functions contained in NGbzero.S and GENbzero.S vector through *bzero_from_clear_user, we may end up taking a fault when executing one of the store alternate address space instructions. If this happens, the exception handler does not restore the %asi register. This commit fixes the issue by introducing a new exception handler that ensures the %asi register is restored when a fault is handled. Orabug: 25577560 Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	sparc: use memdup_user_nul in sun4m LED driver	Geliang Tang	1	-10/+3
	Use memdup_user_nul() helper instead of open-coding to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	x86, pmem: Fix cache flushing for iovec write < 8 bytes	Ben Hutchings	1	-1/+1
	Commit 11e63f6d920d added cache flushing for unaligned writes from an iovec, covering the first and last cache line of a >= 8 byte write and the first cache line of a < 8 byte write. But an unaligned write of 2-7 bytes can still cover two cache lines, so make sure we flush both in that case. Cc: <stable@vger.kernel.org> Fixes: 11e63f6d920d ("x86, pmem: fix broken __copy_user_nocache ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-09	arm64: uaccess: suppress spurious clang warning	Mark Rutland	1	-2/+2
	Clang tries to warn when there's a mismatch between an operand's size, and the size of the register it is held in, as this may indicate a bug. Specifically, clang warns when the operand's type is less than 64 bits wide, and the register is used unqualified (i.e. %N rather than %xN or %wN). Unfortunately clang can generate these warnings for unreachable code. For example, for code like: do { \ typeof((ptr)) __v = (v); \ switch(sizeof((ptr))) { \ case 1: \ // assume __v is 1 byte wide \ asm ("{op}b %w0" : : "r" (v)); \ break; \ case 8: \ // assume __v is 8 bytes wide \ asm ("{op} %0" : : "r" (v)); \ break; \ } while (0) ... if op() were passed a char value and pointer to char, clang may produce a warning for the unreachable case where sizeof((ptr)) is 8. For the same reasons, clang produces warnings when __put_user_err() is used for types that are less than 64 bits wide. We could avoid this with a cast to a fixed-width type in each of the cases. However, GCC will then warn that pointer types are being cast to mismatched integer sizes (in unreachable paths). Another option would be to use the same union trickery as we do for __smp_store_release() and __smp_load_acquire(), but this is fairly invasive. Instead, this patch suppresses the clang warning by using an x modifier in the assembly for the 8 byte case of __put_user_err(). No additional work is necessary as the value has been cast to typeof((ptr)), so the compiler will have performed any necessary extension for the reachable case. For consistency, __get_user_err() is also updated to use the x modifier for its 8 byte case. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: atomic_lse: match asm register sizes	Mark Rutland	1	-2/+2
	The LSE atomic code uses asm register variables to ensure that parameters are allocated in specific registers. In the majority of cases we specifically ask for an x register when using 64-bit values, but in a couple of cases we use a w regsiter for a 64-bit value. For asm register variables, the compiler only cares about the register index, with wN and xN having the same meaning. The compiler determines the register size to use based on the type of the variable. Thus, this inconsistency is merely confusing, and not harmful to code generation. For consistency, this patch updates those cases to use the x register alias. There should be no functional change as a result of this patch. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: armv8_deprecated: ensure extension of addr	Mark Rutland	1	-1/+2
	Our compat swp emulation holds the compat user address in an unsigned int, which it passes to __user_swpX_asm(). When a 32-bit value is passed in a register, the upper 32 bits of the register are unknown, and we must extend the value to 64 bits before we can use it as a base address. This patch casts the address to unsigned long to ensure it has been suitably extended, avoiding the potential issue, and silencing a related warning from clang. Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm") Cc: <stable@vger.kernel.org> # 3.19.x- Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: uaccess: ensure extension of access_ok() addr	Mark Rutland	1	-1/+2
	Our access_ok() simply hands its arguments over to __range_ok(), which implicitly assummes that the addr parameter is 64 bits wide. This isn't necessarily true for compat code, which might pass down a 32-bit address parameter. In these cases, we don't have a guarantee that the address has been zero extended to 64 bits, and the upper bits of the register may contain unknown values, potentially resulting in a suprious failure. Avoid this by explicitly casting the addr parameter to an unsigned long (as is done on other architectures), ensuring that the parameter is widened appropriately. Fixes: 0aea86a2176c ("arm64: User access library functions") Cc: <stable@vger.kernel.org> # 3.7.x- Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: ensure extension of smp_store_release value	Mark Rutland	1	-5/+15
	When an inline assembly operand's type is narrower than the register it is allocated to, the least significant bits of the register (up to the operand type's width) are valid, and any other bits are permitted to contain any arbitrary value. This aligns with the AAPCS64 parameter passing rules. Our __smp_store_release() implementation does not account for this, and implicitly assumes that operands have been zero-extended to the width of the type being stored to. Thus, we may store unknown values to memory when the value type is narrower than the pointer type (e.g. when storing a char to a long). This patch fixes the issue by casting the value operand to the same width as the pointer operand in all cases, which ensures that the value is zero-extended as we expect. We use the same union trickery as __smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that pointers are potentially cast to narrower width integers in unreachable paths. A whitespace issue at the top of __smp_store_release() is also corrected. No changes are necessary for __smp_load_acquire(). Load instructions implicitly clear any upper bits of the register, and the compiler will only consider the least significant bits of the register as valid regardless. Fixes: 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()") Fixes: 878a84d5a8a1 ("arm64: add missing data types in smp_load_acquire/smp_store_release") Cc: <stable@vger.kernel.org> # 3.14.x- Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: xchg: hazard against entire exchange variable	Mark Rutland	1	-1/+1
	The inline assembly in __XCHG_CASE() uses a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a u8 pointer, and thus the hazard only applies to the first byte of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, as demonstrated with the following test case: union u { unsigned long l; unsigned int i[2]; }; unsigned long update_char_hazard(union u u) { unsigned int a, b; a = u->i[1]; asm ("str %1, %0" : "+Q" ((char )&u->l) : "r" (0UL)); b = u->i[1]; return a ^ b; } unsigned long update_long_hazard(union u u) { unsigned int a, b; a = u->i[1]; asm ("str %1, %0" : "+Q" ((long )&u->l) : "r" (0UL)); b = u->i[1]; return a ^ b; } The linaro 15.08 GCC 5.1.1 toolchain compiles the above as follows when using -O2 or above: 0000000000000000 <update_char_hazard>: 0: d2800001 mov x1, #0x0 // #0 4: f9000001 str x1, [x0] 8: d2800000 mov x0, #0x0 // #0 c: d65f03c0 ret 0000000000000010 <update_long_hazard>: 10: b9400401 ldr w1, [x0,#4] 14: d2800002 mov x2, #0x0 // #0 18: f9000002 str x2, [x0] 1c: b9400400 ldr w0, [x0,#4] 20: 4a000020 eor w0, w1, w0 24: d65f03c0 ret This patch fixes the issue by passing an unsigned long pointer into the +Q constraint, as we do for our cmpxchg code. This may hazard against more than is necessary, but this is better than missing a necessary hazard. Fixes: 305d454aaa29 ("arm64: atomics: implement native {relaxed, acquire, release} atomics") Cc: <stable@vger.kernel.org> # 4.4.x- Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: documentation: document tagged pointer stack constraints	Kristina Martsenko	1	-15/+47
	Some kernel features don't currently work if a task puts a non-zero address tag in its stack pointer, frame pointer, or frame record entries (FP, LR). For example, with a tagged stack pointer, the kernel can't deliver signals to the process, and the task is killed instead. As another example, with a tagged frame pointer or frame records, perf fails to generate call graphs or resolve symbols. For now, just document these limitations, instead of finding and fixing everything that doesn't work, as it's not known if anyone needs to use tags in these places anyway. In addition, as requested by Dave Martin, generalize the limitations into a general kernel address tag policy, and refactor tagged-pointers.txt to include it. Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0") Cc: <stable@vger.kernel.org> # 3.12.x- Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: entry: improve data abort handling of tagged pointers	Kristina Martsenko	2	-2/+12
	When handling a data abort from EL0, we currently zero the top byte of the faulting address, as we assume the address is a TTBR0 address, which may contain a non-zero address tag. However, the address may be a TTBR1 address, in which case we should not zero the top byte. This patch fixes that. The effect is that the full TTBR1 address is passed to the task's signal handler (or printed out in the kernel log). When handling a data abort from EL1, we leave the faulting address intact, as we assume it's either a TTBR1 address or a TTBR0 address with tag 0x00. This is true as far as I'm aware, we don't seem to access a tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to forget about address tags, and code added in the future may not always remember to remove tags from addresses before accessing them. So add tag handling to the EL1 data abort handler as well. This also makes it consistent with the EL0 data abort handler. Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0") Cc: <stable@vger.kernel.org> # 3.12.x- Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: hw_breakpoint: fix watchpoint matching for tagged pointers	Kristina Martsenko	2	-3/+6
	When we take a watchpoint exception, the address that triggered the watchpoint is found in FAR_EL1. We compare it to the address of each configured watchpoint to see which one was hit. The configured watchpoint addresses are untagged, while the address in FAR_EL1 will have an address tag if the data access was done using a tagged address. The tag needs to be removed to compare the address to the watchpoints. Currently we don't remove it, and as a result can report the wrong watchpoint as being hit (specifically, always either the highest TTBR0 watchpoint or lowest TTBR1 watchpoint). This patch removes the tag. Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0") Cc: <stable@vger.kernel.org> # 3.12.x- Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	arm64: traps: fix userspace cache maintenance emulation on a tagged pointer	Kristina Martsenko	1	-2/+2
	When we emulate userspace cache maintenance in the kernel, we can currently send the task a SIGSEGV even though the maintenance was done on a valid address. This happens if the address has a non-zero address tag, and happens to not be mapped in. When we get the address from a user register, we don't currently remove the address tag before performing cache maintenance on it. If the maintenance faults, we end up in either __do_page_fault, where find_vma can't find the VMA if the address has a tag, or in do_translation_fault, where the tagged address will appear to be above TASK_SIZE. In both cases, the address is not mapped in, and the task is sent a SIGSEGV. This patch removes the tag from the address before using it. With this patch, the fault is handled correctly, the address gets mapped in, and the cache maintenance succeeds. As a second bug, if cache maintenance (correctly) fails on an invalid tagged address, the address gets passed into arm64_notify_segfault, where find_vma fails to find the VMA due to the tag, and the wrong si_code may be sent as part of the siginfo_t of the segfault. With this patch, the correct si_code is sent. Fixes: 7dd01aef0557 ("arm64: trap userspace "dc cvau" cache operation on errata-affected core") Cc: <stable@vger.kernel.org> # 4.8.x- Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-09	device-dax: kill NR_DEV_DAX	Dan Williams	2	-13/+3
	There is no point to ask how many device-dax instances the kernel should support. Since we are already using a dynamic major number, just allow the max number of minors by default and be done. This also fixes the fact that the proposed max for the NR_DEV_DAX range was larger than what could be supported by alloc_chrdev_region(). Fixes: ba09c01d2fa8 ("dax: convert to the cdev api") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-09	proc: try to remove use of FOLL_FORCE entirely	Linus Torvalds	1	-4/+1
	We fixed the bugs in it, but it's still an ugly interface, so let's see if anybody actually depends on it. It's entirely possible that nothing actually requires the whole "punch through read-only mappings" semantics. For example, gdb definitely uses the /proc/<pid>/mem interface, but it looks like it mainly does it for regular reads of the target (that don't need FOLL_FORCE), and looking at the gdb source code seems to fall back on the traditional ptrace(PTRACE_POKEDATA) interface if it needs to. If this breaks something, I do have a (more complex) version that only enables FOLL_FORCE when somebody has PTRACE_ATTACH'ed to the target, like the comment here used to say ("Maybe we should limit FOLL_FORCE to actual ptrace users?"). Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09	qede: Split PF/VF ndos.	Mintz, Yuval	2	-6/+21
	PFs and VFs share the same structure of NDOs today, and the VFs explicitly fails the ndo_xdp() callback stating it doesn't support XDP. This results in lots of: [qede_xdp:1032(enp131s2)]VFs don't support XDP ------------[ cut here ]------------ WARNING: CPU: 4 PID: 1426 at net/core/rtnetlink.c:1637 rtnl_dump_ifinfo+0x354/0x3c0 ... Call Trace: ? __alloc_skb+0x9b/0x1d0 netlink_dump+0x122/0x290 netlink_recvmsg+0x27d/0x430 sock_recvmsg+0x3d/0x50 ... As every dump request for the VF interface info would fail due to rtnl_xdp_fill() returning an error code. To resolve this, introduce a subset of the NDOs meant for the VF in a seperate structure and register that one instead for VFs, and omit the ndo_xdp initialization. Fixes: 40b8c45492ef ("qede: Prevent VFs from using XDP") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	qed: Correct doorbell configuration for !4Kb pages	Ram Amrani	1	-1/+1
	When configuring the doorbell DPI address, driver aligns the start address to 4KB [HW-pages] instead of host PAGE_SIZE. As a result, RoCE applications might receive addresses which are unaligned to pages [when PAGE_SIZE > 4KB], which is a security risk. Fixes: 51ff17251c9c ("qed: Add support for RoCE hw init") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	qed: Tell QM the number of tasks	Mintz, Yuval	1	-0/+1
	Driver doesn't pass the number of tasks to the QM init logic which would cause back-pressure in scenarios requiring many tasks [E.g., using max MRs] and thus reduced performance. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	qed: Fix VF removal sequence	Mintz, Yuval	1	-2/+4
	After previos changes in HW-stop scheme, VFs stopped sending CLOSE messages to their PFs when they unload. Fixes: 1226337ad98f ("qed: Correct HW stop flow") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	qede: Fix XDP memory leak on unload	Suddarsana Reddy Kalluru	1	-0/+3
	When (re\|un)loading, Tx-queues belonging to XDP would not get freed. Fixes: cb6aeb079294 ("qede: Add support for XDP_TX") Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	net/mlx4_core: Reduce harmless SRIOV error message to debug level	Jack Morgenstein	2	-4/+12
	Under SRIOV resource management, extra counters are allocated to VFs from a free pool. If that pool is empty, the ALLOC_RES command for a counter resource fails -- and this generates a misleading error message in the message log. Under SRIOV, each VF is allocated (i.e., guaranteed) 2 counters -- one counter per port. For ETH ports, the RoCE driver requests an additional counter (above the guaranteed counters). If that request fails, the VF RoCE driver simply uses the default (i.e., guaranteed) counter for that port. Thus, failing to allocate an additional counter does not constitute a problem, and the error message on the PF when this occurs should be reduced to debug level. Finally, to identify the situation that the reason for the failure is that no resources are available to grant to the VF, we modified the error returned by mlx4_grant_resource to -EDQUOT (Quota exceeded), which more accurately describes the error. Fixes: c3abb51bdb0e ("IB/mlx4: Add RoCE/IB dedicated counters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	net/mlx4_en: Avoid adding steering rules with invalid ring	Talat Batheesh	1	-0/+5
	Inserting steering rules with illegal ring is an invalid operation, block it. Fixes: 820672812f82 ('net/mlx4_en: Manage flow steering rules with ethtool') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	net/mlx4_en: Change the error print to debug print	Kamal Heib	1	-1/+2
	The error print within mlx4_en_calc_rx_buf() should be a debug print. Fixes: 51151a16a60f ('mlx4: allow order-0 memory allocations in RX path') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09	s390/virtio: change maintainership	Christian Borntraeger	1	-1/+1
	Halil is doing a lot more work in the virtio area on s390 than I do. Let's reflect the reality in the maintainers file. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	tools/virtio: fix spelling mistake: "wakeus" -> "wakeups"	Colin Ian King	1	-1/+1
	trivial fix to spelling mistake in an error message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2017-05-09	virtio_net: tidy a couple debug statements	Dan Carpenter	1	-2/+2
	We are printing a decimal value for truesize so we shouldn't use an "0x" prefix. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	ptr_ring: support testing different batching sizes	Michael S. Tsirkin	1	-0/+3
	Use the param flag for that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	ringtest: support test specific parameters	Michael S. Tsirkin	2	-0/+15
	Add a new flag for passing test-specific parameters. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	ptr_ring: batch ring zeroing	Michael S. Tsirkin	1	-9/+54
	A known weakness in ptr_ring design is that it does not handle well the situation when ring is almost full: as entries are consumed they are immediately used again by the producer, so consumer and producer are writing to a shared cache line. To fix this, add batching to consume calls: as entries are consumed do not write NULL into the ring until we get a multiple (in current implementation 2x) of cache lines away from the producer. At that point, write them all out. We do the write out in the reverse order to keep producer from sharing cache with consumer for as long as possible. Writeout also triggers when ring wraps around - there's no special reason to do this but it helps keep the code a bit simpler. What should we do if getting away from producer by 2 cache lines would mean we are keeping the ring moe than half empty? Maybe we should reduce the batching in this case, current patch simply reduces the batching. Notes: - it is no longer true that a call to consume guarantees that the following call to produce will succeed. No users seem to assume that. - batching can also in theory reduce the signalling rate: users that would previously send interrups to the producer to wake it up after consuming each entry would now only need to do this once in a batch. Doing this would be easy by returning a flag to the caller. No users seem to do signalling on consume yet so this was not implemented yet. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2017-05-09	virtio: virtio_driver doc	Cornelia Huck	1	-0/+4
	Add comments for the virtio_driver members that were not documented. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	virtio_net: don't reset twice on XDP on/off	Michael S. Tsirkin	1	-1/+0
	We already do a reset once in remove_vq_common - there appears to be no point in doing another one when we add/remove XDP. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	virtio_net: fix support for small rings	Michael S. Tsirkin	1	-4/+26
	When ring size is small (<32 entries) making buffers smaller means a full ring might not be able to hold enough buffers to fit a single large packet. Make sure a ring full of buffers is large enough to allow at least one packet of max size. Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	virtio_net: reduce alignment for buffers	Michael S. Tsirkin	1	-12/+1
	We don't need to align length to any particular value anymore. Aligning to L1 cache size probably sill makes sense to reduce false sharing. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	virtio_net: rework mergeable buffer handling	Michael S. Tsirkin	1	-46/+43
	Use the new _ctx virtio API to maintain true length for each buffer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	virtio_net: allow specifying context for rx	Michael S. Tsirkin	1	-1/+14
	With mergeable buffers we never use s/g for rx, so allow specifying context in that case. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09	powerpc/64s: Support new device tree binding for discovering CPU features	Nicholas Piggin	12	-16/+1398
	The ibm,powerpc-cpu-features device tree binding describes CPU features with ASCII names and extensible compatibility, privilege, and enablement metadata that allows improved flexibility and compatibility with new hardware. The interface is described in detail in ibm,powerpc-cpu-features.txt in this patch. Currently this code is not enabled by default, and there are no released firmwares that provide the binding. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>