aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-06-06rseq: Introduce restartable sequences system callMathieu Desnoyers4-1/+327
Expose a new system call allowing each thread to register one userspace memory area to be used as an ABI between kernel and user-space for two purposes: user-space restartable sequences and quick access to read the current CPU number value from user-space. * Restartable sequences (per-cpu atomics) Restartables sequences allow user-space to perform update operations on per-cpu data without requiring heavy-weight atomic operations. The restartable critical sections (percpu atomics) work has been started by Paul Turner and Andrew Hunter. It lets the kernel handle restart of critical sections. [1] [2] The re-implementation proposed here brings a few simplifications to the ABI which facilitates porting to other architectures and speeds up the user-space fast path. Here are benchmarks of various rseq use-cases. Test hardware: arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading The following benchmarks were all performed on a single thread. * Per-CPU statistic counter increment getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 344.0 31.4 11.0 x86-64: 15.3 2.0 7.7 * LTTng-UST: write event 32-bit header, 32-bit payload into tracer per-cpu buffer getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 2502.0 2250.0 1.1 x86-64: 117.4 98.0 1.2 * liburcu percpu: lock-unlock pair, dereference, read/compare word getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 751.0 128.5 5.8 x86-64: 53.4 28.6 1.9 * jemalloc memory allocator adapted to use rseq Using rseq with per-cpu memory pools in jemalloc at Facebook (based on rseq 2016 implementation): The production workload response-time has 1-2% gain avg. latency, and the P99 overall latency drops by 2-3%. * Reading the current CPU number Speeding up reading the current CPU number on which the caller thread is running is done by keeping the current CPU number up do date within the cpu_id field of the memory area registered by the thread. This is done by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU value within the registered user-space memory area. User-space can then read the current CPU number directly from memory. Keeping the current cpu id in a memory area shared between kernel and user-space is an improvement over current mechanisms available to read the current CPU number, which has the following benefits over alternative approaches: - 35x speedup on ARM vs system call through glibc - 20x speedup on x86 compared to calling glibc, which calls vdso executing a "lsl" instruction, - 14x speedup on x86 compared to inlined "lsl" instruction, - Unlike vdso approaches, this cpu_id value can be read from an inline assembly, which makes it a useful building block for restartable sequences. - The approach of reading the cpu id through memory mapping shared between kernel and user-space is portable (e.g. ARM), which is not the case for the lsl-based x86 vdso. On x86, yet another possible approach would be to use the gs segment selector to point to user-space per-cpu data. This approach performs similarly to the cpu id cache, but it has two disadvantages: it is not portable, and it is incompatible with existing applications already using the gs segment selector for other purposes. Benchmarking various approaches for reading the current CPU number: ARMv7 Processor rev 4 (v7l) Machine model: Cubietruck - Baseline (empty loop): 8.4 ns - Read CPU from rseq cpu_id: 16.7 ns - Read CPU from rseq cpu_id (lazy register): 19.8 ns - glibc 2.19-0ubuntu6.6 getcpu: 301.8 ns - getcpu system call: 234.9 ns x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz: - Baseline (empty loop): 0.8 ns - Read CPU from rseq cpu_id: 0.8 ns - Read CPU from rseq cpu_id (lazy register): 0.8 ns - Read using gs segment selector: 0.8 ns - "lsl" inline assembly: 13.0 ns - glibc 2.19-0ubuntu6 getcpu: 16.6 ns - getcpu system call: 53.9 ns - Speed (benchmark taken on v8 of patchset) Running 10 runs of hackbench -l 100000 seems to indicate, contrary to expectations, that enabling CONFIG_RSEQ slightly accelerates the scheduler: Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1 kernel parameter), with a Linux v4.6 defconfig+localyesconfig, restartable sequences series applied. * CONFIG_RSEQ=n avg.: 41.37 s std.dev.: 0.36 s * CONFIG_RSEQ=y avg.: 40.46 s std.dev.: 0.33 s - Size On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is 567 bytes, and the data size increase of vmlinux is 5696 bytes. [1] https://lwn.net/Articles/650333/ [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
2018-06-06uapi/headers: Provide types_32_64.hMathieu Desnoyers1-0/+50
Provide helper macros for fields which represent pointers in kernel-userspace ABI. This facilitates handling of 32-bit user-space by 64-bit kernels by defining those fields as 32-bit 0-padding and 32-bit integer on 32-bit architectures, which allows the kernel to treat those as 64-bit integers. The order of padding and 32-bit integer depends on the endianness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-2-mathieu.desnoyers@efficios.com
2018-06-04Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds8-144/+142
Pull time/Y2038 updates from Thomas Gleixner: - Consolidate SySV IPC UAPI headers - Convert SySV IPC to the new COMPAT_32BIT_TIME mechanism - Cleanup the core interfaces and standardize on the ktime_get_* naming convention. - Convert the X86 platform ops to timespec64 - Remove the ugly temporary timespec64 hack * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86: Convert x86_platform_ops to timespec64 timekeeping: Add more coarse clocktai/boottime interfaces timekeeping: Add ktime_get_coarse_with_offset timekeeping: Standardize on ktime_get_*() naming timekeeping: Clean up ktime_get_real_ts64 timekeeping: Remove timespec64 hack y2038: ipc: Redirect ipc(SEMTIMEDOP, ...) to compat_ksys_semtimedop y2038: ipc: Enable COMPAT_32BIT_TIME y2038: ipc: Use __kernel_timespec y2038: ipc: Report long times to user space y2038: ipc: Use ktime_get_real_seconds consistently y2038: xtensa: Extend sysvipc data structures y2038: powerpc: Extend sysvipc data structures y2038: sparc: Extend sysvipc data structures y2038: parisc: Extend sysvipc data structures y2038: mips: Extend sysvipc data structures y2038: arm64: Extend sysvipc compat data structures y2038: s390: Remove unneeded ipc uapi header files y2038: ia64: Remove unneeded ipc uapi header files y2038: alpha: Remove unneeded ipc uapi header files ...
2018-06-04Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds9-18/+62
Pull timers and timekeeping updates from Thomas Gleixner: - Core infrastucture work for Y2038 to address the COMPAT interfaces: + Add a new Y2038 safe __kernel_timespec and use it in the core code + Introduce config switches which allow to control the various compat mechanisms + Use the new config switch in the posix timer code to control the 32bit compat syscall implementation. - Prevent bogus selection of CPU local clocksources which causes an endless reselection loop - Remove the extra kthread in the clocksource code which has no value and just adds another level of indirection - The usual bunch of trivial updates, cleanups and fixlets all over the place - More SPDX conversions * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) clocksource/drivers/mxs_timer: Switch to SPDX identifier clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier clocksource/drivers/timer-imx-gpt: Remove outdated file path clocksource/drivers/arc_timer: Add comments about locking while read GFRC clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages clocksource/drivers/sprd: Fix Kconfig dependency clocksource: Move inline keyword to the beginning of function declarations timer_list: Remove unused function pointer typedef timers: Adjust a kernel-doc comment tick: Prefer a lower rating device only if it's CPU local device clocksource: Remove kthread time: Change nanosleep to safe __kernel_* types time: Change types to new y2038 safe __kernel_* types time: Fix get_timespec64() for y2038 safe compat interfaces time: Add new y2038 safe __kernel_timespec posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME time: Introduce CONFIG_COMPAT_32BIT_TIME time: Introduce CONFIG_64BIT_TIME in architectures compat: Enable compat_get/put_timespec64 always ...
2018-06-04Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds8-19/+30
Pull irq updates from Thomas Gleixner: - Consolidation of softirq pending: The softirq mask and its accessors/mutators have many implementations scattered around many architectures. Most do the same things consisting in a field in a per-cpu struct (often irq_cpustat_t) accessed through per-cpu ops. We can provide instead a generic efficient version that most of them can use. In fact s390 is the only exception because the field is stored in lowcore. - Support for level!?! triggered MSI (ARM) Over the past couple of years, we've seen some SoCs coming up with ways of signalling level interrupts using a new flavor of MSIs, where the MSI controller uses two distinct messages: one that raises a virtual line, and one that lowers it. The target MSI controller is in charge of maintaining the state of the line. This allows for a much simplified HW signal routing (no need to have hundreds of discrete lines to signal level interrupts if you already have a memory bus), but results in a departure from the current idea the kernel has of MSIs. - Support for Meson-AXG GPIO irqchip - Large stm32 irqchip rework (suspend/resume, hierarchical domains) - More SPDX conversions * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) ARM: dts: stm32: Add exti support to stm32mp157 pinctrl ARM: dts: stm32: Add exti support for stm32mp157c pinctrl/stm32: Add irq_eoi for stm32gpio irqchip irqchip/stm32: Add suspend/resume support for hierarchy domain irqchip/stm32: Add stm32mp1 support with hierarchy domain irqchip/stm32: Prepare common functions irqchip/stm32: Add host and driver data structures irqchip/stm32: Add suspend support irqchip/stm32: Add falling pending register support irqchip/stm32: Checkpatch fix irqchip/stm32: Optimizes and cleans up stm32-exti irq_domain irqchip/meson-gpio: Add support for Meson-AXG SoCs dt-bindings: interrupt-controller: New binding for Meson-AXG SoC dt-bindings: interrupt-controller: Fix the double quotes softirq/s390: Move default mutators of overwritten softirq mask to s390 softirq/x86: Switch to generic local_softirq_pending() implementation softirq/sparc: Switch to generic local_softirq_pending() implementation softirq/powerpc: Switch to generic local_softirq_pending() implementation softirq/parisc: Switch to generic local_softirq_pending() implementation softirq/ia64: Switch to generic local_softirq_pending() implementation ...
2018-06-04Merge branch 'x86-dax-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-2/+17
Pull x86 dax updates from Ingo Molnar: "This contains x86 memcpy_mcsafe() fault handling improvements the nvdimm tree would like to make more use of" * 'x86-dax-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe() x86/asm/memcpy_mcsafe: Add write-protection-fault handling x86/asm/memcpy_mcsafe: Return bytes remaining x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling x86/asm/memcpy_mcsafe: Remove loop unrolling
2018-06-04Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-0/+2
Pull scheduler updates from Ingo Molnar: - power-aware scheduling improvements (Patrick Bellasi) - NUMA balancing improvements (Mel Gorman) - vCPU scheduling fixes (Rohit Jain) * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Update util_est before updating schedutil sched/cpufreq: Modify aggregate utilization to always include blocked FAIR utilization sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu() sched/wait: Include <linux/wait.h> in <linux/swait.h> sched/numa: Stagger NUMA balancing scan periods for new threads sched/core: Don't schedule threads on pre-empted vCPUs sched/fair: Avoid calling sync_entity_load_avg() unnecessarily sched/fair: Rearrange select_task_rq_fair() to optimize it
2018-06-04Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+9
Pull perf updates from Ingo Molnar: "Kernel side changes: - x86 Intel uncore driver cleanups and enhancements (Kan Liang) - group scheduling and other fixes (Song Liu - store frame pointer in the sample traces for better profiling (Alexey Budankov) - compat fixes/enhancements (Eugene Syromiatnikov) Tooling side changes, which you can build and install in a single step via: make -C tools/perf clean install perf annotate: - Support 'perf annotate --group' for non-explicit recorded event "groups", showing multiple columns, one for each event, just like when dealing with explicit event groups (those enclosed with {}) (Jin Yao) - Record min/max LBR cycles (>= Skylake) and add 'perf annotate' TUI hotkey to show it (c) (Jin Yao) perf bpf: - Add infrastructure to help in writing eBPF C programs to be used with '-e name.c' type events in tools such as 'record' and 'trace', with headers for common constructs and an examples directory that will get populated as we add more such helpers and the 'perf bpf' (Arnaldo Carvalho de Melo) perf stat: - Display time in precision based on std deviation (Jiri Olsa) - Add --table option to display time of each run (Jiri Olsa) - Display length strings of each run for --table option (Jiri Olsa) perf buildid-cache: - Add --list and --purge-all options (Ravi Bangoria) perf test: - Let 'perf test list' display subtests (Hendrik Brueckner) perf pti: - Create extra kernel maps to help in decoding samples in x86 PTI entry trampolines (Adrian Hunter) - Copy x86 PTI entry trampoline sections in the kcore copy used for annotation and intel_pt CPU traces decoding (Adrian Hunter) ... and a lot of other fixes, enhancements and cleanups I did not list, see the shortlog and git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits) perf/x86/intel/uncore: Clean up client IMC uncore perf/x86/intel/uncore: Expose uncore_pmu_event*() functions perf/x86/intel/uncore: Support IIO free-running counters on SKX perf/x86/intel/uncore: Add infrastructure for free running counters perf/x86/intel/uncore: Add new data structures for free running counters perf/x86/intel/uncore: Correct fixed counter index check in generic code perf/x86/intel/uncore: Correct fixed counter index check for NHM perf/x86/intel/uncore: Introduce customized event_read() for client IMC uncore perf/x86: Store user space frame-pointer value on a sample perf/core: Wire up compat PERF_EVENT_IOC_QUERY_BPF, PERF_EVENT_IOC_MODIFY_ATTRIBUTES perf/core: Fix bad use of igrab() perf/core: Fix group scheduling with mixed hw and sw events perf kcore_copy: Amend the offset of sections that remap kernel text perf kcore_copy: Copy x86 PTI entry trampoline sections perf kcore_copy: Get rid of kernel_map perf kcore_copy: Iterate phdrs perf kcore_copy: Layout sections perf kcore_copy: Calculate offset from phnum perf kcore_copy: Keep a count of phdrs perf kcore_copy: Keep phdr data in a list ...
2018-06-04Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds8-15/+92
Pull locking updates from Ingo Molnar: - Lots of tidying up changes all across the map for Linux's formal memory/locking-model tooling, by Alan Stern, Akira Yokosawa, Andrea Parri, Paul E. McKenney and SeongJae Park. Notable changes beyond an overall update in the tooling itself is the tidying up of spin_is_locked() semantics, which spills over into the kernel proper as well. - qspinlock improvements: the locking algorithm now guarantees forward progress whereas the previous implementation in mainline could starve threads indefinitely in cmpxchg() loops. Also other related cleanups to the qspinlock code (Will Deacon) - misc smaller improvements, cleanups and fixes all across the locking subsystem * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) locking/rwsem: Simplify the is-owner-spinnable checks tools/memory-model: Add reference for 'Simplifying ARM concurrency' tools/memory-model: Update ASPLOS information MAINTAINERS, tools/memory-model: Update e-mail address for Andrea Parri tools/memory-model: Fix coding style in 'lock.cat' tools/memory-model: Remove out-of-date comments and code from lock.cat tools/memory-model: Improve mixed-access checking in lock.cat tools/memory-model: Improve comments in lock.cat tools/memory-model: Remove duplicated code from lock.cat tools/memory-model: Flag "cumulativity" and "propagation" tests tools/memory-model: Add model support for spin_is_locked() tools/memory-model: Add scripts to test memory model tools/memory-model: Fix coding style in 'linux-kernel.def' tools/memory-model: Model 'smp_store_mb()' tools/memory-order: Update the cheat-sheet to show that smp_mb__after_atomic() orders later RMW operations tools/memory-order: Improve key for SELF and SV tools/memory-model: Fix cheat sheet typo tools/memory-model: Update required version of herdtools7 tools/memory-model: Redefine rb in terms of rcu-fence tools/memory-model: Rename link and rcu-path to rcu-link and rb ...
2018-06-04Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-4/+6
Pull EFI updates from Ingo Molnar: - decode x86 CPER data (Yazen Ghannam) - ignore unrealistically large option ROMs (Hans de Goede) - initialize UEFI secure boot state during Xen dom0 boot (Daniel Kiper) - additional minor tweaks and fixes. * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/capsule-loader: Don't output reset log when reset flags are not set efi/x86: Ignore unrealistically large option ROMs efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function efi: Align efi_pci_io_protocol typedefs to type naming convention efi/libstub/tpm: Make function efi_retrieve_tpm2_eventlog_1_2() static efi: Decode IA32/X64 Context Info structure efi: Decode IA32/X64 MS Check structure efi: Decode additional IA32/X64 Bus Check fields efi: Decode IA32/X64 Cache, TLB, and Bus Check structures efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs efi: Decode IA32/X64 Processor Error Info Structure efi: Decode IA32/X64 Processor Error Section efi: Fix IA32/X64 Processor Error Record definition efi/cper: Remove the INDENT_SP silliness x86/xen/efi: Initialize UEFI secure boot state during dom0 boot
2018-06-04Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-18/+47
Pull RCU updates from Ingo Molnar: - updates to the handling of expedited grace periods - updates to reduce lock contention in the rcu_node combining tree [ These are in preparation for the consolidation of RCU-bh, RCU-preempt, and RCU-sched into a single flavor, which was requested by Linus in response to a security flaw whose root cause included confusion between the multiple flavors of RCU ] - torture-test updates that save their users some time and effort - miscellaneous fixes * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) rcu/x86: Provide early rcu_cpu_starting() callback torture: Make kvm-find-errors.sh find build warnings rcutorture: Abbreviate kvm.sh summary lines rcutorture: Print end-of-test state in kvm.sh summary rcutorture: Print end-of-test state torture: Fold parse-torture.sh into parse-console.sh torture: Add a script to edit output from failed runs rcu: Update list of rcu_future_grace_period() trace events rcu: Drop early GP request check from rcu_gp_kthread() rcu: Simplify and inline cpu_needs_another_gp() rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp() rcu: Make rcu_start_this_gp() check for out-of-range requests rcu: Add funnel locking to rcu_start_this_gp() rcu: Make rcu_start_future_gp() caller select grace period rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp() rcu: Clear request other than RCU_GP_FLAG_INIT at GP end rcu: Cleanup, don't put ->completed into an int rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs() rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition rcu: Make rcu_migrate_callbacks wake GP kthread when needed ...
2018-06-04Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds5-3/+11
Pull siginfo updates from Eric Biederman: "This set of changes close the known issues with setting si_code to an invalid value, and with not fully initializing struct siginfo. There remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64 and x86 to get the code that generates siginfo into a simpler and more maintainable state. Most of that work involves refactoring the signal handling code and thus careful code review. Also not included is the work to shrink the in kernel version of struct siginfo. That depends on getting the number of places that directly manipulate struct siginfo under control, as it requires the introduction of struct kernel_siginfo for the in kernel things. Overall this set of changes looks like it is making good progress, and with a little luck I will be wrapping up the siginfo work next development cycle" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) signal/sh: Stop gcc warning about an impossible case in do_divide_error signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions signal/um: More carefully relay signals in relay_signal. signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code signal/signalfd: Add support for SIGSYS signal/signalfd: Remove __put_user from signalfd_copyinfo signal/xtensa: Use force_sig_fault where appropriate signal/xtensa: Consistenly use SIGBUS in do_unaligned_user signal/um: Use force_sig_fault where appropriate signal/sparc: Use force_sig_fault where appropriate signal/sparc: Use send_sig_fault where appropriate signal/sh: Use force_sig_fault where appropriate signal/s390: Use force_sig_fault where appropriate signal/riscv: Replace do_trap_siginfo with force_sig_fault signal/riscv: Use force_sig_fault where appropriate signal/parisc: Use force_sig_fault where appropriate signal/parisc: Use force_sig_mceerr where appropriate signal/openrisc: Use force_sig_fault where appropriate signal/nios2: Use force_sig_fault where appropriate ...
2018-06-04Merge tag 'for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds2-139/+281
Pull btrfs updates from David Sterba: "User visible features: - added support for the ioctl FS_IOC_FSGETXATTR, per-inode flags, successor of GET/SETFLAGS; now supports only existing flags: append, immutable, noatime, nodump, sync - 3 new unprivileged ioctls to allow users to enumerate subvolumes - dedupe syscall implementation does not restrict the range to 16MiB, though it still splits the whole range to 16MiB chunks - on user demand, rmdir() is able to delete an empty subvolume, export the capability in sysfs - fix inode number types in tracepoints, other cleanups - send: improved speed when dealing with a large removed directory, measurements show decrease from 2000 minutes to 2 minutes on a directory with 2 million entries - pre-commit check of superblock to detect a mysterious in-memory corruption - log message updates Other changes: - orphan inode cleanup improved, does no keep long-standing reservations that could lead up to early ENOSPC in some cases - slight improvement of handling snapshotted NOCOW files by avoiding some unnecessary tree searches - avoid OOM when dealing with many unmergeable small extents at flush time - speedup conversion of free space tree representations from/to bitmap/tree - code refactoring, deletion, cleanups: + delayed refs + delayed iput + redundant argument removals + memory barrier cleanups + remove a redundant mutex supposedly excluding several ioctls to run in parallel - new tracepoints for blockgroup manipulation - more sanity checks of compressed headers" * tag 'for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (183 commits) btrfs: Add unprivileged version of ino_lookup ioctl btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF btrfs: Add unprivileged ioctl which returns subvolume information Btrfs: clean up error handling in btrfs_truncate() btrfs: Factor out write portion of btrfs_get_blocks_direct btrfs: Factor out read portion of btrfs_get_blocks_direct btrfs: return ENOMEM if path allocation fails in btrfs_cross_ref_exist btrfs: raid56: Remove VLA usage btrfs: return error value if create_io_em failed in cow_file_range btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshot btrfs: drop unused parameter qgroup_reserved btrfs: balance dirty metadata pages in btrfs_finish_ordered_io btrfs: lift some btrfs_cross_ref_exist checks in nocow path btrfs: Remove fs_info argument from btrfs_uuid_tree_rem btrfs: Remove fs_info argument from btrfs_uuid_tree_add Btrfs: remove unused check of skip_locking Btrfs: remove always true check in unlock_up Btrfs: grab write lock directly if write_lock_level is the max level Btrfs: move get root out of btrfs_search_slot to a helper Btrfs: use more straightforward extent_buffer_uptodate check ...
2018-06-04Merge branch 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds18-27/+60
Pull aio updates from Al Viro: "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly. The only thing I'm holding back for a day or so is Adam's aio ioprio - his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case), but let it sit in -next for decency sake..." * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) aio: sanitize the limit checking in io_submit(2) aio: fold do_io_submit() into callers aio: shift copyin of iocb into io_submit_one() aio_read_events_ring(): make a bit more readable aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way aio: take list removal to (some) callers of aio_complete() aio: add missing break for the IOCB_CMD_FDSYNC case random: convert to ->poll_mask timerfd: convert to ->poll_mask eventfd: switch to ->poll_mask pipe: convert to ->poll_mask crypto: af_alg: convert to ->poll_mask net/rxrpc: convert to ->poll_mask net/iucv: convert to ->poll_mask net/phonet: convert to ->poll_mask net/nfc: convert to ->poll_mask net/caif: convert to ->poll_mask net/bluetooth: convert to ->poll_mask net/sctp: convert to ->poll_mask net/tipc: convert to ->poll_mask ...
2018-06-04Merge tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linuxLinus Torvalds1-1/+1
Pull fasync fix from Jeff Layton: "Just a single fix for a deadlock in the fasync handling code that Kirill observed while testing. The fix is to change the fa_lock to be rwlock_t, and use a read lock in kill_fasync_rcu" * tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fasync: Fix deadlock between task-context and interrupt-context kill_fasync()
2018-06-04Merge tag 'docs-4.18' of git://git.lwn.net/linuxLinus Torvalds5-7/+45
Pull documentation updates from Jonathan Corbet: "There's been a fair amount of work in the docs tree this time around, including: - Extensive RST conversions and organizational work in the memory-management docs thanks to Mike Rapoport. - An update of Documentation/features from Andrea Parri and a script to keep it updated. - Various LICENSES updates from Thomas, along with a script to check SPDX tags. - Work to fix dangling references to documentation files; this involved a fair number of one-liner comment changes outside of Documentation/ ... and the usual list of documentation improvements, typo fixes, etc" * tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits) Documentation: document hung_task_panic kernel parameter docs/admin-guide/mm: add high level concepts overview docs/vm: move ksm and transhuge from "user" to "internals" section. docs: Use the kerneldoc comments for memalloc_no*() doc: document scope NOFS, NOIO APIs docs: update kernel versions and dates in tables docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge docs/vm: transhuge: minor updates docs/vm: transhuge: change sections order Documentation: arm: clean up Marvell Berlin family info Documentation: gpio: driver: Fix a typo and some odd grammar docs: ranoops.rst: fix location of ramoops.txt scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode docs: uio-howto.rst: use a code block to solve a warning mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback w1: w1_io.c: fix a kernel-doc warning Documentation/process/posting: wrap text at 80 cols docs: admin-guide: add cgroup-v2 documentation Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'" Documentation: refcount-vs-atomic: Update reference to LKMM doc. ...
2018-06-04swait: strengthen language to discourage useLinus Torvalds1-1/+13
We already earlier discouraged people from using this interface in commit 88796e7e5c45 ("sched/swait: Document it clearly that the swait facilities are special and shouldn't be used"), but I just got a pull request with a new broken user. So make the comment *really* clear. The swait interfaces are bad, and should not be used unless you have some *very* strong reasons that include tons of hard performance numbers on just why you want to use them, and you show that you actually understand that they aren't at all like the normal wait/wakeup interfaces. So far, every single user has been suspect. The main user is KVM, which is completely pointless (there is only ever one waiter, which avoids the interface subtleties, but also means that having a queue instead of a pointer is counter-productive and certainly not an "optimization"). So make the comments much stronger. Not that anybody likely reads them anyway, but there's always some slight hope that it will cause somebody to think twice. I'd like to remove this interface entirely, but there is the theoretical possibility that it's actually the right thing to use in some situation, most likely some deep RT use. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-04Merge tag 'regmap-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmapLinus Torvalds1-1/+18
Pull regmap updates from Mark Brown: "This is another quiet release for regmap, there's one minor feature improvement for the recently added slimbus support and a few minor fixes and cleanups" * tag 'regmap-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: slimbus: allow register offsets up to 16 bits regmap: add missing prototype for devm_init_slimbus regmap: Skip clk_put for attached clocks when freeing context regmap: include <linux/ktime.h> from include/linux/regmap.h
2018-06-04Merge tag 'spi-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spiLinus Torvalds2-53/+256
Pull spi updates from Mark Brown: "Quite a busy release for SPI, mainly as a result of Boris Brezillon's work on improving the integration with MTD for accelerated SPI flash controllers. He's added a new spi_mem interface which works a lot better with general hardware and converted the users over to it, as a result of this work we've got some MTD changes in here as well. Other highlights include: - Lots of spring cleaning for the s3c64xx driver. - Removal of the bcm53xx, the hardware is also supported by the mspi driver but SoC naming had caused people to miss the duplication. - Conversion of the pxa2xx driver to use the standard message processing loop rather than open coding. - A bunch of improvements to the runtime PM of the OMAP McSPI driver" * tag 'spi-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (47 commits) spi: Fix typo on SPI_MEM help text spi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCAC mtd: devices: m25p80: Use spi_mem_set_drvdata() instead of spi_set_drvdata() spi: omap2-mcspi: Remove unnecessary pm_runtime_force_suspend() spi: Add missing pm_runtime_put_noidle() after failed get spi: ti-qspi: Make sure res_mmap != NULL before dereferencing it spi: spi-s3c64xx: Fix system resume support spi: bcm-qspi: Fix build failure caused by spi_flash_read() API removal spi: Get rid of the spi_flash_read() API mtd: spi-nor: Use the spi_mem_xx() API spi: ti-qspi: Implement the spi_mem interface spi: bcm-qspi: Implement the spi_mem interface spi: Make support for regular transfers optional when ->mem_ops != NULL spi: Extend the core to ease integration of SPI memory controllers spi: remove forgotten CONFIG_SPI_BCM53XX spi: remove the older/duplicated bcm53xx driver spi: pxa2xx: check clk_prepare_enable() return value spi: lpspi: Switch to SPDX identifier spi: mxs: Switch to SPDX identifier spi: imx: Switch to SPDX identifier ...
2018-06-04Merge tag 'chrome-platform-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platformLinus Torvalds1-0/+2
Pull chrome platform updates from Benson Leung: - further changes from Dmitry related to the removal of platform data from atmel_mxt_ts and chromeos_laptop. This time, we have some changes that teach chromeos_laptop how to supply acpi properties for some input devices so that the peripheral driver doesn't have to do dmi matching on some Chromebook platforms. - new Chromebook Tablet switch driver, which is useful for x86 convertible Chromebooks. - other misc cleanup * tag 'chrome-platform-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform: platform/chrome: Use to_cros_ec_dev more broadly platform/chrome: chromeos_laptop: fix touchpad button mapping on Celes platform: chrome: Add input dependency for tablet switch driver platform/chrome: chromeos_laptop - supply properties for ACPI devices platform/chrome: chromeos_tbmc - add SPDX identifier platform: chrome: Add Tablet Switch ACPI driver platform/chrome: cros_ec_lpc: do not try DMI match when ACPI device found
2018-06-04Merge tag 'hwmon-for-linus-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-stagingLinus Torvalds1-0/+2
Pull hwmon updates from Guenter Roeck: - asus_atk0110 driver modified to use new API - k10temp supports new CPUs and reports both Tctl and Tdie - minor fixes in gpio-fan, ltc2990, fschmd, and mc13783 drivers * tag 'hwmon-for-linus-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (asus_atk0110) Make use of device managed memory hwmon: (asus_atk0110) Replace deprecated device register call hwmon: (k10temp) Make function get_raw_temp static hwmon: (gpio-fan) Fix "#cooling-cells" property name in bindings MAINTAINERS: hwmon: Add Documentation/devicetree/bindings/hwmon hwmon: (ltc2990) support all measurement modes hwmon: (ltc2990) add devicetree binding hwmon: (ltc2990) Fix incorrect conversion of negative temperatures hwmon: (core) check parent dev != NULL when chip != NULL hwmon: (fschmd) fix typo 'can by' to 'can be' hwmon: (k10temp) Display both Tctl and Tdie hwmon: (k10temp) Add support for Stoney Ridge and Bristol Ridge CPUs hwmon: MC13783: Add uid and die temperature sensor inputs
2018-06-04Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds13-95/+92
Pull dma-mapping updates from Christoph Hellwig: - replace the force_dma flag with a dma_configure bus method. (Nipun Gupta, although one patch is іncorrectly attributed to me due to a git rebase bug) - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai) - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the right thing for bounce buffering. - move dma-debug initialization to common code, and apply a few cleanups to the dma-debug code. - cleanup the Kconfig mess around swiotlb selection - swiotlb comment fixup (Yisheng Xie) - a trivial swiotlb fix. (Dan Carpenter) - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt) - add a new generic dma-noncoherent dma_map_ops implementation and use it for arc, c6x and nds32. - improve scatterlist validity checking in dma-debug. (Robin Murphy) - add a struct device quirk to limit the dma-mask to 32-bit due to bridge/system issues, and switch x86 to use it instead of a local hack for VIA bridges. - handle devices without a dma_mask more gracefully in the dma-direct code. * tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits) dma-direct: don't crash on device without dma_mask nds32: use generic dma_noncoherent_ops nds32: implement the unmap_sg DMA operation nds32: consolidate DMA cache maintainance routines x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag x86/pci-dma: remove the explicit nodac and allowdac option x86/pci-dma: remove the experimental forcesac boot option Documentation/x86: remove a stray reference to pci-nommu.c core, dma-direct: add a flag 32-bit dma limits dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs dma-debug: check scatterlist segments c6x: use generic dma_noncoherent_ops arc: use generic dma_noncoherent_ops arc: fix arc_dma_{map,unmap}_page arc: fix arc_dma_sync_sg_for_{cpu,device} arc: simplify arc_dma_sync_single_for_{cpu,device} dma-mapping: provide a generic dma-noncoherent implementation dma-mapping: simplify Kconfig dependencies riscv: add swiotlb support riscv: only enable ZONE_DMA32 for 64-bit ...
2018-06-04Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds3-4/+1
Pull misc vfs updates from Al Viro: "Misc bits and pieces not fitting into anything more specific" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: delete unnecessary assignment in vfs_listxattr Documentation: filesystems: update filesystem locking documentation vfs: namei: use path_equal() in follow_dotdot() fs.h: fix outdated comment about file flags __inode_security_revalidate() never gets NULL opt_dentry make xattr_getsecurity() static vfat: simplify checks in vfat_lookup() get rid of dead code in d_find_alias() it's SB_BORN, not MS_BORN... msdos_rmdir(): kill BS comment remove rpc_rmdir() fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
2018-06-04Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds17-70/+96
Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_show ide: remove ide_driver_proc_write isdn: replace ->proc_fops with ->proc_show atm: switch to proc_create_seq_private atm: simplify procfs code bluetooth: switch to proc_create_seq_data netfilter/x_tables: switch to proc_create_seq_private netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data neigh: switch to proc_create_seq_data hostap: switch to proc_create_{seq,single}_data bonding: switch to proc_create_seq_data rtc/proc: switch to proc_create_single_data drbd: switch to proc_create_single resource: switch to proc_create_seq_data staging/rtl8192u: simplify procfs code jfs: simplify procfs code ...
2018-06-04Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-blockLinus Torvalds16-117/+202
Pull block updates from Jens Axboe: - clean up how we pass around gfp_t and blk_mq_req_flags_t (Christoph) - prepare us to defer scheduler attach (Christoph) - clean up drivers handling of bounce buffers (Christoph) - fix timeout handling corner cases (Christoph/Bart/Keith) - bcache fixes (Coly) - prep work for bcachefs and some block layer optimizations (Kent). - convert users of bio_sets to using embedded structs (Kent). - fixes for the BFQ io scheduler (Paolo/Davide/Filippo) - lightnvm fixes and improvements (Matias, with contributions from Hans and Javier) - adding discard throttling to blk-wbt (me) - sbitmap blk-mq-tag handling (me/Omar/Ming). - remove the sparc jsflash block driver, acked by DaveM. - Kyber scheduler improvement from Jianchao, making it more friendly wrt merging. - conversion of symbolic proc permissions to octal, from Joe Perches. Previously the block parts were a mix of both. - nbd fixes (Josef and Kevin Vigor) - unify how we handle the various kinds of timestamps that the block core and utility code uses (Omar) - three NVMe pull requests from Keith and Christoph, bringing AEN to feature completeness, file backed namespaces, cq/sq lock split, and various fixes - various little fixes and improvements all over the map * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits) blk-mq: update nr_requests when switching to 'none' scheduler block: don't use blocking queue entered for recursive bio submits dm-crypt: fix warning in shutdown path lightnvm: pblk: take bitmap alloc. out of critical section lightnvm: pblk: kick writer on new flush points lightnvm: pblk: only try to recover lines with written smeta lightnvm: pblk: remove unnecessary bio_get/put lightnvm: pblk: add possibility to set write buffer size manually lightnvm: fix partial read error path lightnvm: proper error handling for pblk_bio_add_pages lightnvm: pblk: fix smeta write error path lightnvm: pblk: garbage collect lines with failed writes lightnvm: pblk: rework write error recovery path lightnvm: pblk: remove dead function lightnvm: pass flag on graceful teardown to targets lightnvm: pblk: check for chunk size before allocating it lightnvm: pblk: remove unnecessary argument lightnvm: pblk: remove unnecessary indirection lightnvm: pblk: return NVM_ error on failed submission lightnvm: pblk: warn in case of corrupted write buffer ...
2018-06-02block: don't use blocking queue entered for recursive bio submitsJens Axboe1-0/+2
If we end up splitting a bio and the queue goes away between the initial submission and the later split submission, then we can block forever in blk_queue_enter() waiting for the reference to drop to zero. This will never happen, since we already hold a reference. Mark a split bio as already having entered the queue, so we can just use the live non-blocking queue enter variant. Thanks to Tetsuo Handa for the analysis. Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+2
Pull networking fixes from David Miller: 1) Infinite loop in _decode_session6(), from Eric Dumazet. 2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric Dumazet. 3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux. 4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki Makita. 5) Incorrect idr release in cls_flower, from Paul Blakey. 6) Probe error handling fix in davinci_emac, from Dan Carpenter. 7) Memory leak in XPS configuration, from Alexander Duyck. 8) Use after free with cloned sockets in kcm, from Kirill Tkhai. 9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas Dichtel. 10) Fix UAPI hole in bpf data structure for 32-bit compat applications, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) bpf: fix uapi hole for 32 bit compat applications net: usb: cdc_mbim: add flag FLAG_SEND_ZLP ip6_tunnel: remove magic mtu value 0xFFF8 ip_tunnel: restore binding to ifaces with a large mtu net: dsa: b53: Add BCM5389 support kcm: Fix use-after-free caused by clonned sockets net-sysfs: Fix memory leak in XPS configuration ixgbe: fix parsing of TC actions for HW offload net: ethernet: davinci_emac: fix error handling in probe() net/ncsi: Fix array size in dumpit handler cls_flower: Fix incorrect idr release when failing to modify rule net/sonic: Use dma_mapping_error() xfrm Fix potential error pointer dereference in xfrm_bundle_create. vhost_net: flush batched heads before trying to busy polling tun: Fix NULL pointer dereference in XDP redirect be2net: Fix error detection logic for BE3 net: qmi_wwan: Add Netgear Aircard 779S mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG atm: zatm: fix memcmp casting iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs ...
2018-06-02Merge tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-1/+1
Pull drm fixes from Dave Airlie: "A few final fixes: i915: - fix for potential Spectre vector in the new query uAPI - fix NULL pointer deref (FDO #106559) - DMI fix to hide LVDS for Radiant P845 (FDO #105468) amdgpu: - suspend/resume DC regression fix - underscan flicker fix on fiji - gamma setting fix after dpms omap: - fix oops regression core: - fix PSR timing dw-hdmi: - fix oops regression" * tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux: drm/amd/display: Update color props when modeset is required drm/amd/display: Make atomic-check validate underscan changes drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense drm/amd/display: Fix BUG_ON during CRTC atomic check update drm/i915/query: nospec expects no more than an unsigned long drm/i915/query: Protect tainted function pointer lookup drm/i915/lvds: Move acpi lid notification registration to registration phase drm/i915: Disable LVDS on Radiant P845 drm/omap: fix NULL deref crash with SDI displays drm/psr: Fix missed entry in PSR setup time table.
2018-06-02Merge tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds1-3/+3
Pull IIO driver fixes from Greg KH: "Here are some old IIO driver fixes that were sitting in my tree for a few weeks. Sorry about not getting them to you sooner. They fix a number of small IIO driver issues that have been reported. All of these have been in linux-next for a while with no reported problems" * tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: adc: select buffer for at91-sama5d2_adc iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels iio:kfifo_buf: check for uint overflow iio:buffer: make length types match kfifo types iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock iio: adc: stm32-dfsdm: fix successive oversampling settings iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
2018-06-01bpf: fix uapi hole for 32 bit compat applicationsDaniel Borkmann1-0/+2
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the case of struct bpf_map_info but also struct bpf_prog_info. In net-next commit b85fab0e67b ("bpf: Add gpl_compatible flag to struct bpf_prog_info") added a bitfield into it to expose some flags related to programs. Thus, add an unnamed __u32 bitfield for both so that alignment keeps the same in both 32 and 64 bit cases, and can be naturally extended from there as in b85fab0e67b. Before: # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ __u64 netns_dev; /* 44 8 */ __u64 netns_ino; /* 52 8 */ /* size: 64, cachelines: 1, members: 10 */ /* padding: 4 */ }; After (same as on 64 bit): # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ __u64 netns_dev; /* 48 8 */ __u64 netns_ino; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 10 */ /* sum members: 60, holes: 1, sum holes: 4 */ }; Reported-by: Dmitry V. Levin <ldv@altlinux.org> Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps") Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-01lightnvm: pass flag on graceful teardown to targetsJavier González1-1/+1
If the namespace is unregistered before the LightNVM target is removed (e.g., on hot unplug) it is too late for the target to store any metadata on the device - any attempt to write to the device will fail. In this case, pass on a "gracefull teardown" flag to the target to let it know when this happens. In the case of pblk, we pad the open line (close all open chunks) to improve data retention. In the event of an ungraceful shutdown, avoid this part and just clean up. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe1-2/+14
Pull NVMe changes from Christoph: "Below is another set of NVMe updates for 4.18. Besides the usual bug fixes this includes more feature completness in terms of AEN and log page handling on the target." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme: use the changed namespaces list log to clear ns data changed AENs nvme: mark nvme_queue_scan static nvme: submit AEN event configuration on startup nvmet: mask pending AENs nvmet: add AEN configuration support nvmet: implement the changed namespaces log nvmet: split log page implementation nvmet: add a new nvmet_zero_sgl helper nvme.h: add AEN configuration symbols nvme.h: add the changed namespace list log nvme.h: untangle AEN notice definitions nvmet: fix error return code in nvmet_file_ns_enable() nvmet: fix a typo in nvmet_file_ns_enable() nvme-fabrics: allow internal passthrough command on deleting controllers nvme-loop: add support for multiple ports nvme-pci: simplify __nvme_submit_cmd nvme-pci: Rate limit the nvme timeout warnings nvme: allow duplicate controller if prior controller being deleted
2018-06-01block: unexport elevator_init/exitChristoph Hellwig1-2/+0
These are only used by the block core. Also move the declarations to block/blk.h. Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01nvme.h: add AEN configuration symbolsHannes Reinecke1-0/+5
Signed-off-by: Hannes Reinecke <hare@suse.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-05-31nvme.h: add the changed namespace list logChristoph Hellwig1-0/+3
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-05-31nvme.h: untangle AEN notice definitionsChristoph Hellwig1-2/+6
Stop including the event type in the definitions for the notice type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-05-31Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar7-8/+14
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31btrfs: Add unprivileged version of ino_lookup ioctlTomohiro Misono1-0/+17
Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER to allow normal users to call "btrfs subvolume list/show" etc. in combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF. This can be used like BTRFS_IOC_INO_LOOKUP but the argument is different. This is because it always searches the fs/file tree correspoinding to the fd with which this ioctl is called and also returns the name of bottom subvolume. The main differences from original ino_lookup ioctl are: 1. Read + Exec permission will be checked using inode_permission() during path construction. -EACCES will be returned in case of failure. 2. Path construction will be stopped at the inode number which corresponds to the fd with which this ioctl is called. If constructed path does not exist under fd's inode, -EACCES will be returned. 3. The name of bottom subvolume is also searched and filled. Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1) bytes than ino_lookup ioctl because of space of subvolume's name. Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com> Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> [ style fixes ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REFTomohiro Misono1-0/+18
Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns ROOT_REF information of the subvolume containing this inode except the subvolume name (this is because to prevent potential name leak). The subvolume name will be gained by user version of ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check. The min id of root ref's subvolume to be searched is specified by @min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search ends, @min_id is set to the last searched root ref's subvolid + 1. Also, if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM, -EOVERFLOW is returned. Therefore the caller can just call this ioctl again without changing the argument to continue search. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com> Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> [ style fixes and struct item renames ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31btrfs: Add unprivileged ioctl which returns subvolume informationTomohiro Misono1-0/+62
Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns the information of subvolume containing this inode. (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.) Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com> Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> [ minor style fixes, update struct comments ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31Merge tag 'drm-misc-fixes-2018-05-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixesDave Airlie1-1/+1
dw-hdmi: Fix Oops regression from rc1 (Neil) Cc: Neil Armstrong <narmstrong@baylibre.com> * tag 'drm-misc-fixes-2018-05-30' of git://anongit.freedesktop.org/drm/drm-misc: drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
2018-05-30block: Drop bioset_create()Kent Overstreet1-4/+2
All users have been converted to bioset_init(), kill off the old API. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-30pktcdvd: convert to bioset_init()/mempool_init()Kent Overstreet1-1/+1
Convert pktcdvd to embedded bio sets. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-30block: convert bounce, q->bio_split to bioset_init()/mempool_init()Kent Overstreet2-1/+6
Convert the core block functionality to embedded bio sets. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-30platform/chrome: Use to_cros_ec_dev more broadlyGwendal Grignou1-0/+2
Move to_cros_ec_dev macro to cros_ec.h and use it when the private ec object is needed from device object. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Benson Leung <bleung@chromium.org>
2018-05-30drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_senseNeil Armstrong1-1/+1
The dw_hdmi_setup_rx_sense exported function should not use struct device to recover the dw-hdmi context using drvdata, but take struct dw_hdmi directly like other exported functions. This caused a regression using Meson DRM on S905X since v4.17-rc1 : Internal error: Oops: 96000007 [#1] PREEMPT SMP [...] CPU: 0 PID: 124 Comm: irq/32-dw_hdmi_ Not tainted 4.17.0-rc7 #2 Hardware name: Libre Technology CC (DT) [...] pc : osq_lock+0x54/0x188 lr : __mutex_lock.isra.0+0x74/0x530 [...] Process irq/32-dw_hdmi_ (pid: 124, stack limit = 0x00000000adf418cb) Call trace: osq_lock+0x54/0x188 __mutex_lock_slowpath+0x10/0x18 mutex_lock+0x30/0x38 __dw_hdmi_setup_rx_sense+0x28/0x98 dw_hdmi_setup_rx_sense+0x10/0x18 dw_hdmi_top_thread_irq+0x2c/0x50 irq_thread_fn+0x28/0x68 irq_thread+0x10c/0x1a0 kthread+0x128/0x130 ret_from_fork+0x10/0x18 Code: 34000964 d00050a2 51000484 9135c042 (f864d844) ---[ end trace 945641e1fbbc07da ]--- note: irq/32-dw_hdmi_[124] exited with preempt_count 1 genirq: exiting task "irq/32-dw_hdmi_" (124) is an active IRQ thread (irq 32) Fixes: eea034af90c6 ("drm/bridge/synopsys: dw-hdmi: don't clobber drvdata") Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Tested-by: Koen Kooi <koen@dominion.thruhere.net> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/1527673438-20643-1-git-send-email-narmstrong@baylibre.com
2018-05-30blk-mq: abstract out blk-mq-sched rq list iteration bio merge helperJens Axboe1-1/+2
No functional changes in this patch, just a prep patch for utilizing this in an IO scheduler. Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Omar Sandoval <osandov@fb.com>
2018-05-30Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcuIngo Molnar3-1/+2
Pull RCU fix from Paul E. McKenney: "This additional v4.18 pull request contains a single commit that fell through the cracks: Provide early rcu_cpu_starting() callback for the benefit of the x86/mtrr code, which needs RCU to be available on incoming CPUs earlier than has been the case in the past." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-29block: remove parent device reference from struct bsg_class_deviceChristoph Hellwig2-7/+2
Bsg holding a reference to the parent device may result in a crash if a bsg file handle is closed after the parent device driver has unloaded. Holding a reference is not really needed: the parent device must exist between bsg_register_queue and bsg_unregister_queue. Before the device goes away the caller does blk_cleanup_queue so that all in-flight requests to the device are gone and all new requests cannot pass beyond the queue. The queue itself is a refcounted object and it will stay alive with a bsg file. Based on analysis, previous patch and changelog from Anatoliy Glagolev. Reported-by: Anatoliy Glagolev <glagolig@gmail.com> Reviewed-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29block: don't print a message when the device went awayChristoph Hellwig1-1/+1
The information about a size change in this case just creates confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>