aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-04-16selftests/harness: Add 30 second timeout per testKees Cook1-0/+2
In order to keep tests from hanging forever, this adds an alarm signal to each test run. This assumes an individual test doesn't take longer than 30 seconds. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-04-16selftests/seccomp: Handle namespace failures gracefullyKees Cook1-20/+23
When running without USERNS or PIDNS the seccomp test would hang since it was waiting forever for the child to trigger the user notification since it seems the glibc() abort handler makes a call to getpid(), which would trap again. This changes the getpid filter to getppid, and makes sure ASSERTs execute to stop from spawning the listener. Reported-by: Shuah Khan <shuah@kernel.org> Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Cc: stable@vger.kernel.org # > 5.0 Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-04-08selftests: cgroup: fix cleanup path in test_memcg_subtree_control()Roman Gushchin1-17/+21
Dan reported, that cleanup path in test_memcg_subtree_control() triggers a static checker warning: ./tools/testing/selftests/cgroup/test_memcontrol.c:76 \ test_memcg_subtree_control() error: uninitialized symbol 'child2'. Fix this by initializing child2 and parent2 variables and split the cleanup path into few stages. Signed-off-by: Roman Gushchin <guro@fb.com> Fixes: 84092dbcf901 ("selftests: cgroup: add memory controller self-tests") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08selftests: efivarfs: remove the test_create_read file if it was existZhangXiaoxu1-0/+4
After the first run, the test case 'test_create_read' will always fail because the file is exist and file's attr is 'S_IMMUTABLE', open with 'O_RDWR' will always return -EPERM. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08rseq/selftests: Adapt number of threads to the number of detected cpusMathieu Desnoyers1-2/+5
On smaller systems, running a test with 200 threads can take a long time on machines with smaller number of CPUs. Detect the number of online cpus at test runtime, and multiply that by 6 to have 6 rseq threads per cpu preempting each other. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08lib: Add test module for strscpy_padTobin C. Harding6-1/+159
Add a test module for the new strscpy_pad() function. Tie it into the kselftest infrastructure for lib/ tests. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08lib/string: Add strscpy_pad() functionTobin C. Harding2-7/+44
We have a function to copy strings safely and we have a function to copy strings and zero the tail of the destination (if source string is shorter than destination buffer) but we do not have a function to do both at once. This means developers must write this themselves if they desire this functionality. This is a chore, and also leaves us open to off by one errors unnecessarily. Add a function that calls strscpy() then memset()s the tail to zero if the source string is shorter than the destination buffer. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08lib: Use new kselftest headerTobin C. Harding2-34/+9
We just added a new C header file for use with test modules that are intended to be run with kselftest. We can reduce code duplication by using this header. Use new kselftest header to reduce code duplication in test_printf and test_bitmap test modules. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08kselftest: Add test module framework headerTobin C. Harding2-2/+140
kselftest runs as a userspace process. Sometimes we need to test things from kernel space. One way of doing this is by creating a test module. Currently doing so requires developers to write a bunch of boiler plate in the module if kselftest is to be used to run the tests. This means we currently have a load of duplicate code to achieve these ends. If we have a uniform method for implementing test modules then we can reduce code duplication, ensure uniformity in the test framework, ease code maintenance, and reduce the work required to create tests. This all helps to encourage developers to write and run tests. Add a C header file that can be included in test modules. This provides a single point for common test functions/macros. Implement a few macros that make up the start of the test framework. Add documentation for new kselftest header to kselftest documentation. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08kselftest: Add test runner creation scriptTobin C. Harding4-50/+88
Currently if we wish to use kselftest to run tests within a kernel module we write a small script to load/unload and do error reporting. There are a bunch of these under tools/testing/selftests/lib/ that are all identical except for the test name. We can reduce code duplication and improve maintainability if we have one version of this. However kselftest requires an executable for each test. We can move all the script logic to a central script then have each individual test script call the main script. Oneliner to call kselftest_module.sh courtesy of Kees, thanks! Add test runner creation script. Convert tools/testing/selftests/lib/*.sh to use new test creation script. Testing ------- Configure kselftests for lib/ then build and boot kernel. Then run kselftests as follows: $ cd /path/to/kernel/tree $ sudo make O=$output_path -C tools/testing/selftests TARGETS="lib" run_tests and also $ cd /path/to/kernel/tree $ cd tools/testing/selftests $ sudo make O=$output_path TARGETS="lib" run_tests and also $ cd /path/to/kernel/tree $ cd tools/testing/selftests $ sudo make TARGETS="lib" run_tests Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08lib/test_printf: Add empty module_exit functionTobin C. Harding1-0/+6
Currently the test_printf module does not have an exit function, this prevents the module from being unloaded. If we cannot unload the module we cannot run the tests a second time. Add an empty exit function. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08selftest/gpio: Remove duplicate headerSabyasachi Gupta1-1/+0
Remove duplicate header which are included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08selftest/rseq: Remove duplicate headerSabyasachi Gupta1-1/+0
Remove duplicate header which is included twice Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08selftest/timers: Remove duplicate headerSabyasachi Gupta1-1/+0
Remove duplicate header which is included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-08selftest/x86/mpx-dig.c: Remove duplicate headerSabyasachi Gupta1-2/+0
Remove duplicate header which is included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Shuah Khan <shuah@kernel.org>
2019-04-07Linux 5.1-rc4Linus Torvalds1-1/+1
2019-04-07ARM: milbeaut: fix build with !CONFIG_HOTPLUG_CPUArnd Bergmann1-0/+4
When HOTPLUG_CPU is disabled, some fields in the smp operations are not available or needed: arch/arm/mach-milbeaut/platsmp.c:90:3: error: field designator 'cpu_die' does not refer to any field in type 'struct smp_operations' .cpu_die = m10v_cpu_die, ^ arch/arm/mach-milbeaut/platsmp.c:91:3: error: field designator 'cpu_kill' does not refer to any field in type 'struct smp_operations' .cpu_kill = m10v_cpu_kill, ^ Hide them in an #ifdef like the other platforms do. Fixes: 9fb29c734f9e ("ARM: milbeaut: Add basic support for Milbeaut m10v SoC") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-07ARM: iop: don't use using 64-bit DMA masksArnd Bergmann3-12/+12
clang warns about statically defined DMA masks from the DMA_BIT_MASK macro with length 64: arch/arm/mach-iop13xx/setup.c:303:35: error: shift count >= width of type [-Werror,-Wshift-count-overflow] static u64 iop13xx_adma_dmamask = DMA_BIT_MASK(64); ^~~~~~~~~~~~~~~~ include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK' #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) ^ ~~~ The ones in iop shouldn't really be 64 bit masks, so changing them to what the driver can support avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-07ARM: orion: don't use using 64-bit DMA masksArnd Bergmann1-2/+2
clang warns about statically defined DMA masks from the DMA_BIT_MASK macro with length 64: arch/arm/plat-orion/common.c:625:29: error: shift count >= width of type [-Werror,-Wshift-count-overflow] .coherent_dma_mask = DMA_BIT_MASK(64), ^~~~~~~~~~~~~~~~ include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK' #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) The ones in orion shouldn't really be 64 bit masks, so changing them to what the driver can support avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-07Revert "ARM: dts: nomadik: Fix polarity of SPI CS"Olof Johansson1-5/+4
This reverts commit fa9463564e77067df81b0b8dec91adbbbc47bfb4. Per Linus Walleij: Dear ARM SoC maintainers, can you please revert this patch. It was the wrong solution to the wrong problem, and I must have acted in stress. Andrey fixed the real bug in a proper way in these commits: commit e5545c94e43b8f6599ffc01df8d1aedf18ee912a "gpio: of: Check propname before applying "cs-gpios" quirks" commit 7ce40277bf848391705011ba37eac2e377cbd9e6 "gpio: of: Check for "spi-cs-high" in child instead of parent node" Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-07dt-bindings: cpu: Fix JSON schemaMaxime Ripard1-1/+1
Commit fd73403a4862 ("dt-bindings: arm: Add SMP enable-method for Milbeaut") added support for a new cpu enable-method, but did so using tabulations to ident. This is however invalid in the syntax, and resulted in a failure when trying to use that schemas for validation. Use spaces instead of tabs to indent to fix this. Fixes: fd73403a4862 ("dt-bindings: arm: Add SMP enable-method for Milbeaut") Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Sugaya Taichi <sugaya.taichi@socionext.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-06parisc: Detect QEMU earlier in boot processHelge Deller2-6/+3
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before. This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding. Fixes: 310d82784fb4 ("parisc: qemu idle sleep support") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v4.14+
2019-04-06parisc: also set iaoq_b in instruction_pointer_set()Sven Schnelle1-1/+2
When setting the instruction pointer on PA-RISC we also need to set the back of the instruction queue to the new offset, otherwise we will execute on instruction from the new location, and jumping back to the old location stored in iaoq_b. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 75ebedf1d263 ("parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature") Cc: stable@vger.kernel.org # 4.19+
2019-04-06parisc: regs_return_value() should return gpr28Sven Schnelle1-1/+1
While working on kretprobes for PA-RISC I was wondering while the kprobes sanity test always fails on kretprobes. This is caused by returning gpr20 instead of gpr28. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # 4.14+
2019-04-06Revert: parisc: Use F_EXTEND() macro in iosapic codeHelge Deller1-1/+5
Revert parts of commit 97d7e2e3fd8a ("parisc: Use F_EXTEND() macro in iosapic code"). It breaks booting the 32-bit kernel on some machines. Reported-by: Sven Schnelle <svens@stackframe.org> Tested-by: Sven Schnelle <svens@stackframe.org> Fixes: 97d7e2e3fd8a ("parisc: Use F_EXTEND() macro in iosapic code") Signed-off-by: Helge Deller <deller@gmx.de>
2019-04-06fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlockKirill Smelkov5-5/+389
Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added locking for file.f_pos access and in particular made concurrent read and write not possible - now both those functions take f_pos lock for the whole run, and so if e.g. a read is blocked waiting for data, write will deadlock waiting for that read to complete. This caused regression for stream-like files where previously read and write could run simultaneously, but after that patch could not do so anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") which fixes such regression for particular case of /proc/xen/xenbus. The patch that added f_pos lock in 2014 did so to guarantee POSIX thread safety for read/write/lseek and added the locking to file descriptors of all regular files. In 2014 that thread-safety problem was not new as it was already discussed earlier in 2006. However even though 2006'th version of Linus's patch was adding f_pos locking "only for files that are marked seekable with FMODE_LSEEK (thus avoiding the stream-like objects like pipes and sockets)", the 2014 version - the one that actually made it into the tree as 9c225f2655e3 - is doing so irregardless of whether a file is seekable or not. See https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/ https://lwn.net/Articles/180387 https://lwn.net/Articles/180396 for historic context. The reason that it did so is, probably, that there are many files that are marked non-seekable, but e.g. their read implementation actually depends on knowing current position to correctly handle the read. Some examples: kernel/power/user.c snapshot_read fs/debugfs/file.c u32_array_read fs/fuse/control.c fuse_conn_waiting_read + ... drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read arch/s390/hypfs/inode.c hypfs_read_iter ... Despite that, many nonseekable_open users implement read and write with pure stream semantics - they don't depend on passed ppos at all. And for those cases where read could wait for something inside, it creates a situation similar to xenbus - the write could be never made to go until read is done, and read is waiting for some, potentially external, event, for potentially unbounded time -> deadlock. Besides xenbus, there are 14 such places in the kernel that I've found with semantic patch (see below): drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write() drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write() drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write() drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write() net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write() drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write() drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write() drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write() net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write() drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write() drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write() drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write() drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write() drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write() In addition to the cases above another regression caused by f_pos locking is that now FUSE filesystems that implement open with FOPEN_NONSEEKABLE flag, can no longer implement bidirectional stream-like files - for the same reason as above e.g. read can deadlock write locking on file.f_pos in the kernel. FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse: implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and write routines not depending on current position at all, and with both read and write being potentially blocking operations: See https://github.com/libfuse/osspd https://lwn.net/Articles/308445 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510 Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as "somewhat pipe-like files ..." with read handler not using offset. However that test implements only read without write and cannot exercise the deadlock scenario: https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216 I've actually hit the read vs write deadlock for real while implementing my FUSE filesystem where there is /head/watch file, for which open creates separate bidirectional socket-like stream in between filesystem and its user with both read and write being later performed simultaneously. And there it is semantically not easy to split the stream into two separate read-only and write-only channels: https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169 Let's fix this regression. The plan is: 1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - doing so would break many in-kernel nonseekable_open users which actually use ppos in read/write handlers. 2. Add stream_open() to kernel to open stream-like non-seekable file descriptors. Read and write on such file descriptors would never use nor change ppos. And with that property on stream-like files read and write will be running without taking f_pos lock - i.e. read and write could be running simultaneously. 3. With semantic patch search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access. 4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via steam_open if that bit is present in filesystem open reply. It was tempting to change fs/fuse/ open handler to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. 5. Add stream_open and FOPEN_STREAM handling to stable kernels starting from v3.14+ (the kernel where 9c225f2655 first appeared). This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in their open handler and this way avoid the deadlock on all kernel versions. This should work because fs/fuse/ ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. This patch adds stream_open, converts /proc/xen/xenbus to it and adds semantic patch to automatically locate in-kernel places that are either required to be converted due to read vs write deadlock, or that are just safe to be converted because read and write do not use ppos and there are no other funky methods in file_operations. Regarding semantic patch I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c) Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yongzhi Pan <panyongzhi@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Tejun Heo <tj@kernel.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Nikolaus Rath <Nikolaus@rath.org> Cc: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-06xsysace: Fix error handling in ace_setupGuenter Roeck1-0/+2
If xace hardware reports a bad version number, the error handling code in ace_setup() calls put_disk(), followed by queue cleanup. However, since the disk data structure has the queue pointer set, put_disk() also cleans and releases the queue. This results in blk_cleanup_queue() accessing an already released data structure, which in turn may result in a crash such as the following. [ 10.681671] BUG: Kernel NULL pointer dereference at 0x00000040 [ 10.681826] Faulting instruction address: 0xc0431480 [ 10.682072] Oops: Kernel access of bad area, sig: 11 [#1] [ 10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440 [ 10.682387] Modules linked in: [ 10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G W 5.0.0-rc6-next-20190218+ #2 [ 10.682733] NIP: c0431480 LR: c043147c CTR: c0422ad8 [ 10.682863] REGS: cf82fbe0 TRAP: 0300 Tainted: G W (5.0.0-rc6-next-20190218+) [ 10.683065] MSR: 00029000 <CE,EE,ME> CR: 22000222 XER: 00000000 [ 10.683236] DEAR: 00000040 ESR: 00000000 [ 10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000 [ 10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000 [ 10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000 [ 10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800 [ 10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114 [ 10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114 [ 10.684602] Call Trace: [ 10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable) [ 10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c [ 10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68 [ 10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c [ 10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508 [ 10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8 [ 10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c [ 10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464 [ 10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4 [ 10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc [ 10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0 [ 10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234 [ 10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c [ 10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac [ 10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330 [ 10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478 [ 10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114 [ 10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c [ 10.687349] Instruction dump: [ 10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008 [ 10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008 [ 10.688056] ---[ end trace 13c9ff51d41b9d40 ]--- Fix the problem by setting the disk queue pointer to NULL before calling put_disk(). A more comprehensive fix might be to rearrange the code to check the hardware version before initializing data structures, but I don't know if this would have undesirable side effects, and it would increase the complexity of backporting the fix to older kernels. Fixes: 74489a91dd43a ("Add support for Xilinx SystemACE CompactFlash interface") Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-06null_blk: prevent crash from bad home_node valueJohn Pittman1-0/+5
At module load, if the selected home_node value is greater than the available numa nodes, the system will crash in __alloc_pages_nodemask() due to a bad paging request. Prevent this user error crash by detecting the bad value, logging an error, and setting g_home_node back to the default of NUMA_NO_NODE. Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-06i2c: imx: don't leak the i2c adapter on errorLaurentiu Tudor1-1/+3
Make sure to free the i2c adapter on the error exit path. Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: e1ab9a468e3b ("i2c: imx: improve the error handling in i2c_imx_dma_request()") Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-04-05kernel/sysctl.c: fix out-of-bounds access when setting file-maxWill Deacon1-1/+2
Commit 32a5ad9c2285 ("sysctl: handle overflow for file-max") hooked up min/max values for the file-max sysctl parameter via the .extra1 and .extra2 fields in the corresponding struct ctl_table entry. Unfortunately, the minimum value points at the global 'zero' variable, which is an int. This results in a KASAN splat when accessed as a long by proc_doulongvec_minmax on 64-bit architectures: | BUG: KASAN: global-out-of-bounds in __do_proc_doulongvec_minmax+0x5d8/0x6a0 | Read of size 8 at addr ffff2000133d1c20 by task systemd/1 | | CPU: 0 PID: 1 Comm: systemd Not tainted 5.1.0-rc3-00012-g40b114779944 #2 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x228 | show_stack+0x14/0x20 | dump_stack+0xe8/0x124 | print_address_description+0x60/0x258 | kasan_report+0x140/0x1a0 | __asan_report_load8_noabort+0x18/0x20 | __do_proc_doulongvec_minmax+0x5d8/0x6a0 | proc_doulongvec_minmax+0x4c/0x78 | proc_sys_call_handler.isra.19+0x144/0x1d8 | proc_sys_write+0x34/0x58 | __vfs_write+0x54/0xe8 | vfs_write+0x124/0x3c0 | ksys_write+0xbc/0x168 | __arm64_sys_write+0x68/0x98 | el0_svc_common+0x100/0x258 | el0_svc_handler+0x48/0xc0 | el0_svc+0x8/0xc | | The buggy address belongs to the variable: | zero+0x0/0x40 | | Memory state around the buggy address: | ffff2000133d1b00: 00 00 00 00 00 00 00 00 fa fa fa fa 04 fa fa fa | ffff2000133d1b80: fa fa fa fa 04 fa fa fa fa fa fa fa 04 fa fa fa | >ffff2000133d1c00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00 | ^ | ffff2000133d1c80: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 00 00 | ffff2000133d1d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fix the splat by introducing a unsigned long 'zero_ul' and using that instead. Link: http://lkml.kernel.org/r/20190403153409.17307-1-will.deacon@arm.com Fixes: 32a5ad9c2285 ("sysctl: handle overflow for file-max") Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Christian Brauner <christian@brauner.io> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matteo Croce <mcroce@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05mm/util.c: fix strndup_user() commentAndrew Morton1-1/+1
The kerneldoc misdescribes strndup_user()'s return value. Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Timur Tabi <timur@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05sh: fix multiple function definition build errorsRandy Dunlap1-2/+2
Many of the sh CPU-types have their own plat_irq_setup() and arch_init_clk_ops() functions, so these same (empty) functions in arch/sh/boards/of-generic.c are not needed and cause build errors. If there is some case where these empty functions are needed, they can be retained by marking them as "__weak" while at the same time making builds that do not need them succeed. Fixes these build errors: arch/sh/boards/of-generic.o: In function `plat_irq_setup': (.init.text+0x134): multiple definition of `plat_irq_setup' arch/sh/kernel/cpu/sh2/setup-sh7619.o:(.init.text+0x30): first defined here arch/sh/boards/of-generic.o: In function `arch_init_clk_ops': (.init.text+0x118): multiple definition of `arch_init_clk_ops' arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.init.text+0x0): first defined here Link: http://lkml.kernel.org/r/9ee4e0c5-f100-86a2-bd4d-1d3287ceab31@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kbuild test robot <lkp@intel.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCMTomer Maimon1-1/+2
Add Tali Perry as Nuvoton NPCM maintainer, replace Brendan Higgins Nuvoton NPCM reviewer with Benjamin Fair. Link: http://lkml.kernel.org/r/20190328235752.334462-2-tmaimon77@gmail.com Signed-off-by: Tomer Maimon <tmaimon77@gmail.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Reviewed-by: Benjamin Fair <benjaminfair@google.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Cc: Joe Perches <joe@perches.com> Cc: Avi Fishman <avifishman70@gmail.com> Cc: Patrick Venture <venture@google.com> Cc: Nancy Yuen <yuenn@google.com> Cc: Tali Perry <tali.perry1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCMTomer Maimon1-1/+1
In the process of upstreaming architecture support for ARM/NUVOTON NPCM include/dt-bindings/clock/nuvoton,npcm7xx-clks.h was renamed include/dt-bindings/clock/nuvoton,npcm7xx-clock.h without updating MAINTAINERS. This updates the MAINTAINERS pattern to match the new name of this file. Link: http://lkml.kernel.org/r/20190328235752.334462-1-tmaimon77@gmail.com Fixes: 6a498e06ba22 ("MAINTAINERS: Add entry for the Nuvoton NPCM architecture") Signed-off-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Tomer Maimon <tmaimon77@gmail.com> Reported-by: Joe Perches <joe@perches.com> Reviewed-by: Benjamin Fair <benjaminfair@google.com> Cc: Avi Fishman <avifishman70@gmail.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Nancy Yuen <yuenn@google.com> Cc: Patrick Venture <venture@google.com> Cc: Tali Perry <tali.perry1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05mm: writeback: use exact memcg dirty countsGreg Thelen2-3/+22
Since commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") memcg dirty and writeback counters are managed as: 1) per-memcg per-cpu values in range of [-32..32] 2) per-memcg atomic counter When a per-cpu counter cannot fit in [-32..32] it's flushed to the atomic. Stat readers only check the atomic. Thus readers such as balance_dirty_pages() may see a nontrivial error margin: 32 pages per cpu. Assuming 100 cpus: 4k x86 page_size: 13 MiB error per memcg 64k ppc page_size: 200 MiB error per memcg Considering that dirty+writeback are used together for some decisions the errors double. This inaccuracy can lead to undeserved oom kills. One nasty case is when all per-cpu counters hold positive values offsetting an atomic negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32). balance_dirty_pages() only consults the atomic and does not consider throttling the next n_cpu*32 dirty pages. If the file_lru is in the 13..200 MiB range then there's absolutely no dirty throttling, which burdens vmscan with only dirty+writeback pages thus resorting to oom kill. It could be argued that tiny containers are not supported, but it's more subtle. It's the amount the space available for file lru that matters. If a container has memory.max-200MiB of non reclaimable memory, then it will also suffer such oom kills on a 100 cpu machine. The following test reliably ooms without this patch. This patch avoids oom kills. $ cat test mount -t cgroup2 none /dev/cgroup cd /dev/cgroup echo +io +memory > cgroup.subtree_control mkdir test cd test echo 10M > memory.max (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo) (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100) $ cat memcg-writeback-stress.c /* * Dirty pages from all but one cpu. * Clean pages from the non dirtying cpu. * This is to stress per cpu counter imbalance. * On a 100 cpu machine: * - per memcg per cpu dirty count is 32 pages for each of 99 cpus * - per memcg atomic is -99*32 pages * - thus the complete dirty limit: sum of all counters 0 * - balance_dirty_pages() only sees atomic count -99*32 pages, which * it max()s to 0. * - So a workload can dirty -99*32 pages before balance_dirty_pages() * cares. */ #define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <sched.h> #include <stdlib.h> #include <stdio.h> #include <sys/stat.h> #include <sys/sysinfo.h> #include <sys/types.h> #include <unistd.h> static char *buf; static int bufSize; static void set_affinity(int cpu) { cpu_set_t affinity; CPU_ZERO(&affinity); CPU_SET(cpu, &affinity); if (sched_setaffinity(0, sizeof(affinity), &affinity)) err(1, "sched_setaffinity"); } static void dirty_on(int output_fd, int cpu) { int i, wrote; set_affinity(cpu); for (i = 0; i < 32; i++) { for (wrote = 0; wrote < bufSize; ) { int ret = write(output_fd, buf+wrote, bufSize-wrote); if (ret == -1) err(1, "write"); wrote += ret; } } } int main(int argc, char **argv) { int cpu, flush_cpu = 1, output_fd; const char *output; if (argc != 2) errx(1, "usage: output_file"); output = argv[1]; bufSize = getpagesize(); buf = malloc(getpagesize()); if (buf == NULL) errx(1, "malloc failed"); output_fd = open(output, O_CREAT|O_RDWR); if (output_fd == -1) err(1, "open(%s)", output); for (cpu = 0; cpu < get_nprocs(); cpu++) { if (cpu != flush_cpu) dirty_on(output_fd, cpu); } set_affinity(flush_cpu); if (fsync(output_fd)) err(1, "fsync(%s)", output); if (close(output_fd)) err(1, "close(%s)", output); free(buf); } Make balance_dirty_pages() and wb_over_bg_thresh() work harder to collect exact per memcg counters. This avoids the aforementioned oom kills. This does not affect the overhead of memory.stat, which still reads the single atomic counter. Why not use percpu_counter? memcg already handles cpus going offline, so no need for that overhead from percpu_counter. And the percpu_counter spinlocks are more heavyweight than is required. It probably also makes sense to use exact dirty and writeback counters in memcg oom reports. But that is saved for later. Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> [4.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05psi: clarify the units used in pressure filesWaiman Long1-6/+6
The output of the PSI files show a bunch of numbers with no unit. The psi.txt documentation file also does not indicate what units are used. One can only find out by looking at the source code. The units are percentage for the averages and useconds for the total. Make the information easier to find by documenting the units in psi.txt. Link: http://lkml.kernel.org/r/20190402193810.3450-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()Aneesh Kumar K.V1-0/+36
With some architectures like ppc64, set_pmd_at() cannot cope with a situation where there is already some (different) valid entry present. Use pmdp_set_access_flags() instead to modify the pfn which is built to deal with modifying existing PMD entries. This is similar to commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by insert_pfn()") We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support pud pfn entries now. Without this patch we also see the below message in kernel log "BUG: non-zero pgtables_bytes on freeing mm:" Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Chandan Rajendra <chandan@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05hugetlbfs: fix memory leak for resv_mapMike Kravetz1-6/+14
When mknod is used to create a block special file in hugetlbfs, it will allocate an inode and kmalloc a 'struct resv_map' via resv_map_alloc(). inode->i_mapping->private_data will point the newly allocated resv_map. However, when the device special file is opened bd_acquire() will set inode->i_mapping to bd_inode->i_mapping. Thus the pointer to the allocated resv_map is lost and the structure is leaked. Programs to reproduce: mount -t hugetlbfs nodev hugetlbfs mknod hugetlbfs/dev b 0 0 exec 30<> hugetlbfs/dev umount hugetlbfs/ resv_map structures are only needed for inodes which can have associated page allocations. To fix the leak, only allocate resv_map for those inodes which could possibly be associated with page allocations. Link: http://lkml.kernel.org/r/20190401213101.16476-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reported-by: Yufen Yu <yuyufen@huawei.com> Suggested-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX()Jann Horn1-1/+1
Symmetrically to VM_FAULT_SET_HINDEX(), we need a force-cast in VM_FAULT_GET_HINDEX() to tell sparse that this is intentional. Sparse complains about the current code when building a kernel with CONFIG_MEMORY_FAILURE: arch/x86/mm/fault.c:1058:53: warning: restricted vm_fault_t degrades to integer Link: http://lkml.kernel.org/r/20190327204117.35215-1-jannh@google.com Fixes: 3d3539018d2c ("mm: create the new vm_fault_t type") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05lib/lzo: fix bugs for very short or empty inputDave Rodgman3-9/+12
For very short input data (0 - 1 bytes), lzo-rle was not behaving correctly. Fix this behaviour and update documentation accordingly. For zero-length input, lzo v0 outputs an end-of-stream marker only, which was misinterpreted by lzo-rle as a bitstream version number. Ensure bitstream versions > 0 require a minimum stream length of 5. Also fixes a bug in handling the tail for very short inputs when a bitstream version is present. Link: http://lkml.kernel.org/r/20190326165857.34613-1-dave.rodgman@arm.com Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05include/linux/bitrev.h: fix constant bitrevArnd Bergmann1-23/+23
clang points out with hundreds of warnings that the bitrev macros have a problem with constant input: drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] u8 crc = bitrev8(data->val_status & 0x0F); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8' __constant_bitrev8(__x) : \ ~~~~~~~~~~~~~~~~~~~^~~~ include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8' u8 __x = x; \ ~~~ ^ Both the bitrev and the __constant_bitrev macros use an internal variable named __x, which goes horribly wrong when passing one to the other. The obvious fix is to rename one of the variables, so this adds an extra '_'. It seems we got away with this because - there are only a few drivers using bitrev macros - usually there are no constant arguments to those - when they are constant, they tend to be either 0 or (unsigned)-1 (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and give the correct result by pure chance. In fact, the only driver that I could find that gets different results with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver for fairly rare hardware (adding the maintainer to Cc for testing). Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Zhao Qiang <qiang.zhao@nxp.com> Cc: Yalin Wang <yalin.wang@sonymobile.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05kmemleak: powerpc: skip scanning holes in the .bss sectionCatalin Marinas2-5/+18
Commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds kvm_tmp[] into the .bss section and then free the rest of unused spaces back to the page allocator. kernel_init kvm_guest_init kvm_free_tmp free_reserved_area free_unref_page free_unref_page_prepare With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the result, kmemleak scan will trigger a panic when it scans the .bss section with unmapped pages. This patch creates dedicated kmemleak objects for the .data, .bss and potentially .data..ro_after_init sections to allow partial freeing via the kmemleak_free_part() in the powerpc kvm_free_tmp() function. Link: http://lkml.kernel.org/r/20190321171917.62049-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Qian Cai <cai@lca.pw> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Qian Cai <cai@lca.pw> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Avi Kivity <avi@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05lib/string.c: implement a basic bcmpNick Desaulniers2-0/+23
A recent optimization in Clang (r355672) lowers comparisons of the return value of memcmp against zero to comparisons of the return value of bcmp against zero. This helps some platforms that implement bcmp more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but an optimized implementation is in the works. This results in linkage failures for all targets with Clang due to the undefined symbol. For now, just implement bcmp as a tailcail to memcmp to unbreak the build. This routine can be further optimized in the future. Other ideas discussed: * A weak alias was discussed, but breaks for architectures that define their own implementations of memcmp since aliases to declarations are not permitted (only definitions). Arch-specific memcmp implementations typically declare memcmp in C headers, but implement them in assembly. * -ffreestanding also is used sporadically throughout the kernel. * -fno-builtin-bcmp doesn't work when doing LTO. Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18eedcff13 Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: James Y Knight <jyknight@google.com> Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05dm integrity: fix deadlock with overlapping I/OMikulas Patocka1-3/+1
dm-integrity will deadlock if overlapping I/O is issued to it, the bug was introduced by commit 724376a04d1a ("dm integrity: implement fair range locks"). Users rarely use overlapping I/O so this bug went undetected until now. Fix this bug by correcting, likely cut-n-paste, typos in ranges_overlap() and also remove a flawed ranges_overlap() check in remove_range_unlocked(). This condition could leave unprocessed bios hanging on wait_list forever. Cc: stable@vger.kernel.org # v4.19+ Fixes: 724376a04d1a ("dm integrity: implement fair range locks") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-05KVM: x86: nVMX: fix x2APIC VTPR read interceptMarc Orr1-1/+1
Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the SDM, when "virtualize x2APIC mode" is 1 and "APIC-register virtualization" is 0, a RDMSR of 808H should return the VTPR from the virtual APIC page. However, for nested, KVM currently fails to disable the read intercept for this MSR. This means that a RDMSR exit takes precedence over "virtualize x2APIC mode", and KVM passes through L1's TPR to L2, instead of sourcing the value from L2's virtual APIC page. This patch fixes the issue by disabling the read intercept, in VMCS02, for the VTPR when "APIC-register virtualization" is 0. The issue described above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC mode w/ nested". Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Fixes: c992384bde84f ("KVM: vmx: speed up MSR bitmap merge") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)Marc Orr1-28/+44
The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a result, we discovered the potential for a buggy or malicious L1 to get access to L0's x2APIC MSRs, via an L2, as follows. 1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl variable, in nested_vmx_prepare_msr_bitmap() to become true. 2. L1 disables "virtualize x2APIC mode" in VMCS12. 3. L1 enables "APIC-register virtualization" in VMCS12. Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then set "virtualize x2APIC mode" to 0 in VMCS02. Oops. This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR intercepts with VMCS12's "virtualize x2APIC mode" control. The scenario outlined above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Add leak scenario to virt_x2apic_mode_test". Note, it looks like this issue may have been introduced inadvertently during a merge---see 15303ba5d1cd. Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflowDavid Rientjes1-3/+9
This ensures that the address and length provided to DBG_DECRYPT and DBG_ENCRYPT do not cause an overflow. At the same time, pass the actual number of pages pinned in memory to sev_unpin_memory() as a cleanup. Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05kvm: svm: fix potential get_num_contig_pages overflowDavid Rientjes1-5/+5
get_num_contig_pages() could potentially overflow int so make its type consistent with its usage. Reported-by: Cfir Cohen <cfir@google.com> Cc: stable@vger.kernel.org Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05tty: mark Siemens R3964 line discipline as BROKENGreg Kroah-Hartman1-1/+1
The n_r3964 line discipline driver was written in a different time, when SMP machines were rare, and users were trusted to do the right thing. Since then, the world has moved on but not this code, it has stayed rooted in the past with its lovely hand-crafted list structures and loads of "interesting" race conditions all over the place. After attempting to clean up most of the issues, I just gave up and am now marking the driver as BROKEN so that hopefully someone who has this hardware will show up out of the woodwork (I know you are out there!) and will help with debugging a raft of changes that I had laying around for the code, but was too afraid to commit as odds are they would break things. Many thanks to Jann and Linus for pointing out the initial problems in this codebase, as well as many reviews of my attempts to fix the issues. It was a case of whack-a-mole, and as you can see, the mole won. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05block: Revert v5.0 blk_mq_request_issue_directly() changesBart Van Assche4-69/+71
blk_mq_try_issue_directly() can return BLK_STS*_RESOURCE for requests that have been queued. If that happens when blk_mq_try_issue_directly() is called by the dm-mpath driver then dm-mpath will try to resubmit a request that is already queued and a kernel crash follows. Since it is nontrivial to fix blk_mq_request_issue_directly(), revert the blk_mq_request_issue_directly() changes that went into kernel v5.0. This patch reverts the following commits: * d6a51a97c0b2 ("blk-mq: replace and kill blk_mq_request_issue_directly") # v5.0. * 5b7a6f128aad ("blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests") # v5.0. * 7f556a44e61d ("blk-mq: refactor the code of issue request directly") # v5.0. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: James Smart <james.smart@broadcom.com> Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> Reported-by: Laurence Oberman <loberman@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Fixes: 7f556a44e61d ("blk-mq: refactor the code of issue request directly") # v5.0. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>