linux-rng - Development tree for the kernel CSPRNG

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-23	mfd: cros_ec: Retry commands when EC is known to be busy	Brian Norris	2	-4/+22
	Commit 001dde9400d5 ("mfd: cros ec: spi: Fix "in progress" error signaling") pointed out some bad code, but its analysis and conclusion was not 100% correct. It is correct that we should not propagate result==EC_RES_IN_PROGRESS for transport errors, because this has a special meaning -- that we should follow up with EC_CMD_GET_COMMS_STATUS until the EC is no longer busy. This is definitely the wrong thing for many commands, because among other problems, EC_CMD_GET_COMMS_STATUS doesn't actually retrieve any RX data from the EC, so commands that expected some data back will instead start processing junk. For such commands, the right answer is to either propagate the error (and return that error to the caller) or resend the original command (not EC_CMD_GET_COMMS_STATUS). Unfortunately, commit 001dde9400d5 forgets a crucial point: that for some long-running operations, the EC physically cannot respond to commands any more. For example, with EC_CMD_FLASH_ERASE, the EC may be re-flashing its own code regions, so it can't respond to SPI interrupts. Instead, the EC prepares us ahead of time for being busy for a "long" time, and fills its hardware buffer with EC_SPI_PAST_END. Thus, we expect to see several "transport" errors (or, messages filled with EC_SPI_PAST_END). So we should really translate that to a retryable error (-EAGAIN) and continue sending EC_CMD_GET_COMMS_STATUS until we get a ready status. IOW, it is actually important to treat some of these "junk" values as retryable errors. Together with commit 001dde9400d5, this resolves bugs like the following: 1. EC_CMD_FLASH_ERASE now works again (with commit 001dde9400d5, we would abort the first time we saw EC_SPI_PAST_END) 2. Before commit 001dde9400d5, transport errors (e.g., EC_SPI_RX_BAD_DATA) seen in other commands (e.g., EC_CMD_RTC_GET_VALUE) used to yield junk data in the RX buffer; they will now yield -EAGAIN return values, and tools like 'hwclock' will simply fail instead of retrieving and re-programming undefined time values Fixes: 001dde9400d5 ("mfd: cros ec: spi: Fix "in progress" error signaling") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2018-05-22	alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2	Sinan Kaya	1	-7/+7
	memory-barriers.txt has been updated with the following requirement. "When using writel(), a prior wmb() is not needed to guarantee that the cache coherent memory writes have completed before writing to the MMIO region." Current writeX() and iowriteX() implementations on alpha are not satisfying this requirement as the barrier is after the register write. Move mb() in writeX() and iowriteX() functions to guarantee that HW observes memory changes before performing register operations. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matt Turner <mattst88@gmail.com>
2018-05-22	alpha: simplify get_arch_dma_ops	Christoph Hellwig	2	-5/+3
	Remove the dma_ops indirection. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matt Turner <mattst88@gmail.com>
2018-05-22	alpha: use dma_direct_ops for jensen	Christoph Hellwig	3	-33/+5
	The generic dma_direct implementation does the same thing as the alpha pci-noop implementation, just with more bells and whistles. And unlike the current code it at least has a theoretical chance to actually compile. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matt Turner <mattst88@gmail.com>
2018-05-22	PM / core: Fix direct_complete handling for devices with no callbacks	Rafael J. Wysocki	1	-4/+3
	Commit 08810a4119aa (PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags) inadvertently prevented the power.direct_complete flag from being set for devices without PM callbacks and with disabled runtime PM which also prevents power.direct_complete from being set for their parents. That led to problems including a resume crash on HP ZBook 14u. Restore the previous behavior by causing power.direct_complete to be set for those devices again, but do that in a more direct way to avoid overlooking that case in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199693 Fixes: 08810a4119aa (PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags) Reported-by: Thomas Martitz <kugel@rockbox.org> Tested-by: Thomas Martitz <kugel@rockbox.org> Cc: 4.15+ <stable@vger.kernel.org> # 4.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Johan Hovold <johan@kernel.org>
2018-05-21	powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit	Nicholas Piggin	9	-2/+356
	On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-21	loop: clear wb_err in bd_inode when detaching backing file	Jeff Layton	1	-0/+1
	When a loop block device encounters a writeback error, that error will get propagated to the bd_inode's wb_err field. If we then detach the backing file from it, attach another and fsync it, we'll get back the writeback error that we had from the previous backing file. This is a bit of a grey area as POSIX doesn't cover loop devices, but it is somewhat counterintuitive. If we detach a backing file from the loopdev while there are still unreported errors, take it as a sign that we're no longer interested in the previous file, and clear out the wb_err in the loop blockdev. Reported-and-Tested-by: Theodore Y. Ts'o <tytso@mit.edu> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-21	aio: fix io_destroy(2) vs. lookup_ioctx() race	Al Viro	1	-2/+2
	kill_ioctx() used to have an explicit RCU delay between removing the reference from ->ioctx_table and percpu_ref_kill() dropping the refcount. At some point that delay had been removed, on the theory that percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was the wrong kind of RCU delay and it didn't care about rcu_read_lock() used by lookup_ioctx(). As the result, we could get ctx freed right under lookup_ioctx(). Tejun has fixed that in a6d7cff472e ("fs/aio: Add explicit RCU grace period when freeing kioctx"); however, that fix is not enough. Suppose io_destroy() from one thread races with e.g. io_setup() from another; CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2 has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the refcount, getting it to 0 and triggering a call of free_ioctx_users(), which proceeds to drop the secondary refcount and once that reaches zero calls free_ioctx_reqs(). That does INIT_RCU_WORK(&ctx->free_rwork, free_ioctx); queue_rcu_work(system_wq, &ctx->free_rwork); and schedules freeing the whole thing after RCU delay. In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the refcount from 0 to 1 and returned the reference to io_setup(). Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get freed until after percpu_ref_get(). Sure, we'd increment the counter before ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it has grabbed the reference, ctx is NOT going away until it gets around to dropping that reference. The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss. It's not costlier than what we currently do in normal case, it's safe to call since freeing is delayed and it closes the race window - either lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx() fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see the object in question at all. Cc: stable@kernel.org Fixes: a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	ext2: fix a block leak	Al Viro	1	-10/+0
	open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks not freed... Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to 737f2e93b972 ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr(). Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed	Al Viro	1	-0/+22
	That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed	Al Viro	1	-0/+10
	That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	unfuck sysfs_mount()	Al Viro	1	-3/+3
	new_sb is left uninitialized in case of early failures in kernfs_mount_ns(), and while IS_ERR(root) is true in all such cases, using IS_ERR(root) \|\| !new_sb is not a solution - IS_ERR(root) is true in some cases when new_sb is true. Make sure new_sb is initialized (and matches the reality) in all cases and fix the condition for dropping kobj reference - we want it done precisely in those situations where the reference has not been transferred into a new super_block instance. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	kernfs: deal with kernfs_fill_super() failures	Al Viro	1	-0/+1
	make sure that info->node is initialized early, so that kernfs_kill_sb() can list_del() it safely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	cramfs: Fix IS_ENABLED typo	Joe Perches	1	-1/+1
	There's an extra C here... Fixes: 99c18ce580c6 ("cramfs: direct memory access support") Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	befs_lookup(): use d_splice_alias()	Al Viro	1	-12/+5
	RTFS(Documentation/filesystems/nfs/Exporting) if you try to make something exportable. Fixes: ac632f5b6301 "befs: add NFS export support" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	affs_lookup: switch to d_splice_alias()	Al Viro	1	-6/+5
	Making something exportable takes more than providing ->s_export_ops. In particular, ->lookup() MUST use d_splice_alias() instead of d_add(). Reading Documentation/filesystems/nfs/Exporting would've been a good idea; as it is, exporting AFFS is badly (and exploitably) broken. Partially-Fixes: ed4433d72394 "fs/affs: make affs exportable" Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	affs_lookup(): close a race with affs_remove_link()	Al Viro	1	-3/+7
	we unlock the directory hash too early - if we are looking at secondary link and primary (in another directory) gets removed just as we unlock, we could have the old primary moved in place of the secondary, leaving us to look into freed entry (and leaving our dentry with ->d_fsdata pointing to a freed entry). Cc: stable@vger.kernel.org # 2.4.4+ Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	sr: pass down correctly sized SCSI sense buffer	Jens Axboe	1	-2/+8
	We're casting the CDROM layer request_sense to the SCSI sense buffer, but the former is 64 bytes and the latter is 96 bytes. As we generally allocate these on the stack, we end up blowing up the stack. Fix this by wrapping the scsi_execute() call with a properly sized sense buffer, and copying back the bits for the CDROM layer. Cc: stable@vger.kernel.org Reported-by: Piotr Gabriel Kosinski <pg.kosinski@gmail.com> Reported-by: Daniel Shapira <daniel@twistlock.com> Tested-by: Kees Cook <keescook@chromium.org> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-21	libata: blacklist Micron 500IT SSD with MU01 firmware	Sudip Mukherjee	1	-0/+2
	While whitelisting Micron M500DC drives, the tweaked blacklist entry enabled queued TRIM from M500IT variants also. But these do not support queued TRIM. And while using those SSDs with the latest kernel we have seen errors and even the partition table getting corrupted. Some part from the dmesg: [ 6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133 [ 6.727390] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [ 6.741026] ata1.00: supports DRM functions and may not be fully accessible [ 6.759887] ata1.00: configured for UDMA/133 [ 6.762256] scsi 0:0:0:0: Direct-Access ATA Micron_M500IT_MT MU01 PQ: 0 ANSI: 5 and then for the error: [ 120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen [ 120.860338] ata1.00: irq_stat 0x40000008 [ 120.860342] ata1.00: failed command: SEND FPDMA QUEUED [ 120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout) [ 120.860353] ata1.00: status: { DRDY } [ 120.860543] ata1: hard resetting link [ 121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 121.166376] ata1.00: supports DRM functions and may not be fully accessible [ 121.186238] ata1.00: supports DRM functions and may not be fully accessible [ 121.204445] ata1.00: configured for UDMA/133 [ 121.204454] ata1.00: device reported invalid CHS sector 0 [ 121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 [ 121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current] [ 121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4 [ 121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00 [ 121.204559] print_req_error: I/O error, dev sda, sector 272512 After few reboots with these errors, and the SSD is corrupted. After blacklisting it, the errors are not seen and the SSD does not get corrupted any more. Fixes: 243918be6393 ("libata: Do not blacklist Micron M500DC") Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-20	Linux 4.17-rc6	Linus Torvalds	1	-1/+1

2018-05-19	net: ip6_gre: fix tunnel metadata device sharing.	William Tu	1	-22/+79
	Currently ip6gre and ip6erspan share single metadata mode device, using 'collect_md_tun'. Thus, when doing: ip link add dev ip6gre11 type ip6gretap external ip link add dev ip6erspan12 type ip6erspan external RTNETLINK answers: File exists simply fails due to the 2nd tries to create the same collect_md_tun. The patch fixes it by adding a separate collect md tunnel device for the ip6erspan, 'collect_md_tun_erspan'. As a result, a couple of places need to refactor/split up in order to distinguish ip6gre and ip6erspan. First, move the collect_md check at ip6gre_tunnel_{unlink,link} and create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}. Then before link/unlink, make sure the link_md/unlink_md is called. Finally, a separate ndo_uninit is created for ip6erspan. Tested it using the samples/bpf/test_tunnel_bpf.sh. Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19	bpf: Prevent memory disambiguation attack	Alexei Starovoitov	2	-3/+57
	Detect code patterns where malicious 'speculative store bypass' can be used and sanitize such patterns. 39: (bf) r3 = r10 40: (07) r3 += -216 41: (79) r8 = (u64 )(r7 +0) // slow read 42: (7a) (u64 )(r10 -72) = 0 // verifier inserts this instruction 43: (7b) (u64 )(r8 +0) = r3 // this store becomes slow due to r8 44: (79) r1 = (u64 )(r6 +0) // cpu speculatively executes this load 45: (71) r2 = (u8 )(r1 +0) // speculatively arbitrary 'load byte' // is now sanitized Above code after x86 JIT becomes: e5: mov %rbp,%rdx e8: add $0xffffffffffffff28,%rdx ef: mov 0x0(%r13),%r14 f3: movq $0x0,-0x48(%rbp) fb: mov %rdx,0x0(%r14) ff: mov 0x0(%rbx),%rdi 103: movzbq 0x0(%rdi),%rsi Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-19	ARM: fix kill( ,SIGFPE) breakage	Russell King	2	-14/+1
	Commit 7771c6645700 ("signal/arm: Document conflicts with SI_USER and SIGFPE") broke the siginfo structure for userspace triggered signals, causing the strace testsuite to regress. Fix this by eliminating the FPE_FIXME definition (which is at the root of the breakage) and use FPE_FLTINV instead for the case where the hardware appears to be reporting nonsense. Fixes: 7771c6645700 ("signal/arm: Document conflicts with SI_USER and SIGFPE") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	mmap: relax file size limit for regular files	Linus Torvalds	1	-1/+1
	Commit be83bbf80682 ("mmap: introduce sane default mmap limits") was introduced to catch problems in various ad-hoc character device drivers doing mmap and getting the size limits wrong. In the process, it used "known good" limits for the normal cases of mapping regular files and block device drivers. It turns out that the "s_maxbytes" limit was less "known good" than I thought. In particular, /proc doesn't set it, but exposes one regular file to mmap: /proc/vmcore. As a result, that file got limited to the default MAX_INT s_maxbytes value. This went unnoticed for a while, because apparently the only thing that needs it is the s390 kernel zfcpdump, but there might be other tools that use this too. Vasily suggested just changing s_maxbytes for all of /proc, which isn't wrong, but makes me nervous at this stage. So instead, just make the new mmap limit always be MAX_LFS_FILESIZE for regular files, which won't affect anything else. It wasn't the regular file case I was worried about. I'd really prefer for maxsize to have been per-inode, but that is not how things are today. Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits") Reported-by: Vasily Gorbik <gor@linux.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-19	x86/MCE/AMD: Cache SMCA MISC block addresses	Borislav Petkov	1	-15/+14
	... into a global, two-dimensional array and service subsequent reads from that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs disabled). In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage of the bank->blocks pointer. Fixes: 27bd59502702 ("x86/mce/AMD: Get address from already initialized block") Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
2018-05-19	ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions	Masami Hiramatsu	2	-0/+20
	Since do_undefinstr() uses get_user to get the undefined instruction, it can be called before kprobes processes recursive check. This can cause an infinit recursive exception. Prohibit probing on get_user functions. Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr	Masami Hiramatsu	1	-1/+4
	Prohibit kprobes on do_undefinstr because kprobes on arm is implemented by undefined instruction. This means if we probe do_undefinstr(), it can cause infinit recursive exception. Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: 8770/1: kprobes: Prohibit probing on optimized_callback	Masami Hiramatsu	1	-0/+1
	Prohibit probing on optimized_callback() because it is called from kprobes itself. If we put a kprobes on it, that will cause a recursive call loop. Mark it NOKPROBE_SYMBOL. Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed	Masami Hiramatsu	1	-1/+2
	Since get_kprobe_ctlblk() uses smp_processor_id() to access per-cpu variable, it hits smp_processor_id sanity check as below. [ 7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 7.007859] caller is debug_smp_processor_id+0x20/0x24 [ 7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1 [ 7.008890] Hardware name: Generic DT based system [ 7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24) [ 7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98) [ 7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c) [ 7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24) [ 7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4) [ 7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0) To fix this issue, call get_kprobe_ctlblk() right after irq-disabled since that disables preemption. Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: replace unnecessary perl with sed and the shell $(( )) operator	Russell King	1	-5/+3
	You can build a kernel in a cross compiling environment that doesn't have perl in the $PATH. Commit 429f7a062e3b broke that for 32 bit ARM. Fix it. As reported by Stephen Rothwell, it appears that the symbols can be either part of the BSS section or absolute symbols depending on the binutils version. When they're an absolute symbol, the $(( )) operator errors out and the build fails. Fix this as well. Fixes: 429f7a062e3b ("ARM: decompressor: fix BSS size calculation") Reported-by: Rob Landley <rob@landley.net> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Rob Landley <rob@landley.net> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: kexec: record parent context registers for non-crash CPUs	Russell King	1	-1/+1
	How we got to machine_crash_nonpanic_core() (iow, from an IPI, etc) is not interesting for debugging a crash. The more interesting context is the parent context prior to the IPI being received. Record the parent context register state rather than the register state in machine_crash_nonpanic_core(), which is more relevant to the failing condition. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: kexec: fix kdump register saving on panic()	Russell King	1	-12/+22
	When a panic() occurs, the kexec code uses smp_send_stop() to stop the other CPUs, but this results in the CPU register state not being saved, and gdb is unable to inspect the state of other CPUs. Commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path") addressed the issue on x86, but ignored other architectures. Address the issue on ARM by splitting out the crash stop implementation to crash_smp_send_stop() and adding the necessary protection. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel	Łukasz Stelmach	1	-2/+2
	The hypervisor setup before __enter_kernel destroys the value sotred in r1. The value needs to be restored just before the jump. Fixes: 6b52f7bdb888 ("ARM: hyp-stub: Use r1 for the soft-restart address") Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	ARM: 8753/1: decompressor: add a missing parameter to the addruart macro	Łukasz Stelmach	1	-8/+8
	In commit 639da5ee374b ("ARM: add an extra temp register to the low level debugging addruart macro") an additional temporary register was added to the addruart macro, but the decompressor code wasn't updated. Fixes: 639da5ee374b ("ARM: add an extra temp register to the low level debugging addruart macro") Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-05-19	x86/mm: Drop TS_COMPAT on 64-bit exec() syscall	Dmitry Safonov	1	-0/+1
	The x86 mmap() code selects the mmap base for an allocation depending on the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and for 32bit mm->mmap_compat_base. exec() calls mmap() which in turn uses in_compat_syscall() to check whether the mapping is for a 32bit or a 64bit task. The decision is made on the following criteria: ia32 child->thread.status & TS_COMPAT x32 child->pt_regs.orig_ax & __X32_SYSCALL_BIT ia64 !ia32 && !x32 __set_personality_x32() was dropping TS_COMPAT flag, but set_personality_64bit() has kept compat syscall flag making in_compat_syscall() return true during the first exec() syscall. Which in result has user-visible effects, mentioned by Alexey: 1) It breaks ASAN $ gcc -fsanitize=address wrap.c -o wrap-asan $ ./wrap32 ./wrap-asan true ==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING. ==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range. ==1217==Process memory map follows: 0x000000400000-0x000000401000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000600000-0x000000601000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000601000-0x000000602000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x0000f7dbd000-0x0000f7de2000 /lib64/ld-2.27.so 0x0000f7fe2000-0x0000f7fe3000 /lib64/ld-2.27.so 0x0000f7fe3000-0x0000f7fe4000 /lib64/ld-2.27.so 0x0000f7fe4000-0x0000f7fe5000 0x7fed9abff000-0x7fed9af54000 0x7fed9af54000-0x7fed9af6b000 /lib64/libgcc_s.so.1 [snip] 2) It doesn't seem to be great for security if an attacker always knows that ld.so is going to be mapped into the first 4GB in this case (the same thing happens for PIEs as well). The testcase: $ cat wrap.c int main(int argc, char *argv[]) { execvp(argv[1], &argv[1]); return 127; } $ gcc wrap.c -o wrap $ LD_SHOW_AUXV=1 ./wrap ./wrap true \|& grep AT_BASE AT_BASE: 0x7f63b8309000 AT_BASE: 0x7faec143c000 AT_BASE: 0x7fbdb25fa000 $ gcc -m32 wrap.c -o wrap32 $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true \|& grep AT_BASE AT_BASE: 0xf7eff000 AT_BASE: 0xf7cee000 AT_BASE: 0x7f8b9774e000 Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Fixes: ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec") Reported-by: Alexey Izbyshev <izbyshev@ispras.ru> Bisected-by: Alexander Monakov <amonakov@ispras.ru> Investigated-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: Alexander Monakov <amonakov@ispras.ru> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com
2018-05-19	objtool: Detect RIP-relative switch table references, part 2	Josh Poimboeuf	1	-25/+12
	With the following commit: fd35c88b7417 ("objtool: Support GCC 8 switch tables") I added a "can't find switch jump table" warning, to stop covering up silent failures if add_switch_table() can't find anything. That warning found yet another bug in the objtool switch table detection logic. For cases 1 and 2 (as described in the comments of find_switch_table()), the find_symbol_containing() check doesn't adjust the offset for RIP-relative switch jumps. Incidentally, this bug was already fixed for case 3 with: 6f5ec2993b1f ("objtool: Detect RIP-relative switch table references") However, that commit missed the fix for cases 1 and 2. The different cases are now starting to look more and more alike. So fix the bug by consolidating them into a single case, by checking the original dynamic jump instruction in the case 3 loop. This also simplifies the code and makes it more robust against future switch table detection issues -- of which I'm sure there will be many... Switch table detection has been the most fragile area of objtool, by far. I long for the day when we'll have a GCC plugin for annotating switch tables. Linus asked me to delay such a plugin due to the flakiness of the plugin infrastructure in older versions of GCC, so this rickety code is what we're stuck with for now. At least the code is now a little simpler than it was. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f400541613d45689086329432f3095119ffbc328.1526674218.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19	efi/libstub/arm64: Handle randomized TEXT_OFFSET	Mark Rutland	1	-0/+10
	When CONFIG_RANDOMIZE_TEXT_OFFSET=y, TEXT_OFFSET is an arbitrary multiple of PAGE_SIZE in the interval [0, 2MB). The EFI stub does not account for the potential misalignment of TEXT_OFFSET relative to EFI_KIMG_ALIGN, and produces a randomized physical offset which is always a round multiple of EFI_KIMG_ALIGN. This may result in statically allocated objects whose alignment exceeds PAGE_SIZE to appear misaligned in memory. This has been observed to result in spurious stack overflow reports and failure to make use of the IRQ stacks, and theoretically could result in a number of other issues. We can OR in the low bits of TEXT_OFFSET to ensure that we have the necessary offset (and hence preserve the misalignment of TEXT_OFFSET relative to EFI_KIMG_ALIGN), so let's do that. Reported-by: Kim Phillips <kim.phillips@arm.com> Tested-by: Kim Phillips <kim.phillips@arm.com> [ardb: clarify comment and commit log, drop unneeded parens] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 6f26b3671184c36d ("arm64: kaslr: increase randomization granularity") Link: http://lkml.kernel.org/r/20180518140841.9731-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-18	hfsplus: stop workqueue when fill_super() failed	Tetsuo Handa	1	-0/+1
	syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync(). As far as I can see, it is hfsplus_mark_mdb_dirty() from hfsplus_new_inode() in hfsplus_fill_super() that calls queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does not fail if queue_delayed_work() was called, and the out_put_hidden_dir label is the appropriate location to call cancel_delayed_work_sync(). [1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9 Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	mm: don't allow deferred pages with NEED_PER_CPU_KM	Pavel Tatashin	1	-0/+1
	It is unsafe to do virtual to physical translations before mm_init() is called if struct page is needed in order to determine the memory section number (see SECTION_IN_PAGE_FLAGS). This is because only in mm_init() we initialize struct pages for all the allocated memory when deferred struct pages are used. My recent fix in commit c9e97a1997 ("mm: initialize pages on demand during boot") exposed this problem, because it greatly reduced number of pages that are initialized before mm_init(), but the problem existed even before my fix, as Fengguang Wu found. Below is a more detailed explanation of the problem. We initialize struct pages in four places: 1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones. 2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock. 3. Using on-demand logic when pages are allocated after mm_init call (when memblock is finished) 4. After smp_init() when the rest free deferred pages are initialized. The problem occurs if we try to do va to phys translation of a memory between steps 1 and 2. Because we have not yet initialized struct pages for all the reserved pages, it is inherently unsafe to do va to phys if the translation itself requires access of "struct page" as in case of this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP The following path exposes the problem: start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT) We disable this path by not allowing NEED_PER_CPU_KM with deferred struct pages feature. The problems are discussed in these threads: http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Dennis Zhou <dennisszhou@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	MAINTAINERS: add Q: entry to kselftest for patchwork project	Shuah Khan (Samsung OSG)	1	-0/+1
	A new patchwork project is created to track kselftest patches. Update the kselftest entry in the MAINTAINERS file adding 'Q:' entry: https://patchwork.kernel.org/project/linux-kselftest/list/ Link: http://lkml.kernel.org/r/20180515164427.12201-1-shuah@kernel.org Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	radix tree: fix multi-order iteration race	Ross Zwisler	1	-4/+2
	Fix a race in the multi-order iteration code which causes the kernel to hit a GP fault. This was first seen with a production v4.15 based kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD DAX entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember for example that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. We fix this race by fixing the way that skip_siblings() detects sibling nodes. Instead of testing against the preceding slot we instead look for siblings via is_sibling_entry() which compares against the position of the struct radix_tree_node.slots[] array. This ensures that sibling entries are properly identified, even if they are no longer contiguous with the 'entry' they point to. Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators") Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	radix tree test suite: multi-order iteration race	Ross Zwisler	2	-0/+64
	Add a test which shows a race in the multi-order iteration code. This test reliably hits the race in under a second on my machine, and is the result of a real bug report against kernel a production v4.15 based kernel (4.15.6-300.fc27.x86_64). With a real kernel this issue is hit when using order 9 PMD DAX radix tree entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. In the radix tree test suite this will be caught by the address sanitizer: ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0 READ of size 8 at 0x60c0008ae400 thread T3 #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660 #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567 #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655 #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a) #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e) Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	radix tree test suite: add item_delete_rcu()	Ross Zwisler	2	-0/+21
	Currently the lifetime of "struct item" entries in the radix tree are not controlled by RCU, but are instead deleted inline as they are removed from the tree. In the following patches we add a test which has threads iterating over items pulled from the tree and verifying them in an rcu_read_lock()/rcu_read_unlock() section. This means that though an item has been removed from the tree it could still be being worked on by other threads until the RCU grace period expires. So, we need to actually free the "struct item" structures at the end of the grace period, just as we do with "struct radix_tree_node" items. Link: http://lkml.kernel.org/r/20180503192430.7582-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	radix tree test suite: fix compilation issue	Ross Zwisler	1	-1/+2
	Pulled from a patch from Matthew Wilcox entitled "xarray: Add definition of struct xarray": > From: Matthew Wilcox <mawilcox@microsoft.com> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> https://patchwork.kernel.org/patch/10341249/ These defines fix this compilation error: In file included from ./linux/radix-tree.h:6:0, from ./linux/../../../../include/linux/idr.h:15, from ./linux/idr.h:1, from idr.c:4: ./linux/../../../../include/linux/idr.h: In function `idr_init_base': ./linux/../../../../include/linux/radix-tree.h:129:2: warning: implicit declaration of function `spin_lock_init'; did you mean `spinlock_t'? [-Wimplicit-function-declaration] spin_lock_init(&(root)->xa_lock); \ ^ ./linux/../../../../include/linux/idr.h:126:2: note: in expansion of macro `INIT_RADIX_TREE' INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER); ^~~~~~~~~~~~~~~ by providing a spin_lock_init() wrapper for the v4.17-rc* version of the radix tree test suite. Link: http://lkml.kernel.org/r/20180503192430.7582-3-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	radix tree test suite: fix mapshift build target	Ross Zwisler	1	-4/+2
	Commit c6ce3e2fe3da ("radix tree test suite: Add config option for map shift") introduced a phony makefile target called 'mapshift' that ends up generating the file generated/map-shift.h. This phony target was then added as a dependency of the top level 'targets' build target, which is what is run when you go to tools/testing/radix-tree and just type 'make'. Unfortunately, this phony target doesn't actually work as a dependency, so you end up getting: $ make make: * No rule to make target 'generated/map-shift.h', needed by 'main.o'. Stop. make: * Waiting for unfinished jobs.... Fix this by making the file generated/map-shift.h our real makefile target, and add this a dependency of the top level build target. Link: http://lkml.kernel.org/r/20180503192430.7582-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	include/linux/mm.h: add new inline function vmf_error()	Souptick Joarder	1	-0/+7
	Many places in drivers/ file systems, error was handled in a common way like below: ret = (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS; vmf_error() will replace this and return vm_fault_t type err. A lot of drivers and filesystems currently have a rather complex mapping of errno-to-VM_FAULT code. We have been able to eliminate a lot of it by just returning VM_FAULT codes directly from functions which are called exclusively from the fault handling path. Some functions can be called both from the fault handler and other context which are expecting an errno, so they have to continue to return an errno. Some users still need to choose different behaviour for different errnos, but vmf_error() captures the essential error translation that's common to all users, and those that need to handle additional errors can handle them first. Link: http://lkml.kernel.org/r/20180510174826.GA14268@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly	Matthew Wilcox	1	-6/+15
	I had neglected to increment the error counter when the tests failed, which made the tests noisy when they fail, but not actually return an error code. Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au Fixes: 3cc78125a081 ("lib/test_bitmap.c: add optimisation tests") Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18	platform/x86: DELL_WMI use depends on instead of select for DELL_SMBIOS	Darren Hart	1	-1/+1
	If DELL_WMI "select"s DELL_SMBIOS, the DELL_SMBIOS dependencies are ignored and it is still possible to end up with unmet direct dependencies. Change the select to a depends on. Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-05-18	cxgb4: fix offset in collecting TX rate limit info	Rahul Lakkireddy	1	-19/+9
	Correct the indirect register offsets in collecting TX rate limit info in UP CIM logs. Also, T5 doesn't support these indirect register offsets, so remove them from collection logic. Fixes: be6e36d916b1 ("cxgb4: collect TX rate limit info in UP CIM logs") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18	net: sched: red: avoid hashing NULL child	Paolo Abeni	2	-4/+6
	Hangbin reported an Oops triggered by the syzkaller qdisc rules: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI Modules linked in: sch_red CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:qdisc_hash_add+0x26/0xa0 RSP: 0018:ffff8800589cf470 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971 RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0 R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0 R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0 FS: 00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: red_change+0x2d2/0xed0 [sch_red] qdisc_create+0x57e/0xef0 tc_modify_qdisc+0x47f/0x14e0 rtnetlink_rcv_msg+0x6a8/0x920 netlink_rcv_skb+0x2a2/0x3c0 netlink_unicast+0x511/0x740 netlink_sendmsg+0x825/0xc30 sock_sendmsg+0xc5/0x100 ___sys_sendmsg+0x778/0x8e0 __sys_sendmsg+0xf5/0x1b0 do_syscall_64+0xbd/0x3b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x450869 RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869 RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700 Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51 RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470 When a red qdisc is updated with a 0 limit, the child qdisc is left unmodified, no additional scheduler is created in red_change(), the 'child' local variable is rightfully NULL and must not add it to the hash table. This change addresses the above issue moving qdisc_hash_add() right after the child qdisc creation. It additionally removes unneeded checks for noop_qdisc. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 49b499718fa1 ("net: sched: make default fifo qdiscs appear in the dump") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>