wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-26	aio: simplify KIOCB_KEY handling	Christoph Hellwig	2	-9/+7
	No need to pass the key field to lookup_iocb to compare it with KIOCB_KEY, as we can do that right after retrieving it from userspace. Also move the KIOCB_KEY definition to aio.c as it is an internal value not used by any other place in the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-26	fs: introduce new ->get_poll_head and ->poll_mask methods	Christoph Hellwig	5	-7/+50
	->get_poll_head returns the waitqueue that the poll operation is going to sleep on. Note that this means we can only use a single waitqueue for the poll, unlike some current drivers that use two waitqueues for different events. But now that we have keyed wakeups and heavily use those for poll there aren't that many good reason left to keep the multiple waitqueues, and if there are any ->poll is still around, the driver just won't support aio poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-26	fs: add new vfs_poll and file_can_poll helpers	Christoph Hellwig	9	-38/+32
	These abstract out calls to the poll method in preparation for changes in how we poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-26	fs: update documentation to mention __poll_t and match the code	Christoph Hellwig	2	-2/+2
	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26	fs: cleanup do_pollfd	Christoph Hellwig	1	-25/+23
	Use straightline code with failure handling gotos instead of a lot of nested conditionals. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-26	fs: unexport poll_schedule_timeout	Christoph Hellwig	2	-4/+1
	No users outside of select.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-26	uapi: turn __poll_t sparse checks on by default	Christoph Hellwig	1	-4/+0
	Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-23	fix io_destroy()/aio_complete() race	Al Viro	1	-2/+1
	If io_destroy() gets to cancelling everything that can be cancelled and gets to kiocb_cancel() calling the function driver has left in ->ki_cancel, it becomes vulnerable to a race with IO completion. At that point req is already taken off the list and aio_complete() does NOT spin until we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds to kiocb_free(), freing req just it gets passed to ->ki_cancel(). Fix is simple - remove from the list after the call of kiocb_cancel(). All instances of ->ki_cancel() already have to cope with the being called with iocb still on list - that's what happens in io_cancel(2). Cc: stable@kernel.org Fixes: 0460fef2a921 "aio: use cancellation list lazily" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	aio: fix io_destroy(2) vs. lookup_ioctx() race	Al Viro	1	-2/+2
	kill_ioctx() used to have an explicit RCU delay between removing the reference from ->ioctx_table and percpu_ref_kill() dropping the refcount. At some point that delay had been removed, on the theory that percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was the wrong kind of RCU delay and it didn't care about rcu_read_lock() used by lookup_ioctx(). As the result, we could get ctx freed right under lookup_ioctx(). Tejun has fixed that in a6d7cff472e ("fs/aio: Add explicit RCU grace period when freeing kioctx"); however, that fix is not enough. Suppose io_destroy() from one thread races with e.g. io_setup() from another; CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2 has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the refcount, getting it to 0 and triggering a call of free_ioctx_users(), which proceeds to drop the secondary refcount and once that reaches zero calls free_ioctx_reqs(). That does INIT_RCU_WORK(&ctx->free_rwork, free_ioctx); queue_rcu_work(system_wq, &ctx->free_rwork); and schedules freeing the whole thing after RCU delay. In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the refcount from 0 to 1 and returned the reference to io_setup(). Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get freed until after percpu_ref_get(). Sure, we'd increment the counter before ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it has grabbed the reference, ctx is NOT going away until it gets around to dropping that reference. The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss. It's not costlier than what we currently do in normal case, it's safe to call since freeing is delayed and it closes the race window - either lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx() fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see the object in question at all. Cc: stable@kernel.org Fixes: a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	ext2: fix a block leak	Al Viro	1	-10/+0
	open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks not freed... Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to 737f2e93b972 ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr(). Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed	Al Viro	1	-0/+22
	That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed	Al Viro	1	-0/+10
	That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	unfuck sysfs_mount()	Al Viro	1	-3/+3
	new_sb is left uninitialized in case of early failures in kernfs_mount_ns(), and while IS_ERR(root) is true in all such cases, using IS_ERR(root) \|\| !new_sb is not a solution - IS_ERR(root) is true in some cases when new_sb is true. Make sure new_sb is initialized (and matches the reality) in all cases and fix the condition for dropping kobj reference - we want it done precisely in those situations where the reference has not been transferred into a new super_block instance. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	kernfs: deal with kernfs_fill_super() failures	Al Viro	1	-0/+1
	make sure that info->node is initialized early, so that kernfs_kill_sb() can list_del() it safely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	cramfs: Fix IS_ENABLED typo	Joe Perches	1	-1/+1
	There's an extra C here... Fixes: 99c18ce580c6 ("cramfs: direct memory access support") Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	befs_lookup(): use d_splice_alias()	Al Viro	1	-12/+5
	RTFS(Documentation/filesystems/nfs/Exporting) if you try to make something exportable. Fixes: ac632f5b6301 "befs: add NFS export support" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	affs_lookup: switch to d_splice_alias()	Al Viro	1	-6/+5
	Making something exportable takes more than providing ->s_export_ops. In particular, ->lookup() MUST use d_splice_alias() instead of d_add(). Reading Documentation/filesystems/nfs/Exporting would've been a good idea; as it is, exporting AFFS is badly (and exploitably) broken. Partially-Fixes: ed4433d72394 "fs/affs: make affs exportable" Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21	affs_lookup(): close a race with affs_remove_link()	Al Viro	1	-3/+7
	we unlock the directory hash too early - if we are looking at secondary link and primary (in another directory) gets removed just as we unlock, we could have the old primary moved in place of the secondary, leaving us to look into freed entry (and leaving our dentry with ->d_fsdata pointing to a freed entry). Cc: stable@vger.kernel.org # 2.4.4+ Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-13	fix breakage caused by d_find_alias() semantics change	Al Viro	1	-4/+16
	"VFS: don't keep disconnected dentries on d_anon" had a non-trivial side-effect - d_unhashed() now returns true for those dentries, making d_find_alias() skip them altogether. For most of its callers that's fine - we really want a connected alias there. However, there is a codepath where we relied upon picking such aliases if nothing else could be found - selinux delayed initialization of contexts for inodes on already mounted filesystems used to rely upon that. Cc: stable@kernel.org # f1ee616214cb "VFS: don't keep disconnected dentries on d_anon" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11	fs: don't scan the inode cache before SB_BORN is set	Dave Chinner	1	-6/+24
	We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. It produces an oops down this path during the failed mount: radix_tree_gang_lookup_tag+0xc4/0x130 xfs_perag_get_tag+0x37/0xf0 xfs_reclaim_inodes_count+0x32/0x40 xfs_fs_nr_cached_objects+0x11/0x20 super_cache_count+0x35/0xc0 shrink_slab.part.66+0xb1/0x370 shrink_node+0x7e/0x1a0 try_to_free_pages+0x199/0x470 __alloc_pages_slowpath+0x3a1/0xd20 __alloc_pages_nodemask+0x1c3/0x200 cache_grow_begin+0x20b/0x2e0 fallback_alloc+0x160/0x200 kmem_cache_alloc+0x111/0x4e0 The problem is that the superblock shrinker is running before the filesystem structures it depends on have been fully set up. i.e. the shrinker is registered in sget(), before ->fill_super() has been called, and the shrinker can call into the filesystem before fill_super() does it's setup work. Essentially we are exposed to both use-after-free and use-before-initialisation bugs here. To fix this, add a check for the SB_BORN flag in super_cache_count. In general, this flag is not set until ->fs_mount() completes successfully, so we know that it is set after the filesystem setup has completed. This matches the trylock_super() behaviour which will not let super_cache_scan() run if SB_BORN is not set, and hence will not allow the superblock shrinker from entering the filesystem while it is being set up or after it has failed setup and is being torn down. Cc: stable@kernel.org Signed-Off-By: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11	do d_instantiate/unlock_new_inode combinations safely	Al Viro	14	-72/+57
	For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-02	iov_iter: fix memory leak in pipe_get_pages_alloc()	Ilya Dryomov	1	-1/+1
	Make n signed to avoid leaking the pages array if __pipe_get_pages() fails to allocate any pages. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-02	iov_iter: fix return type of __pipe_get_pages()	Ilya Dryomov	1	-1/+1
	It returns -EFAULT and happens to be a helper for pipe_get_pages() whose return type is ssize_t. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-02	aio: implement io_pgetevents	Christoph Hellwig	8	-11/+130
	This is the io_getevents equivalent of ppoll/pselect and allows to properly mix signals and aio completions (especially with IOCB_CMD_POLL) and atomically executes the following sequence: sigset_t origmask; pthread_sigmask(SIG_SETMASK, &sigmask, &origmask); ret = io_getevents(ctx, min_nr, nr, events, timeout); pthread_sigmask(SIG_SETMASK, &origmask, NULL); Note that unlike many other signal related calls we do not pass a sigmask size, as that would get us to 7 arguments, which aren't easily supported by the syscall infrastructure. It seems a lot less painful to just add a new syscall variant in the unlikely case we're going to increase the sigset size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-02	aio: implement IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC	Christoph Hellwig	1	-0/+43
	Simple workqueue offload for now, but prepared for adding a real aio_fsync method if the need arises. Based on an earlier patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-02	aio: refactor read/write iocb setup	Christoph Hellwig	1	-69/+92
	Don't reference the kiocb structure from the common aio code, and move any use of it into helper specific to the read/write path. This is in preparation for aio_poll support that wants to use the space for different fields. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-02	aio: remove the extra get_file/fput pair in io_submit_one	Christoph Hellwig	1	-9/+16
	If we release the lockdep write protection token before calling into ->write_iter and thus never access the file pointer after an -EIOCBQUEUED return from ->write_iter or ->read_iter we don't need this extra reference. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-02	aio: sanitize ki_list handling	Christoph Hellwig	1	-7/+6
	Instead of handcoded non-null checks always initialize ki_list to an empty list and use list_empty / list_empty_careful on it. While we're at it also error out on a double call to kiocb_set_cancel_fn instead of ignoring it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-02	aio: remove an outdated BUG_ON and comment in aio_complete	Christoph Hellwig	1	-9/+0
	These days we don't treat sync iocbs special in the aio completion code as they never use it. Remove the old comment and BUG_ON given that the current definition of is_sync_kiocb makes it impossible to hit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-02	aio: don't print the page size at boot time	Christoph Hellwig	1	-3/+0
	The page size is in no way related to the aio code, and printing it in the (debug) dmesg at every boot serves no purpose. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-29	Linux v4.17-rc3	Linus Torvalds	1	-1/+1

2018-04-28	<linux/stringhash.h>: fix end_name_hash() for 64bit long	Amir Goldstein	1	-2/+2
	The comment claims that this helper will try not to loose bits, but for 64bit long it looses the high bits before hashing 64bit long into 32bit int. Use the helper hash_long() to do the right thing for 64bit long. For 32bit long, there is no change. All the callers of end_name_hash() either assign the result to qstr->hash, which is u32 or return the result as an int value (e.g. full_name_hash()). Change the helper return type to int to conform to its users. [ It took me a while to apply this, because my initial reaction to it was - incorrectly - that it could make for slower code. After having looked more at it, I take back all my complaints about the patch, Amir was right and I was mis-reading things or just being stupid. I also don't worry too much about the possible performance impact of this on 64-bit, since most architectures that actually care about performance end up not using this very much (the dcache code is the most performance-critical, but the word-at-a-time case uses its own hashing anyway). So this ends up being mostly used for filesystems that do their own degraded hashing (usually because they want a case-insensitive comparison function). A _tiny_ worry remains, in that not everybody uses DCACHE_WORD_ACCESS, and then this potentially makes things more expensive on 64-bit architectures with slow or lacking multipliers even for the normal case. That said, realistically the only such architecture I can think of is PA-RISC. Nobody really cares about performance on that, it's more of a "look ma, I've got warts^W an odd machine" platform. So the patch is fine, and all my initial worries were just misplaced from not looking at this properly. - Linus ] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-28	MAINTAINERS: add myself as maintainer of AFFS	David Sterba	1	-1/+2
	The AFFS filesystem is still in use by m68k community (Link #2), but as there was no code activity and no maintainer, the filesystem appeared on the list of candidates for staging/removal (Link #1). I volunteer to act as a maintainer of AFFS to collect any fixes that might show up and to guard fs/affs/ against another spring cleaning. Link: https://lkml.kernel.org/r/20180425154602.GA8546@bombadil.infradead.org Link: https://lkml.kernel.org/r/1613268.lKBQxPXt8J@merkaba CC: Martin Steigerwald <martin@lichtvoll.de> CC: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-27	x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI	KarimAllah Ahmed	2	-7/+7
	Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of capabilities. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-04-27	kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use	Junaid Shahid	1	-10/+4
	Currently, KVM flushes the TLB after a change to the APIC access page address or the APIC mode when EPT mode is enabled. However, even in shadow paging mode, a TLB flush is needed if VPIDs are being used, as specified in the Intel SDM Section 29.4.5. So replace vmx_flush_tlb_ept_only() with vmx_flush_tlb(), which will flush if either EPT or VPIDs are in use. Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-04-27	x86/entry/64/compat: Preserve r8-r11 in int $0x80	Andy Lutomirski	2	-18/+25
	32-bit user code that uses int $80 doesn't care about r8-r11. There is, however, some 64-bit user code that intentionally uses int $0x80 to invoke 32-bit system calls. From what I've seen, basically all such code assumes that r8-r15 are all preserved, but the kernel clobbers r8-r11. Since I doubt that there's any code that depends on int $0x80 zeroing r8-r11, change the kernel to preserve them. I suspect that very little user code is broken by the old clobber, since r8-r11 are only rarely allocated by gcc, and they're clobbered by function calls, so they only way we'd see a problem is if the same function that invokes int $0x80 also spills something important to one of these registers. The current behavior seems to date back to the historical commit "[PATCH] x86-64 merge for 2.6.4". Before that, all regs were preserved. I can't find any explanation of why this change was made. Update the test_syscall_vdso_32 testcase as well to verify the new behavior, and it strengthens the test to make sure that the kernel doesn't accidentally permute r8..r15. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org
2018-04-27	x86/ipc: Fix x32 version of shmid64_ds and msqid64_ds	Arnd Bergmann	2	-0/+73
	A bugfix broke the x32 shmid64_ds and msqid64_ds data structure layout (as seen from user space) a few years ago: Originally, __BITS_PER_LONG was defined as 64 on x32, so we did not have padding after the 64-bit __kernel_time_t fields, After __BITS_PER_LONG got changed to 32, applications would observe extra padding. In other parts of the uapi headers we seem to have a mix of those expecting either 32 or 64 on x32 applications, so we can't easily revert the path that broke these two structures. Instead, this patch decouples x32 from the other architectures and moves it back into arch specific headers, partially reverting the even older commit 73a2d096fdf2 ("x86: remove all now-duplicate header files"). It's not clear whether this ever made any difference, since at least glibc carries its own (correct) copy of both of these header files, so possibly no application has ever observed the definitions here. Based on a suggestion from H.J. Lu, I tried out the tool from https://github.com/hjl-tools/linux-header to find other such bugs, which pointed out the same bug in statfs(), which also has a separate (correct) copy in glibc. Fixes: f4b4aae18288 ("x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . J . Lu" <hjl.tools@gmail.com> Cc: Jeffrey Walton <noloader@gmail.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180424212013.3967461-1-arnd@arndb.de
2018-04-27	x86/setup: Do not reserve a crash kernel region if booted on Xen PV	Petr Tesarik	1	-0/+6
	Xen PV domains cannot shut down and start a crash kernel. Instead, the crashing kernel makes a SCHEDOP_shutdown hypercall with the reason code SHUTDOWN_crash, cf. xen_crash_shutdown() machine op in arch/x86/xen/enlighten_pv.c. A crash kernel reservation is merely a waste of RAM in this case. It may also confuse users of kexec_load(2) and/or kexec_file_load(2). When flags include KEXEC_ON_CRASH or KEXEC_FILE_ON_CRASH, respectively, these syscalls return success, which is technically correct, but the crash kexec image will never be actually used. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jean Delvare <jdelvare@suse.de> Link: https://lkml.kernel.org/r/20180425120835.23cef60c@ezekiel.suse.cz
2018-04-27	i2c: sprd: Fix the i2c count issue	Baolin Wang	1	-4/+2
	We found the I2C controller count register is unreliable sometimes, that will cause I2C to lose data. Thus we can read the data count from 'i2c_dev->count' instead of the I2C controller count register. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-04-27	i2c: sprd: Prevent i2c accesses after suspend is called	Baolin Wang	1	-0/+16
	Add one flag to indicate if the i2c controller has been in suspend state, which can prevent i2c accesses after i2c controller is suspended following system suspend. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-04-27	i2c: dev: prevent ZERO_SIZE_PTR deref in i2cdev_ioctl_rdwr()	Alexander Popov	1	-1/+1
	i2cdev_ioctl_rdwr() allocates i2c_msg.buf using memdup_user(), which returns ZERO_SIZE_PTR if i2c_msg.len is zero. Currently i2cdev_ioctl_rdwr() always dereferences the buf pointer in case of I2C_M_RD \| I2C_M_RECV_LEN transfer. That causes a kernel oops in case of zero len. Let's check the len against zero before dereferencing buf pointer. This issue was triggered by syzkaller. Signed-off-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> [wsa: use '< 1' instead of '!' for easier readability] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-04-27	arm64: avoid instrumenting atomic_ll_sc.o	Mark Rutland	1	-0/+4
	Our out-of-line atomics are built with a special calling convention, preventing pointless stack spilling, and allowing us to patch call sites with ARMv8.1 atomic instructions. Instrumentation inserted by the compiler may result in calls to functions not following this special calling convention, resulting in registers being unexpectedly clobbered, and various problems resulting from this. For example, if a kernel is built with KCOV and ARM64_LSE_ATOMICS, the compiler inserts calls to __sanitizer_cov_trace_pc in the prologues of the atomic functions. This has been observed to result in spurious cmpxchg failures, leading to a hang early on in the boot process. This patch avoids such issues by preventing instrumentation of our out-of-line atomics. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-27	powerpc/kvm/booke: Fix altivec related build break	Laurentiu Tudor	1	-0/+7
	Add missing "altivec unavailable" interrupt injection helper thus fixing the linker error below: arch/powerpc/kvm/emulate_loadstore.o: In function `kvmppc_check_altivec_disabled': arch/powerpc/kvm/emulate_loadstore.c: undefined reference to `.kvmppc_core_queue_vec_unavail' Fixes: 09f984961c137c4b ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions") Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-27	powerpc: Fix deadlock with multiple calls to smp_send_stop	Nicholas Piggin	1	-16/+39
	smp_send_stop can lock up the IPI path for any subsequent calls, because the receiving CPUs spin in their handler function. This started becoming a problem with the addition of an smp_send_stop call in the reboot path, because panics can reboot after doing their own smp_send_stop. The NMI IPI variant was fixed with ac61c11566 ("powerpc: Fix smp_send_stop NMI IPI handling"), which leaves the smp_call_function variant. This is fixed by having smp_send_stop only ever do the smp_call_function once. This is a bit less robust than the NMI IPI fix, because any other call to smp_call_function after smp_send_stop could deadlock, but that has always been the case, and it was not been a problem before. Fixes: f2748bdfe1573 ("powerpc/powernv: Always stop secondaries before reboot/shutdown") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-27	cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt	Shilpasri G Bhat	1	-3/+11
	gpstate_timer_handler() uses synchronous smp_call to set the pstate on the requested core. This causes the below hard lockup: smp_call_function_single+0x110/0x180 (unreliable) smp_call_function_any+0x180/0x250 gpstate_timer_handler+0x1e8/0x580 call_timer_fn+0x50/0x1c0 expire_timers+0x138/0x1f0 run_timer_softirq+0x1e8/0x270 __do_softirq+0x158/0x3e4 irq_exit+0xe8/0x120 timer_interrupt+0x9c/0xe0 decrementer_common+0x114/0x120 -- interrupt: 901 at doorbell_global_ipi+0x34/0x50 LR = arch_send_call_function_ipi_mask+0x120/0x130 arch_send_call_function_ipi_mask+0x4c/0x130 smp_call_function_many+0x340/0x450 pmdp_invalidate+0x98/0xe0 change_huge_pmd+0xe0/0x270 change_protection_range+0xb88/0xe40 mprotect_fixup+0x140/0x340 SyS_mprotect+0x1b4/0x350 system_call+0x58/0x6c One way to avoid this is removing the smp-call. We can ensure that the timer always runs on one of the policy-cpus. If the timer gets migrated to a cpu outside the policy then re-queue it back on the policy->cpus. This way we can get rid of the smp-call which was being used to set the pstate on the policy->cpus. Fixes: 7bc54b652f13 ("timers, cpufreq/powernv: Initialize the gpstate timer as pinned") Cc: stable@vger.kernel.org # v4.8+ Reported-by: Nicholas Piggin <npiggin@gmail.com> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-26	x86/cpu/intel: Add missing TLB cpuid values	jacek.tomaka@poczta.fm	1	-0/+3
	Make kernel print the correct number of TLB entries on Intel Xeon Phi 7210 (and others) Before: [ 0.320005] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 After: [ 0.320005] Last level dTLB entries: 4KB 256, 2MB 128, 4MB 128, 1GB 16 The entries do exist in the official Intel SMD but the type column there is incorrect (states "Cache" where it should read "TLB"), but the entries for the values 0x6B, 0x6C and 0x6D are correctly described as 'Data TLB'. Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180423161425.24366-1-jacekt@dugeo.com
2018-04-26	mtd: rawnand: marvell: fix the chip-select DT parsing logic	Miquel Raynal	1	-17/+8
	The block responsible of parsing the DT for the number of chip-select lines uses an 'if/else if/else if' block. The content of the second and third 'else if' conditions are: 1/ the actual condition to enter the sub-block and 2/ the operation to do in this sub-block. [...] else if (condition1_to_enter && action1() == failed) raise_error(); else if (condition2_to_enter && action2() == failed) raise_error(); [...] In case of failure, the sub-block is entered and an error raised. Otherwise, in case of success, the code would continue erroneously in the next 'else if' statement because it did not failed (and did not enter the first 'else if' sub-block). The first 'else if' refers to legacy bindings while the second 'else if' refers to new bindings. The second 'else if', which is entered erroneously, checks for the 'reg' property, which, for old bindings, does not mean anything because it would not be the number of CS available, but the regular register map of almost any DT node. This being said, the content of the 'reg' property being the register map offset and length, it has '2' values, so the number of CS in this situation is assumed to be '2'. When running nand_scan_ident() with 2 CS, the core will check for an array of chips. It will first issue a RESET and then a READ_ID. Of course this will trigger two timeouts because there is no chip in front of the second CS: [ 1.367460] marvell-nfc f2720000.nand: Timeout on CMDD (NDSR: 0x00000080) [ 1.474292] marvell-nfc f2720000.nand: Timeout on CMDD (NDSR: 0x00000280) Indeed, this is harmless and the core will then assume there is only one valid CS. Fix the logic in the whole block by entering each sub-block just on the 'is legacy' condition, doing the action inside the sub-block. This way, when the action succeeds, the whole block is left. Furthermore, for both the old bindings and the new bindings the same logic was applied to retrieve the number of CS lines: using of_get_property() to get a size in bytes, converted in the actual number of lines by dividing it per sizeof(u32) (4 bytes). This is fine for the 'reg' property which is a list of the CS IDs but not for the 'num-cs' property which is directly the value of the number of CS. Anyway, no existing DT uses another value than 'num-cs = <1>' and no other value has ever been supported by the old driver (pxa3xx_nand.c). Remove this condition and apply a number of 1 CS anyway, as already described in the bindings. Finally, the 'reg' property of a 'nand' node (with the new bindings) gives the IDs of each CS line in use. marvell_nand.c driver first look at the number of CS lines that are present in this property. Better use of_property_count_elems_of_size() than dividing by 4 the size of the number of bytes returned by of_get_property(). Fixes: 02f26ecf8c772 ("mtd: nand: add reworked Marvell NAND controller driver") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-04-26	KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_mmio_read_apr()	Mark Rutland	1	-0/+5
	It's possible for userspace to control n. Sanitize n when using it as an array index. Note that while it appears that n must be bound to the interval [0,3] due to the way it is extracted from addr, we cannot guarantee that compiler transformations (and/or future refactoring) will ensure this is the case, and given this is a slow path it's better to always perform the masking. Found by smatch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-26	KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()	Mark Rutland	1	-4/+10
	It's possible for userspace to control intid. Sanitize intid when using it as an array index. At the same time, sort the includes when adding <linux/nospec.h>. Found by smatch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-26	arm64: fix possible spectre-v1 in ptrace_hbp_get_event()	Mark Rutland	1	-4/+10
	It's possible for userspace to control idx. Sanitize idx when using it as an array index. Found by smatch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>