wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2024-07-24	inode: clarify what's locked	Christian Brauner	1	-11/+11
	In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held but the inode is unhashed. We then release the inode_lock. So using "locked" as parameter name is confusing. Use is_inode_hash_locked as parameter name instead. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	vfs: Fix potential circular locking through setxattr() and removexattr()	David Howells	1	-43/+48
	When using cachefiles, lockdep may emit something similar to the circular locking dependency notice below. The problem appears to stem from the following: (1) Cachefiles manipulates xattrs on the files in its cache when called from ->writepages(). (2) The setxattr() and removexattr() system call handlers get the name (and value) from userspace after taking the sb_writers lock, putting accesses of the vma->vm_lock and mm->mmap_lock inside of that. (3) The afs filesystem uses a per-inode lock to prevent multiple revalidation RPCs and in writeback vs truncate to prevent parallel operations from deadlocking against the server on one side and local page locks on the other. Fix this by moving the getting of the name and value in {get,remove}xattr() outside of the sb_writers lock. This also has the minor benefits that we don't need to reget these in the event of a retry and we never try to take the sb_writers lock in the event we can't pull the name and value into the kernel. Alternative approaches that might fix this include moving the dispatch of a write to the cache off to a workqueue or trying to do without the validation lock in afs. Note that this might also affect other filesystems that use netfslib and/or cachefiles. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-build2+ #956 Not tainted ------------------------------------------------------ fsstress/6050 is trying to acquire lock: ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0 but task is already holding lock: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&vma->vm_lock->lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_write+0x3b/0x50 vma_start_write+0x6b/0xa0 vma_link+0xcc/0x140 insert_vm_struct+0xb7/0xf0 alloc_bprm+0x2c1/0x390 kernel_execve+0x65/0x1a0 call_usermodehelper_exec_async+0x14d/0x190 ret_from_fork+0x24/0x40 ret_from_fork_asm+0x1a/0x30 -> #3 (&mm->mmap_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 __might_fault+0x7c/0xb0 strncpy_from_user+0x25/0x160 removexattr+0x7f/0x100 __do_sys_fremovexattr+0x7e/0xb0 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (sb_writers#14){.+.+}-{0:0}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 percpu_down_read+0x3c/0x90 vfs_iocb_iter_write+0xe9/0x1d0 __cachefiles_write+0x367/0x430 cachefiles_issue_write+0x299/0x2f0 netfs_advance_write+0x117/0x140 netfs_write_folio.isra.0+0x5ca/0x6e0 netfs_writepages+0x230/0x2f0 afs_writepages+0x4d/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 __filemap_fdatawrite_range+0xa8/0xf0 file_write_and_wait_range+0x59/0x90 afs_release+0x10f/0x270 __fput+0x25f/0x3d0 __do_sys_close+0x43/0x70 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&vnode->validate_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 afs_writepages+0x37/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 filemap_invalidate_inode+0x167/0x1e0 netfs_unbuffered_write_iter+0x1bd/0x2d0 vfs_write+0x22e/0x320 ksys_write+0xbc/0x130 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (mapping.invalidate_lock#3){++++}-{3:3}: check_noncircular+0x119/0x160 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 filemap_fault+0x26e/0x8b0 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&vma->vm_lock->lock); lock(&mm->mmap_lock); lock(&vma->vm_lock->lock); rlock(mapping.invalidate_lock#3); * DEADLOCK * 1 lock held by fsstress/6050: #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 stack backtrace: CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x80 check_noncircular+0x119/0x160 ? queued_spin_lock_slowpath+0x4be/0x510 ? __pfx_check_noncircular+0x10/0x10 ? __pfx_queued_spin_lock_slowpath+0x10/0x10 ? mark_lock+0x47/0x160 ? init_chain_block+0x9c/0xc0 ? add_chain_block+0x84/0xf0 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 ? __pfx___lock_acquire+0x10/0x10 ? __lock_release.isra.0+0x13b/0x230 lock_acquire.part.0+0x103/0x280 ? filemap_fault+0x26e/0x8b0 ? __pfx_lock_acquire.part.0+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? lock_acquire+0xd7/0x120 down_read+0x95/0x200 ? filemap_fault+0x26e/0x8b0 ? __pfx_down_read+0x10/0x10 ? __filemap_get_folio+0x25/0x1a0 filemap_fault+0x26e/0x8b0 ? __pfx_filemap_fault+0x10/0x10 ? find_held_lock+0x7c/0x90 ? __pfx___lock_release.isra.0+0x10/0x10 ? __pte_offset_map+0x99/0x110 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 ? __pfx___handle_mm_fault+0x10/0x10 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Gao Xiang <xiang@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org [brauner: fix minor issues] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	filelock: Fix fcntl/close race recovery compat path	Jann Horn	1	-5/+4
	When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	fs: use all available ids	Christian Brauner	1	-1/+1
	The counter is unconditionally incremented for each mount allocation. If we set it to 1ULL << 32 we're losing 4294967296 as the first valid non-32 bit mount id. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-1-834113cab0d2@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT	David Howells	1	-1/+1
	Set the maximum size of a subrequest that writes to cachefiles to be MAX_RW_COUNT so that we don't overrun the maximum write we can make to the backing filesystem. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1599005.1721398742@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	netfs: Fix writeback that needs to go to both server and cache	David Howells	1	-0/+1
	When netfslib is performing writeback (ie. ->writepages), it maintains two parallel streams of writes, one to the server and one to the cache, but it doesn't mark either stream of writes as active until it gets some data that needs to be written to that stream. This is done because some folios will only be written to the cache (e.g. copying to the cache on read is done by marking the folios and letting writeback do the actual work) and sometimes we'll only be writing to the server (e.g. if there's no cache). Now, since we don't actually dispatch uploads and cache writes in parallel, but rather flip between the streams, depending on which has the lowest so-far-issued offset, and don't wait for the subreqs to finish before flipping, we can end up in a situation where, say, we issue a write to the server and this completes before we start the write to the cache. But because we only activate a stream when we first add a subreq to it, the result collection code may run before we manage to activate the stream - resulting in the folio being cleaned and having the writeback-in-progress mark removed. At this point, the folio no longer belongs to us. This is only really a problem for folios that need to be written to both streams - and in that case, the upload to the server is started first, followed by the write to the cache - and the cache write may see a bad folio. Fix this by activating the cache stream up front if there's a cache available. If there's a cache, then all data is going to be written to it. Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1599053.1721398818@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	pidfs: add selftests for new namespace ioctls	Christian Brauner	1	-31/+227
	Add selftests to verify that deriving namespace file descriptors from pidfd file descriptors works correctly. Link: https://lore.kernel.org/r/20240722-work-pidfs-69dbea91edab@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	pidfs: handle kernels without namespaces cleanly	Christian Brauner	1	-23/+42
	The nsproxy structure contains nearly all of the namespaces associated with a task. When a given namespace type is not supported by this kernel the rules whether the corresponding pointer in struct nsproxy is NULL or always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be the case and for all namespace types we'd always set it to init_<ns_type>_ns when the corresponding namespace type isn't supported. Make sure we handle all namespaces where the pointer in struct nsproxy can be NULL when the namespace type isn't supported. Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	pidfs: when time ns disabled add check for ioctl	Edward Adam Davis	1	-0/+2
	syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in open_namespace. Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Reported-and-tested-by: syzbot+34a0ee986f61f15da35d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_7FAE8DB725EE0DD69236DDABDDDE195E4F07@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	vfs: correct the comments of vfs_*() helpers	Congjie Zhou	1	-13/+13
	correct the comments of vfs_*() helpers in fs/namei.c, including: 1. vfs_create() 2. vfs_mknod() 3. vfs_mkdir() 4. vfs_rmdir() 5. vfs_symlink() All of them come from the same commit: 6521f8917082 "namei: prepare for idmapped mounts" The @dentry is actually the dentry of child directory rather than base directory(parent directory), and thus the @dir has to be modified due to the change of @dentry. Signed-off-by: Congjie Zhou <zcjie0802@qq.com> Link: https://lore.kernel.org/r/tencent_2FCF6CC9E10DC8A27AE58A5A0FE4FCE96D0A@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	vfs: handle __wait_on_freeing_inode() and evict() race	Mateusz Guzik	1	-0/+20
	Lockless hash lookup can find and lock the inode after it gets the I_FREEING flag set, at which point it blocks waiting for teardown in evict() to finish. However, the flag is still set even after evict() wakes up all waiters. This results in a race where if the inode lock is taken late enough, it can happen after both hash removal and wakeups, meaning there is nobody to wake the racing thread up. This worked prior to RCU-based lookup because the entire ordeal was synchronized with the inode hash lock. Since unhashing requires the inode lock, we can safely check whether it happened after acquiring it. Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/ Reported-by: Dominique Martinet <asmadeus@codewreck.org> Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops") Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240718151838.611807-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG	David Howells	1	-10/+8
	CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do that now. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1410796.1721333406@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	netfs: Revert "netfs: Switch debug logging to pr_debug()"	David Howells	15	-78/+113
	Revert commit 163eae0fb0d4c610c59a8de38040f8e12f89fd43 to get back the original operation of the debugging macros. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24	kbuild: doc: gcc to CC change	Ivan Davydov	1	-3/+3
	In this part of the documentation, $(CC) is meant, but gcc is written. Signed-off-by: Ivan Davydov <davydoff33@yandex.ru> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-22	execve: Move KUnit tests to tests/ subdirectory	Kees Cook	5	-3/+4
	Move the exec KUnit tests into a separate directory to avoid polluting the local directory namespace. Additionally update MAINTAINERS for the new files. Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240720170310.it.942-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-07-22	bcachefs: Fix printbuf usage while atomic	Kent Overstreet	1	-0/+1
	Reported-by: syzbot+f765e51170cf13493f0b@syzkaller.appspotmail.com Fixes: f12410bb7ddd ("bcachefs: Add an error message for insufficient rw journal devs") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-22	bcachefs: More informative error message in reattach_inode()	Kent Overstreet	1	-2/+5
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-22	kallsyms: change sym_entry::percpu_absolute to bool type	Masahiro Yamada	1	-4/+4
	This field is boolean. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-22	kallsyms: unify seq and start_pos fields of struct sym_entry	Masahiro Yamada	1	-3/+2
	The struct sym_entry uses the 'seq' and 'start_pos' fields to remember the index in the symbol table. They serve the same purpose and are not used simultaneously. Unify them. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-22	kallsyms: add more original symbol type/name in comment lines	Masahiro Yamada	1	-10/+9
	Commit bea5b7450474 ("kallsyms: expand symbol name into comment for debugging") added the uncompressed type/name in the comment lines of kallsyms_offsets. It would be useful to do the same for kallsyms_names and kallsyms_seqs_of_names. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-22	kallsyms: use \t instead of a tab in printf()	Masahiro Yamada	1	-1/+1
	This string literal uses a mixture of \t escape sequences and a tab. Use \t consistently. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-22	kallsyms: avoid repeated calculation of array size for markers	Masahiro Yamada	1	-3/+4
	Introduce the markers_cnt variable for readability. No functional change intended. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-22	kbuild: add script and target to generate pacman package	Thomas Weißschuh	5	-1/+136
	pacman is the package manager used by Arch Linux and its derivates. Creating native packages from the kernel tree has multiple advantages: * The package triggers the correct hooks for initramfs generation and bootloader configuration * Uninstallation is complete and also invokes the relevant hooks * New UAPI headers can be installed without any manual bookkeeping The PKGBUILD file is a modified version of the one used for the downstream Arch Linux "linux" package. Extra steps that should not be necessary for a development kernel have been removed and an UAPI header package has been added. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-21	modpost: use generic macros for hash table implementation	Masahiro Yamada	1	-13/+5
	Use macros provided by hashtable.h Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-21	kbuild: move some helper headers from scripts/kconfig/ to scripts/include/	Masahiro Yamada	18	-226/+19
	Move array_size.h, hashtable.h, list.h, list_types.h from scripts/kconfig/ to scripts/include/. These headers will be useful for other host programs. Remove scripts/mod/list.h. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-20	cifs: Fix missing fscache invalidation	David Howells	1	-0/+6
	A network filesystem needs to implement a netfslib hook to invalidate fscache if it's to be able to use the cache. Fix cifs to implement the cache invalidation hook. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-20	io_uring: fix error pbuf checking	Pavel Begunkov	1	-1/+3
	Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent error handling in io_alloc_pbuf_ring(). KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:__io_remove_buffers+0xac/0x700 io_uring/kbuf.c:341 Call Trace: <TASK> io_put_bl io_uring/kbuf.c:378 [inline] io_destroy_buffers+0x14e/0x490 io_uring/kbuf.c:392 io_ring_ctx_free+0xa00/0x1070 io_uring/io_uring.c:2613 io_ring_exit_work+0x80f/0x8a0 io_uring/io_uring.c:2844 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd40 kernel/workqueue.c:3390 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Cc: stable@vger.kernel.org Reported-by: syzbot+2074b1a3d447915c6f1c@syzkaller.appspotmail.com Fixes: 87585b05757dc ("io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c5f9df20560bd9830401e8e48abc029e7cfd9f5e.1721329239.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-20	io_uring: fix lost getsockopt completions	Pavel Begunkov	1	-1/+1
	There is a report that iowq executed getsockopt never completes. The reason being that io_uring_cmd_sock() can return a positive result, and io_uring_cmd() propagates it back to core io_uring, instead of IOU_OK. In case of io_wq_submit_work(), the request will be dropped without completing it. The offending code was introduced by a hack in a9c3eda7eada9 ("io_uring: fix submission-failure handling for uring-cmd"), however it was fine until getsockopt was introduced and started returning positive results. The right solution is to always return IOU_OK, since e0b23d9953b0c ("io_uring: optimise ltimeout for inline execution"), we should be able to do it without problems, however for the sake of backporting and minimising side effects, let's keep returning negative return codes and otherwise do IOU_OK. Link: https://github.com/axboe/liburing/issues/1181 Cc: stable@vger.kernel.org Fixes: 8e9fad0e70b7b ("io_uring: Add io_uring command support for sockets") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/ff349cf0654018189b6077e85feed935f0f8839e.1721149870.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-20	LoongArch: Make the users of larch_insn_gen_break() constant	Oleg Nesterov	3	-4/+7
	LoongArch defines UPROBE_SWBP_INSN as a function call and this breaks arch_uprobe_trampoline() which uses it to initialize a static variable. Add the new "__builtin_constant_p" helper, __emit_break(), and redefine the current users of larch_insn_gen_break() to use it. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20240614174822.GA1185149@thelio-3990X/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Check TIF_LOAD_WATCH to enable user space watchpoint	Tiezhu Yang	2	-1/+4
	Currently, there are some places to set CSR.PRMD.PWE, the first one is in hw_breakpoint_thread_switch() to enable user space singlestep via checking TIF_SINGLESTEP, the second one is in hw_breakpoint_control() to enable user space watchpoint. For the latter case, it should also check TIF_LOAD_WATCH to make the logic correct and clear. Fixes: c8e57ab0995c ("LoongArch: Trigger user-space watchpoints correctly") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Use rustc option -Zdirect-access-external-data	WANG Rui	1	-1/+2
	-Zdirect-access-external-data is a new Rust compiler option added in Rust 1.78, which we use to optimize the access of external data in the Linux kernel's Rust code. This patch modifies the Rust code in vmlinux to directly access externa data, using PC-REL instead of GOT. However, Rust code whithin modules is constrained by the PC-REL addressing range and is explicitly set to use an indirect method. Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add support for relocating the kernel with RELR relocation	Xi Ruoyao	5	-0/+33
	RELR as a relocation packing format for relative relocations for reducing the size of relative relocation records. In a position independent executable there are often many relative relocation records, and our vmlinux is a PIE. The LLD linker (since 17.0.0) and the BFD linker (since 2.43) supports packing the relocations in the RELR format for LoongArch, with the flag -z pack-relative-relocs. Commits 5cf896fb6be3eff ("arm64: Add support for relocating the kernel with RELR relocations") and ccb2d173b983984bfa ("Makefile: use -z pack-relative-relocs") have already added the framework to use RELR. We just need to wire it up and process the RELR relocation records in relocate_relative() in addition to the RELA relocation records. A ".p2align 3" directive is added to la_abs macro or the BFD linker cannot pack the relocation records against the .la_abs section (the ". = ALIGN(8);" directive in vmlinux.lds.S is too late in the linking process). With defconfig and CONFIG_RELR vmlinux.efi is 2.1 MiB (6%) smaller, and vmlinuz.efi (using gzip compression) is 384 KiB (2.8%) smaller. Link: https://groups.google.com/d/topic/generic-abi/bX460iggiKg Link: https://reviews.llvm.org/D138135#4531389 Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d89ecf33ab6d Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Remove a redundant checking in relocator	Xi Ruoyao	1	-3/+1
	With our linker script "relocated_addr >= VMLINUX_LOAD_ADDRESS" should be always true. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Use correct API to map cmdline in relocate_kernel()	Huacai Chen	1	-1/+3
	fw_arg1 is in memory space rather than I/O space, so we should use early_memremap_ro() instead of early_ioremap() to map the cmdline. Moreover, we should unmap it after using. Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Automatically disable KASLR for hibernation	Huacai Chen	1	-0/+26
	Hibernation assumes the memory layout after resume be the same as that before sleep, so it expects the kernel is loaded at the same position. To achieve this goal we automatically disable KASLR if user explicitly requests hibernation via the "resume=" command line. Since "nohibernate" and "noresume" have higher priorities than "resume=", we only disable KASLR if there is no "nohibernate" and "noresume". Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add ACPI standard hardware register based S3 support	Jiaxun Yang	2	-6/+18
	Most LoongArch 64 machines are using custom "SADR" ACPI extension to perform ACPI S3 sleep. However the standard ACPI way to perform sleep is to write a value to ACPI PM1/SLEEP_CTL register, and this is never supported properly in kernel. Add standard S3 sleep by providing a default DoSuspend function which calls ACPI's acpi_enter_sleep_state() routine when SADR is not provided by the firmware. Also fix suspend assembly code so that ra is set properly before go into sleep routine. (Previously linked address of jirl was set to a0, some firmware do require return address in a0 but it's already set with la.pcrel before). Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add architectural preparation for CPUFreq	Huacai Chen	3	-0/+19
	Add architectural preparation for CPUFreq driver, including: Kconfig, register definition and platform device registration. Some of LoongArch processors support DVFS, their IOCSR.FEATURES has IOCSRF_FREQSCALE set. And they has a micro-core in the package called SMC (System Management Controller) to scale frequency, voltage, etc. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add writecombine support for DMW-based ioremap()	Huacai Chen	7	-17/+37
	Currently, only TLB-based ioremap() support writecombine, so add the counterpart for DMW-based ioremap() with help of DMW2. The base address (WRITECOMBINE_BASE) is configured as 0xa000000000000000. DMW3 is unused by kernel now, however firmware may leave garbage in them and interfere kernel's address mapping. So clear it as necessary. BTW, centralize the DMW configuration to macro SETUP_DMWINS. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add ARCH_HAS_DEBUG_VM_PGTABLE support	Huacai Chen	2	-1/+2
	Add ARCH_HAS_DEBUG_VM_PGTABLE selection in Kconfig, in order to make corresponding vm debug features usable on LoongArch. Also update the corresponding arch-support.txt document. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add ARCH_HAS_PTE_DEVMAP support	Huacai Chen	3	-2/+24
	In order for things like get_user_pages() to work on ZONE_DEVICE memory, we need a software PTE bit to identify device-backed PFNs. Hook this up along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add RANDOMIZE_KSTACK_OFFSET support	Jinjie Ruan	2	-1/+22
	Add support of kernel stack offset randomization while handling syscall, the offset is defaultly limited by KSTACK_OFFSET_MAX(). In order to avoid triggering stack canaries (due to __builtin_alloca()) and slowing down the entry path, use __no_stack_protector attribute to disable stack protector for do_syscall() at function level. With this patch, the REPORT_STACK test show that: `loongarch64 bits of stack entropy: 7` Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Add irq_work support via self IPIs	Huacai Chen	5	-1/+34
	Add irq_work support for LoongArch via self IPIs. This make it possible to run works in hardware interrupt context, which is a prerequisite for NOHZ_FULL. Implement: - arch_irq_work_raise() - arch_irq_work_has_interrupt() Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Always enumerate MADT and setup logical-physical CPU mapping	Huacai Chen	3	-11/+22
	Some drivers want to use cpu_logical_map(), early_cpu_to_node() and some other CPU mapping APIs, even if we use "nr_cpus=1" to hard limit the CPU number. This is strongly required for the multi-bridges machines. Currently, we stop parsing the MADT if the nr_cpus limit is reached, but to achieve the above goal we should always enumerate the MADT table and setup logical-physical CPU mapping whether there is a nr_cpus limit. Rework the MADT enumeration: 1. Define a flag "cpu_enumerated" to distinguish the first enumeration (cpu_enumerated=0) and the physical hotplug case (cpu_enumerated=1) for set_processor_mask(). 2. If cpu_enumerated=0, stop parsing only when NR_CPUS limit is reached, so we can setup logical-physical CPU mapping; if cpu_enumerated=1, stop parsing when nr_cpu_ids limit is reached, so we can avoid some runtime bugs. Once logical-physical CPU mapping is setup, we will let cpu_enumerated=1. 3. Use find_first_zero_bit() instead of cpumask_next_zero() to find the next zero bit (free logical CPU id) in the cpu_present_mask, because cpumask_next_zero() will stop at nr_cpu_ids. 4. Only touch cpu_possible_mask if cpu_enumerated=0, this is in order to avoid some potential crashes, because cpu_possible_mask is marked as __ro_after_init. 5. In prefill_possible_map(), clear cpu_present_mask bits greater than nr_cpu_ids, in order to avoid a CPU be "present" but not "possible". Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h	Huacai Chen	2	-2/+2
	Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value. But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there! Actually, back when the LoongArch port was under review, people were aware of the same problem with sandboxing clone3 [3], so clone was eventually kept. Unfortunately it seemed at that time no one had noticed statx, so besides restoring fstat/newfstatat to LoongArch uapi (and postponing the problem further), it seems inevitable that we would need to tackle seccomp deep argument inspection. However, this is obviously a decision that shouldn't be taken lightly, so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT in unistd.h. This is the simplest solution for now, and so we hope the community will tackle the long-standing problem of seccomp deep argument inspection in the future [4][5]. Also add "newstat" to syscall_abis_64 in Makefile.syscalls due to upstream asm-generic changes. More infomation please reading this thread [6]. [1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150 [2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355 [3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/ [4] https://lwn.net/Articles/799557/ [5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf [6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@brauner/T/#t Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20	i2c: header: improve kdoc for i2c_algorithm	Wolfram Sang	1	-8/+7
	Reword the explanation of @xfer, the old one was confusing and mixing up terminology. Other than that, capitalize some words correctly and use full line length. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-07-20	i2c: header: remove unneeded stuff regarding i2c_algorithm	Wolfram Sang	1	-6/+0
	The forward declaration is not needed anymore. The sentence about "following structs" became obsolete when struct i2c_algorithm became a kdoc. The paragraph about return values can go because we have this information in kdoc already. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-07-20	i2c: piix4: Register SPDs	Thomas Weißschuh	2	-0/+10
	The piix4 I2C bus can carry SPDs, register them if present. Only look on bus 0, as this is where the SPDs seem to be located. Only the first 8 slots are supported. If the system has more, then these will not be visible. The AUX bus can not be probed as on some platforms it reports all devices present and all reads return "0". This would allow the ee1004 to be probed incorrectly. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-07-20	Makefile: add comment to discourage tools/* addition for kernel builds	Masahiro Yamada	1	-0/+6
	Kbuild provides scripts/Makefile.host to build host programs used for building the kernel. Unfortunately, there are two exceptions that opt out of Kbuild. The build system under tools/ is a cheesy replica, and cause issues. I was recently poked about a problem in the tools build system, which I do not maintain (and nobody maintains). [1] Without a comment, people might believe this is the right location because that is where objtool lives, even if a more robust Kbuild syntax satisfies their needs. [2] [1]: https://lore.kernel.org/linux-kbuild/ZnIYWBgrJ-IJtqK8@google.com/T/#m8ece130dd0e23c6f2395ed89070161948dee8457 [2]: https://lore.kernel.org/all/20240618200501.GA1611012@google.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
2024-07-20	kbuild: clean up scripts/remove-stale-files	Masahiro Yamada	1	-18/+0
	These lines have been here for more than a year. Remove them. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-20	kconfig: recursive checks drop file/lineno	HONG Yifan	2	-48/+30
	This prevents segfault when getting filename and lineno in recursive checks. If the following snippet is found in Kconfig: [Test code 1] config FOO bool depends on BAR select BAR ... without BAR defined; then there is a segfault. Kconfig:34:error: recursive dependency detected! Kconfig:34: symbol FOO depends on BAR make[4]: * [scripts/kconfig/Makefile:85: allnoconfig] Segmentation fault This is because of the following. BAR is a fake entry created by sym_lookup() with prop being NULL. In the recursive check, there is a NULL check for prop to fall back to stack->sym->prop if stack->prop is NULL. However, in this case, stack->sym points to the fake BAR entry created by sym_lookup(), so prop is still NULL. prop was then referenced without additional NULL checks, causing segfault. As the previous email thread suggests, the file and lineno for select is also wrong: [Test code 2] config FOO bool config BAR bool config FOO bool "FOO" depends on BAR select BAR $ make defconfig * Default configuration is based on 'x86_64_defconfig' Kconfig:1:error: recursive dependency detected! Kconfig:1: symbol FOO depends on BAR Kconfig:4: symbol BAR is selected by FOO [...] Kconfig:4 should be Kconfig:10. This patch deletes the wrong and segfault-prone filename/lineno inference completely. With this patch, Test code 1 yields: error: recursive dependency detected! symbol FOO depends on BAR symbol BAR is selected by FOO Signed-off-by: HONG Yifan <elsk@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>