wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2023-07-28	tpm: Switch i2c drivers back to use .probe()	Uwe Kleine-König	6	-6/+6
	After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() call-back type"), all drivers being converted to .probe_new() and then 03c835f498b5 ("i2c: Switch .probe() to not take an id parameter") convert back to (the new) .probe() to be able to eventually drop .probe_new() from struct i2c_driver. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28	security: keys: perform capable check only on privileged operations	Christian Göttsche	1	-3/+8
	If the current task fails the check for the queried capability via `capable(CAP_SYS_ADMIN)` LSMs like SELinux generate a denial message. Issuing such denial messages unnecessarily can lead to a policy author granting more privileges to a subject than needed to silence them. Reorder CAP_SYS_ADMIN checks after the check whether the operation is actually privileged. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28	Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF"	Mike Rapoport (IBM)	1	-4/+0
	This reverts commit 9e46e4dcd9d6cd88342b028dbfa5f4fb7483d39c. kbuild reports a warning in memblock_remove_region() because of a false positive caused by partial reset of the memblock state. Doing the full reset will remove the false positives, but will allow late use of memblock_free() to go unnoticed, so it is better to revert the offending commit. WARNING: CPU: 0 PID: 1 at mm/memblock.c:352 memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00001-g9e46e4dcd9d6 #2 RIP: 0010:memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Call Trace: memblock_discard (kbuild/src/x86_64/mm/memblock.c:383) page_alloc_init_late (kbuild/src/x86_64/include/linux/find.h:208 kbuild/src/x86_64/include/linux/nodemask.h:266 kbuild/src/x86_64/mm/mm_init.c:2405) kernel_init_freeable (kbuild/src/x86_64/init/main.c:1325 kbuild/src/x86_64/init/main.c:1546) kernel_init (kbuild/src/x86_64/init/main.c:1439) ret_from_fork (kbuild/src/x86_64/arch/x86/kernel/process.c:145) ret_from_fork_asm (kbuild/src/x86_64/arch/x86/entry/entry_64.S:298) Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202307271656.447aa17e-oliver.sang@intel.com Signed-off-by: "Mike Rapoport (IBM)" <rppt@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-28	mm/mempolicy: Take VMA lock before replacing policy	Jann Horn	1	-1/+14
	mbind() calls down into vma_replace_policy() without taking the per-VMA locks, replaces the VMA's vma->vm_policy pointer, and frees the old policy. That's bad; a concurrent page fault might still be using the old policy (in vma_alloc_folio()), resulting in use-after-free. Normally this will manifest as a use-after-free read first, but it can result in memory corruption, including because vma_alloc_folio() can call mpol_cond_put() on the freed policy, which conditionally changes the policy's refcount member. This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA systems as long as the kernel was built with CONFIG_NUMA. Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-28	ACPI/IORT: Remove erroneous id_count check in iort_node_get_rmr_info()	Guanghui Feng	1	-3/+0
	According to the ARM IORT specifications DEN 0049 issue E, the "Number of IDs" field in the ID mapping format reports the number of IDs in the mapping range minus one. In iort_node_get_rmr_info(), we erroneously skip ID mappings whose "Number of IDs" equal to 0, resulting in valid mapping nodes with a single ID to map being skipped, which is wrong. Fix iort_node_get_rmr_info() by removing the bogus id_count check. Fixes: 491cf4a6735a ("ACPI/IORT: Add support to retrieve IORT RMR reserved regions") Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com> Cc: <stable@vger.kernel.org> # 6.0.x Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/1689593625-45213-1-git-send-email-guanghuifeng@linux.alibaba.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-07-28	ata: libata-core: fix when to fetch sense data for successful commands	Niklas Cassel	1	-2/+2
	The condition to fetch sense data was supposed to be: ATA_SENSE set AND either 1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command sense data supported bit is set); or 2) Command was non-NCQ and regular sense data reporting is enabled. However the check in 2) accidentally had the negation at the wrong place, causing it to try to fetch sense data if it was a non-NCQ command _or_ if regular sense data reporting was _not_ enabled. Fix this by removing the extra parentheses that should not be there, such that only the correct return (ata_is_ncq()) is negated. Fixes: 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD") Reported-by: Borislav Petkov <bp@alien8.de> Closes: https://lore.kernel.org/linux-ide/20230722155621.GIZLv8JbURKzHtKvQE@fat_crate.local/ Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-07-28	ata: pata_ns87415: mark ns87560_tf_read static	Arnd Bergmann	1	-1/+1
	The global function triggers a warning because of the missing prototype drivers/ata/pata_ns87415.c:263:6: warning: no previous prototype for 'ns87560_tf_read' [-Wmissing-prototypes] 263 \| void ns87560_tf_read(struct ata_port ap, struct ata_taskfile tf) There are no other references to this, so just make it static. Fixes: c4b5b7b6c4423 ("pata_ns87415: Initial cut at 87415/87560 IDE support") Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-07-27	mm/memory-failure: fix hardware poison check in unpoison_memory()	Sidhartha Kumar	1	-1/+1
	It was pointed out[1] that using folio_test_hwpoison() is wrong as we need to check the indiviual page that has poison. folio_test_hwpoison() only checks the head page so go back to using PageHWPoison(). User-visible effects include existing hwpoison-inject tests possibly failing as unpoisoning a single subpage could lead to unpoisoning an entire folio. Memory unpoisoning could also not work as expected as the function will break early due to only checking the head page and not the actually poisoned subpage. [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/ Link: https://lkml.kernel.org/r/20230717181812.167757-1-sidhartha.kumar@oracle.com Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	proc/vmcore: fix signedness bug in read_from_oldmem()	Dan Carpenter	1	-1/+1
	The bug is the error handling: if (tmp < nr_bytes) { "tmp" can hold negative error codes but because "nr_bytes" is type size_t the negative error codes are treated as very high positive values (success). Fix this by changing "nr_bytes" to type ssize_t. The "nr_bytes" variable is used to store values between 1 and PAGE_SIZE and they can fit in ssize_t without any issue. Link: https://lkml.kernel.org/r/b55f7eed-1c65-4adc-95d1-6c7c65a54a6e@moroto.mountain Fixes: 5d8de293c224 ("vmcore: convert copy_oldmem_page() to take an iov_iter") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	mailmap: update remaining active codeaurora.org email addresses	Bjorn Andersson	1	-1/+96
	The lack of mailmap updates for @codeaurora.org addresses reduces the usefulness of tools such as get_maintainer.pl. Some recent (and welcome!) additions has been made to improve the situation, this concludes the effort. Link: https://lkml.kernel.org/r/20230720210256.1296567-1-quic_bjorande@quicinc.com Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	mm: lock VMA in dup_anon_vma() before setting ->anon_vma	Jann Horn	1	-0/+1
	When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the VMA that is being expanded to cover the area previously occupied by another VMA. This currently happens while `dst` is not write-locked. This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent page faults can happen on `dst` under the per-VMA lock. This is already icky in itself, since such page faults can now install pages into `dst` that are attached to an `anon_vma` that is not yet tied back to the `anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due to an out-of-memory error, things get much worse: `anon_vma_clone()` then reverts `dst->anon_vma` back to NULL, and `dst` remains completely unconnected to the `anon_vma`, even though we can have pages in the area covered by `dst` that point to the `anon_vma`. This means the `anon_vma` of such pages can be freed while the pages are still mapped into userspace, which leads to UAF when a helper like folio_lock_anon_vma_read() tries to look up the anon_vma of such a page. This theoretically is a security bug, but I believe it is really hard to actually trigger as an unprivileged user because it requires that you can make an order-0 GFP_KERNEL allocation fail, and the page allocator tries pretty hard to prevent that. I think doing the vma_start_write() call inside dup_anon_vma() is the most straightforward fix for now. For a kernel-assisted reproducer, see the notes section of the patch mail. Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	mm: fix memory ordering for mm_lock_seq and vm_lock_seq	Jann Horn	3	-8/+59
	mm->mm_lock_seq effectively functions as a read/write lock; therefore it must be used with acquire/release semantics. A specific example is the interaction between userfaultfd_register() and lock_vma_under_rcu(). userfaultfd_register() does the following from the point where it changes a VMA's flags to the point where concurrent readers are permitted again (in a simple scenario where only a single private VMA is accessed and no merging/splitting is involved): userfaultfd_register userfaultfd_set_vm_flags vm_flags_reset vma_start_write down_write(&vma->vm_lock->lock) vma->vm_lock_seq = mm_lock_seq [marks VMA as busy] up_write(&vma->vm_lock->lock) vm_flags_init [sets VM_UFFD_* in __vm_flags] vma->vm_userfaultfd_ctx.ctx = ctx mmap_write_unlock vma_end_write_all WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA] There are no memory barriers in between the __vm_flags update and the mm->mm_lock_seq update that unlocks the VMA, so the unlock can be reordered to above the `vm_flags_init()` call, which means from the perspective of a concurrent reader, a VMA can be marked as a userfaultfd VMA while it is not VMA-locked. That's bad, we definitely need a store-release for the unlock operation. The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly fine because all accesses to vma->vm_lock_seq that matter are always protected by the VMA lock. There is a racy read in vma_start_read() though that can tolerate false-positives, so we should be using WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN). On the other side, lock_vma_under_rcu() works as follows in the relevant region for locking and userfaultfd check: lock_vma_under_rcu vma_start_read vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout] down_read_trylock(&vma->vm_lock->lock) vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check] userfaultfd_armed checks vma->vm_flags & __VM_UFFD_FLAGS Here, the interesting aspect is how far down the mm->mm_lock_seq read can be reordered - if this read is reordered down below the vma->vm_flags access, this could cause lock_vma_under_rcu() to partly operate on information that was read while the VMA was supposed to be locked. To prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we need to read it with a load-acquire. Some of the comment wording is based on suggestions by Suren. BACKPORT WARNING: One of the functions changed by this patch (which I've written against Linus' tree) is vma_try_start_write(), but this function no longer exists in mm/mm-everything. I don't know whether the merged version of this patch will be ordered before or after the patch that removes vma_try_start_write(). If you're backporting this patch to a tree with vma_try_start_write(), make sure this patch changes that function. Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	scripts/spelling.txt: remove 'thead' as a typo	Drew Fustini	1	-1/+0
	T-Head is a vendor of processor core IP, and they have recently introduced the RISC-V TH1520 SoC. Remove 'thead' as a typo of 'thread' to avoid checkpatch incorrectly warning that 'thead' is typo in patches that add support for T-Head designs in the kernel. Link: https://lkml.kernel.org/r/20230723010329.674186-1-dfustini@baylibre.com Link: https://www.t-head.cn/ Signed-off-by: Drew Fustini <dfustini@baylibre.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: Conor Dooley <conor@kernel.org> Cc: Jisheng Zhang <jszhang@kernel.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Diederik de Haas <didi.debian@cknow.org> Cc: Ian Rogers <irogers@google.com> Cc: Luca Ceresoli <luca.ceresoli@bootlin.com> # versaclock5 Cc: Randy Dunlap <rdunlap@infradead.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	mm/pagewalk: fix EFI_PGT_DUMP of espfix area	Hugh Dickins	1	-1/+4
	Booting x86_64 with CONFIG_EFI_PGT_DUMP=y shows messages of the form "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)". EFI_PGT_DUMP dumps all of efi_mm, including the espfix area, which is set up with pmd entries which fit the pmd_bad() check: so 0d940a9b270b warns and clears those entries, which would ruin running Win16 binaries. The failing pte_offset_map() stopped such a kernel from even booting, until a few commits later be872f83bf57 changed the pagewalk to tolerate that: but it needs to be even more careful, to not spoil those entries. I might have preferred to change init_espfix_ap() not to use "bad" pmd entries; or to leave them out of the efi_mm dump. But there is great value in staying away from there, and a pagewalk check of address against TASK_SIZE may protect from other such aberrations too. Link: https://lkml.kernel.org/r/22bca736-4cab-9ee5-6a52-73a3b2bbe865@google.com Closes: https://lore.kernel.org/linux-mm/CABXGCsN3JqXckWO=V7p=FhPU1tK03RE1w9UE6xL5Y86SMk209w@mail.gmail.com/ Fixes: 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail") Fixes: be872f83bf57 ("mm/pagewalk: walk_pte_range() allow for pte_offset_map()") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	shmem: minor fixes to splice-read implementation	Hugh Dickins	1	-3/+6
	HWPoison: my reading of folio_test_hwpoison() is that it only tests the head page of a large folio, whereas splice_folio_into_pipe() will splice as much of the folio as it can: so for safety we should also check the has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned. (Perhaps that ugliness can be improved at the mm end later.) The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask it for "part" not "len". Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com Fixes: bd194b187115 ("shmem: Implement splice-read") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	tmpfs: fix Documentation of noswap and huge mount options	Hugh Dickins	1	-27/+20
	The noswap mount option is surely not one of the three options for sizing: move its description down. The huge= mount option does not accept numeric values: those are just in an internal enum. Delete those numbers, and follow the manpage text more closely (but there's not yet any fadvise() or fcntl() which applies here). /sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and barely relevant to mounting a tmpfs: just refer to transhuge.rst (while still using the words deny and force, to help as informal reminders). [rdunlap@infradead.org: fixup Docs table for huge mount options] Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Fixes: d0f5a85442d1 ("shmem: update documentation") Fixes: 2c6efe9cf2d7 ("shmem: add support to ignore swap") Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	Revert "um: Use swap() to make code cleaner"	Andy Shevchenko	1	-3/+4
	This reverts commit 9b0da3f22307af693be80f5d3a89dc4c7f360a85. The sigio.c is clearly user space code which is handled by arch/um/scripts/Makefile.rules (see USER_OBJS rule). The above mentioned commit simply broke this agreement, we may not use Linux kernel internal headers in them without thorough thinking. Hence, revert the wrong commit. Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/ Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Herve Codina <herve.codina@bootlin.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Richard Weinberger <richard@nod.at> Cc: Yang Guang <yang.guang5@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	mm/damon/core-test: initialise context before test in damon_test_set_attrs()	Feng Tang	1	-5/+5
	Running kunit test for 6.5-rc1 hits one bug: ok 10 damon_test_update_monitoring_result general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G N 6.5.0-rc2 #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:damon_set_attrs+0xb9/0x120 Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d 8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00 49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89 RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246 RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70 RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78 R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> damon_test_set_attrs+0x63/0x1f0 kunit_generic_run_threadfn_adapter+0x17/0x30 kthread+0xfd/0x130 The problem seems to be related with the damon_ctx was used without being initialized. Fix it by adding the initialization. Link: https://lkml.kernel.org/r/20230718052811.1065173-1-feng.tang@intel.com Fixes: aa13779be6b7 ("mm/damon/core-test: add a test for damon_set_attrs()") Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27	mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock	Jann Horn	1	-12/+16
	lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't be called in the VMA-locked page fault path by ensuring that vma->anon_vma is set. However, this check happens before the VMA is locked, which means a concurrent move_vma() can concurrently call unlink_anon_vmas(), which disassociates the VMA's anon_vma. This means we can get UAF in the following scenario: THREAD 1 THREAD 2 ======== ======== <page fault> lock_vma_under_rcu() rcu_read_lock() mas_walk() check vma->anon_vma mremap() syscall move_vma() vma_start_write() unlink_anon_vmas() <syscall end> handle_mm_fault() __handle_mm_fault() handle_pte_fault() do_pte_missing() do_anonymous_page() anon_vma_prepare() __anon_vma_prepare() find_mergeable_anon_vma() mas_walk() [looks up VMA X] munmap() syscall (deletes VMA X) reusable_anon_vma() [called on freed VMA X] This is a security bug if you can hit it, although an attacker would have to win two races at once where the first race window is only a few instructions wide. This patch is based on some previous discussion with Linus Torvalds on the security list. Cc: stable@vger.kernel.org Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-27	hwmon: (k10temp) Enable AMD3255 Proc to show negative temperature	Baskaran Kannan	1	-2/+15
	Industrial processor i3255 supports temperatures -40 deg celcius to 105 deg Celcius. The current implementation of k10temp_read_temp rounds off any negative temperatures to '0'. To fix this, the following changes have been made. A flag 'disp_negative' is added to struct k10temp_data to support AMD i3255 processors. Flag 'disp_negative' is set if 3255 processor is found during k10temp_probe. Flag 'disp_negative' is used to determine whether to round off negative temperatures to '0' in k10temp_read_temp. Signed-off-by: Baskaran Kannan <Baski.Kannan@amd.com> Link: https://lore.kernel.org/r/20230727162159.1056136-1-Baski.Kannan@amd.com Fixes: aef17ca12719 ("hwmon: (k10temp) Only apply temperature offset if result is positive") Cc: stable@vger.kernel.org [groeck: Fixed multi-line comment] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-07-27	iommufd: Set end correctly when doing batch carry	Jason Gunthorpe	1	-1/+1
	Even though the test suite covers this it somehow became obscured that this wasn't working. The test iommufd_ioas.mock_domain.access_domain_destory would blow up rarely. end should be set to 1 because this just pushed an item, the carry, to the pfns list. Sometimes the test would blow up with: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 5 PID: 584 Comm: iommufd Not tainted 6.5.0-rc1-dirty #1236 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:batch_unpin+0xa2/0x100 [iommufd] Code: 17 48 81 fe ff ff 07 00 77 70 48 8b 15 b7 be 97 e2 48 85 d2 74 14 48 8b 14 fa 48 85 d2 74 0b 40 0f b6 f6 48 c1 e6 04 48 01 f2 <48> 8b 3a 48 c1 e0 06 89 ca 48 89 de 48 83 e7 f0 48 01 c7 e8 96 dc RSP: 0018:ffffc90001677a58 EFLAGS: 00010246 RAX: 00007f7e2646f000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 00000000fefc4c8d RDI: 0000000000fefc4c RBP: ffffc90001677a80 R08: 0000000000000048 R09: 0000000000000200 R10: 0000000000030b98 R11: ffffffff81f3bb40 R12: 0000000000000001 R13: ffff888101f75800 R14: ffffc90001677ad0 R15: 00000000000001fe FS: 00007f9323679740(0000) GS:ffff8881ba540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000105ede003 CR4: 00000000003706a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x5c/0x70 ? __die+0x1f/0x60 ? page_fault_oops+0x15d/0x440 ? lock_release+0xbc/0x240 ? exc_page_fault+0x4a4/0x970 ? asm_exc_page_fault+0x27/0x30 ? batch_unpin+0xa2/0x100 [iommufd] ? batch_unpin+0xba/0x100 [iommufd] __iopt_area_unfill_domain+0x198/0x430 [iommufd] ? __mutex_lock+0x8c/0xb80 ? __mutex_lock+0x6aa/0xb80 ? xa_erase+0x28/0x30 ? iopt_table_remove_domain+0x162/0x320 [iommufd] ? lock_release+0xbc/0x240 iopt_area_unfill_domain+0xd/0x10 [iommufd] iopt_table_remove_domain+0x195/0x320 [iommufd] iommufd_hw_pagetable_destroy+0xb3/0x110 [iommufd] iommufd_object_destroy_user+0x8e/0xf0 [iommufd] iommufd_device_detach+0xc5/0x140 [iommufd] iommufd_selftest_destroy+0x1f/0x70 [iommufd] iommufd_object_destroy_user+0x8e/0xf0 [iommufd] iommufd_destroy+0x3a/0x50 [iommufd] iommufd_fops_ioctl+0xfb/0x170 [iommufd] __x64_sys_ioctl+0x40d/0x9a0 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lore.kernel.org/r/3-v1-85aacb2af554+bc-iommufd_syz3_jgg@nvidia.com Cc: <stable@vger.kernel.org> Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reported-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-27	iommufd: IOMMUFD_DESTROY should not increase the refcount	Jason Gunthorpe	3	-30/+75
	syzkaller found a race where IOMMUFD_DESTROY increments the refcount: obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY); if (IS_ERR(obj)) return PTR_ERR(obj); iommufd_ref_to_users(obj); /* See iommufd_ref_to_users() */ if (!iommufd_object_destroy_user(ucmd->ictx, obj)) As part of the sequence to join the two existing primitives together. Allowing the refcount the be elevated without holding the destroy_rwsem violates the assumption that all temporary refcount elevations are protected by destroy_rwsem. Racing IOMMUFD_DESTROY with iommufd_object_destroy_user() will cause spurious failures: WARNING: CPU: 0 PID: 3076 at drivers/iommu/iommufd/device.c:477 iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:478 Modules linked in: CPU: 0 PID: 3076 Comm: syz-executor.0 Not tainted 6.3.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 RIP: 0010:iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:477 Code: e8 3d 4e 00 00 84 c0 74 01 c3 0f 0b c3 0f 1f 44 00 00 f3 0f 1e fa 48 89 fe 48 8b bf a8 00 00 00 e8 1d 4e 00 00 84 c0 74 01 c3 <0f> 0b c3 0f 1f 44 00 00 41 57 41 56 41 55 4c 8d ae d0 00 00 00 41 RSP: 0018:ffffc90003067e08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888109ea0300 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff RBP: 0000000000000004 R08: 0000000000000000 R09: ffff88810bbb3500 R10: ffff88810bbb3e48 R11: 0000000000000000 R12: ffffc90003067e88 R13: ffffc90003067ea8 R14: ffff888101249800 R15: 00000000fffffffe FS: 00007ff7254fe6c0(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555557262da8 CR3: 000000010a6fd000 CR4: 0000000000350ef0 Call Trace: <TASK> iommufd_test_create_access drivers/iommu/iommufd/selftest.c:596 [inline] iommufd_test+0x71c/0xcf0 drivers/iommu/iommufd/selftest.c:813 iommufd_fops_ioctl+0x10f/0x1b0 drivers/iommu/iommufd/main.c:337 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x84/0xc0 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The solution is to not increment the refcount on the IOMMUFD_DESTROY path at all. Instead use the xa_lock to serialize everything. The refcount check == 1 and xa_erase can be done under a single critical region. This avoids the need for any refcount incrementing. It has the downside that if userspace races destroy with other operations it will get an EBUSY instead of waiting, but this is kind of racing is already dangerous. Fixes: 2ff4bed7fee7 ("iommufd: File descriptor, context, kconfig and makefiles") Link: https://lore.kernel.org/r/2-v1-85aacb2af554+bc-iommufd_syz3_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+7574ebfe589049630608@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-27	ublk: return -EINTR if breaking from waiting for existed users in DEL_DEV	Ming Lei	1	-2/+2
	If user interrupts wait_event_interruptible() in ublk_ctrl_del_dev(), return -EINTR and let user know what happens. Fixes: 0abe39dec065 ("block: ublk: improve handling device deletion") Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20230726144502.566785-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-27	ublk: fail to recover device if queue setup is interrupted	Ming Lei	1	-1/+3
	In ublk_ctrl_end_recovery(), if wait_for_completion_interruptible() is interrupted by signal, queues aren't setup successfully yet, so we have to fail UBLK_CMD_END_USER_RECOVERY, otherwise kernel oops can be triggered. Fixes: c732a852b419 ("ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support") Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20230726144502.566785-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-27	ublk: fail to start device if queue setup is interrupted	Ming Lei	1	-1/+2
	In ublk_ctrl_start_dev(), if wait_for_completion_interruptible() is interrupted by signal, queues aren't setup successfully yet, so we have to fail UBLK_CMD_START_DEV, otherwise kernel oops can be triggered. Reported by German when working on qemu-storage-deamon which requires single thread ublk daemon. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reported-by: German Maglione <gmaglione@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230726144502.566785-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-27	tipc: stop tipc crypto on failure in tipc_node_create	Fedor Pchelkin	1	-1/+1
	If tipc_link_bc_create() fails inside tipc_node_create() for a newly allocated tipc node then we should stop its tipc crypto and free the resources allocated with a call to tipc_crypto_start(). As the node ref is initialized to one to that point, just put the ref on tipc_link_bc_create() error case that would lead to tipc_node_free() be eventually executed and properly clean the node and its crypto resources. Found by Linux Verification Center (linuxtesting.org). Fixes: cb8092d70a6f ("tipc: move bc link creation back to tipc_node_create") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230725214628.25246-1-pchelkin@ispras.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27	af_unix: Terminate sun_path when bind()ing pathname socket.	Kuniyuki Iwashima	1	-5/+16
	kernel test robot reported slab-out-of-bounds access in strlen(). [0] Commit 06d4c8a80836 ("af_unix: Fix fortify_panic() in unix_bind_bsd().") removed unix_mkname_bsd() call in unix_bind_bsd(). If sunaddr->sun_path is not terminated by user and we don't enable CONFIG_INIT_STACK_ALL_ZERO=y, strlen() will do the out-of-bounds access during file creation. Let's go back to strlen()-with-sockaddr_storage way and pack all 108 trickiness into unix_mkname_bsd() with bold comments. [0]: BUG: KASAN: slab-out-of-bounds in strlen (lib/string.c:?) Read of size 1 at addr ffff000015492777 by task fortify_strlen_/168 CPU: 0 PID: 168 Comm: fortify_strlen_ Not tainted 6.5.0-rc1-00333-g3329b603ebba #16 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace (arch/arm64/kernel/stacktrace.c:235) show_stack (arch/arm64/kernel/stacktrace.c:242) dump_stack_lvl (lib/dump_stack.c:107) print_report (mm/kasan/report.c:365 mm/kasan/report.c:475) kasan_report (mm/kasan/report.c:590) __asan_report_load1_noabort (mm/kasan/report_generic.c:378) strlen (lib/string.c:?) getname_kernel (./include/linux/fortify-string.h:? fs/namei.c:226) kern_path_create (fs/namei.c:3926) unix_bind (net/unix/af_unix.c:1221 net/unix/af_unix.c:1324) __sys_bind (net/socket.c:1792) __arm64_sys_bind (net/socket.c:1801) invoke_syscall (arch/arm64/kernel/syscall.c:? arch/arm64/kernel/syscall.c:52) el0_svc_common (./include/linux/thread_info.h:127 arch/arm64/kernel/syscall.c:147) do_el0_svc (arch/arm64/kernel/syscall.c:189) el0_svc (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:648) el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:?) el0t_64_sync (arch/arm64/kernel/entry.S:591) Allocated by task 168: kasan_set_track (mm/kasan/common.c:45 mm/kasan/common.c:52) kasan_save_alloc_info (mm/kasan/generic.c:512) __kasan_kmalloc (mm/kasan/common.c:383) __kmalloc (mm/slab_common.c:? mm/slab_common.c:998) unix_bind (net/unix/af_unix.c:257 net/unix/af_unix.c:1213 net/unix/af_unix.c:1324) __sys_bind (net/socket.c:1792) __arm64_sys_bind (net/socket.c:1801) invoke_syscall (arch/arm64/kernel/syscall.c:? arch/arm64/kernel/syscall.c:52) el0_svc_common (./include/linux/thread_info.h:127 arch/arm64/kernel/syscall.c:147) do_el0_svc (arch/arm64/kernel/syscall.c:189) el0_svc (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:648) el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:?) el0t_64_sync (arch/arm64/kernel/entry.S:591) The buggy address belongs to the object at ffff000015492700 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes to the right of allocated 119-byte region [ffff000015492700, ffff000015492777) The buggy address belongs to the physical page: page:00000000aeab52ba refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x55492 anon flags: 0x3fffc0000000200(slab\|node=0\|zone=0\|lastcpupid=0xffff) page_type: 0xffffffff() raw: 03fffc0000000200 ffff0000084018c0 fffffc00003d0e00 0000000000000005 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff000015492600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000015492680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff000015492700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 fc ^ ffff000015492780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff000015492800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 06d4c8a80836 ("af_unix: Fix fortify_panic() in unix_bind_bsd().") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/netdev/202307262110.659e5e8-oliver.sang@intel.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230726190828.47874-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27	tipc: check return value of pskb_trim()	Yuanjun Gong	1	-1/+2
	goto free_skb if an unexpected result is returned by pskb_tirm() in tipc_crypto_rcv_complete(). Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/20230725064810.5820-1-ruc_gongyuanjun@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27	benet: fix return value check in be_lancer_xmit_workarounds()	Yuanjun Gong	1	-1/+2
	in be_lancer_xmit_workarounds(), it should go to label 'tx_drop' if an unexpected value is returned by pskb_trim(). Fixes: 93040ae5cc8d ("be2net: Fix to trim skb for padded vlan packets to workaround an ASIC Bug") Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Link: https://lore.kernel.org/r/20230725032726.15002-1-ruc_gongyuanjun@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27	ALSA: hda/realtek: Support ASUS G713PV laptop	Pavel Asyutchenko	1	-0/+1
	This laptop has CS35L41 amp connected via I2C. With this patch speakers begin to work if the missing _DSD properties are added to ACPI tables. Signed-off-by: Pavel Asyutchenko <svenpavel@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230726223732.20775-1-svenpavel@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-07-27	xen: speed up grant-table reclaim	Demi Marie Obenour	2	-11/+40
	When a grant entry is still in use by the remote domain, Linux must put it on a deferred list. Normally, this list is very short, because the PV network and block protocols expect the backend to unmap the grant first. However, Qubes OS's GUI protocol is subject to the constraints of the X Window System, and as such winds up with the frontend unmapping the window first. As a result, the list can grow very large, resulting in a massive memory leak and eventual VM freeze. To partially solve this problem, make the number of entries that the VM will attempt to free at each iteration tunable. The default is still 10, but it can be overridden via a module parameter. This is Cc: stable because (when combined with appropriate userspace changes) it fixes a severe performance and stability problem for Qubes OS users. Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-07-26	virtio-net: fix race between set queues and probe	Jason Wang	1	-2/+2
	A race were found where set_channels could be called after registering but before virtnet_set_queues() in virtnet_probe(). Fixing this by moving the virtnet_set_queues() before netdevice registering. While at it, use _virtnet_set_queues() to avoid holding rtnl as the device is not even registered at that time. Cc: stable@vger.kernel.org Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20230725072049.617289-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64	Lin Ma	1	-0/+14
	The nla_for_each_nested parsing in function mqprio_parse_nlattr() does not check the length of the nested attribute. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as 8 byte integer and passed to priv->max_rate/min_rate. This patch adds the check based on nla_len() when check the nla_type(), which ensures that the length of these two attribute must equals sizeof(u64). Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio") Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://lore.kernel.org/r/20230725024227.426561-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	splice, net: Fix splice_to_socket() for O_NONBLOCK socket	Jan Stancek	1	-0/+2
	LTP sendfile07 [1], which expects sendfile() to return EAGAIN when transferring data from regular file to a "full" O_NONBLOCK socket, started failing after commit 2dc334f1a63a ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()"). sendfile() no longer immediately returns, but now blocks. Removed sock_sendpage() handled this case by setting a MSG_DONTWAIT flag, fix new splice_to_socket() to do the same for O_NONBLOCK sockets. [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/sendfile/sendfile07.c Fixes: 2dc334f1a63a ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()") Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com> Tested-by: Xi Ruoyao <xry111@xry111.site> Link: https://lore.kernel.org/r/023c0e21e595e00b93903a813bc0bfb9a5d7e368.1690219914.git.jstancek@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	net: fec: tx processing does not call XDP APIs if budget is 0	Wei Fang	1	-4/+12
	According to the clarification [1] in the latest napi.rst, the tx processing cannot call any XDP (or page pool) APIs if the "budget" is 0. Because NAPI is called with the budget of 0 (such as netpoll) indicates we may be in an IRQ context, however, we cannot use the page pool from IRQ context. [1] https://lore.kernel.org/all/20230720161323.2025379-1-kuba@kernel.org/ Fixes: 20f797399035 ("net: fec: recycle pages for transmitted XDP frames") Signed-off-by: Wei Fang <wei.fang@nxp.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230725074148.2936402-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	hwmon: (pmbus_core) Fix Deadlock in pmbus_regulator_get_status	Guenter Roeck	1	-1/+2
	pmbus_regulator_get_status() acquires update_lock. pmbus_regulator_get_error_flags() acquires it again, resulting in an immediate deadlock. Call _pmbus_get_flags() from pmbus_regulator_get_status() directly to avoid the problem. Reported-by: Patrick Rudolph <patrick.rudolph@9elements.com> Closes: https://lore.kernel.org/linux-hwmon/b7a3ad85-aab4-4718-a001-1d8b1c0eef36@roeck-us.net/T/#u Cc: Naresh Solanki <Naresh.Solanki@9elements.com> Cc: stable@vger.kernel.org # v6.2+ Fixes: c05f477c4ba3 ("hwmon: (pmbus/core) Implement regulator get_status") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-07-26	mptcp: more accurate NL event generation	Paolo Abeni	1	-2/+1
	Currently the mptcp code generate a "new listener" event even if the actual listen() syscall fails. Address the issue moving the event generation call under the successful branch. Cc: stable@vger.kernel.org Fixes: f8c9dfbd875b ("mptcp: add pm listener events") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-2-6f60fe7137a9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	selftests: mptcp: join: only check for ip6tables if needed	Matthieu Baerts	1	-3/+1
	If 'iptables-legacy' is available, 'ip6tables-legacy' command will be used instead of 'ip6tables'. So no need to look if 'ip6tables' is available in this case. Cc: stable@vger.kernel.org Fixes: 0c4cd3f86a40 ("selftests: mptcp: join: use 'iptables-legacy' if available") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-1-6f60fe7137a9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	tools: ynl-gen: fix parse multi-attr enum attribute	Arkadiusz Kubalewski	1	-6/+6
	When attribute is enum type and marked as multi-attr, the netlink respond is not parsed, fails with stack trace: Traceback (most recent call last): File "/net-next/tools/net/ynl/./test.py", line 520, in <module> main() File "/net-next/tools/net/ynl/./test.py", line 488, in main dplls=dplls_get(282574471561216) File "/net-next/tools/net/ynl/./test.py", line 48, in dplls_get reply=act(args) File "/net-next/tools/net/ynl/./test.py", line 41, in act reply = ynl.dump(args.dump, attrs) File "/net-next/tools/net/ynl/lib/ynl.py", line 598, in dump return self._op(method, vals, dump=True) File "/net-next/tools/net/ynl/lib/ynl.py", line 584, in _op rsp_msg = self._decode(gm.raw_attrs, op.attr_set.name) File "/net-next/tools/net/ynl/lib/ynl.py", line 451, in _decode self._decode_enum(rsp, attr_spec) File "/net-next/tools/net/ynl/lib/ynl.py", line 408, in _decode_enum value = enum.entries_by_val[raw].name TypeError: unhashable type: 'list' error: 1 Redesign _decode_enum(..) to take a enum int value and translate it to either a bitmask or enum name as expected. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230725101642.267248-3-arkadiusz.kubalewski@intel.com Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	tools: ynl-gen: fix enum index in _decode_enum(..)	Arkadiusz Kubalewski	1	-2/+2
	Remove wrong index adjustment, which is leftover from adding support for sparse enums. enum.entries_by_val() function shall not subtract the start-value, as it is indexed with real enum value. Fixes: c311aaa74ca1 ("tools: ynl: fix enum-as-flags in the generic CLI") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230725101642.267248-2-arkadiusz.kubalewski@intel.com Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26	ALSA: usb-audio: Update for native DSD support quirks	Jussi Laako	1	-6/+28
	Maintenance patch for native DSD support. Remove incorrect T+A device quirks. Move set of device quirks to vendor quirks. Add set of missing device and vendor quirks. Signed-off-by: Jussi Laako <jussi@sonarnerd.net> Link: https://lore.kernel.org/r/20230726165645.404311-1-jussi@sonarnerd.net Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-07-26	mm: suppress mm fault logging if fatal signal already pending	Linus Torvalds	1	-0/+4
	Commit eda0047296a1 ("mm: make the page fault mmap locking killable") intentionally made it much easier to trigger the "page fault fails because a fatal signal is pending" situation, by having the mmap locking fail early in that case. We have long aborted page faults in other fatal cases when the actual IO for a page is interrupted by SIGKILL - which is particularly useful for the traditional case of NFS hanging due to network issues, but local filesystems could cause it too if you happened to get the SIGKILL while waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()). So aborting the page fault wasn't a new condition - but it now triggers earlier, before we even get to 'handle_mm_fault()'. And as a result the error doesn't go through our 'fault_signal_pending()' logic, and doesn't get filtered away there. Normally you'd never even notice, because if a fatal signal is pending, the new SIGSEGV we send ends up being ignored anyway. But it turns out that there is one very noticeable exception: if you enable 'show_unhandled_signals', the aborted page fault will be logged in the kernel messages, and you'll get a scary line looking something like this in your logs: pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0) which is rather misleading. It's not really a segfault at all, it's just "the thread was killed before the page fault completed, so we aborted the page fault". Fix this by just making it clear that a pending fatal signal means that any new signal coming in after that is implicitly handled. This will avoid the misleading logging, since now the signal isn't 'unhandled' any more. Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/ Acked-by: Oleg Nesterov <oleg@redhat.com> Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-26	drm/msm: Disallow submit with fence id 0	Rob Clark	1	-1/+1
	A fence id of zero is expected to be invalid, and is not removed from the fence_idr table. If userspace is requesting to specify the fence id with the FENCE_SN_IN flag, we need to reject a zero fence id value. Fixes: 17154addc5c1 ("drm/msm: Add MSM_SUBMIT_FENCE_SN_IN") Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/549180/
2023-07-26	arm64/sme: Set new vector length before reallocating	Mark Brown	1	-2/+2
	As part of fixing the allocation of the buffer for SVE state when changing SME vector length we introduced an immediate reallocation of the SVE state, this is also done when changing the SVE vector length for consistency. Unfortunately this reallocation is done prior to writing the new vector length to the task struct, meaning the allocation is done with the old vector length and can lead to memory corruption due to an undersized buffer being used. Move the update of the vector length before the allocation to ensure that the new vector length is taken into account. For some reason this isn't triggering any problems when running tests on the arm64 fixes branch (even after repeated tries) but is triggering issues very often after merge into mainline. Fixes: d4d5be94a878 ("arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230726-arm64-fix-sme-fix-v1-1-7752ec58af27@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-07-26	arm64/fpsimd: Don't flush SME register hardware state along with thread	Mark Brown	1	-1/+0
	We recently changed the fpsimd thread flush to flush the physical SME state as well as the thread state for the current thread. Unfortunately this leads to intermittent corruption in interaction with the lazy FPSIMD register switching. When under heavy load such as can be triggered by the startup phase of fp-stress it is possible that the current thread may not be scheduled prior to returning to userspace, and indeed we may end up returning to the last thread that was scheduled on the PE without ever exiting the kernel to any other task. If that happens then we will not reload the register state from memory, leading to loss of any SME register state. Since this was purely an attempt to defensively close off potential problems revert the change. Fixes: af3215fd0230 ("arm64/fpsimd: Exit streaming mode when flushing tasks") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230724-arm64-dont-flush-smstate-v1-1-9a8b637ace6c@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-07-26	netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID	Pablo Neira Ayuso	1	-2/+3
	Bail out with EOPNOTSUPP when adding rule to bound chain via NFTA_RULE_CHAIN_ID. The following warning splat is shown when adding a rule to a deleted bound chain: WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Kevin Rich <kevinrich1337@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-26	netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR	Pablo Neira Ayuso	1	-9/+18
	On error when building the rule, the immediate expression unbinds the chain, hence objects can be deactivated by the transaction records. Otherwise, it is possible to trigger the following warning: WARNING: CPU: 3 PID: 915 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 3 PID: 915 Comm: chain-bind-err- Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic") Reported-by: Kevin Rich <kevinrich1337@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-26	netfilter: nft_set_rbtree: fix overlap expiration walk	Florian Westphal	1	-6/+14
	The lazy gc on insert that should remove timed-out entries fails to release the other half of the interval, if any. Can be reproduced with tests/shell/testcases/sets/0044interval_overlap_0 in nftables.git and kmemleak enabled kernel. Second bug is the use of rbe_prev vs. prev pointer. If rbe_prev() returns NULL after at least one iteration, rbe_prev points to element that is not an end interval, hence it should not be removed. Lastly, check the genmask of the end interval if this is active in the current generation. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-07-26	rbd: retrieve and check lock owner twice before blocklisting	Ilya Dryomov	1	-2/+23
	An attempt to acquire exclusive lock can race with the current lock owner closing the image: 1. lock is held by client123, rbd_lock() returns -EBUSY 2. get_lock_owner_info() returns client123 instance details 3. client123 closes the image, lock is released 4. find_watcher() returns 0 as there is no matching watcher anymore 5. client123 instance gets erroneously blocklisted Particularly impacted is mirror snapshot scheduler in snapshot-based mirroring since it happens to open and close images a lot (images are opened only for as long as it takes to take the next mirror snapshot, the same client instance is used for all images). To reduce the potential for erroneous blocklisting, retrieve the lock owner again after find_watcher() returns 0. If it's still there, make sure it matches the previously detected lock owner. Cc: stable@vger.kernel.org # f38cb9d9c204: rbd: make get_lock_owner_info() return a single locker or NULL Cc: stable@vger.kernel.org # 8ff2c64c9765: rbd: harden get_lock_owner_info() a bit Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2023-07-26	rbd: harden get_lock_owner_info() a bit	Ilya Dryomov	2	-6/+16
	- we want the exclusive lock type, so test for it directly - use sscanf() to actually parse the lock cookie and avoid admitting invalid handles - bail if locker has a blank address Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>