aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-sqlite.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2023-08-04mm: compaction: fix endless looping over same migrate blockJohannes Weiner1-3/+5
During stress testing, the following situation was observed: 70 root 39 19 0 0 0 R 100.0 0.0 959:29.92 khugepaged 310936 root 20 0 84416 25620 512 R 99.7 1.5 642:37.22 hugealloc Tracing shows isolate_migratepages_block() endlessly looping over the first block in the DMA zone: hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0 hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0 hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0 hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0 The problem is that the functions tries to test and set the skip bit once on the block, to avoid skipping on its own skip-set, using pageblock_aligned() on the pfn as a test. But because this is the DMA zone which starts at pfn 1, this is never true for the first block, and the skip bit isn't set or tested at all. As a result, fast_find_migrateblock() returns the same pageblock over and over. If the pfn isn't pageblock-aligned, also check if it's the start of the zone to ensure test-and-set-exactly-once on unaligned ranges. Thanks to Vlastimil Babka for the help in debugging this. Link: https://lkml.kernel.org/r/20230731172450.1632195-1-hannes@cmpxchg.org Fixes: 90ed667c03fe ("Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04selftests: mm: ksm: fix incorrect evaluation of parameterAyush Jain1-0/+1
A missing break in kms_tests leads to kselftest hang when the parameter -s is used. In current code flow because of missing break in -s, -t parses args spilled from -s and as -t accepts only valid values as 0,1 so any arg in -s >1 or <0, gets in ksm_test failure This went undetected since, before the addition of option -t, the next case -M would immediately break out of the switch statement but that is no longer the case Add the missing break statement. ----Before---- ./ksm_tests -H -s 100 Invalid merge type ----After---- ./ksm_tests -H -s 100 Number of normal pages: 0 Number of huge pages: 50 Total size: 100 MiB Total time: 0.401732682 s Average speed: 248.922 MiB/s Link: https://lkml.kernel.org/r/20230728163952.4634-1-ayush.jain3@amd.com Fixes: 07115fcc15b4 ("selftests/mm: add new selftests for KSM") Signed-off-by: Ayush Jain <ayush.jain3@amd.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Stefan Roesch <shr@devkernel.io> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04hugetlb: do not clear hugetlb dtor until allocating vmemmapMike Kravetz1-24/+51
Patch series "Fix hugetlb free path race with memory errors". In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on HWPOISON hugepages" the race window was discovered. https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/ Freeing a hugetlb page back to low level memory allocators is performed in two steps. 1) Under hugetlb lock, remove page from hugetlb lists and clear destructor 2) Outside lock, allocate vmemmap if necessary and call low level free Between these two steps, the hugetlb page will appear as a normal compound page. However, vmemmap for tail pages could be missing. If a memory error occurs at this time, we could try to update page flags non-existant page structs. A much more detailed description is in the first patch. The first patch addresses the race window. However, it adds a hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page free operation. This could lead to slowdowns if one is freeing a large number of hugetlb pages. The second path optimizes the update_and_free_pages_bulk routine to only take the lock once in bulk operations. The second patch is technically not a bug fix, but includes a Fixes tag and Cc stable to avoid a performance regression. It can be combined with the first, but was done separately make reviewing easier. This patch (of 2): Freeing a hugetlb page and releasing base pages back to the underlying allocator such as buddy or cma is performed in two steps: - remove_hugetlb_folio() is called to remove the folio from hugetlb lists, get a ref on the page and remove hugetlb destructor. This all must be done under the hugetlb lock. After this call, the page can be treated as a normal compound page or a collection of base size pages. - update_and_free_hugetlb_folio() is called to allocate vmemmap if needed and the free routine of the underlying allocator is called on the resulting page. We can not hold the hugetlb lock here. One issue with this scheme is that a memory error could occur between these two steps. In this case, the memory error handling code treats the old hugetlb page as a normal compound page or collection of base pages. It will then try to SetPageHWPoison(page) on the page with an error. If the page with error is a tail page without vmemmap, a write error will occur when trying to set the flag. Address this issue by modifying remove_hugetlb_folio() and update_and_free_hugetlb_folio() such that the hugetlb destructor is not cleared until after allocating vmemmap. Since clearing the destructor requires holding the hugetlb lock, the clearing is done in remove_hugetlb_folio() if the vmemmap is present. This saves a lock/unlock cycle. Otherwise, destructor is cleared in update_and_free_hugetlb_folio() after allocating vmemmap. Note that this will leave hugetlb pages in a state where they are marked free (by hugetlb specific page flag) and have a ref count. This is not a normal state. The only code that would notice is the memory error code, and it is set up to retry in such a case. A subsequent patch will create a routine to do bulk processing of vmemmap allocation. This will eliminate a lock/unlock cycle for each hugetlb page in the case where we are freeing a large number of pages. Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04mm: memory-failure: avoid false hwpoison page mapped error infoMiaohe Lin1-3/+7
folio->_mapcount is overloaded in SLAB, so folio_mapped() has to be done after folio_test_slab() is checked. Otherwise slab folio might be treated as a mapped folio leading to false 'Someone maps the hwpoison page' error info. Link: https://lkml.kernel.org/r/20230727115643.639741-4-linmiaohe@huawei.com Fixes: 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed pages") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04mm: memory-failure: fix potential unexpected return value from unpoison_memory()Miaohe Lin1-10/+9
If unpoison_memory() fails to clear page hwpoisoned flag, return value ret is expected to be -EBUSY. But when get_hwpoison_page() returns 1 and fails to clear page hwpoisoned flag due to races, return value will be unexpected 1 leading to users being confused. And there's a code smell that the variable "ret" is used not only to save the return value of unpoison_memory(), but also the return value from get_hwpoison_page(). Make a further cleanup by using another auto-variable solely to save the return value of get_hwpoison_page() as suggested by Naoya. Link: https://lkml.kernel.org/r/20230727115643.639741-3-linmiaohe@huawei.com Fixes: bf181c582588 ("mm/hwpoison: fix unpoison_memory()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04mm/swapfile: fix wrong swap entry type for hwpoisoned swapcache pageMiaohe Lin2-4/+6
Patch series "A few fixup patches for mm", v2. This series contains a few fixup patches to fix potential unexpected return value, fix wrong swap entry type for hwpoisoned swapcache page and so on. More details can be found in the respective changelogs. This patch (of 3): Hwpoisoned dirty swap cache page is kept in the swap cache and there's simple interception code in do_swap_page() to catch it. But when trying to swapoff, unuse_pte() will wrongly install a general sense of "future accesses are invalid" swap entry for hwpoisoned swap cache page due to unaware of such type of page. The user will receive SIGBUS signal without expected BUS_MCEERR_AR payload. BTW, typo 'hwposioned' is fixed. Link: https://lkml.kernel.org/r/20230727115643.639741-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20230727115643.639741-2-linmiaohe@huawei.com Fixes: 6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04radix tree test suite: fix incorrect allocation size for pthreadsColin Ian King1-1/+1
Currently the pthread allocation for each array item is based on the size of a pthread_t pointer and should be the size of the pthread_t structure, so the allocation is under-allocating the correct size. Fix this by using the size of each element in the pthreads array. Static analysis cppcheck reported: tools/testing/radix-tree/regression1.c:180:2: warning: Size of pointer 'threads' used instead of size of its data. [pointerSize] Link: https://lkml.kernel.org/r/20230727160930.632674-1-colin.i.king@gmail.com Fixes: 1366c37ed84b ("radix tree test harness") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04crypto, cifs: fix error handling in extract_iter_to_sg()David Howells1-1/+1
Fix error handling in extract_iter_to_sg(). Pages need to be unpinned, not put in extract_user_to_sg() when handling IOVEC/UBUF sources. The bug may result in a warning like the following: WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:27 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 raw_atomic_add include/linux/atomic/atomic-arch-fallback.h:537 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 atomic_add include/linux/atomic/atomic-instrumented.h:105 [inline] WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 try_grab_page+0x108/0x160 mm/gup.c:252 ... pc : try_grab_page+0x108/0x160 mm/gup.c:229 lr : follow_page_pte+0x174/0x3e4 mm/gup.c:651 ... Call trace: __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:27 [inline] arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] raw_atomic_add include/linux/atomic/atomic-arch-fallback.h:537 [inline] atomic_add include/linux/atomic/atomic-instrumented.h:105 [inline] try_grab_page+0x108/0x160 mm/gup.c:252 follow_pmd_mask mm/gup.c:734 [inline] follow_pud_mask mm/gup.c:765 [inline] follow_p4d_mask mm/gup.c:782 [inline] follow_page_mask+0x12c/0x2e4 mm/gup.c:839 __get_user_pages+0x174/0x30c mm/gup.c:1217 __get_user_pages_locked mm/gup.c:1448 [inline] __gup_longterm_locked+0x94/0x8f4 mm/gup.c:2142 internal_get_user_pages_fast+0x970/0xb60 mm/gup.c:3140 pin_user_pages_fast+0x4c/0x60 mm/gup.c:3246 iov_iter_extract_user_pages lib/iov_iter.c:1768 [inline] iov_iter_extract_pages+0xc8/0x54c lib/iov_iter.c:1831 extract_user_to_sg lib/scatterlist.c:1123 [inline] extract_iter_to_sg lib/scatterlist.c:1349 [inline] extract_iter_to_sg+0x26c/0x6fc lib/scatterlist.c:1339 hash_sendmsg+0xc0/0x43c crypto/algif_hash.c:117 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x54/0x60 net/socket.c:748 ____sys_sendmsg+0x270/0x2ac net/socket.c:2494 ___sys_sendmsg+0x80/0xdc net/socket.c:2548 __sys_sendmsg+0x68/0xc4 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2584 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52 el0_svc_common.constprop.0+0x44/0xe4 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x38/0xa4 arch/arm64/kernel/syscall.c:191 el0_svc+0x2c/0xb0 arch/arm64/kernel/entry-common.c:647 el0t_64_sync_handler+0xc0/0xc4 arch/arm64/kernel/entry-common.c:665 el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:591 Link: https://lkml.kernel.org/r/20571.1690369076@warthog.procyon.org.uk Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist") Reported-by: syzbot+9b82859567f2e50c123e@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-mm/000000000000273d0105ff97bf56@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Steve French <stfrench@microsoft.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jeff Layton <jlayton@kernel.org> Cc: Shyam Prasad N <nspmangalore@gmail.com> Cc: Rohith Surabattula <rohiths.msft@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04zsmalloc: fix races between modifications of fullness and isolatedAndrew Yang1-5/+9
We encountered many kernel exceptions of VM_BUG_ON(zspage->isolated == 0) in dec_zspage_isolation() and BUG_ON(!pages[1]) in zs_unmap_object() lately. This issue only occurs when migration and reclamation occur at the same time. With our memory stress test, we can reproduce this issue several times a day. We have no idea why no one else encountered this issue. BTW, we switched to the new kernel version with this defect a few months ago. Since fullness and isolated share the same unsigned int, modifications of them should be protected by the same lock. [andrew.yang@mediatek.com: move comment] Link: https://lkml.kernel.org/r/20230727062910.6337-1-andrew.yang@mediatek.com Link: https://lkml.kernel.org/r/20230721063705.11455-1-andrew.yang@mediatek.com Fixes: c4549b871102 ("zsmalloc: remove zspage isolation for migration") Signed-off-by: Andrew Yang <andrew.yang@mediatek.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-30Linux 6.5-rc4Linus Torvalds1-1/+1
2023-07-29arch/*/configs/*defconfig: Replace AUTOFS4_FS by AUTOFS_FSSven Joachim64-75/+63
Commit a2225d931f75 ("autofs: remove left-over autofs4 stubs") promised the removal of the fs/autofs/Kconfig fragment for AUTOFS4_FS within a couple of releases, but five years later this still has not happened yet, and AUTOFS4_FS is still enabled in 63 defconfigs. Get rid of it mechanically: git grep -l CONFIG_AUTOFS4_FS -- '*defconfig' | xargs sed -i 's/AUTOFS4_FS/AUTOFS_FS/' Also just remove the AUTOFS4_FS config option stub. Anybody who hasn't regenerated their config file in the last five years will need to just get the new name right when they do. Signed-off-by: Sven Joachim <svenjoac@gmx.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-29KVM: selftests: Expand x86's sregs test to cover illegal CR0 valuesSean Christopherson1-31/+39
Add coverage to x86's set_sregs_test to verify KVM rejects vendor-agnostic illegal CR0 values, i.e. CR0 values whose legality doesn't depend on the current VMX mode. KVM historically has neglected to reject bad CR0s from userspace, i.e. would happily accept a completely bogus CR0 via KVM_SET_SREGS{2}. Punt VMX specific subtests to future work, as they would require quite a bit more effort, and KVM gets coverage for CR0 checks in general through other means, e.g. KVM-Unit-Tests. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guestSean Christopherson1-4/+9
Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only if KVM itself is not configured to utilize unrestricted guests, i.e. don't stuff CR0/CR4 for a restricted L2 that is running as the guest of an unrestricted L1. Any attempt to VM-Enter a restricted guest with invalid CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should never observe a restricted L2 with incompatible CR0/CR4, since nested VM-Enter from L1 should have failed. And if KVM does observe an active, restricted L2 with incompatible state, e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail does more harm than good, as KVM will often neglect to undo the side effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus the damage can easily spill over to L1. On the other hand, letting VM-Enter fail due to bad guest state is more likely to contain the damage to L2 as KVM relies on hardware to perform most guest state consistency checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into L1 irrespective of (un)restricted guest behavior. Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Fixes: bddd82d19e2e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalidSean Christopherson5-20/+52
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid, e.g. due to setting bits 63:32, illegal combinations, or to a value that isn't allowed in VMX (non-)root mode. The VMX checks in particular are "fun" as failure to disallow Real Mode for an L2 that is configured with unrestricted guest disabled, when KVM itself has unrestricted guest enabled, will result in KVM forcing VM86 mode to virtual Real Mode for L2, but then fail to unwind the related metadata when synthesizing a nested VM-Exit back to L1 (which has unrestricted guest enabled). Opportunistically fix a benign typo in the prototype for is_valid_cr4(). Cc: stable@vger.kernel.org Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage"Sean Christopherson1-68/+0
Remove coccinelle's recommendation to use DEFINE_DEBUGFS_ATTRIBUTE() instead of DEFINE_SIMPLE_ATTRIBUTE(). Regardless of whether or not the "significant overhead" incurred by debugfs_create_file() is actually meaningful, warnings from the script have led to a rash of low-quality patches that have sowed confusion and consumed maintainer time for little to no benefit. There have been no less than four attempts to "fix" KVM, and a quick search on lore shows that KVM is not alone. This reverts commit 5103068eaca290f890a30aae70085fac44cecaf6. Link: https://lore.kernel.org/all/87tu2nbnz3.fsf@mpe.ellerman.id.au Link: https://lore.kernel.org/all/c0b98151-16b6-6d8f-1765-0f7d46682d60@redhat.com Link: https://lkml.kernel.org/r/20230706072954.4881-1-duminjie%40vivo.com Link: https://lore.kernel.org/all/Y2FsbufV00jbyF0B@google.com Link: https://lore.kernel.org/all/Y2ENJJ1YiSg5oHiy@orome Link: https://lore.kernel.org/all/7560b350e7b23786ce712118a9a504356ff1cca4.camel@kernel.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230726202920.507756-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: selftests: Verify stats fd is usable after VM fd has been closedSean Christopherson1-2/+8
Verify that VM and vCPU binary stats files are usable even after userspace has put its last direct reference to the VM. This is a regression test for a UAF bug where KVM didn't gift the stats files a reference to the VM. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: selftests: Verify stats fd can be dup()'d and readSean Christopherson1-1/+7
Expand the binary stats test to verify that a stats fd can be dup()'d and read, to (very) roughly simulate userspace passing around the file. Adding the dup() test is primarily an intermediate step towards verifying that userspace can read VM/vCPU stats before _and_ after userspace closes its copy of the VM fd; the dup() test itself is only mildly interesting. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: selftests: Verify userspace can create "redundant" binary stats filesSean Christopherson1-2/+23
Verify that KVM doesn't artificially limit KVM_GET_STATS_FD to a single file per VM/vCPU. There's no known use case for getting multiple stats fds, but it should work, and more importantly creating multiple files will make it easier to test that KVM correct manages VM refcounts for stats files. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: selftests: Explicitly free vcpus array in binary stats testSean Christopherson1-0/+1
Explicitly free the all-encompassing vcpus array in the binary stats test so that the test is consistent with respect to freeing all dynamically allocated resources (versus letting them be freed on exit). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: selftests: Clean up stats fd in common stats_test() helperSean Christopherson1-18/+4
Move the stats fd cleanup code into stats_test() and drop the superfluous vm_stats_test() and vcpu_stats_test() helpers in order to decouple creation of the stats file from consuming/testing the file (deduping code is a bonus). This will make it easier to test various edge cases related to stats, e.g. that userspace can dup() a stats fd, that userspace can have multiple stats files for a singleVM/vCPU, etc. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: selftests: Use pread() to read binary stats headerSean Christopherson2-4/+8
Use pread() with an explicit offset when reading the header and the header name for a binary stats fd so that the common helper and the binary stats test don't subtly rely on the file effectively being untouched, e.g. to allow multiple reads of the header, name, etc. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: Grab a reference to KVM for VM and vCPU stats file descriptorsSean Christopherson1-0/+24
Grab a reference to KVM prior to installing VM and vCPU stats file descriptors to ensure the underlying VM and vCPU objects are not freed until the last reference to any and all stats fds are dropped. Note, the stats paths manually invoke fd_install() and so don't need to grab a reference before creating the file. Fixes: ce55c049459c ("KVM: stats: Support binary stats retrieval for a VCPU") Fixes: fcfe1baeddbf ("KVM: stats: Support binary stats retrieval for a VM") Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu> Closes: https://lore.kernel.org/all/CAC_GQSr3xzZaeZt85k_RCBd5kfiOve8qXo7a81Cq53LuVQ5r=Q@mail.gmail.com Cc: stable@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Message-Id: <20230711230131.648752-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29selftests/rseq: Play nice with binaries statically linked against glibc 2.35+Sean Christopherson1-6/+22
To allow running rseq and KVM's rseq selftests as statically linked binaries, initialize the various "trampoline" pointers to point directly at the expect glibc symbols, and skip the dlysm() lookups if the rseq size is non-zero, i.e. the binary is statically linked *and* the libc registered its own rseq. Define weak versions of the symbols so as not to break linking against libc versions that don't support rseq in any capacity. The KVM selftests in particular are often statically linked so that they can be run on targets with very limited runtime environments, i.e. test machines. Fixes: 233e667e1ae3 ("selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35") Cc: Aaron Lewis <aaronlewis@google.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721223352.2333911-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"Sean Christopherson1-8/+2
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows dereferencing memslots during WRMSR emulation, drop the requirement that "next RIP" is valid. In hindsight, acquiring kvm->srcu would have been a better fix than avoiding the pastpath, but at the time it was thought that accessing SRCU-protected data in the fastpath was a one-off edge case. This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721224337.2335137-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: Acquire SRCU read lock when handling fastpath MSR writesSean Christopherson1-0/+4
Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in the VM-Exit fastpath handler, as several of the common helpers used during emulation expect the caller to provide SRCU protection. E.g. if the guest is counting instructions retired, KVM will query the PMU event filter when stepping over the WRMSR. dump_stack+0x85/0xdf lockdep_rcu_suspicious+0x109/0x120 pmc_event_is_allowed+0x165/0x170 kvm_pmu_trigger_event+0xa5/0x190 handle_fastpath_set_msr_irqoff+0xca/0x1e0 svm_vcpu_run+0x5c3/0x7b0 [kvm_amd] vcpu_enter_guest+0x2108/0x2580 Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"). Providing protection for the entirety of WRMSR emulation will allow reverting the aforementioned commit, and will avoid having to play whack-a-mole when new uses of SRCU-protected structures are inevitably added in common emulation helpers. Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events") Reported-by: Greg Thelen <gthelen@google.com> Reported-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721224337.2335137-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: VMX: Use vmread_error() to report VM-Fail in "goto" pathSean Christopherson1-2/+1
Use vmread_error() to report VM-Fail on VMREAD for the "asm goto" case, now that trampoline case has yet another wrapper around vmread_error() to play nice with instrumentation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721235637.2345403-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: VMX: Make VMREAD error path play nice with noinstrSean Christopherson3-9/+26
Mark vmread_error_trampoline() as noinstr, and add a second trampoline for the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=n case to enable instrumentation when handling VM-Fail on VMREAD. VMREAD is used in various noinstr flows, e.g. immediately after VM-Exit, and objtool rightly complains that the call to the error trampoline leaves a no-instrumentation section without annotating that it's safe to do so. vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0xc9: call to vmread_error_trampoline() leaves .noinstr.text section Note, strictly speaking, enabling instrumentation in the VM-Fail path isn't exactly safe, but if VMREAD fails the kernel/system is likely hosed anyways, and logging that there is a fatal error is more important than *maybe* encountering slightly unsafe instrumentation. Reported-by: Su Hui <suhui@nfschina.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721235637.2345403-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86/irq: Conditionally register IRQ bypass consumer againLike Xu1-1/+1
As was attempted commit 14717e203186 ("kvm: Conditionally register IRQ bypass consumer"): "if we don't support a mechanism for bypassing IRQs, don't register as a consumer. Initially this applied to AMD processors, but when AVIC support was implemented for assigned devices, kvm_arch_has_irq_bypass() was always returning true. We can still skip registering the consumer where enable_apicv or posted-interrupts capability is unsupported or globally disabled. This eliminates meaningless dev_info()s when the connect fails between producer and consumer", such as on Linux hosts where enable_apicv or posted-interrupts capability is unsupported or globally disabled. Cc: Alex Williamson <alex.williamson@redhat.com> Reported-by: Yong He <alexyonghe@tencent.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379 Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20230724111236.76570-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipivPeng Hao1-1/+2
The pid_table of ipiv is the persistent memory allocated by per-vcpu, which should be counted into the memory cgroup. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <CAPm50aLxCQ3TQP2Lhc0PX3y00iTRg+mniLBqNDOC=t9CLxMwwA@mail.gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: check the kvm_cpu_get_interrupt result before using itMaxim Levitsky1-3/+7
The code was blindly assuming that kvm_cpu_get_interrupt never returns -1 when there is a pending interrupt. While this should be true, a bug in KVM can still cause this. If -1 is returned, the code before this patch was converting it to 0xFF, and 0xFF interrupt was injected to the guest, which results in an issue which was hard to debug. Add WARN_ON_ONCE to catch this case and skip the injection if this happens again. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230726135945.260841-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: VMX: set irr_pending in kvm_apic_update_irrMaxim Levitsky1-1/+4
When the APICv is inhibited, the irr_pending optimization is used. Therefore, when kvm_apic_update_irr sets bits in the IRR, it must set irr_pending to true as well. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230726135945.260841-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomicallyMaxim Levitsky1-7/+13
If APICv is inhibited, then IPIs from peer vCPUs are done by atomically setting bits in IRR. This means, that when __kvm_apic_update_irr copies PIR to IRR, it has to modify IRR atomically as well. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230726135945.260841-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29kprobes: Prohibit probing on CFI preamble symbolMasami Hiramatsu (Google)1-1/+13
Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those are used for CFI and not executed. Probing it will break the CFI. Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28tracing: Fix warning in trace_buffered_event_disable()Zheng Yejian1-10/+4
Warning happened in trace_buffered_event_disable() at WARN_ON_ONCE(!trace_buffered_event_ref) Call Trace: ? __warn+0xa5/0x1b0 ? trace_buffered_event_disable+0x189/0x1b0 __ftrace_event_enable_disable+0x19e/0x3e0 free_probe_data+0x3b/0xa0 unregister_ftrace_function_probe_func+0x6b8/0x800 event_enable_func+0x2f0/0x3d0 ftrace_process_regex.isra.0+0x12d/0x1b0 ftrace_filter_write+0xe6/0x140 vfs_write+0x1c9/0x6f0 [...] The cause of the warning is in __ftrace_event_enable_disable(), trace_buffered_event_enable() was called once while trace_buffered_event_disable() was called twice. Reproduction script show as below, for analysis, see the comments: ``` #!/bin/bash cd /sys/kernel/tracing/ # 1. Register a 'disable_event' command, then: # 1) SOFT_DISABLED_BIT was set; # 2) trace_buffered_event_enable() was called first time; echo 'cmdline_proc_show:disable_event:initcall:initcall_finish' > \ set_ftrace_filter # 2. Enable the event registered, then: # 1) SOFT_DISABLED_BIT was cleared; # 2) trace_buffered_event_disable() was called first time; echo 1 > events/initcall/initcall_finish/enable # 3. Try to call into cmdline_proc_show(), then SOFT_DISABLED_BIT was # set again!!! cat /proc/cmdline # 4. Unregister the 'disable_event' command, then: # 1) SOFT_DISABLED_BIT was cleared again; # 2) trace_buffered_event_disable() was called second time!!! echo '!cmdline_proc_show:disable_event:initcall:initcall_finish' > \ set_ftrace_filter ``` To fix it, IIUC, we can change to call trace_buffered_event_enable() at fist time soft-mode enabled, and call trace_buffered_event_disable() at last time soft-mode disabled. Link: https://lore.kernel.org/linux-trace-kernel/20230726095804.920457-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28ftrace: Remove unused extern declarationsYueHaibing1-4/+0
commit 6a9c981b1e96 ("ftrace: Remove unused function ftrace_arch_read_dyn_info()") left ftrace_arch_read_dyn_info() extern declaration. And commit 1d74f2a0f64b ("ftrace: remove ftrace_ip_converted()") leave ftrace_ip_converted() declaration. Link: https://lore.kernel.org/linux-trace-kernel/20230725134808.9716-1-yuehaibing@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28tracing: Fix kernel-doc warnings in trace_seq.cGaosheng Cui1-0/+1
Fix kernel-doc warning: kernel/trace/trace_seq.c:142: warning: Function parameter or member 'args' not described in 'trace_seq_vprintf' Link: https://lkml.kernel.org/r/20230724140827.1023266-5-cuigaosheng1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28tracing: Fix kernel-doc warnings in trace_events_trigger.cGaosheng Cui1-0/+2
Fix kernel-doc warnings: kernel/trace/trace_events_trigger.c:59: warning: Function parameter or member 'buffer' not described in 'event_triggers_call' kernel/trace/trace_events_trigger.c:59: warning: Function parameter or member 'event' not described in 'event_triggers_call' Link: https://lkml.kernel.org/r/20230724140827.1023266-4-cuigaosheng1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28tracing/synthetic: Fix kernel-doc warnings in trace_events_synth.cGaosheng Cui1-0/+1
Fix kernel-doc warning: kernel/trace/trace_events_synth.c:1257: warning: Function parameter or member 'mod' not described in 'synth_event_gen_cmd_array_start' Link: https://lkml.kernel.org/r/20230724140827.1023266-3-cuigaosheng1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28ring-buffer: Fix kernel-doc warnings in ring_buffer.cGaosheng Cui1-2/+1
Fix kernel-doc warnings: kernel/trace/ring_buffer.c:954: warning: Function parameter or member 'cpu' not described in 'ring_buffer_wake_waiters' kernel/trace/ring_buffer.c:3383: warning: Excess function parameter 'event' description in 'ring_buffer_unlock_commit' kernel/trace/ring_buffer.c:5359: warning: Excess function parameter 'cpu' description in 'ring_buffer_reset_online_cpus' Link: https://lkml.kernel.org/r/20230724140827.1023266-2-cuigaosheng1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28ring-buffer: Fix wrong stat of cpu_buffer->readZheng Yejian1-10/+12
When pages are removed in rb_remove_pages(), 'cpu_buffer->read' is set to 0 in order to make sure any read iterators reset themselves. However, this will mess 'entries' stating, see following steps: # cd /sys/kernel/tracing/ # 1. Enlarge ring buffer prepare for later reducing: # echo 20 > per_cpu/cpu0/buffer_size_kb # 2. Write a log into ring buffer of cpu0: # taskset -c 0 echo "hello1" > trace_marker # 3. Read the log: # cat per_cpu/cpu0/trace_pipe <...>-332 [000] ..... 62.406844: tracing_mark_write: hello1 # 4. Stop reading and see the stats, now 0 entries, and 1 event readed: # cat per_cpu/cpu0/stats entries: 0 [...] read events: 1 # 5. Reduce the ring buffer # echo 7 > per_cpu/cpu0/buffer_size_kb # 6. Now entries became unexpected 1 because actually no entries!!! # cat per_cpu/cpu0/stats entries: 1 [...] read events: 0 To fix it, introduce 'page_removed' field to count total removed pages since last reset, then use it to let read iterators reset themselves instead of changing the 'read' pointer. Link: https://lore.kernel.org/linux-trace-kernel/20230724054040.3489499-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Cc: <vnagarnaik@google.com> Fixes: 83f40318dab0 ("ring-buffer: Make removal of ring buffer pages atomic") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-28tpm_tis: Explicitly check for error codeAlexander Steffen1-2/+7
recv_data either returns the number of received bytes, or a negative value representing an error code. Adding the return value directly to the total number of received bytes therefore looks a little weird, since it might add a negative error code to a sum of bytes. The following check for size < expected usually makes the function return ETIME in that case, so it does not cause too many problems in practice. But to make the code look cleaner and because the caller might still be interested in the original error code, explicitly check for the presence of an error code and pass that through. Cc: stable@vger.kernel.org Fixes: cb5354253af2 ("[PATCH] tpm: spacing cleanups 2") Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28tpm: Switch i2c drivers back to use .probe()Uwe Kleine-König6-6/+6
After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() call-back type"), all drivers being converted to .probe_new() and then 03c835f498b5 ("i2c: Switch .probe() to not take an id parameter") convert back to (the new) .probe() to be able to eventually drop .probe_new() from struct i2c_driver. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28security: keys: perform capable check only on privileged operationsChristian Göttsche1-3/+8
If the current task fails the check for the queried capability via `capable(CAP_SYS_ADMIN)` LSMs like SELinux generate a denial message. Issuing such denial messages unnecessarily can lead to a policy author granting more privileges to a subject than needed to silence them. Reorder CAP_SYS_ADMIN checks after the check whether the operation is actually privileged. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-07-28Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF"Mike Rapoport (IBM)1-4/+0
This reverts commit 9e46e4dcd9d6cd88342b028dbfa5f4fb7483d39c. kbuild reports a warning in memblock_remove_region() because of a false positive caused by partial reset of the memblock state. Doing the full reset will remove the false positives, but will allow late use of memblock_free() to go unnoticed, so it is better to revert the offending commit. WARNING: CPU: 0 PID: 1 at mm/memblock.c:352 memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00001-g9e46e4dcd9d6 #2 RIP: 0010:memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Call Trace: memblock_discard (kbuild/src/x86_64/mm/memblock.c:383) page_alloc_init_late (kbuild/src/x86_64/include/linux/find.h:208 kbuild/src/x86_64/include/linux/nodemask.h:266 kbuild/src/x86_64/mm/mm_init.c:2405) kernel_init_freeable (kbuild/src/x86_64/init/main.c:1325 kbuild/src/x86_64/init/main.c:1546) kernel_init (kbuild/src/x86_64/init/main.c:1439) ret_from_fork (kbuild/src/x86_64/arch/x86/kernel/process.c:145) ret_from_fork_asm (kbuild/src/x86_64/arch/x86/entry/entry_64.S:298) Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202307271656.447aa17e-oliver.sang@intel.com Signed-off-by: "Mike Rapoport (IBM)" <rppt@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-28mm/mempolicy: Take VMA lock before replacing policyJann Horn1-1/+14
mbind() calls down into vma_replace_policy() without taking the per-VMA locks, replaces the VMA's vma->vm_policy pointer, and frees the old policy. That's bad; a concurrent page fault might still be using the old policy (in vma_alloc_folio()), resulting in use-after-free. Normally this will manifest as a use-after-free read first, but it can result in memory corruption, including because vma_alloc_folio() can call mpol_cond_put() on the freed policy, which conditionally changes the policy's refcount member. This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA systems as long as the kernel was built with CONFIG_NUMA. Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-28ACPI/IORT: Remove erroneous id_count check in iort_node_get_rmr_info()Guanghui Feng1-3/+0
According to the ARM IORT specifications DEN 0049 issue E, the "Number of IDs" field in the ID mapping format reports the number of IDs in the mapping range minus one. In iort_node_get_rmr_info(), we erroneously skip ID mappings whose "Number of IDs" equal to 0, resulting in valid mapping nodes with a single ID to map being skipped, which is wrong. Fix iort_node_get_rmr_info() by removing the bogus id_count check. Fixes: 491cf4a6735a ("ACPI/IORT: Add support to retrieve IORT RMR reserved regions") Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com> Cc: <stable@vger.kernel.org> # 6.0.x Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/1689593625-45213-1-git-send-email-guanghuifeng@linux.alibaba.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-07-28LoongArch: Cleanup __builtin_constant_p() checking for cpu_has_*Huacai Chen1-11/+4
In the current configuration, cpu_has_lsx and cpu_has_lasx cannot be constants. So cleanup the __builtin_constant_p() checking to reduce the complexity. Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-07-28LoongArch: BPF: Fix check condition to call lu32id in move_imm()Tiezhu Yang1-1/+1
As the code comment says, the initial aim is to reduce one instruction in some corner cases, if bit[51:31] is all 0 or all 1, no need to call lu32id. That is to say, it should call lu32id only if bit[51:31] is not all 0 and not all 1. The current code always call lu32id, the result is right but the logic is unexpected and wrong, fix it. Cc: stable@vger.kernel.org # 6.1 Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support") Reported-by: Colin King (gmail) <colin.i.king@gmail.com> Closes: https://lore.kernel.org/all/bcf97046-e336-712a-ac68-7fd194f2953e@gmail.com/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-07-28LoongArch: BPF: Enable bpf_probe_read{, str}() on LoongArchChenguang Zhao1-0/+1
Currently nettrace does not work on LoongArch due to missing bpf_probe_read{,str}() support, with the error message: ERROR: failed to load kprobe-based eBPF ERROR: failed to load kprobe-based bpf According to commit 0ebeea8ca8a4d1d ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work"), we only need to select CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE to add said support, because LoongArch does have non-overlapping address ranges for kernel and userspace. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-07-28LoongArch: Fix return value underflow in exception pathWANG Rui2-2/+4
This patch fixes an underflow issue in the return value within the exception path, specifically at .Llt8 when the remaining length is less than 8 bytes. Cc: stable@vger.kernel.org Fixes: 8941e93ca590 ("LoongArch: Optimize memory ops (memset/memcpy/memmove)") Reported-by: Weihao Li <liweihao@loongson.cn> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>