aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2021-08-21lockd: lockd server-side shouldn't set fl_opsJ. Bruce Fields1-18/+12
Locks have two sets of op arrays, fl_lmops for the lock manager (lockd or nfsd), fl_ops for the filesystem. The server-side lockd code has been setting its own fl_ops, which leads to confusion (and crashes) in the reexport case, where the filesystem expects to be the only one setting fl_ops. And there's no reason for it that I can see-the lm_get/put_owner ops do the same job. Reported-by: Daire Byrne <daire@dneg.com> Tested-by: Daire Byrne <daire@dneg.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-20SUNRPC: Add documentation for the fail_sunrpc/ directoryChuck Lever1-0/+18
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-20SUNRPC: Server-side disconnect injectionChuck Lever3-0/+13
Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. A file called /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect enables administrators to turn off server-side disconnect injection while allowing other types of sunrpc errors to be injected. The default setting is that server-side disconnect injection is enabled (ignore=false). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-20SUNRPC: Move client-side disconnect injectionChuck Lever4-82/+30
Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. Convert the existing client-side disconnect injection infrastructure to use the kernel's generic error injection facility. The generic facility has a richer set of injection criteria. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-20SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directoryChuck Lever3-0/+42
This directory will contain a set of administrative controls for enabling error injection for kernel RPC consumers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-19svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()Chuck Lever1-7/+0
svc_xprt_free() already "puts" the bc_xprt before calling the transport's "free" method. No need to do it twice. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17nfsd4: Fix forced-expiry lockingJ. Bruce Fields1-2/+2
This should use the network-namespace-wide client_lock, not the per-client cl_lock. You shouldn't see any bugs unless you're actually using the forced-expiry interface introduced by 89c905beccbb. Fixes: 89c905beccbb "nfsd: allow forced expiration of NFSv4 clients" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17rpc: fix gss_svc_init cleanup on failureJ. Bruce Fields1-1/+1
The failure case here should be rare, but it's obviously wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17SUNRPC: Add RPC_AUTH_TLS protocol numbersChuck Lever2-0/+2
Shared by client and server. See: https://www.iana.org/assignments/rpc-authentication-numbers/rpc-authentication-numbers.xhtml Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17lockd: change the proc_handler for nsm_use_hostnamesJia He1-1/+1
nsm_use_hostnames is a module parameter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items. This patch use a new proc_handler proc_dobool to fix it. Signed-off-by: Jia He <hejianet@gmail.com> Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> [thuth: Fix typo in commit message] Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17sysctl: introduce new proc handler proc_doboolJia He2-0/+44
This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches. Suggested-by: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Jia He <hejianet@gmail.com> [thuth: rebased the patch to the current kernel version] Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()Chuck Lever3-4/+20
Some paths through svc_process() leave rqst->rq_procinfo set to NULL, which triggers a crash if tracing happens to be enabled. Fixes: 89ff87494c6e ("SUNRPC: Display RPC procedure names instead of proc numbers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17NFSD: remove vanity commentsNeilBrown2-2/+0
Including one's name in copyright claims is appropriate. Including it in random comments is just vanity. After 2 decades, it is time for these to be gone. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17svcrdma: Convert rdma->sc_rw_ctxts to llistChuck Lever3-16/+37
Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts to an llist. The goal is to reduce the average overhead of Send completions, because a transport's completion handlers are single-threaded on one CPU core. This change reduces CPU utilization of each Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17svcrdma: Relieve contention on sc_send_lock.Chuck Lever3-18/+11
/proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client. To address this, convert the send_ctxt free list to an llist. Returning an item to the send_ctxt cache is now waitless, which reduces the instruction path length in the single-threaded Send handler (svc_rdma_wc_send). The goal is to enable the ib_comp_wq worker to handle a higher RPC/RDMA Send completion rate given the same CPU resources. This change reduces CPU utilization of Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17svcrdma: Fewer calls to wake_up() in Send completion handlerChuck Lever3-8/+18
Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's more, the Send wait queue almost never has waiters, so most of the time, this is an expensive no-op. As always, the goal is to reduce the average overhead of each completion, because a transport's completion handlers are single- threaded on one CPU core. This change reduces CPU utilization of the Send completion thread by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17lockd: Fix invalid lockowner cast after vfs_test_lockBenjamin Coddington1-1/+1
After calling vfs_test_lock() the pointer to a conflicting lock can be returned, and that lock is not guarunteed to be owned by nlm. In that case, we cannot cast it to struct nlm_lockowner. Instead return the pid of that conflicting lock. Fixes: 646d73e91b42 ("lockd: Show pid of lockd for remote locks") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17NFSD: Use new __string_len C macros for nfsd_clid_classChuck Lever1-3/+2
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17NFSD: Use new __string_len C macros for the nfs_dirent tracepointChuck Lever1-7/+5
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17tracing: Add trace_event helper macros __string_len() and __assign_str_len()Steven Rostedt (VMware)2-0/+49
There's a few cases that a string that is to be recorded in a trace event, does not have a terminating 'nul' character, and instead, the tracepoint passes in the length of the string to record. Add two helper macros to the trace event code that lets this work easier, than tricks with "%.*s" logic. __string_len() which is similar to __string() for declaration, but takes a length argument. __assign_str_len() which is similar to __assign_str() for assiging the string, but it too takes a length argument. Note, the TRACE_EVENT() macro will allocate the location on the ring buffer to 'len + 1', that will be used to store the string into. It is a requirement that the 'len' used for this is a most the length of the string being recorded. This string can still use __get_str() just like strings created with __string() can use to retrieve the string. Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/ Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17NFSD: Batch release pages during splice readChuck Lever1-7/+2
Large splice reads call put_page() repeatedly. put_page() is relatively expensive to call, so replace it with the new svc_rqst_replace_page() helper to help amortize that cost. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de>
2021-08-17SUNRPC: Add svc_rqst_replace_page() APIChuck Lever3-0/+28
Replacing a page in rq_pages[] requires a get_page(), which is a bus-locked operation, and a put_page(), which can be even more costly. To reduce the cost of replacing a page in rq_pages[], batch the put_page() operations by collecting "freed" pages in a pagevec, and then release those pages when the pagevec is full. This pagevec is also emptied when each RPC completes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17NFSD: Clean up splice actorChuck Lever1-8/+3
A few useful observations: - The value in @size is never modified. - splice_desc.len is an unsigned int, and so is xdr_buf.page_len. An implicit cast to size_t is unnecessary. - The computation of .page_len is the same in all three arms of the "if" statement, so hoist it out to make it clear that the operation is an unconditional invariant. The resulting function is 18 bytes shorter on my system (-Os). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de>
2021-08-15Linux 5.14-rc6Linus Torvalds1-1/+1
2021-08-13lib: use PFN_PHYS() in devmem_is_allowed()Liang Wang1-1/+1
The physical address may exceed 32 bits on 32-bit systems with more than 32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or the physical address may overflow and be truncated. We found this bug when mapping a high addresses through devmem tool, when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is used to map a high address that is not in the iomem address range, an unexpected error indicating no permission is returned. This bug was initially introduced from v2.6.37, and the function was moved to lib in v5.11. Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem") Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Liang Wang <wangliang101@huawei.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Liang Wang <wangliang101@huawei.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13mm/memcg: fix incorrect flushing of lruvec data in obj_stockWaiman Long1-2/+4
When mod_objcg_state() is called with a pgdat that is different from that in the obj_stock, the old lruvec data cached in obj_stock are flushed out. Unfortunately, they were flushed to the new pgdat and so the data go to the wrong node. This will screw up the slab data reported in /sys/devices/system/node/node*/meminfo. Fix that by flushing the data to the cached pgdat instead. Link: https://lkml.kernel.org/r/20210802143834.30578-1-longman@redhat.com Fixes: 68ac5b3c8db2 ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp") Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Chris Down <chris@chrisdown.name> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)David Hildenbrand2-3/+8
Doing some extended tests and polishing the man page update for MADV_POPULATE_(READ|WRITE), I realized that we end up converting also SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another madvise() user error. We want to report only problematic mappings and permission problems that the user could have know as -EINVAL. Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL, but instead indicate -EFAULT to user space. While we could also convert it to -ENOMEM, using -EFAULT looks more helpful when user space might want to troubleshoot what's going wrong: MADV_POPULATE_(READ|WRITE) is not part of an final Linux release and we can still adjust the behavior. Link: https://lkml.kernel.org/r/20210726154932.102880-1-david@redhat.com Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13mm: slub: fix slub_debug disabling for list of slabsVlastimil Babka1-5/+8
Vijayanand Jitta reports: Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would want to disable slub_debug for few slabs. Using boot parameter with slub_debug=-,slab_name syntax doesn't work as expected i.e; only disabling debugging for the specified list of slabs. Instead it disables debugging for all slabs, which is wrong. This patch fixes it by delaying the moment when the global slub_debug flags variable is updated. In case a "slub_debug=-,slab_name" has been passed, the global flags remain as initialized (depending on CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0. Link: https://lkml.kernel.org/r/8a3d992a-473a-467b-28a0-4ad2ff60ab82@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vijayanand Jitta <vjitta@codeaurora.org> Reviewed-by: Vijayanand Jitta <vjitta@codeaurora.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13slub: fix kmalloc_pagealloc_invalid_free unit testShakeel Butt1-4/+4
The unit test kmalloc_pagealloc_invalid_free makes sure that for the higher order slub allocation which goes to page allocator, the free is called with the correct address i.e. the virtual address of the head page. Commit f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free") unified the free code paths for page allocator based slub allocations but instead of using the address passed by the caller, it extracted the address from the page. Thus making the unit test kmalloc_pagealloc_invalid_free moot. So, fix this by using the address passed by the caller. Should we fix this? I think yes because dev expect kasan to catch these type of programming bugs. Link: https://lkml.kernel.org/r/20210802180819.1110165-1-shakeelb@google.com Fixes: f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13kasan, slub: reset tag when printing addressKuan-Ying Lee1-2/+2
The address still includes the tags when it is printed. With hardware tag-based kasan enabled, we will get a false positive KASAN issue when we access metadata. Reset the tag before we access the metadata. Link: https://lkml.kernel.org/r/20210804090957.12393-3-Kuan-Ying.Lee@mediatek.com Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata") Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13kasan, kmemleak: reset tags when scanning blockKuan-Ying Lee1-3/+3
Patch series "kasan, slub: reset tag when printing address", v3. With hardware tag-based kasan enabled, we reset the tag when we access metadata to avoid from false alarm. This patch (of 2): Kmemleak needs to scan kernel memory to check memory leak. With hardware tag-based kasan enabled, when it scans on the invalid slab and dereference, the issue will occur as below. Hardware tag-based KASAN doesn't use compiler instrumentation, we can not use kasan_disable_current() to ignore tag check. Based on the below report, there are 11 0xf7 granules, which amounts to 176 bytes, and the object is allocated from the kmalloc-256 cache. So when kmemleak accesses the last 256-176 bytes, it causes faults, as those are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe. Thus, we reset tags before accessing metadata to avoid from false positives. BUG: KASAN: out-of-bounds in scan_block+0x58/0x170 Read at addr f7ff0000c0074eb0 by task kmemleak/138 Pointer tag: [f7], memory tag: [fe] CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x1b0 show_stack+0x1c/0x30 dump_stack_lvl+0x68/0x84 print_address_description+0x7c/0x2b4 kasan_report+0x138/0x38c __do_kernel_fault+0x190/0x1c4 do_tag_check_fault+0x78/0x90 do_mem_abort+0x44/0xb4 el1_abort+0x40/0x60 el1h_64_sync_handler+0xb4/0xd0 el1h_64_sync+0x78/0x7c scan_block+0x58/0x170 scan_gray_list+0xdc/0x1a0 kmemleak_scan+0x2ac/0x560 kmemleak_scan_thread+0xb0/0xe0 kthread+0x154/0x160 ret_from_fork+0x10/0x18 Allocated by task 0: kasan_save_stack+0x2c/0x60 __kasan_kmalloc+0xec/0x104 __kmalloc+0x224/0x3c4 __register_sysctl_paths+0x200/0x290 register_sysctl_table+0x2c/0x40 sysctl_init+0x20/0x34 proc_sys_init+0x3c/0x48 proc_root_init+0x80/0x9c start_kernel+0x648/0x6a4 __primary_switched+0xc0/0xc8 Freed by task 0: kasan_save_stack+0x2c/0x60 kasan_set_track+0x2c/0x40 kasan_set_free_info+0x44/0x54 ____kasan_slab_free.constprop.0+0x150/0x1b0 __kasan_slab_free+0x14/0x20 slab_free_freelist_hook+0xa4/0x1fc kfree+0x1e8/0x30c put_fs_context+0x124/0x220 vfs_kern_mount.part.0+0x60/0xd4 kern_mount+0x24/0x4c bdev_cache_init+0x70/0x9c vfs_caches_init+0xdc/0xf4 start_kernel+0x638/0x6a4 __primary_switched+0xc0/0xc8 The buggy address belongs to the object at ffff0000c0074e00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff0000c0074e00, ffff0000c0074f00) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074 head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0 flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff|kasantag=0x0) raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe >ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe ^ ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Link: https://lkml.kernel.org/r/20210804090957.12393-1-Kuan-Ying.Lee@mediatek.com Link: https://lkml.kernel.org/r/20210804090957.12393-2-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13nbd: Aovid double completion of a requestXie Yongji1-3/+11
There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13tools/io_uring/io_uring-cp: sync with liburing exampleJens Axboe1-4/+27
This example is missing a few fixes that are in the liburing version, synchronize with the upstream version. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHAREDYu Kuai1-2/+4
We run a test that delete and recover devcies frequently(two devices on the same host), and we found that 'active_queues' is super big after a period of time. If device a and device b share a tag set, and a is deleted, then blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there is only one queue that are using the tag set. However, if b is still active, the active_queues of b might never be cleared even if b is deleted. Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13driver core: Add missing kernel doc for device::msi_lockThomas Gleixner1-0/+1
Fixes: 77e89afc25f3 ("PCI/MSI: Protect msi_desc::masked for multi-MSI") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-08-13KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlockSean Christopherson3-4/+39
Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write. Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages. Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree. For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise. Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock. Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective). Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging). Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181815.3378104-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEsSean Christopherson1-1/+8
Set the min_level for the TDP iterator at the root level when zapping all SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf SPTE will recursively zap all its children, thus there is no need for the iterator to attempt to step down. This avoids rereading the top-level SPTEs after they are zapped by causing try_step_down() to short-circuit. In most cases, optimizing try_step_down() will be in the noise as the cost of zapping SPTEs completely dominates the overall time. The optimization is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM is zapping in response to multiple memslot updates when userspace is adding and removing read-only memslots for option ROMs. In that case, the task doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock for read and thus can be a noisy neighbor of sorts. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEsSean Christopherson1-10/+16
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level paging, the "zap all" flows will skip top-level SPTEs whose range extends beyond shadow_phys_bits and leak their SPs when the VM is destroyed. Use the current upper bound (based on host.MAXPHYADDR) to detect that the caller wants to zap all SPTEs, e.g. instead of using the max theoretical gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator to terminate its walk earlier when running on hosts with MAXPHYADDR < 52. Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to help future debuggers should KVM decide to leak SPTEs again. The bug is most easily reproduced by running (and unloading!) KVM in a VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped. ============================================================================= BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1) CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x45/0x59 slab_err+0x95/0xc9 __kmem_cache_shutdown.cold+0x3c/0x158 kmem_cache_destroy+0x3d/0xf0 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x5d/0x90 [kvm] kvm_exit+0x78/0x90 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x13f/0x220 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PFSean Christopherson1-1/+2
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: 1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13kvm: vmx: Sync all matching EPTPs when injecting nested EPT faultJunaid Shahid1-12/+41
When a nested EPT violation/misconfig is injected into the guest, the shadow EPT PTEs associated with that address need to be synced. This is done by kvm_inject_emulated_page_fault() before it calls nested_ept_inject_page_fault(). However, that will only sync the shadow EPT PTE associated with the current L1 EPTP. Since the ASID is based on EP4TA rather than the full EPTP, so syncing the current EPTP is not enough. The SPTEs associated with any other L1 EPTPs in the prev_roots cache with the same EP4TA also need to be synced. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20210806222229.1645356-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86: remove dead initializationPaolo Bonzini1-1/+1
hv_vcpu is initialized again a dozen lines below, and at this point vcpu->arch.hyperv is not valid. Remove the initializer. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernelsSean Christopherson1-27/+1
Remove an ancient restriction that disallowed exposing EFER.NX to the guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU. The motivation of the check, added by commit 2cc51560aed0 ("KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run the guest with the host's EFER.NX and thus avoid context switching EFER if the only divergence was the NX bit. Fast forward to today, and KVM has long since stopped running the guest with the host's EFER.NX. Not only does KVM context switch EFER if host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 && guest.EFER.NX=1 when using shadow paging (to emulate SMEP). Furthermore, the entire motivation for the restriction was made obsolete over a decade ago when Intel added dedicated host and guest EFER fields in the VMCS (Nehalem timeframe), which reduced the overhead of context switching EFER from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles. In practice, the removed restriction only affects non-PAE 32-bit kernels, as EFER.NX is set during boot if NX is supported and the kernel will use PAE paging (32-bit or 64-bit), regardless of whether or not the kernel will actually use NX itself (mark PTEs non-executable). Alternatively and/or complementarily, startup_32_smp() in head_32.S could be modified to set EFER.NX=1 regardless of paging mode, thus eliminating the scenario where NX is supported but not enabled. However, that runs the risk of breaking non-KVM non-PAE kernels (though the risk is very, very low as there are no known EFER.NX errata), and also eliminates an easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER across nested virtualization transitions. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210805183804.1221554-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-12x86/resctrl: Fix default monitoring groups reportingBabu Moger1-14/+13
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem. Steps to reproduce: 1. mount -t resctrl resctrl /sys/fs/resctrl/ 2. cd /sys/fs/resctrl/ 3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832 4. Create sub monitor group: mkdir mon_groups/test1 5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable". When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable". Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs. Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik <pawel.szulik@intel.com> Signed-off-by: Babu Moger <Babu.Moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
2021-08-12vsock/virtio: avoid potential deadlock when vsock device removeLongpeng(Mike)1-2/+5
There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-12wwan: core: Avoid returning NULL from wwan_create_dev()Andy Shevchenko1-4/+8
Make wwan_create_dev() to return either valid or error pointer, In some cases it may return NULL. Prevent this by converting it to the respective error pointer. Fixes: 9a44c1cc6388 ("net: Add a WWAN subsystem") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/20210811124845.10955-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-12cifs: Call close synchronously during unlink/rename/lease break.Rohith Surabattula3-30/+56
During unlink/rename/lease break, deferred work for close is scheduled immediately but in an asynchronous manner which might lead to race with actual(unlink/rename) commands. This change will schedule close synchronously which will avoid the race conditions with other commands. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: stable@vger.kernel.org # 5.13 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-08-12cifs: Handle race conditions during renameRohith Surabattula2-7/+28
When rename is executed on directory which has files for which close is deferred, then rename will fail with EACCES. This patch will try to close all deferred files when EACCES is received and retry rename on a directory. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Cc: stable@vger.kernel.org # 5.13 Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-08-12xen/events: Fix race in set_evtchn_to_irqMaximilian Heyne1-6/+14
There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq mapping are lazily allocated in this function. The check whether the row is already present and the row initialization is not synchronized. Two threads can at the same time allocate a new row for evtchn_to_irq and add the irq mapping to the their newly allocated row. One thread will overwrite what the other has set for evtchn_to_irq[row] and therefore the irq mapping is lost. This will trigger a BUG_ON later in bind_evtchn_to_cpu: INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802 INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002) INFO: nvme nvme77: 1/0/0 default/read/poll queues CRIT: kernel BUG at drivers/xen/events/events_base.c:427! WARN: invalid opcode: 0000 [#1] SMP NOPTI WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme] WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0 WARN: Call Trace: WARN: set_affinity_irq+0x121/0x150 WARN: irq_do_set_affinity+0x37/0xe0 WARN: irq_setup_affinity+0xf6/0x170 WARN: irq_startup+0x64/0xe0 WARN: __setup_irq+0x69e/0x740 WARN: ? request_threaded_irq+0xad/0x160 WARN: request_threaded_irq+0xf5/0x160 WARN: ? nvme_timeout+0x2f0/0x2f0 [nvme] WARN: pci_request_irq+0xa9/0xf0 WARN: ? pci_alloc_irq_vectors_affinity+0xbb/0x130 WARN: queue_request_irq+0x4c/0x70 [nvme] WARN: nvme_reset_work+0x82d/0x1550 [nvme] WARN: ? check_preempt_wakeup+0x14f/0x230 WARN: ? check_preempt_curr+0x29/0x80 WARN: ? nvme_irq_check+0x30/0x30 [nvme] WARN: process_one_work+0x18e/0x3c0 WARN: worker_thread+0x30/0x3a0 WARN: ? process_one_work+0x3c0/0x3c0 WARN: kthread+0x113/0x130 WARN: ? kthread_park+0x90/0x90 WARN: ret_from_fork+0x3a/0x50 This patch sets evtchn_to_irq rows via a cmpxchg operation so that they will be set only once. The row is now cleared before writing it to evtchn_to_irq in order to not create a race once the row is visible for other threads. While at it, do not require the page to be zeroed, because it will be overwritten with -1's in clear_evtchn_to_irq_row anyway. Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Fixes: d0b075ffeede ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated") Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-08-12x86/tools: Fix objdump version check againRandy Dunlap1-0/+1
Skip (omit) any version string info that is parenthesized. Warning: objdump version 15) is older than 2.19 Warning: Skipping posttest. where 'objdump -v' says: GNU objdump (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18 Fixes: 8bee738bb1979 ("x86: Fix objdump version check in chkobjdump.awk for different formats.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210731000146.2720-1-rdunlap@infradead.org
2021-08-12riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUEAlexandre Ghiti1-1/+1
The current comment states that we check if the 64-bit kernel mapping overlaps with the last 4K of the address space that is reserved to error values in create_kernel_page_table, which is not the case since it is done in setup_vm. But anyway, remove the reference to any function and simply note that in 64-bit kernel, the check should be done as soon as the kernel mapping base address is known. Fixes: db6b84a368b4 ("riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE") Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>