From 0c555a3c1bc9114ad91422b941dcd29e02490687 Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Mon, 27 Jan 2025 14:21:14 -0800 Subject: mm,procfs: allow read-only remote mm access under CAP_PERFMON It's very common for various tracing and profiling toolis to need to access /proc/PID/maps contents for stack symbolization needs to learn which shared libraries are mapped in memory, at which file offset, etc. Currently, access to /proc/PID/maps requires CAP_SYS_PTRACE (unless we are looking at data for our own process, which is a trivial case not too relevant for profilers use cases). Unfortunately, CAP_SYS_PTRACE implies way more than just ability to discover memory layout of another process: it allows to fully control arbitrary other processes. This is problematic from security POV for applications that only need read-only /proc/PID/maps (and other similar read-only data) access, and in large production settings CAP_SYS_PTRACE is frowned upon even for the system-wide profilers. On the other hand, it's already possible to access similar kind of information (and more) with just CAP_PERFMON capability. E.g., setting up PERF_RECORD_MMAP collection through perf_event_open() would give one similar information to what /proc/PID/maps provides. CAP_PERFMON, together with CAP_BPF, is already a very common combination for system-wide profiling and observability application. As such, it's reasonable and convenient to be able to access /proc/PID/maps with CAP_PERFMON capabilities instead of CAP_SYS_PTRACE. For procfs, these permissions are checked through common mm_access() helper, and so we augment that with cap_perfmon() check *only* if requested mode is PTRACE_MODE_READ. I.e., PTRACE_MODE_ATTACH wouldn't be permitted by CAP_PERFMON. So /proc/PID/mem, which uses PTRACE_MODE_ATTACH, won't be permitted by CAP_PERFMON, but /proc/PID/maps, /proc/PID/environ, and a bunch of other read-only contents will be allowable under CAP_PERFMON. Besides procfs itself, mm_access() is used by process_madvise() and process_vm_{readv,writev}() syscalls. The former one uses PTRACE_MODE_READ to avoid leaking ASLR metadata, and as such CAP_PERFMON seems like a meaningful allowable capability as well. process_vm_{readv,writev} currently assume PTRACE_MODE_ATTACH level of permissions (though for readv PTRACE_MODE_READ seems more reasonable, but that's outside the scope of this change), and as such won't be affected by this patch. Link: https://lkml.kernel.org/r/20250127222114.1132392-1-andrii@kernel.org Signed-off-by: Andrii Nakryiko Reviewed-by: Shakeel Butt Acked-by: Ingo Molnar Cc: Al Viro Cc: Christian Brauner Cc: Jann Horn Cc: Kees Cook Cc: Liam Howlett Cc: "Mike Rapoport (IBM)" Cc: Peter Zijlstra (Intel) Cc: Suren Baghdasaryan Signed-off-by: Andrew Morton --- kernel/fork.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f3..7757e74ebce4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1559,6 +1559,17 @@ struct mm_struct *get_task_mm(struct task_struct *task) } EXPORT_SYMBOL_GPL(get_task_mm); +static bool may_access_mm(struct mm_struct *mm, struct task_struct *task, unsigned int mode) +{ + if (mm == current->mm) + return true; + if (ptrace_may_access(task, mode)) + return true; + if ((mode & PTRACE_MODE_READ) && perfmon_capable()) + return true; + return false; +} + struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) { struct mm_struct *mm; @@ -1571,7 +1582,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) mm = get_task_mm(task); if (!mm) { mm = ERR_PTR(-ESRCH); - } else if (mm != current->mm && !ptrace_may_access(task, mode)) { + } else if (!may_access_mm(mm, task, mode)) { mmput(mm); mm = ERR_PTR(-EACCES); } -- cgit v1.2.3-59-g8ed1b From fc0d9d9afcd304e5402a73f6244926915e6de8e6 Mon Sep 17 00:00:00 2001 From: Yang Yang Date: Fri, 17 Jan 2025 22:20:13 +0800 Subject: MAINTAINERS: add Yang Yang as a co-maintainer of PER-TASK DELAY ACCOUNTING Balbir Singh is the unique maintainer of PER-TASK DELAY ACCOUNTING, and he had started work on cgroupstats a long time back, this subsystem then is not growing at a very rapid pace. With their excellent work delay accounting is still very useful for observing and optimizing system delay, but still needs continuous improvement. Yang Yang with his team had worked for most of the recent patches of the subsystem, and he has a strong willing to help, Balbir Singh is glad to see that, so add him as a co-maintainer. Link: https://lkml.kernel.org/r/20250117222013817zWHgBaSigRI_eRJt1hqnu@zte.com.cn Signed-off-by: Yang Yang Cc: Balbir Singh Signed-off-by: Andrew Morton --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index ed7aa6867674..a666d7e34fd3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -18513,6 +18513,7 @@ F: mm/percpu*.c PER-TASK DELAY ACCOUNTING M: Balbir Singh +M: Yang Yang S: Maintained F: include/linux/delayacct.h F: kernel/delayacct.c -- cgit v1.2.3-59-g8ed1b From 35dac71cff8f68f065a6719631db919d0640d5e5 Mon Sep 17 00:00:00 2001 From: "Guilherme G. Piccoli" Date: Mon, 20 Jan 2025 16:04:26 -0300 Subject: scripts: add script to extract built-in firmware blobs There is currently no tool to extract a firmware blob that is built-in on vmlinux to the best of my knowledge. So if we have a kernel image containing the blobs, and we want to rebuild the kernel with some debug patches for example (and given that the image also has IKCONFIG=y), we currently can't do that for the same versions for all the firmware blobs, _unless_ we have exact commits of linux-firmware for the specific versions for each firmware included. Through the options CONFIG_EXTRA_FIRMWARE{_DIR} one is able to build a kernel including firmware blobs in a built-in fashion. This is usually the case of built-in drivers that require some blobs in order to work properly, for example, like in non-initrd based systems. Add hereby a script to extract these blobs from a non-stripped vmlinux, similar to the idea of "extract-ikconfig". The firmware loader interface saves such built-in blobs as rodata entries, having a field for the FW name as "_fw___bin"; the tool extracts files named "_" for each rodata firmware entry detected. It makes use of awk, bash, dd and readelf, pretty standard tooling for Linux development. With this tool, we can blindly extract the FWs and easily re-add them in the new debug kernel build, allowing a more deterministic testing without the burden of "hunting down" the proper version of each firmware binary. Link: https://lkml.kernel.org/r/20250120190436.127578-1-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli Suggested-by: Thadeu Lima de Souza Cascardo Reviewed-by: Thadeu Lima de Souza Cascardo Cc: Danilo Krummrich Cc: Greg Kroah-Hartman Cc: Luis Chamberalin Cc: Masahiro Yamada Cc: Nathan Chancellor Cc: Nicolas Schier Cc: "Rafael J. Wysocki" Cc: Russ Weight Signed-off-by: Andrew Morton --- scripts/extract-fwblobs | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100755 scripts/extract-fwblobs diff --git a/scripts/extract-fwblobs b/scripts/extract-fwblobs new file mode 100755 index 000000000000..53729124e5a0 --- /dev/null +++ b/scripts/extract-fwblobs @@ -0,0 +1,30 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# ----------------------------------------------------------------------------- +# Extracts the vmlinux built-in firmware blobs - requires a non-stripped image +# ----------------------------------------------------------------------------- + +if [ -z "$1" ]; then + echo "Must provide a non-stripped vmlinux as argument" + exit 1 +fi + +read -r RD_ADDR_HEX RD_OFF_HEX <<< "$( readelf -SW "$1" |\ +grep -w rodata | awk '{print "0x"$5" 0x"$6}' )" + +FW_SYMS="$(readelf -sW "$1" |\ +awk -n '/fw_end/ { end=$2 ; print name " 0x" start " 0x" end; } { start=$2; name=$8; }')" + +while IFS= read -r entry; do + read -r FW_NAME FW_ADDR_ST_HEX FW_ADDR_END_HEX <<< "$entry" + + # Notice kernel prepends _fw_ and appends _bin to the FW name + # in rodata; hence we hereby filter that out. + FW_NAME=${FW_NAME:4:-4} + + FW_OFFSET="$(printf "%d" $((FW_ADDR_ST_HEX - RD_ADDR_HEX + RD_OFF_HEX)))" + FW_SIZE="$(printf "%d" $((FW_ADDR_END_HEX - FW_ADDR_ST_HEX)))" + + dd if="$1" of="./${FW_NAME}" bs="${FW_SIZE}" count=1 iflag=skip_bytes skip="${FW_OFFSET}" +done <<< "${FW_SYMS}" -- cgit v1.2.3-59-g8ed1b From 541da9f87ddbc68b183a6345284d55441e6d18ca Mon Sep 17 00:00:00 2001 From: Carlos Bilbao Date: Tue, 28 Jan 2025 19:34:30 -0600 Subject: .mailmap: remove redundant mappings of emails Remove two redundant mappings: changbin.du@intel.com -> changbin.du@intel.com viresh.kumar@linaro.org -> viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20250129013430.1117720-1-carlos.bilbao@kernel.org Signed-off-by: Carlos Bilbao Cc: Jonathan Corbet Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- .mailmap | 2 -- 1 file changed, 2 deletions(-) diff --git a/.mailmap b/.mailmap index 72bf830e1e58..0e373e743495 100644 --- a/.mailmap +++ b/.mailmap @@ -153,7 +153,6 @@ Carlos Bilbao Carlos Bilbao Carlos Bilbao Changbin Du -Changbin Du Chao Yu Chao Yu Chester Lin @@ -757,7 +756,6 @@ Vinod Koul Viresh Kumar Viresh Kumar Viresh Kumar -Viresh Kumar Viresh Kumar Vishnu Dasa Vivek Aknurwar -- cgit v1.2.3-59-g8ed1b From 87ad827a2780b8abe08e64a03553e4898a06b9fc Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Tue, 28 Jan 2025 16:17:47 -0800 Subject: docs,procfs: document /proc/PID/* access permission checks Add a paragraph explaining what sort of capabilities a process would need to read procfs data for some other process. Also mention that reading data for its own process doesn't require any extra permissions. Link: https://lkml.kernel.org/r/20250129001747.759990-1-andrii@kernel.org Signed-off-by: Andrii Nakryiko Reviewed-by: Shakeel Butt Cc: Al Viro Cc: Christian Brauner Cc: Steven Rostedt (VMware) Cc: Ingo Molnar Cc: Jann Horn Cc: Kees Cook Cc: Liam Howlett Cc: "Mike Rapoport (IBM)" Cc: Peter Zijlstra (Intel) Cc: Suren Baghdasaryan Signed-off-by: Andrew Morton --- Documentation/filesystems/proc.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 09f0aed5a08b..0e7825bb1e3a 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -128,6 +128,16 @@ process running on the system, which is named after the process ID (PID). The link 'self' points to the process reading the file system. Each process subdirectory has the entries listed in Table 1-1. +A process can read its own information from /proc/PID/* with no extra +permissions. When reading /proc/PID/* information for other processes, reading +process is required to have either CAP_SYS_PTRACE capability with +PTRACE_MODE_READ access permissions, or, alternatively, CAP_PERFMON +capability. This applies to all read-only information like `maps`, `environ`, +`pagemap`, etc. The only exception is `mem` file due to its read-write nature, +which requires CAP_SYS_PTRACE capabilities with more elevated +PTRACE_MODE_ATTACH permissions; CAP_PERFMON capability does not grant access +to /proc/PID/mem for other processes. + Note that an open file descriptor to /proc/ or to any of its contained files or subdirectories does not prevent being reused for some other process in the event that exits. Operations on -- cgit v1.2.3-59-g8ed1b From 95d4b3450ebe2e28a87980b7f7407a8d8d75d633 Mon Sep 17 00:00:00 2001 From: I Hsin Cheng Date: Sun, 19 Jan 2025 14:24:08 +0800 Subject: lib/plist.c: add shortcut for plist_requeue() In the operation of plist_requeue(), "node" is deleted from the list before queueing it back to the list again, which involves looping to find the tail of same-prio entries. If "node" is the head of same-prio entries which means its prio_list is on the priority list, then "node_next" can be retrieve immediately by the next entry of prio_list, instead of looping nodes on node_list. The shortcut implementation can benefit plist_requeue() running the below test, and the test result is shown in the following table. One can observe from the test result that when the number of nodes of same-prio entries is smaller, then the probability of hitting the shortcut can be bigger, thus the benefit can be more significant. While it tends to behave almost the same for long same-prio entries, since the probability of taking the shortcut is much smaller. ----------------------------------------------------------------------- | Test size | 200 | 400 | 600 | 800 | 1000 | ----------------------------------------------------------------------- | new_plist_requeue | 271521| 1007913| 2148033| 4346792| 12200940| ----------------------------------------------------------------------- | old_plist_requeue | 301395| 1105544| 2488301| 4632980| 12217275| ----------------------------------------------------------------------- The test is done on x86_64 architecture with v6.9 kernel and Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz. Test script( executed in kernel module mode ): int init_module(void) { unsigned int test_data[test_size]; /* Split the list into 10 different priority * , when test_size is larger, the number of * nodes within each priority is larger. */ for (i = 0; i < ARRAY_SIZE(test_data); i++) { test_data[i] = i % 10; } ktime_t start, end, time_elapsed = 0; plist_head_init(&test_head_local); for (i = 0; i < ARRAY_SIZE(test_node_local); i++) { plist_node_init(test_node_local + i, 0); test_node_local[i].prio = test_data[i]; } for (i = 0; i < ARRAY_SIZE(test_node_local); i++) { if (plist_node_empty(test_node_local + i)) { plist_add(test_node_local + i, &test_head_local); } } for (i = 0; i < ARRAY_SIZE(test_node_local); i += 1) { start = ktime_get(); plist_requeue(test_node_local + i, &test_head_local); end = ktime_get(); time_elapsed += (end - start); } pr_info("plist_requeue() elapsed time : %lld, size %d\n", time_elapsed, test_size); return 0; } [akpm@linux-foundation.org: tweak comment and code layout] Link: https://lkml.kernel.org/r/20250119062408.77638-1-richard120310@gmail.com Signed-off-by: I Hsin Cheng Cc: Ching-Chun (Jim) Huang Signed-off-by: Andrew Morton --- lib/plist.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/lib/plist.c b/lib/plist.c index c6bce1226874..330febb4bd7d 100644 --- a/lib/plist.c +++ b/lib/plist.c @@ -171,12 +171,24 @@ void plist_requeue(struct plist_node *node, struct plist_head *head) plist_del(node, head); + /* + * After plist_del(), iter is the replacement of the node. If the node + * was on prio_list, take shortcut to find node_next instead of looping. + */ + if (!list_empty(&iter->prio_list)) { + iter = list_entry(iter->prio_list.next, struct plist_node, + prio_list); + node_next = &iter->node_list; + goto queue; + } + plist_for_each_continue(iter, head) { if (node->prio != iter->prio) { node_next = &iter->node_list; break; } } +queue: list_add_tail(&node->node_list, node_next); plist_check_head(head); -- cgit v1.2.3-59-g8ed1b From 9986fb5164c8b21f6439cfd45ba36d8cc80c9710 Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Fri, 31 Jan 2025 17:08:24 +0530 Subject: kexec: initialize ELF lowest address to ULONG_MAX Patch series "powerpc/crash: use generic crashkernel reservation", v3. Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Additionally move powerpc to use generic APIs to locate memory hole for kexec segments while loading kdump kernel. This patch (of 7): kexec_elf_load() loads an ELF executable and sets the address of the lowest PT_LOAD section to the address held by the lowest_load_addr function argument. To determine the lowest PT_LOAD address, a local variable lowest_addr (type unsigned long) is initialized to UINT_MAX. After loading each PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD address is lower, lowest_addr is updated. However, setting lowest_addr to UINT_MAX won't work when the kernel image is loaded above 4G, as the returned lowest PT_LOAD address would be invalid. This is resolved by initializing lowest_addr to ULONG_MAX instead. This issue was discovered while implementing crashkernel high/low reservation on the PowerPC architecture. Link: https://lkml.kernel.org/r/20250131113830.925179-1-sourabhjain@linux.ibm.com Link: https://lkml.kernel.org/r/20250131113830.925179-2-sourabhjain@linux.ibm.com Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Signed-off-by: Andrew Morton --- kernel/kexec_elf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec_elf.c b/kernel/kexec_elf.c index d3689632e8b9..3a5c25b2adc9 100644 --- a/kernel/kexec_elf.c +++ b/kernel/kexec_elf.c @@ -390,7 +390,7 @@ int kexec_elf_load(struct kimage *image, struct elfhdr *ehdr, struct kexec_buf *kbuf, unsigned long *lowest_load_addr) { - unsigned long lowest_addr = UINT_MAX; + unsigned long lowest_addr = ULONG_MAX; int ret; size_t i; -- cgit v1.2.3-59-g8ed1b From 7b54a96f30dec4b812841d179a26e88a7887dc06 Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Fri, 31 Jan 2025 17:08:25 +0530 Subject: crash: remove an unused argument from reserve_crashkernel_generic() cmdline argument is not used in reserve_crashkernel_generic() so remove it. Correspondingly, all the callers have been updated as well. No functional change intended. Link: https://lkml.kernel.org/r/20250131113830.925179-3-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Signed-off-by: Andrew Morton --- arch/arm64/mm/init.c | 6 ++---- arch/loongarch/kernel/setup.c | 5 ++--- arch/riscv/mm/init.c | 6 ++---- arch/x86/kernel/setup.c | 6 ++---- include/linux/crash_reserve.h | 11 +++++------ kernel/crash_reserve.c | 9 ++++----- 6 files changed, 17 insertions(+), 26 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ccdef53872a0..2e496209af80 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -98,21 +98,19 @@ static void __init arch_reserve_crashkernel(void) { unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c index 90cb3ca96f08..b99fbb388fe0 100644 --- a/arch/loongarch/kernel/setup.c +++ b/arch/loongarch/kernel/setup.c @@ -259,18 +259,17 @@ static void __init arch_reserve_crashkernel(void) int ret; unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static void __init fdt_setup(void) diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 15b2eda4c364..3ec9bfaa088a 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1396,21 +1396,19 @@ static void __init arch_reserve_crashkernel(void) { unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } void __init paging_init(void) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index cebee310e200..b8ca14d29b44 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -472,14 +472,13 @@ static void __init memblock_x86_reserve_range_setup_data(void) static void __init arch_reserve_crashkernel(void) { unsigned long long crash_base, crash_size, low_size = 0; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) @@ -490,8 +489,7 @@ static void __init arch_reserve_crashkernel(void) return; } - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static struct resource standard_io_resources[] = { diff --git a/include/linux/crash_reserve.h b/include/linux/crash_reserve.h index 5a9df944fb80..1fe7e7d1b214 100644 --- a/include/linux/crash_reserve.h +++ b/include/linux/crash_reserve.h @@ -32,13 +32,12 @@ int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, #define CRASH_ADDR_HIGH_MAX memblock_end_of_DRAM() #endif -void __init reserve_crashkernel_generic(char *cmdline, - unsigned long long crash_size, - unsigned long long crash_base, - unsigned long long crash_low_size, - bool high); +void __init reserve_crashkernel_generic(unsigned long long crash_size, + unsigned long long crash_base, + unsigned long long crash_low_size, + bool high); #else -static inline void __init reserve_crashkernel_generic(char *cmdline, +static inline void __init reserve_crashkernel_generic( unsigned long long crash_size, unsigned long long crash_base, unsigned long long crash_low_size, diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c index a620fb4b2116..aff7c0fdbefa 100644 --- a/kernel/crash_reserve.c +++ b/kernel/crash_reserve.c @@ -375,11 +375,10 @@ static int __init reserve_crashkernel_low(unsigned long long low_size) return 0; } -void __init reserve_crashkernel_generic(char *cmdline, - unsigned long long crash_size, - unsigned long long crash_base, - unsigned long long crash_low_size, - bool high) +void __init reserve_crashkernel_generic(unsigned long long crash_size, + unsigned long long crash_base, + unsigned long long crash_low_size, + bool high) { unsigned long long search_end = CRASH_ADDR_LOW_MAX, search_base = 0; bool fixed_base = false; -- cgit v1.2.3-59-g8ed1b From 9f0552c978f8950b4661f1222290fb57af811a8b Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Fri, 31 Jan 2025 17:08:26 +0530 Subject: crash: let arch decide usable memory range in reserved area Although the crashkernel area is reserved, on architectures like PowerPC, it is possible for the crashkernel reserved area to contain components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments over these components, PowerPC has its own set of APIs to locate holes in the crashkernel reserved area. Add an arch hook in the generic locate mem hole APIs so that architectures can handle such special regions in the crashkernel area while locating memory holes for kexec segments using generic APIs. With this, a lot of redundant arch-specific code can be removed, as it performs the exact same job as the generic APIs. To keep the generic and arch-specific changes separate, the changes related to moving PowerPC to use the generic APIs and the removal of PowerPC-specific APIs for memory hole allocation are done in a subsequent patch titled "powerpc/crash: Use generic APIs to locate memory hole for kdump. Link: https://lkml.kernel.org/r/20250131113830.925179-4-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Acked-by: Baoquan He Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Signed-off-by: Andrew Morton --- include/linux/kexec.h | 9 +++++++++ kernel/kexec_file.c | 12 ++++++++++++ 2 files changed, 21 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index f0e9f8eda7a3..407f8b0346aa 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -205,6 +205,15 @@ static inline int arch_kimage_file_post_load_cleanup(struct kimage *image) } #endif +#ifndef arch_check_excluded_range +static inline int arch_check_excluded_range(struct kimage *image, + unsigned long start, + unsigned long end) +{ + return 0; +} +#endif + #ifdef CONFIG_KEXEC_SIG #ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION int kexec_kernel_verify_pe_sig(const char *kernel, unsigned long kernel_len); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 3eedb8c226ad..fba686487e3b 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -464,6 +464,12 @@ static int locate_mem_hole_top_down(unsigned long start, unsigned long end, continue; } + /* Make sure this does not conflict with exclude range */ + if (arch_check_excluded_range(image, temp_start, temp_end)) { + temp_start = temp_start - PAGE_SIZE; + continue; + } + /* We found a suitable memory range */ break; } while (1); @@ -498,6 +504,12 @@ static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end, continue; } + /* Make sure this does not conflict with exclude range */ + if (arch_check_excluded_range(image, temp_start, temp_end)) { + temp_start = temp_start + PAGE_SIZE; + continue; + } + /* We found a suitable memory range */ break; } while (1); -- cgit v1.2.3-59-g8ed1b From 6e5250eaa665fac681926251a8d7f5f418103399 Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Fri, 31 Jan 2025 17:08:27 +0530 Subject: powerpc/crash: use generic APIs to locate memory hole for kdump On PowerPC, the memory reserved for the crashkernel can contain components like RTAS, TCE, OPAL, etc., which should be avoided when loading kexec segments into crashkernel memory. Due to these special components, PowerPC has its own set of APIs to locate holes in the crashkernel memory for loading kexec segments for kdump. However, for loading kexec segments in the kexec case, PowerPC already uses generic APIs to locate holes. The previous patch in this series, titled "crash: Let arch decide usable memory range in reserved area," introduced arch-specific hook to handle such special regions in the crashkernel area. So, switch PowerPC to use the generic APIs to locate memory holes for kdump and remove the redundant PowerPC-specific APIs. Link: https://lkml.kernel.org/r/20250131113830.925179-5-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Cc: Baoquan he Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Signed-off-by: Andrew Morton --- arch/powerpc/include/asm/kexec.h | 6 +- arch/powerpc/kexec/file_load_64.c | 259 ++------------------------------------ 2 files changed, 13 insertions(+), 252 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 601e569303e1..1b800df5eb30 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -94,8 +94,10 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long int arch_kimage_file_post_load_cleanup(struct kimage *image); #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup -int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); -#define arch_kexec_locate_mem_hole arch_kexec_locate_mem_hole +int arch_check_excluded_range(struct kimage *image, unsigned long start, + unsigned long end); +#define arch_check_excluded_range arch_check_excluded_range + int load_crashdump_segments_ppc64(struct kimage *image, struct kexec_buf *kbuf); diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index dc65c1391157..e7ef8b2a2554 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -49,201 +49,18 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { NULL }; -/** - * __locate_mem_hole_top_down - Looks top down for a large enough memory hole - * in the memory regions between buf_min & buf_max - * for the buffer. If found, sets kbuf->mem. - * @kbuf: Buffer contents and memory parameters. - * @buf_min: Minimum address for the buffer. - * @buf_max: Maximum address for the buffer. - * - * Returns 0 on success, negative errno on error. - */ -static int __locate_mem_hole_top_down(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max) -{ - int ret = -EADDRNOTAVAIL; - phys_addr_t start, end; - u64 i; - - for_each_mem_range_rev(i, &start, &end) { - /* - * memblock uses [start, end) convention while it is - * [start, end] here. Fix the off-by-one to have the - * same convention. - */ - end -= 1; - - if (start > buf_max) - continue; - - /* Memory hole not found */ - if (end < buf_min) - break; - - /* Adjust memory region based on the given range */ - if (start < buf_min) - start = buf_min; - if (end > buf_max) - end = buf_max; - - start = ALIGN(start, kbuf->buf_align); - if (start < end && (end - start + 1) >= kbuf->memsz) { - /* Suitable memory range found. Set kbuf->mem */ - kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1, - kbuf->buf_align); - ret = 0; - break; - } - } - - return ret; -} - -/** - * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a - * suitable buffer with top down approach. - * @kbuf: Buffer contents and memory parameters. - * @buf_min: Minimum address for the buffer. - * @buf_max: Maximum address for the buffer. - * @emem: Exclude memory ranges. - * - * Returns 0 on success, negative errno on error. - */ -static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max, - const struct crash_mem *emem) +int arch_check_excluded_range(struct kimage *image, unsigned long start, + unsigned long end) { - int i, ret = 0, err = -EADDRNOTAVAIL; - u64 start, end, tmin, tmax; - - tmax = buf_max; - for (i = (emem->nr_ranges - 1); i >= 0; i--) { - start = emem->ranges[i].start; - end = emem->ranges[i].end; - - if (start > tmax) - continue; - - if (end < tmax) { - tmin = (end < buf_min ? buf_min : end + 1); - ret = __locate_mem_hole_top_down(kbuf, tmin, tmax); - if (!ret) - return 0; - } - - tmax = start - 1; - - if (tmax < buf_min) { - ret = err; - break; - } - ret = 0; - } - - if (!ret) { - tmin = buf_min; - ret = __locate_mem_hole_top_down(kbuf, tmin, tmax); - } - return ret; -} - -/** - * __locate_mem_hole_bottom_up - Looks bottom up for a large enough memory hole - * in the memory regions between buf_min & buf_max - * for the buffer. If found, sets kbuf->mem. - * @kbuf: Buffer contents and memory parameters. - * @buf_min: Minimum address for the buffer. - * @buf_max: Maximum address for the buffer. - * - * Returns 0 on success, negative errno on error. - */ -static int __locate_mem_hole_bottom_up(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max) -{ - int ret = -EADDRNOTAVAIL; - phys_addr_t start, end; - u64 i; - - for_each_mem_range(i, &start, &end) { - /* - * memblock uses [start, end) convention while it is - * [start, end] here. Fix the off-by-one to have the - * same convention. - */ - end -= 1; - - if (end < buf_min) - continue; - - /* Memory hole not found */ - if (start > buf_max) - break; - - /* Adjust memory region based on the given range */ - if (start < buf_min) - start = buf_min; - if (end > buf_max) - end = buf_max; - - start = ALIGN(start, kbuf->buf_align); - if (start < end && (end - start + 1) >= kbuf->memsz) { - /* Suitable memory range found. Set kbuf->mem */ - kbuf->mem = start; - ret = 0; - break; - } - } - - return ret; -} - -/** - * locate_mem_hole_bottom_up_ppc64 - Skip special memory regions to find a - * suitable buffer with bottom up approach. - * @kbuf: Buffer contents and memory parameters. - * @buf_min: Minimum address for the buffer. - * @buf_max: Maximum address for the buffer. - * @emem: Exclude memory ranges. - * - * Returns 0 on success, negative errno on error. - */ -static int locate_mem_hole_bottom_up_ppc64(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max, - const struct crash_mem *emem) -{ - int i, ret = 0, err = -EADDRNOTAVAIL; - u64 start, end, tmin, tmax; - - tmin = buf_min; - for (i = 0; i < emem->nr_ranges; i++) { - start = emem->ranges[i].start; - end = emem->ranges[i].end; - - if (end < tmin) - continue; - - if (start > tmin) { - tmax = (start > buf_max ? buf_max : start - 1); - ret = __locate_mem_hole_bottom_up(kbuf, tmin, tmax); - if (!ret) - return 0; - } - - tmin = end + 1; + struct crash_mem *emem; + int i; - if (tmin > buf_max) { - ret = err; - break; - } - ret = 0; - } + emem = image->arch.exclude_ranges; + for (i = 0; i < emem->nr_ranges; i++) + if (start < emem->ranges[i].end && end > emem->ranges[i].start) + return 1; - if (!ret) { - tmax = buf_max; - ret = __locate_mem_hole_bottom_up(kbuf, tmin, tmax); - } - return ret; + return 0; } #ifdef CONFIG_CRASH_DUMP @@ -1004,64 +821,6 @@ out: return ret; } -/** - * arch_kexec_locate_mem_hole - Skip special memory regions like rtas, opal, - * tce-table, reserved-ranges & such (exclude - * memory ranges) as they can't be used for kexec - * segment buffer. Sets kbuf->mem when a suitable - * memory hole is found. - * @kbuf: Buffer contents and memory parameters. - * - * Assumes minimum of PAGE_SIZE alignment for kbuf->memsz & kbuf->buf_align. - * - * Returns 0 on success, negative errno on error. - */ -int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) -{ - struct crash_mem **emem; - u64 buf_min, buf_max; - int ret; - - /* Look up the exclude ranges list while locating the memory hole */ - emem = &(kbuf->image->arch.exclude_ranges); - if (!(*emem) || ((*emem)->nr_ranges == 0)) { - pr_warn("No exclude range list. Using the default locate mem hole method\n"); - return kexec_locate_mem_hole(kbuf); - } - - buf_min = kbuf->buf_min; - buf_max = kbuf->buf_max; - /* Segments for kdump kernel should be within crashkernel region */ - if (IS_ENABLED(CONFIG_CRASH_DUMP) && kbuf->image->type == KEXEC_TYPE_CRASH) { - buf_min = (buf_min < crashk_res.start ? - crashk_res.start : buf_min); - buf_max = (buf_max > crashk_res.end ? - crashk_res.end : buf_max); - } - - if (buf_min > buf_max) { - pr_err("Invalid buffer min and/or max values\n"); - return -EINVAL; - } - - if (kbuf->top_down) - ret = locate_mem_hole_top_down_ppc64(kbuf, buf_min, buf_max, - *emem); - else - ret = locate_mem_hole_bottom_up_ppc64(kbuf, buf_min, buf_max, - *emem); - - /* Add the buffer allocated to the exclude list for the next lookup */ - if (!ret) { - add_mem_range(emem, kbuf->mem, kbuf->memsz); - sort_memory_ranges(*emem, true); - } else { - pr_err("Failed to locate memory buffer of size %lu\n", - kbuf->memsz); - } - return ret; -} - /** * arch_kexec_kernel_image_probe - Does additional handling needed to setup * kexec segments. -- cgit v1.2.3-59-g8ed1b From 021b5b303c5e9bc6634bc4fc4607586904dc7775 Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Fri, 31 Jan 2025 17:08:28 +0530 Subject: powerpc/crash: preserve user-specified memory limit Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system memory bug") fails crashkernel parsing if the crash size is found to be higher than system RAM, which makes the memory_limit adjustment code ineffective due to an early exit from reserve_crashkernel(). Regardless lets not violate the user-specified memory limit by adjusting it. Remove this adjustment to ensure all reservations stay within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update the user-specified memory limit") did the same for fadump. Link: https://lkml.kernel.org/r/20250131113830.925179-6-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Reviewed-by: Mahesh Salgaonkar Acked-by: Hari Bathini Cc: Baoquan he Cc: Madhavan Srinivasan Cc: Michael Ellerman Signed-off-by: Andrew Morton --- arch/powerpc/kexec/core.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index 58a930a47422..f9be7bb5aa08 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -128,14 +128,6 @@ void __init reserve_crashkernel(void) return; } - /* Crash kernel trumps memory limit */ - if (memory_limit && memory_limit <= crashk_res.end) { - memory_limit = crashk_res.end + 1; - total_mem_sz = memory_limit; - printk("Adjusted memory limit for crashkernel, now 0x%llx\n", - memory_limit); - } - printk(KERN_INFO "Reserving %ldMB of memory at %ldMB " "for crashkernel (System RAM: %ldMB)\n", (unsigned long)(crash_size >> 20), -- cgit v1.2.3-59-g8ed1b From bce074bdbc36e747b9e0783fe642f7b634cfb536 Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Fri, 31 Jan 2025 17:08:29 +0530 Subject: powerpc: insert System RAM resource to prevent crashkernel conflict The next patch in the series with title "powerpc/crash: use generic crashkernel reservation" enables powerpc to use generic crashkernel reservation instead of custom implementation. This leads to exporting of `Crash Kernel` memory to iomem_resource (/proc/iomem) via insert_crashkernel_resources():kernel/crash_reserve.c or at another place in the same file if HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY is set. The add_system_ram_resources():arch/powerpc/mm/mem.c adds `System RAM` to iomem_resource using request_resource(). This creates a conflict with `Crash Kernel`, which is added by the generic crashkernel reservation code. As a result, the kernel ultimately fails to add `System RAM` to iomem_resource. Consequently, it does not appear in /proc/iomem. There are multiple approches tried to avoid this: 1. Don't add Crash Kernel to iomem_resource: - This has two issues. First, it requires adding an architecture-specific hook in the generic code. There are already two code paths to choose when to add `Crash Kernel` to iomem_resource. This adds one more code path to skip it. Second, what if `Crash Kernel` is required in /proc/iomem in the future? Many architectures do export it. 2. Don't add `System RAM` to iomem_resource by reverting commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"): - It's not ideal to export `System RAM` via /proc/iomem, but since it already done ealier and userspace tools like kdump and kdump-utils rely on `System RAM` from /proc/iomem, removing it will break userspace. 3. Add Crash Kernel along with System RAM to /proc/iomem: This patch takes the third approach by updating add_system_ram_resources() to use insert_resource() instead of the request_resource() API to add the `System RAM` resource to iomem_resource. insert_resource() allows inserting resources even if they overlap with existing ones. Since `Crash Kernel` and `System RAM` resources are added to iomem_resource early in the boot, any other conflict is not expected. With the changes introduced here and in the next patch, "powerpc/crash: use generic crashkernel reservation," /proc/iomem now exports `System RAM` and `Crash Kernel` as shown below: $ cat /proc/iomem 00000000-3ffffffff : System RAM 10000000-4fffffff : Crash kernel The kdump script is capable of handling `System RAM` and `Crash Kernel` in the above format. The same format is used in other architectures. Link: https://lkml.kernel.org/r/20250131113830.925179-7-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Cc: Baoquan he Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Mahesh Salgaonkar Signed-off-by: Andrew Morton --- arch/powerpc/mm/mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index c7708c8fad29..615966d71425 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -376,7 +376,7 @@ static int __init add_system_ram_resources(void) */ res->end = end - 1; res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - WARN_ON(request_resource(&iomem_resource, res) < 0); + WARN_ON(insert_resource(&iomem_resource, res) < 0); } } -- cgit v1.2.3-59-g8ed1b From e3185ee438c28ee926cb3ef26f3bfb0aae510606 Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Fri, 31 Jan 2025 17:08:30 +0530 Subject: powerpc/crash: use generic crashkernel reservation Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Along with moving to the generic crashkernel reservation, the code related to finding the base address for the crashkernel has been separated into its own function name get_crash_base() for better readability and maintainability. Link: https://lkml.kernel.org/r/20250131113830.925179-8-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Reviewed-by: Mahesh Salgaonkar Acked-by: Hari Bathini Cc: Baoquan he Cc: Madhavan Srinivasan Cc: Michael Ellerman Signed-off-by: Andrew Morton --- arch/powerpc/Kconfig | 3 ++ arch/powerpc/include/asm/crash_reserve.h | 8 +++ arch/powerpc/include/asm/kexec.h | 4 +- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kexec/core.c | 90 ++++++++++++++------------------ 5 files changed, 53 insertions(+), 54 deletions(-) create mode 100644 arch/powerpc/include/asm/crash_reserve.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 424f188e62d9..b7e00bb8e556 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -721,6 +721,9 @@ config ARCH_SUPPORTS_CRASH_HOTPLUG def_bool y depends on PPC64 +config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + def_bool CRASH_RESERVE + config FA_DUMP bool "Firmware-assisted dump" depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV) diff --git a/arch/powerpc/include/asm/crash_reserve.h b/arch/powerpc/include/asm/crash_reserve.h new file mode 100644 index 000000000000..6467ce29b1fa --- /dev/null +++ b/arch/powerpc/include/asm/crash_reserve.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_CRASH_RESERVE_H +#define _ASM_POWERPC_CRASH_RESERVE_H + +/* crash kernel regions are Page size agliged */ +#define CRASH_ALIGN PAGE_SIZE + +#endif /* _ASM_POWERPC_CRASH_RESERVE_H */ diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 1b800df5eb30..70f2f0517509 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -114,9 +114,9 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem #ifdef CONFIG_CRASH_RESERVE int __init overlaps_crashkernel(unsigned long start, unsigned long size); -extern void reserve_crashkernel(void); +extern void arch_reserve_crashkernel(void); #else -static inline void reserve_crashkernel(void) {} +static inline void arch_reserve_crashkernel(void) {} static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } #endif diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index e0059842a1c6..9ed9dde7d231 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -860,7 +860,7 @@ void __init early_init_devtree(void *params) */ if (fadump_reserve_mem() == 0) #endif - reserve_crashkernel(); + arch_reserve_crashkernel(); early_reserve_mem(); if (memory_limit > memblock_phys_mem_size()) diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index f9be7bb5aa08..00e9c267b912 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -58,38 +58,20 @@ void machine_kexec(struct kimage *image) } #ifdef CONFIG_CRASH_RESERVE -void __init reserve_crashkernel(void) -{ - unsigned long long crash_size, crash_base, total_mem_sz; - int ret; - total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); - /* use common parsing */ - ret = parse_crashkernel(boot_command_line, total_mem_sz, - &crash_size, &crash_base, NULL, NULL); - if (ret == 0 && crash_size > 0) { - crashk_res.start = crash_base; - crashk_res.end = crash_base + crash_size - 1; - } - - if (crashk_res.end == crashk_res.start) { - crashk_res.start = crashk_res.end = 0; - return; - } - - /* We might have got these values via the command line or the - * device tree, either way sanitise them now. */ - - crash_size = resource_size(&crashk_res); +static unsigned long long __init get_crash_base(unsigned long long crash_base) +{ #ifndef CONFIG_NONSTATIC_KERNEL - if (crashk_res.start != KDUMP_KERNELBASE) + if (crash_base != KDUMP_KERNELBASE) printk("Crash kernel location must be 0x%x\n", KDUMP_KERNELBASE); - crashk_res.start = KDUMP_KERNELBASE; + return KDUMP_KERNELBASE; #else - if (!crashk_res.start) { + unsigned long long crash_base_align; + + if (!crash_base) { #ifdef CONFIG_PPC64 /* * On the LPAR platform place the crash kernel to mid of @@ -101,45 +83,51 @@ void __init reserve_crashkernel(void) * kernel starts at 128MB offset on other platforms. */ if (firmware_has_feature(FW_FEATURE_LPAR)) - crashk_res.start = min_t(u64, ppc64_rma_size / 2, SZ_512M); + crash_base = min_t(u64, ppc64_rma_size / 2, SZ_512M); else - crashk_res.start = min_t(u64, ppc64_rma_size / 2, SZ_128M); + crash_base = min_t(u64, ppc64_rma_size / 2, SZ_128M); #else - crashk_res.start = KDUMP_KERNELBASE; + crash_base = KDUMP_KERNELBASE; #endif } - crash_base = PAGE_ALIGN(crashk_res.start); - if (crash_base != crashk_res.start) { - printk("Crash kernel base must be aligned to 0x%lx\n", - PAGE_SIZE); - crashk_res.start = crash_base; - } + crash_base_align = PAGE_ALIGN(crash_base); + if (crash_base != crash_base_align) + pr_warn("Crash kernel base must be aligned to 0x%lx\n", PAGE_SIZE); + return crash_base_align; #endif - crash_size = PAGE_ALIGN(crash_size); - crashk_res.end = crashk_res.start + crash_size - 1; +} - /* The crash region must not overlap the current kernel */ - if (overlaps_crashkernel(__pa(_stext), _end - _stext)) { - printk(KERN_WARNING - "Crash kernel can not overlap current kernel\n"); - crashk_res.start = crashk_res.end = 0; +void __init arch_reserve_crashkernel(void) +{ + unsigned long long crash_size, crash_base, crash_end; + unsigned long long kernel_start, kernel_size; + unsigned long long total_mem_sz; + int ret; + + total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); + + /* use common parsing */ + ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, + &crash_base, NULL, NULL); + + if (ret) return; - } - printk(KERN_INFO "Reserving %ldMB of memory at %ldMB " - "for crashkernel (System RAM: %ldMB)\n", - (unsigned long)(crash_size >> 20), - (unsigned long)(crashk_res.start >> 20), - (unsigned long)(total_mem_sz >> 20)); + crash_base = get_crash_base(crash_base); + crash_end = crash_base + crash_size - 1; - if (!memblock_is_region_memory(crashk_res.start, crash_size) || - memblock_reserve(crashk_res.start, crash_size)) { - pr_err("Failed to reserve memory for crashkernel!\n"); - crashk_res.start = crashk_res.end = 0; + kernel_start = __pa(_stext); + kernel_size = _end - _stext; + + /* The crash region must not overlap the current kernel */ + if ((kernel_start + kernel_size > crash_base) && (kernel_start <= crash_end)) { + pr_warn("Crash kernel can not overlap current kernel\n"); return; } + + reserve_crashkernel_generic(crash_size, crash_base, 0, false); } int __init overlaps_crashkernel(unsigned long start, unsigned long size) -- cgit v1.2.3-59-g8ed1b From 9ad18c855518161d0a80f6a7de3ed495f746cb5d Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 3 Feb 2025 12:13:16 +0100 Subject: get_maintainer: add --substatus for reporting subsystem status Patch series "get_maintainer: report subsystem status separately", v2. The subsystem status (S: field) can inform a patch submitter if the subsystem is well maintained or e.g. maintainers are missing. In get_maintainer, it is currently reported with --role(stats) by adjusting the maintainer role for any status different from Maintained. This has two downsides: - if a subsystem has only reviewers or mailing lists and no maintainers, the status is not reported. For example Orphan subsystems typically have no maintainers so there's nobody to report as orphan minder. - the Supported status means that someone is paid for maintaining, but it is reported as "supporter" for all the maintainers, which can be incorrect (only some of them may be paid). People (including myself) have been also confused about what "supporter" means. The second point has been brought up in 2022 and the discussion in the end resulted in adjusting documentation only [1]. I however agree with Ted's points that it's misleading to take the subsystem status and apply it to all maintainers [2]. The attempt to modify get_maintainer output was retracted after Joe objected that the status becomes not reported at all [3]. This series addresses that concern by reporting the status (unless it's the most common Maintained one) on separate lines that follow the reported emails, using a new --substatus parameter. Care is taken to reduce the noise to minimum by not reporting the most common Maintained status, by default require no opt-in that would need the users to discover the new parameter, and at the same time not to break existing git --cc-cmd usage. [1] https://lore.kernel.org/all/20221006162413.858527-1-bryan.odonoghue@linaro.org/ [2] https://lore.kernel.org/all/Yzen4X1Na0MKXHs9@mit.edu/ [3] https://lore.kernel.org/all/30776fe75061951777da8fa6618ae89bea7a8ce4.camel@perches.com/ This patch (of 2): The subsystem status is currently reported with --role(stats) by adjusting the maintainer role for any status different from Maintained. This has two downsides: - if a subsystem has only reviewers or mailing lists and no maintainers, the status is not reported (i.e. typically, Orphan subsystems have no maintainers) - the Supported status means that someone is paid for maintaining, but it is reported as "supporter" for all the maintainers, which can be incorrect. People have been also confused about what "supporter" means. This patch introduces a new --substatus option and functionality aimed to report the subsystem status separately, without adjusting the reported maintainer role. After the e-mails are output, the status of subsystems will follow, for example: ... linux-kernel@vger.kernel.org (open list:LIBRARY CODE) LIBRARY CODE status: Supported In order to allow replacing the role rewriting seamlessly, the new option works as follows: - it is automatically enabled when --email and --role are enabled (the defaults include --email and --rolestats which implies --role) - usages with --norolestats e.g. for git's --cc-cmd will thus need no adjustments - the most common Maintained status is not reported at all, to reduce unnecessary noise - THE REST catch-all section (contains lkml) status is not reported - the existing --subsystem and --status options are unaffected so their users will need no adjustments [vbabka@suse.cz: require that script output goes to a terminal] Link: https://lkml.kernel.org/r/66c2bf7a-9119-4850-b6b8-ac8f426966e1@suse.cz Link: https://lkml.kernel.org/r/20250203-b4-get_maintainer-v2-0-83ba008b491f@suse.cz Link: https://lkml.kernel.org/r/20250203-b4-get_maintainer-v2-1-83ba008b491f@suse.cz Fixes: c1565b6f7b53 ("get_maintainer: add --substatus for reporting subsystem status") Closes: https://lore.kernel.org/all/7aodxv46lj6rthjo4i5zhhx2lybrhb4uknpej2dyz3e7im5w3w@w23bz6fx3jnn/ Signed-off-by: Vlastimil Babka Acked-by: Lorenzo Stoakes Tested-by: Geert Uytterhoeven Tested-by: Uwe Kleine-K=F6nig Cc: Bryan O'Donoghue Cc: Joe Perches Cc: Kees Cook Cc: Ted Ts'o Cc: Thorsten Leemhuis Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- scripts/get_maintainer.pl | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index 5ac02e198737..3eb922b4d22c 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -50,6 +50,7 @@ my $output_multiline = 1; my $output_separator = ", "; my $output_roles = 0; my $output_rolestats = 1; +my $output_substatus = undef; my $output_section_maxlen = 50; my $scm = 0; my $tree = 1; @@ -269,6 +270,7 @@ if (!GetOptions( 'separator=s' => \$output_separator, 'subsystem!' => \$subsystem, 'status!' => \$status, + 'substatus!' => \$output_substatus, 'scm!' => \$scm, 'tree!' => \$tree, 'web!' => \$web, @@ -314,6 +316,10 @@ $output_multiline = 0 if ($output_separator ne ", "); $output_rolestats = 1 if ($interactive); $output_roles = 1 if ($output_rolestats); +if (!defined $output_substatus) { + $output_substatus = $email && $output_roles && -t STDOUT; +} + if ($sections || $letters ne "") { $sections = 1; $email = 0; @@ -637,6 +643,7 @@ my @web = (); my @bug = (); my @subsystem = (); my @status = (); +my @substatus = (); my %deduplicate_name_hash = (); my %deduplicate_address_hash = (); @@ -651,6 +658,11 @@ if ($scm) { output(@scm); } +if ($output_substatus) { + @substatus = uniq(@substatus); + output(@substatus); +} + if ($status) { @status = uniq(@status); output(@status); @@ -859,6 +871,7 @@ sub get_maintainers { @bug = (); @subsystem = (); @status = (); + @substatus = (); %deduplicate_name_hash = (); %deduplicate_address_hash = (); if ($email_git_all_signature_types) { @@ -1073,6 +1086,7 @@ MAINTAINER field selection options: --remove-duplicates => minimize duplicate email names/addresses --roles => show roles (status:subsystem, git-signer, list, etc...) --rolestats => show roles and statistics (commits/total_commits, %) + --substatus => show subsystem status if not Maintained (default: match --roles when output is tty)" --file-emails => add email addresses found in -f file (default: 0 (off)) --fixes => for patches, add signatures of commits with 'Fixes: ' (default: 1 (on)) --scm => print SCM tree(s) if any @@ -1335,7 +1349,9 @@ sub add_categories { my $start = find_starting_index($index); my $end = find_ending_index($index); - push(@subsystem, $typevalue[$start]); + my $subsystem = $typevalue[$start]; + push(@subsystem, $subsystem); + my $status = "Unknown"; for ($i = $start + 1; $i < $end; $i++) { my $tv = $typevalue[$i]; @@ -1386,8 +1402,8 @@ sub add_categories { } } elsif ($ptype eq "R") { if ($email_reviewer) { - my $subsystem = get_subsystem_name($i); - push_email_addresses($pvalue, "reviewer:$subsystem" . $suffix); + my $subs = get_subsystem_name($i); + push_email_addresses($pvalue, "reviewer:$subs" . $suffix); } } elsif ($ptype eq "T") { push(@scm, $pvalue . $suffix); @@ -1397,9 +1413,14 @@ sub add_categories { push(@bug, $pvalue . $suffix); } elsif ($ptype eq "S") { push(@status, $pvalue . $suffix); + $status = $pvalue; } } } + + if ($subsystem ne "THE REST" and $status ne "Maintained") { + push(@substatus, $subsystem . " status: " . $status . $suffix) + } } sub email_inuse { @@ -1903,6 +1924,7 @@ EOT $done = 1; $output_rolestats = 0; $output_roles = 0; + $output_substatus = 0; last; } elsif ($nr =~ /^\d+$/ && $nr > 0 && $nr <= $count) { $selected{$nr - 1} = !$selected{$nr - 1}; -- cgit v1.2.3-59-g8ed1b From 6ba31721aeef048922672d49d48a7f3826005f7d Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 3 Feb 2025 12:13:17 +0100 Subject: get_maintainer: stop reporting subsystem status as maintainer role After introducing the --substatus option, we can stop adjusting the reported maintainer role by the subsystem's status. For compatibility with the --git-chief-penguins option, keep the "chief penguin" role. Link: https://lkml.kernel.org/r/20250203-b4-get_maintainer-v2-2-83ba008b491f@suse.cz Signed-off-by: Vlastimil Babka Reviewed-by: Lorenzo Stoakes Tested-by: Lorenzo Stoakes Cc: Bryan O'Donoghue Cc: Joe Perches Cc: Kees Cook Cc: Ted Ts'o Cc: Thorsten Leemhuis Signed-off-by: Andrew Morton --- scripts/get_maintainer.pl | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index 3eb922b4d22c..4414194bedcf 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -1084,7 +1084,7 @@ MAINTAINER field selection options: --moderated => include moderated lists(s) if any (default: true) --s => include subscriber only list(s) if any (default: false) --remove-duplicates => minimize duplicate email names/addresses - --roles => show roles (status:subsystem, git-signer, list, etc...) + --roles => show roles (role:subsystem, git-signer, list, etc...) --rolestats => show roles and statistics (commits/total_commits, %) --substatus => show subsystem status if not Maintained (default: match --roles when output is tty)" --file-emails => add email addresses found in -f file (default: 0 (off)) @@ -1298,8 +1298,9 @@ sub get_maintainer_role { my $start = find_starting_index($index); my $end = find_ending_index($index); - my $role = "unknown"; + my $role = "maintainer"; my $subsystem = get_subsystem_name($index); + my $status = "unknown"; for ($i = $start + 1; $i < $end; $i++) { my $tv = $typevalue[$i]; @@ -1307,23 +1308,13 @@ sub get_maintainer_role { my $ptype = $1; my $pvalue = $2; if ($ptype eq "S") { - $role = $pvalue; + $status = $pvalue; } } } - $role = lc($role); - if ($role eq "supported") { - $role = "supporter"; - } elsif ($role eq "maintained") { - $role = "maintainer"; - } elsif ($role eq "odd fixes") { - $role = "odd fixer"; - } elsif ($role eq "orphan") { - $role = "orphan minder"; - } elsif ($role eq "obsolete") { - $role = "obsolete minder"; - } elsif ($role eq "buried alive in reporters") { + $status = lc($status); + if ($status eq "buried alive in reporters") { $role = "chief penguin"; } -- cgit v1.2.3-59-g8ed1b From b115b6eccdf5a09909b1db6f1c37074923337f5c Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Wed, 5 Feb 2025 16:29:32 -0500 Subject: lib/zlib: drop EQUAL macro The macro is prehistoric, and only exists to help those readers who don't know what memcmp() returns if memory areas differ. This is pretty well documented, so the macro looks excessive. Now that the only user of the macro depends on DEBUG_ZLIB config, GCC warns about unused macro if the library is built with W=2 against defconfig. So drop it for good. Link: https://lkml.kernel.org/r/20250205212933.68695-1-yury.norov@gmail.com Signed-off-by: Yury Norov Reviewed-by: Sergey Senozhatsky Reviewed-by: Mikhail Zaslonko Cc: Heiko Carsten Cc: Ilya Leoshkevich Cc: Joe Perches Signed-off-by: Andrew Morton --- lib/zlib_deflate/deflate.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/lib/zlib_deflate/deflate.c b/lib/zlib_deflate/deflate.c index 3a1d8d34182e..8fb2a3e17c0e 100644 --- a/lib/zlib_deflate/deflate.c +++ b/lib/zlib_deflate/deflate.c @@ -151,9 +151,6 @@ static const config configuration_table[10] = { * meaning. */ -#define EQUAL 0 -/* result of memcmp for equal strings */ - /* =========================================================================== * Update a hash value with the given input byte * IN assertion: all calls to UPDATE_HASH are made with consecutive @@ -713,8 +710,7 @@ static void check_match( ) { /* check that the match is indeed a match */ - if (memcmp((char *)s->window + match, - (char *)s->window + start, length) != EQUAL) { + if (memcmp((char *)s->window + match, (char *)s->window + start, length)) { fprintf(stderr, " start %u, match %u, length %d\n", start, match, length); do { -- cgit v1.2.3-59-g8ed1b From 8c6bbda879b62f16bb03321a84554b4f63415c55 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 3 Feb 2025 16:05:22 +0100 Subject: rcu: provide a static initializer for hlist_nulls_head Patch series "ucount: Simplify refcounting with rcuref_t". I noticed that the atomic_dec_and_lock_irqsave() in put_ucounts() loops sometimes even during boot. Something like 2-3 iterations but still. This series replaces the refcounting with rcuref_t and adds a RCU lookup. This allows a lockless lookup in alloc_ucounts() if the entry is available and a cmpxchg()less put of the item. This patch (of 4): Provide a static initializer for hlist_nulls_head so that it can be used in statically defined data structures. Link: https://lkml.kernel.org/r/20250203150525.456525-1-bigeasy@linutronix.de Link: https://lkml.kernel.org/r/20250203150525.456525-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Paul E. McKenney Cc: Thomas Gleixner Cc: Boqun Feng Cc: Steven Rostedt Cc: Joel Fernandes Cc: Josh Triplett Cc: Lai jiangshan Cc: Mathieu Desnoyers Cc: Mengen Sun Cc: "Paul E . McKenney" Cc: "Uladzislau Rezki (Sony)" Cc: YueHong Wu Cc: Zqiang Signed-off-by: Andrew Morton --- include/linux/list_nulls.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/list_nulls.h b/include/linux/list_nulls.h index fa6e8471bd22..248db9b77ee2 100644 --- a/include/linux/list_nulls.h +++ b/include/linux/list_nulls.h @@ -28,6 +28,7 @@ struct hlist_nulls_node { #define NULLS_MARKER(value) (1UL | (((long)value) << 1)) #define INIT_HLIST_NULLS_HEAD(ptr, nulls) \ ((ptr)->first = (struct hlist_nulls_node *) NULLS_MARKER(nulls)) +#define HLIST_NULLS_HEAD_INIT(nulls) {.first = (struct hlist_nulls_node *)NULLS_MARKER(nulls)} #define hlist_nulls_entry(ptr, type, member) container_of(ptr,type,member) -- cgit v1.2.3-59-g8ed1b From 328152e6774d9d801ad1d90af557b9113647b379 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 3 Feb 2025 16:05:23 +0100 Subject: ucount: replace get_ucounts_or_wrap() with atomic_inc_not_zero() get_ucounts_or_wrap() increments the counter and if the counter is negative then it decrements it again in order to reset the previous increment. This statement can be replaced with atomic_inc_not_zero() to only increment the counter if it is not yet 0. This simplifies the get function because the put (if the get failed) can be removed. atomic_inc_not_zero() is implement as a cmpxchg() loop which can be repeated several times if another get/put is performed in parallel. This will be optimized later. Increment the reference counter only if not yet dropped to zero. Link: https://lkml.kernel.org/r/20250203150525.456525-3-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Paul E. McKenney Cc: Thomas Gleixner Cc: Boqun Feng Cc: Joel Fernandes Cc: Josh Triplett Cc: Lai jiangshan Cc: Mathieu Desnoyers Cc: Mengen Sun Cc: Steven Rostedt Cc: "Uladzislau Rezki (Sony)" Cc: YueHong Wu Cc: Zqiang Signed-off-by: Andrew Morton --- kernel/ucount.c | 24 ++++++------------------ 1 file changed, 6 insertions(+), 18 deletions(-) diff --git a/kernel/ucount.c b/kernel/ucount.c index 86c5f1c0bad9..4aa501153825 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -146,25 +146,16 @@ static void hlist_add_ucounts(struct ucounts *ucounts) spin_unlock_irq(&ucounts_lock); } -static inline bool get_ucounts_or_wrap(struct ucounts *ucounts) -{ - /* Returns true on a successful get, false if the count wraps. */ - return !atomic_add_negative(1, &ucounts->count); -} - struct ucounts *get_ucounts(struct ucounts *ucounts) { - if (!get_ucounts_or_wrap(ucounts)) { - put_ucounts(ucounts); - ucounts = NULL; - } - return ucounts; + if (atomic_inc_not_zero(&ucounts->count)) + return ucounts; + return NULL; } struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) { struct hlist_head *hashent = ucounts_hashentry(ns, uid); - bool wrapped; struct ucounts *ucounts, *new = NULL; spin_lock_irq(&ucounts_lock); @@ -189,14 +180,11 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) return new; } } - - wrapped = !get_ucounts_or_wrap(ucounts); + if (!atomic_inc_not_zero(&ucounts->count)) + ucounts = NULL; spin_unlock_irq(&ucounts_lock); kfree(new); - if (wrapped) { - put_ucounts(ucounts); - return NULL; - } + return ucounts; } -- cgit v1.2.3-59-g8ed1b From 5f01a22c5b231dd590f61a2591b3090665733bcb Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 3 Feb 2025 16:05:24 +0100 Subject: ucount: use RCU for ucounts lookups The ucounts element is looked up under ucounts_lock. This can be optimized by using RCU for a lockless lookup and return and element if the reference can be obtained. Replace hlist_head with hlist_nulls_head which is RCU compatible. Let find_ucounts() search for the required item within a RCU section and return the item if a reference could be obtained. This means alloc_ucounts() will always return an element (unless the memory allocation failed). Let put_ucounts() RCU free the element if the reference counter dropped to zero. Link: https://lkml.kernel.org/r/20250203150525.456525-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Paul E. McKenney Cc: Thomas Gleixner Cc: Boqun Feng Cc: Joel Fernandes Cc: Josh Triplett Cc: Lai jiangshan Cc: Mathieu Desnoyers Cc: Mengen Sun Cc: Steven Rostedt Cc: "Uladzislau Rezki (Sony)" Cc: YueHong Wu Cc: Zqiang Signed-off-by: Andrew Morton --- include/linux/user_namespace.h | 4 ++- kernel/ucount.c | 75 ++++++++++++++++++++++-------------------- 2 files changed, 43 insertions(+), 36 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 7183e5aca282..ad4dbef92597 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -115,9 +116,10 @@ struct user_namespace { } __randomize_layout; struct ucounts { - struct hlist_node node; + struct hlist_nulls_node node; struct user_namespace *ns; kuid_t uid; + struct rcu_head rcu; atomic_t count; atomic_long_t ucount[UCOUNT_COUNTS]; atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS]; diff --git a/kernel/ucount.c b/kernel/ucount.c index 4aa501153825..b6abaf68cdcc 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -15,7 +15,10 @@ struct ucounts init_ucounts = { }; #define UCOUNTS_HASHTABLE_BITS 10 -static struct hlist_head ucounts_hashtable[(1 << UCOUNTS_HASHTABLE_BITS)]; +#define UCOUNTS_HASHTABLE_ENTRIES (1 << UCOUNTS_HASHTABLE_BITS) +static struct hlist_nulls_head ucounts_hashtable[UCOUNTS_HASHTABLE_ENTRIES] = { + [0 ... UCOUNTS_HASHTABLE_ENTRIES - 1] = HLIST_NULLS_HEAD_INIT(0) +}; static DEFINE_SPINLOCK(ucounts_lock); #define ucounts_hashfn(ns, uid) \ @@ -24,7 +27,6 @@ static DEFINE_SPINLOCK(ucounts_lock); #define ucounts_hashentry(ns, uid) \ (ucounts_hashtable + ucounts_hashfn(ns, uid)) - #ifdef CONFIG_SYSCTL static struct ctl_table_set * set_lookup(struct ctl_table_root *root) @@ -127,22 +129,28 @@ void retire_userns_sysctls(struct user_namespace *ns) #endif } -static struct ucounts *find_ucounts(struct user_namespace *ns, kuid_t uid, struct hlist_head *hashent) +static struct ucounts *find_ucounts(struct user_namespace *ns, kuid_t uid, + struct hlist_nulls_head *hashent) { struct ucounts *ucounts; + struct hlist_nulls_node *pos; - hlist_for_each_entry(ucounts, hashent, node) { - if (uid_eq(ucounts->uid, uid) && (ucounts->ns == ns)) - return ucounts; + guard(rcu)(); + hlist_nulls_for_each_entry_rcu(ucounts, pos, hashent, node) { + if (uid_eq(ucounts->uid, uid) && (ucounts->ns == ns)) { + if (atomic_inc_not_zero(&ucounts->count)) + return ucounts; + } } return NULL; } static void hlist_add_ucounts(struct ucounts *ucounts) { - struct hlist_head *hashent = ucounts_hashentry(ucounts->ns, ucounts->uid); + struct hlist_nulls_head *hashent = ucounts_hashentry(ucounts->ns, ucounts->uid); + spin_lock_irq(&ucounts_lock); - hlist_add_head(&ucounts->node, hashent); + hlist_nulls_add_head_rcu(&ucounts->node, hashent); spin_unlock_irq(&ucounts_lock); } @@ -155,37 +163,33 @@ struct ucounts *get_ucounts(struct ucounts *ucounts) struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) { - struct hlist_head *hashent = ucounts_hashentry(ns, uid); - struct ucounts *ucounts, *new = NULL; + struct hlist_nulls_head *hashent = ucounts_hashentry(ns, uid); + struct ucounts *ucounts, *new; + + ucounts = find_ucounts(ns, uid, hashent); + if (ucounts) + return ucounts; + + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return NULL; + + new->ns = ns; + new->uid = uid; + atomic_set(&new->count, 1); spin_lock_irq(&ucounts_lock); ucounts = find_ucounts(ns, uid, hashent); - if (!ucounts) { + if (ucounts) { spin_unlock_irq(&ucounts_lock); - - new = kzalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - new->ns = ns; - new->uid = uid; - atomic_set(&new->count, 1); - - spin_lock_irq(&ucounts_lock); - ucounts = find_ucounts(ns, uid, hashent); - if (!ucounts) { - hlist_add_head(&new->node, hashent); - get_user_ns(new->ns); - spin_unlock_irq(&ucounts_lock); - return new; - } + kfree(new); + return ucounts; } - if (!atomic_inc_not_zero(&ucounts->count)) - ucounts = NULL; - spin_unlock_irq(&ucounts_lock); - kfree(new); - return ucounts; + hlist_nulls_add_head_rcu(&new->node, hashent); + get_user_ns(new->ns); + spin_unlock_irq(&ucounts_lock); + return new; } void put_ucounts(struct ucounts *ucounts) @@ -193,10 +197,11 @@ void put_ucounts(struct ucounts *ucounts) unsigned long flags; if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { - hlist_del_init(&ucounts->node); + hlist_nulls_del_rcu(&ucounts->node); spin_unlock_irqrestore(&ucounts_lock, flags); + put_user_ns(ucounts->ns); - kfree(ucounts); + kfree_rcu(ucounts, rcu); } } -- cgit v1.2.3-59-g8ed1b From b4dc0bee2a749083028afba346910e198653f42a Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 3 Feb 2025 16:05:25 +0100 Subject: ucount: use rcuref_t for reference counting Use rcuref_t for reference counting. This eliminates the cmpxchg loop in the get and put path. This also eliminates the need to acquire the lock in the put path because once the final user returns the reference, it can no longer be obtained anymore. Use rcuref_t for reference counting. Link: https://lkml.kernel.org/r/20250203150525.456525-5-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Paul E. McKenney Cc: Thomas Gleixner Cc: Boqun Feng Cc: Joel Fernandes Cc: Josh Triplett Cc: Lai jiangshan Cc: Mathieu Desnoyers Cc: Mengen Sun Cc: Steven Rostedt Cc: "Uladzislau Rezki (Sony)" Cc: YueHong Wu Cc: Zqiang Signed-off-by: Andrew Morton --- include/linux/user_namespace.h | 11 +++++++++-- kernel/ucount.c | 16 +++++----------- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index ad4dbef92597..a0bb6d012137 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -120,7 +121,7 @@ struct ucounts { struct user_namespace *ns; kuid_t uid; struct rcu_head rcu; - atomic_t count; + rcuref_t count; atomic_long_t ucount[UCOUNT_COUNTS]; atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS]; }; @@ -133,9 +134,15 @@ void retire_userns_sysctls(struct user_namespace *ns); struct ucounts *inc_ucount(struct user_namespace *ns, kuid_t uid, enum ucount_type type); void dec_ucount(struct ucounts *ucounts, enum ucount_type type); struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid); -struct ucounts * __must_check get_ucounts(struct ucounts *ucounts); void put_ucounts(struct ucounts *ucounts); +static inline struct ucounts * __must_check get_ucounts(struct ucounts *ucounts) +{ + if (rcuref_get(&ucounts->count)) + return ucounts; + return NULL; +} + static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type type) { return atomic_long_read(&ucounts->rlimit[type]); diff --git a/kernel/ucount.c b/kernel/ucount.c index b6abaf68cdcc..8686e329b8f2 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -11,7 +11,7 @@ struct ucounts init_ucounts = { .ns = &init_user_ns, .uid = GLOBAL_ROOT_UID, - .count = ATOMIC_INIT(1), + .count = RCUREF_INIT(1), }; #define UCOUNTS_HASHTABLE_BITS 10 @@ -138,7 +138,7 @@ static struct ucounts *find_ucounts(struct user_namespace *ns, kuid_t uid, guard(rcu)(); hlist_nulls_for_each_entry_rcu(ucounts, pos, hashent, node) { if (uid_eq(ucounts->uid, uid) && (ucounts->ns == ns)) { - if (atomic_inc_not_zero(&ucounts->count)) + if (rcuref_get(&ucounts->count)) return ucounts; } } @@ -154,13 +154,6 @@ static void hlist_add_ucounts(struct ucounts *ucounts) spin_unlock_irq(&ucounts_lock); } -struct ucounts *get_ucounts(struct ucounts *ucounts) -{ - if (atomic_inc_not_zero(&ucounts->count)) - return ucounts; - return NULL; -} - struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) { struct hlist_nulls_head *hashent = ucounts_hashentry(ns, uid); @@ -176,7 +169,7 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) new->ns = ns; new->uid = uid; - atomic_set(&new->count, 1); + rcuref_init(&new->count, 1); spin_lock_irq(&ucounts_lock); ucounts = find_ucounts(ns, uid, hashent); @@ -196,7 +189,8 @@ void put_ucounts(struct ucounts *ucounts) { unsigned long flags; - if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { + if (rcuref_put(&ucounts->count)) { + spin_lock_irqsave(&ucounts_lock, flags); hlist_nulls_del_rcu(&ucounts->node); spin_unlock_irqrestore(&ucounts_lock, flags); -- cgit v1.2.3-59-g8ed1b From a406aff8c05115119127c962cbbbbd202e1973ef Mon Sep 17 00:00:00 2001 From: Vasiliy Kovalev Date: Fri, 14 Feb 2025 11:49:08 +0300 Subject: ocfs2: validate l_tree_depth to avoid out-of-bounds access The l_tree_depth field is 16-bit (__le16), but the actual maximum depth is limited to OCFS2_MAX_PATH_DEPTH. Add a check to prevent out-of-bounds access if l_tree_depth has an invalid value, which may occur when reading from a corrupted mounted disk [1]. Link: https://lkml.kernel.org/r/20250214084908.736528-1-kovalev@altlinux.org Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Vasiliy Kovalev Reported-by: syzbot+66c146268dc88f4341fd@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=66c146268dc88f4341fd [1] Reviewed-by: Joseph Qi Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Jun Piao Cc: Kurt Hackel Cc: Mark Fasheh Cc: Vasiliy Kovalev Signed-off-by: Andrew Morton --- fs/ocfs2/alloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index 4414743b638e..b8ac85b548c7 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -1803,6 +1803,14 @@ static int __ocfs2_find_path(struct ocfs2_caching_info *ci, el = root_el; while (el->l_tree_depth) { + if (unlikely(le16_to_cpu(el->l_tree_depth) >= OCFS2_MAX_PATH_DEPTH)) { + ocfs2_error(ocfs2_metadata_cache_get_super(ci), + "Owner %llu has invalid tree depth %u in extent list\n", + (unsigned long long)ocfs2_metadata_cache_owner(ci), + le16_to_cpu(el->l_tree_depth)); + ret = -EROFS; + goto out; + } if (le16_to_cpu(el->l_next_free_rec) == 0) { ocfs2_error(ocfs2_metadata_cache_get_super(ci), "Owner %llu has empty extent list at depth %u\n", -- cgit v1.2.3-59-g8ed1b From dbc3b6320eb3f7d775d4da41de693109060856d6 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 13 Feb 2025 21:45:30 +0000 Subject: ocfs2: use memcpy_to_folio() in ocfs2_symlink_get_block() Replace use of kmap_atomic() with the higher-level construct memcpy_to_folio(). This removes a use of b_page and supports large folios as well as being easier to understand. It also removes the check for kmap_atomic() failing (because it can't). Link: https://lkml.kernel.org/r/20250213214533.2242224-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Joseph Qi Cc: Mark Tinguely Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Jun Piao Signed-off-by: Andrew Morton --- fs/ocfs2/aops.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 5bbeb6fbb1ac..ccf2930cd2e6 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -46,7 +46,6 @@ static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh = NULL; struct buffer_head *buffer_cache_bh = NULL; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); - void *kaddr; trace_ocfs2_symlink_get_block( (unsigned long long)OCFS2_I(inode)->ip_blkno, @@ -91,17 +90,11 @@ static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, * could've happened. Since we've got a reference on * the bh, even if it commits while we're doing the * copy, the data is still good. */ - if (buffer_jbd(buffer_cache_bh) - && ocfs2_inode_is_new(inode)) { - kaddr = kmap_atomic(bh_result->b_page); - if (!kaddr) { - mlog(ML_ERROR, "couldn't kmap!\n"); - goto bail; - } - memcpy(kaddr + (bh_result->b_size * iblock), - buffer_cache_bh->b_data, - bh_result->b_size); - kunmap_atomic(kaddr); + if (buffer_jbd(buffer_cache_bh) && ocfs2_inode_is_new(inode)) { + memcpy_to_folio(bh_result->b_folio, + bh_result->b_size * iblock, + buffer_cache_bh->b_data, + bh_result->b_size); set_buffer_uptodate(bh_result); } brelse(buffer_cache_bh); -- cgit v1.2.3-59-g8ed1b From bd0ee47da40afc73221de8d5490375931b1f93ee Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 13 Feb 2025 21:45:31 +0000 Subject: ocfs2: remove reference to bh->b_page Buffer heads are attached to folios, not to pages. Also flush_dcache_page() is now deprecated in favour of flush_dcache_folio(). Link: https://lkml.kernel.org/r/20250213214533.2242224-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Joseph Qi Cc: Changwei Ge Cc: Joel Becker Cc: Jun Piao Cc: Junxiao Bi Cc: Mark Fasheh Cc: Mark Tinguely Signed-off-by: Andrew Morton --- fs/ocfs2/quota_global.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c index 15d9acd456ec..e85b1ccf81be 100644 --- a/fs/ocfs2/quota_global.c +++ b/fs/ocfs2/quota_global.c @@ -273,7 +273,7 @@ ssize_t ocfs2_quota_write(struct super_block *sb, int type, if (new) memset(bh->b_data, 0, sb->s_blocksize); memcpy(bh->b_data + offset, data, len); - flush_dcache_page(bh->b_page); + flush_dcache_folio(bh->b_folio); set_buffer_uptodate(bh); unlock_buffer(bh); ocfs2_set_buffer_uptodate(INODE_CACHE(gqinode), bh); -- cgit v1.2.3-59-g8ed1b From 318f05a057157c400a92f17c5f2fa3dd96f57c7e Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:41 +0100 Subject: reboot: replace __hw_protection_shutdown bool action parameter with an enum Patch series "reboot: support runtime configuration of emergency hw_protection action", v3. We currently leave the decision of whether to shutdown or reboot to protect hardware in an emergency situation to the individual drivers. This works out in some cases, where the driver detecting the critical failure has inside knowledge: It binds to the system management controller for example or is guided by hardware description that defines what to do. This is inadequate in the general case though as a driver reporting e.g. an imminent power failure can't know whether a shutdown or a reboot would be more appropriate for a given hardware platform. To address this, this series adds a hw_protection kernel parameter and sysfs toggle that can be used to change the action from the shutdown default to reboot. A new hw_protection_trigger API then makes use of this default action. My particular use case is unattended embedded systems that don't have support for shutdown and that power on automatically when power is supplied: - A brief power cycle gets detected by the driver - The kernel powers down the system and SoC goes into shutdown mode - Power is restored - The system remains oblivious to the restored power - System needs to be manually power cycled for a duration long enough to drain the capacitors With this series, such systems can configure the kernel with hw_protection=reboot to have the boot firmware worry about critical conditions. This patch (of 12): Currently __hw_protection_shutdown() either reboots or shuts down the system according to its shutdown argument. To make the logic easier to follow, both inside __hw_protection_shutdown and at caller sites, lets replace the bool parameter with an enum. This will be extra useful, when in a later commit, a third action is added to the enumeration. No functional change. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-0-e1c09b090c0c@pengutronix.de Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-1-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Cc: Benson Leung Cc: Mark Brown Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- include/linux/reboot.h | 18 +++++++++++++++--- kernel/reboot.c | 14 ++++++-------- 2 files changed, 21 insertions(+), 11 deletions(-) diff --git a/include/linux/reboot.h b/include/linux/reboot.h index abcdde4df697..e97f6b8e8586 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -177,16 +177,28 @@ void ctrl_alt_del(void); extern void orderly_poweroff(bool force); extern void orderly_reboot(void); -void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shutdown); + +/** + * enum hw_protection_action - Hardware protection action + * + * @HWPROT_ACT_SHUTDOWN: + * The system should be shut down (powered off) for HW protection. + * @HWPROT_ACT_REBOOT: + * The system should be rebooted for HW protection. + */ +enum hw_protection_action { HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT }; + +void __hw_protection_shutdown(const char *reason, int ms_until_forced, + enum hw_protection_action action); static inline void hw_protection_reboot(const char *reason, int ms_until_forced) { - __hw_protection_shutdown(reason, ms_until_forced, false); + __hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_REBOOT); } static inline void hw_protection_shutdown(const char *reason, int ms_until_forced) { - __hw_protection_shutdown(reason, ms_until_forced, true); + __hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN); } /* diff --git a/kernel/reboot.c b/kernel/reboot.c index b5a8569e5d81..b20b53f08648 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -983,10 +983,7 @@ static void hw_failure_emergency_poweroff(int poweroff_delay_ms) * @ms_until_forced: Time to wait for orderly shutdown or reboot before * triggering it. Negative value disables the forced * shutdown or reboot. - * @shutdown: If true, indicates that a shutdown will happen - * after the critical tempeature is reached. - * If false, indicates that a reboot will happen - * after the critical tempeature is reached. + * @action: The hardware protection action to be taken. * * Initiate an emergency system shutdown or reboot in order to protect * hardware from further damage. Usage examples include a thermal protection. @@ -994,7 +991,8 @@ static void hw_failure_emergency_poweroff(int poweroff_delay_ms) * pending even if the previous request has given a large timeout for forced * shutdown/reboot. */ -void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shutdown) +void __hw_protection_shutdown(const char *reason, int ms_until_forced, + enum hw_protection_action action) { static atomic_t allow_proceed = ATOMIC_INIT(1); @@ -1009,10 +1007,10 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shut * orderly_poweroff failure */ hw_failure_emergency_poweroff(ms_until_forced); - if (shutdown) - orderly_poweroff(true); - else + if (action == HWPROT_ACT_REBOOT) orderly_reboot(); + else + orderly_poweroff(true); } EXPORT_SYMBOL_GPL(__hw_protection_shutdown); -- cgit v1.2.3-59-g8ed1b From bbf0ec4f57e0a34983ca25f00ec90f4765572d78 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:42 +0100 Subject: reboot: reboot, not shutdown, on hw_protection_reboot timeout hw_protection_shutdown() will kick off an orderly shutdown and if that takes longer than a configurable amount of time, an emergency shutdown will occur. Recently, hw_protection_reboot() was added for those systems that don't implement a proper shutdown and are better served by rebooting and having the boot firmware worry about doing something about the critical condition. On timeout of the orderly reboot of hw_protection_reboot(), the system would go into shutdown, instead of reboot. This is not a good idea, as going into shutdown was explicitly not asked for. Fix this by always doing an emergency reboot if hw_protection_reboot() is called and the orderly reboot takes too long. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-2-e1c09b090c0c@pengutronix.de Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()") Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Reviewed-by: Matti Vaittinen Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 49 insertions(+), 21 deletions(-) diff --git a/kernel/reboot.c b/kernel/reboot.c index b20b53f08648..f348f1ba9e22 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -932,48 +932,76 @@ void orderly_reboot(void) } EXPORT_SYMBOL_GPL(orderly_reboot); +static const char *hw_protection_action_str(enum hw_protection_action action) +{ + switch (action) { + case HWPROT_ACT_SHUTDOWN: + return "shutdown"; + case HWPROT_ACT_REBOOT: + return "reboot"; + default: + return "undefined"; + } +} + +static enum hw_protection_action hw_failure_emergency_action; + /** - * hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay - * @work: work_struct associated with the emergency poweroff function + * hw_failure_emergency_action_func - emergency action work after a known delay + * @work: work_struct associated with the emergency action function * * This function is called in very critical situations to force - * a kernel poweroff after a configurable timeout value. + * a kernel poweroff or reboot after a configurable timeout value. */ -static void hw_failure_emergency_poweroff_func(struct work_struct *work) +static void hw_failure_emergency_action_func(struct work_struct *work) { + const char *action_str = hw_protection_action_str(hw_failure_emergency_action); + + pr_emerg("Hardware protection timed-out. Trying forced %s\n", + action_str); + /* - * We have reached here after the emergency shutdown waiting period has - * expired. This means orderly_poweroff has not been able to shut off - * the system for some reason. + * We have reached here after the emergency action waiting period has + * expired. This means orderly_poweroff/reboot has not been able to + * shut off the system for some reason. * - * Try to shut down the system immediately using kernel_power_off - * if populated + * Try to shut off the system immediately if possible */ - pr_emerg("Hardware protection timed-out. Trying forced poweroff\n"); - kernel_power_off(); + + if (hw_failure_emergency_action == HWPROT_ACT_REBOOT) + kernel_restart(NULL); + else + kernel_power_off(); /* * Worst of the worst case trigger emergency restart */ - pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n"); + pr_emerg("Hardware protection %s failed. Trying emergency restart\n", + action_str); emergency_restart(); } -static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work, - hw_failure_emergency_poweroff_func); +static DECLARE_DELAYED_WORK(hw_failure_emergency_action_work, + hw_failure_emergency_action_func); /** - * hw_failure_emergency_poweroff - Trigger an emergency system poweroff + * hw_failure_emergency_schedule - Schedule an emergency system shutdown or reboot + * + * @action: The hardware protection action to be taken + * @action_delay_ms: Time in milliseconds to elapse before triggering action * * This may be called from any critical situation to trigger a system shutdown - * after a given period of time. If time is negative this is not scheduled. + * or reboot after a given period of time. + * If time is negative this is not scheduled. */ -static void hw_failure_emergency_poweroff(int poweroff_delay_ms) +static void hw_failure_emergency_schedule(enum hw_protection_action action, + int action_delay_ms) { - if (poweroff_delay_ms <= 0) + if (action_delay_ms <= 0) return; - schedule_delayed_work(&hw_failure_emergency_poweroff_work, - msecs_to_jiffies(poweroff_delay_ms)); + hw_failure_emergency_action = action; + schedule_delayed_work(&hw_failure_emergency_action_work, + msecs_to_jiffies(action_delay_ms)); } /** @@ -1006,7 +1034,7 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced, * Queue a backup emergency shutdown in the event of * orderly_poweroff failure */ - hw_failure_emergency_poweroff(ms_until_forced); + hw_failure_emergency_schedule(action, ms_until_forced); if (action == HWPROT_ACT_REBOOT) orderly_reboot(); else -- cgit v1.2.3-59-g8ed1b From 06eaeaee5b780106c3578bfb9ff2babc19ccffb7 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:43 +0100 Subject: docs: thermal: sync hardware protection doc with code Originally, the thermal framework's only hardware protection action was to trigger a shutdown. This has been changed a little over a year ago to also support rebooting as alternative hardware protection action. Update the documentation to reflect this. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-3-e1c09b090c0c@pengutronix.de Fixes: 62e79e38b257 ("thermal/thermal_of: Allow rebooting after critical temp") Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Reviewed-by: Matti Vaittinen Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- Documentation/driver-api/thermal/sysfs-api.rst | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst index c803b89b7248..f73de211bdce 100644 --- a/Documentation/driver-api/thermal/sysfs-api.rst +++ b/Documentation/driver-api/thermal/sysfs-api.rst @@ -413,18 +413,21 @@ This function serves as an arbitrator to set the state of a cooling device. It sets the cooling device to the deepest cooling state if possible. -5. thermal_emergency_poweroff -============================= +5. Critical Events +================== + +On an event of critical trip temperature crossing, the thermal framework +will trigger a hardware protection power-off (shutdown) or reboot, +depending on configuration. -On an event of critical trip temperature crossing the thermal framework -shuts down the system by calling hw_protection_shutdown(). The -hw_protection_shutdown() first attempts to perform an orderly shutdown -but accepts a delay after which it proceeds doing a forced power-off -or as last resort an emergency_restart. +At first, the kernel will attempt an orderly power-off or reboot, but +accepts a delay after which it proceeds to do a forced power-off or +reboot, respectively. If this fails, ``emergency_restart()`` is invoked +as last resort. The delay should be carefully profiled so as to give adequate time for -orderly poweroff. +orderly power-off or reboot. -If the delay is set to 0 emergency poweroff will not be supported. So a -carefully profiled non-zero positive value is a must for emergency -poweroff to be triggered. +If the delay is set to 0, the emergency action will not be supported. So a +carefully profiled non-zero positive value is a must for the emergency +action to be triggered. -- cgit v1.2.3-59-g8ed1b From aafb1245b2feaeab50d5e20d5a76f7c00bc736a0 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:44 +0100 Subject: reboot: describe do_kernel_restart's cmd argument in kernel-doc A W=1 build rightfully complains about the function's kernel-doc being incomplete. Describe its single parameter to fix this. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-4-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- kernel/reboot.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/reboot.c b/kernel/reboot.c index f348f1ba9e22..6185cfe5d4ee 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -229,6 +229,9 @@ EXPORT_SYMBOL(unregister_restart_handler); /** * do_kernel_restart - Execute kernel restart handler call chain * + * @cmd: pointer to buffer containing command to execute for restart + * or %NULL + * * Calls functions registered with register_restart_handler. * * Expected to be called from machine_restart as last step of the restart -- cgit v1.2.3-59-g8ed1b From 81cab0f94ab864f13e29e561382d722daec6a55a Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:45 +0100 Subject: reboot: rename now misleading __hw_protection_shutdown symbols The __hw_protection_shutdown function name has become misleading since it can cause either a shutdown (poweroff) or a reboot depending on its argument. To avoid further confusion, let's rename it, so it doesn't suggest that a poweroff is all it can do. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-5-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- include/linux/reboot.h | 8 ++++---- kernel/reboot.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/reboot.h b/include/linux/reboot.h index e97f6b8e8586..53c64e31b3cf 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -188,17 +188,17 @@ extern void orderly_reboot(void); */ enum hw_protection_action { HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT }; -void __hw_protection_shutdown(const char *reason, int ms_until_forced, - enum hw_protection_action action); +void __hw_protection_trigger(const char *reason, int ms_until_forced, + enum hw_protection_action action); static inline void hw_protection_reboot(const char *reason, int ms_until_forced) { - __hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_REBOOT); + __hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_REBOOT); } static inline void hw_protection_shutdown(const char *reason, int ms_until_forced) { - __hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN); + __hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN); } /* diff --git a/kernel/reboot.c b/kernel/reboot.c index 6185cfe5d4ee..c1f11d5e085e 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -1008,7 +1008,7 @@ static void hw_failure_emergency_schedule(enum hw_protection_action action, } /** - * __hw_protection_shutdown - Trigger an emergency system shutdown or reboot + * __hw_protection_trigger - Trigger an emergency system shutdown or reboot * * @reason: Reason of emergency shutdown or reboot to be printed. * @ms_until_forced: Time to wait for orderly shutdown or reboot before @@ -1022,8 +1022,8 @@ static void hw_failure_emergency_schedule(enum hw_protection_action action, * pending even if the previous request has given a large timeout for forced * shutdown/reboot. */ -void __hw_protection_shutdown(const char *reason, int ms_until_forced, - enum hw_protection_action action) +void __hw_protection_trigger(const char *reason, int ms_until_forced, + enum hw_protection_action action) { static atomic_t allow_proceed = ATOMIC_INIT(1); @@ -1043,7 +1043,7 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced, else orderly_poweroff(true); } -EXPORT_SYMBOL_GPL(__hw_protection_shutdown); +EXPORT_SYMBOL_GPL(__hw_protection_trigger); static int __init reboot_setup(char *str) { -- cgit v1.2.3-59-g8ed1b From 96201a8a984658169dfa23507b6e75311c9975a8 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:46 +0100 Subject: reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown It currently depends on the caller, whether we attempt a hardware protection shutdown (poweroff) or a reboot. A follow-up commit will make this partially user-configurable, so it's a good idea to have the emergency message clearly state whether the kernel is going for a reboot or a shutdown. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-6-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- kernel/reboot.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/reboot.c b/kernel/reboot.c index c1f11d5e085e..faf1ff422634 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -1027,7 +1027,8 @@ void __hw_protection_trigger(const char *reason, int ms_until_forced, { static atomic_t allow_proceed = ATOMIC_INIT(1); - pr_emerg("HARDWARE PROTECTION shutdown (%s)\n", reason); + pr_emerg("HARDWARE PROTECTION %s (%s)\n", + hw_protection_action_str(action), reason); /* Shutdown should be initiated only once. */ if (!atomic_dec_and_test(&allow_proceed)) -- cgit v1.2.3-59-g8ed1b From e016173f656b89f5f71b7d45dfc599ada8107eef Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:47 +0100 Subject: reboot: add support for configuring emergency hardware protection action We currently leave the decision of whether to shutdown or reboot to protect hardware in an emergency situation to the individual drivers. This works out in some cases, where the driver detecting the critical failure has inside knowledge: It binds to the system management controller for example or is guided by hardware description that defines what to do. In the general case, however, the driver detecting the issue can't know what the appropriate course of action is and shouldn't be dictating the policy of dealing with it. Therefore, add a global hw_protection toggle that allows the user to specify whether shutdown or reboot should be the default action when the driver doesn't set policy. This introduces no functional change yet as hw_protection_trigger() has no callers, but these will be added in subsequent commits. [arnd@arndb.de: hide unused hw_protection_attr] Link: https://lkml.kernel.org/r/20250224141849.1546019-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-7-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- Documentation/ABI/testing/sysfs-kernel-reboot | 8 +++++ Documentation/admin-guide/kernel-parameters.txt | 6 ++++ include/linux/reboot.h | 22 +++++++++++- include/uapi/linux/capability.h | 1 + kernel/reboot.c | 48 +++++++++++++++++++++++++ 5 files changed, 84 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-reboot b/Documentation/ABI/testing/sysfs-kernel-reboot index 837330fb2511..e117aba46be0 100644 --- a/Documentation/ABI/testing/sysfs-kernel-reboot +++ b/Documentation/ABI/testing/sysfs-kernel-reboot @@ -30,3 +30,11 @@ KernelVersion: 5.11 Contact: Matteo Croce Description: Don't wait for any other CPUs on reboot and avoid anything that could hang. + +What: /sys/kernel/reboot/hw_protection +Date: April 2025 +KernelVersion: 6.15 +Contact: Ahmad Fatoum +Description: Hardware protection action taken on critical events like + overtemperature or imminent voltage loss. + Valid values are: reboot shutdown diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fb8752b42ec8..b2f04967876f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1933,6 +1933,12 @@ which allow the hypervisor to 'idle' the guest on lock contention. + hw_protection= [HW] + Format: reboot | shutdown + + Hardware protection action taken on critical events like + overtemperature or imminent voltage loss. + i2c_bus= [HW] Override the default board specific I2C bus speed or register an additional I2C bus that is not registered from board initialization code. diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 53c64e31b3cf..79e02876f2ba 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -181,16 +181,36 @@ extern void orderly_reboot(void); /** * enum hw_protection_action - Hardware protection action * + * @HWPROT_ACT_DEFAULT: + * The default action should be taken. This is HWPROT_ACT_SHUTDOWN + * by default, but can be overridden. * @HWPROT_ACT_SHUTDOWN: * The system should be shut down (powered off) for HW protection. * @HWPROT_ACT_REBOOT: * The system should be rebooted for HW protection. */ -enum hw_protection_action { HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT }; +enum hw_protection_action { HWPROT_ACT_DEFAULT, HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT }; void __hw_protection_trigger(const char *reason, int ms_until_forced, enum hw_protection_action action); +/** + * hw_protection_trigger - Trigger default emergency system hardware protection action + * + * @reason: Reason of emergency shutdown or reboot to be printed. + * @ms_until_forced: Time to wait for orderly shutdown or reboot before + * triggering it. Negative value disables the forced + * shutdown or reboot. + * + * Initiate an emergency system shutdown or reboot in order to protect + * hardware from further damage. The exact action taken is controllable at + * runtime and defaults to shutdown. + */ +static inline void hw_protection_trigger(const char *reason, int ms_until_forced) +{ + __hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_DEFAULT); +} + static inline void hw_protection_reboot(const char *reason, int ms_until_forced) { __hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_REBOOT); diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index 5bb906098697..2e21b5594f81 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -275,6 +275,7 @@ struct vfs_ns_cap_data { /* Allow setting encryption key on loopback filesystem */ /* Allow setting zone reclaim policy */ /* Allow everything under CAP_BPF and CAP_PERFMON for backward compatibility */ +/* Allow setting hardware protection emergency action */ #define CAP_SYS_ADMIN 21 diff --git a/kernel/reboot.c b/kernel/reboot.c index faf1ff422634..2d6a06fe6c66 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -36,6 +36,8 @@ enum reboot_mode reboot_mode DEFAULT_REBOOT_MODE; EXPORT_SYMBOL_GPL(reboot_mode); enum reboot_mode panic_reboot_mode = REBOOT_UNDEFINED; +static enum hw_protection_action hw_protection_action = HWPROT_ACT_SHUTDOWN; + /* * This variable is used privately to keep track of whether or not * reboot_type is still set to its default value (i.e., reboot= hasn't @@ -1027,6 +1029,9 @@ void __hw_protection_trigger(const char *reason, int ms_until_forced, { static atomic_t allow_proceed = ATOMIC_INIT(1); + if (action == HWPROT_ACT_DEFAULT) + action = hw_protection_action; + pr_emerg("HARDWARE PROTECTION %s (%s)\n", hw_protection_action_str(action), reason); @@ -1046,6 +1051,48 @@ void __hw_protection_trigger(const char *reason, int ms_until_forced, } EXPORT_SYMBOL_GPL(__hw_protection_trigger); +static bool hw_protection_action_parse(const char *str, + enum hw_protection_action *action) +{ + if (sysfs_streq(str, "shutdown")) + *action = HWPROT_ACT_SHUTDOWN; + else if (sysfs_streq(str, "reboot")) + *action = HWPROT_ACT_REBOOT; + else + return false; + + return true; +} + +static int __init hw_protection_setup(char *str) +{ + hw_protection_action_parse(str, &hw_protection_action); + return 1; +} +__setup("hw_protection=", hw_protection_setup); + +#ifdef CONFIG_SYSFS +static ssize_t hw_protection_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", + hw_protection_action_str(hw_protection_action)); +} +static ssize_t hw_protection_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, + size_t count) +{ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (!hw_protection_action_parse(buf, &hw_protection_action)) + return -EINVAL; + + return count; +} +static struct kobj_attribute hw_protection_attr = __ATTR_RW(hw_protection); +#endif + static int __init reboot_setup(char *str) { for (;;) { @@ -1305,6 +1352,7 @@ static struct kobj_attribute reboot_cpu_attr = __ATTR_RW(cpu); #endif static struct attribute *reboot_attrs[] = { + &hw_protection_attr.attr, &reboot_mode_attr.attr, #ifdef CONFIG_X86 &reboot_force_attr.attr, -- cgit v1.2.3-59-g8ed1b From b15388a2f0195c2921d80cb16eab91b8802af2b8 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:48 +0100 Subject: regulator: allow user configuration of hardware protection action When the core detects permanent regulator hardware failure or imminent power failure of a critical supply, it will call hw_protection_shutdown in an attempt to do a limited orderly shutdown followed by powering off the system. This doesn't work out well for many unattended embedded systems that don't have support for shutdown and that power on automatically when power is supplied: - A brief power cycle gets detected by the driver - The kernel powers down the system and SoC goes into shutdown mode - Power is restored - The system remains oblivious to the restored power - System needs to be manually power cycled for a duration long enough to drain the capacitors Allow users to fix this by calling the newly introduced hw_protection_trigger() instead: This way the hw_protection commandline or sysfs parameter is used to dictate the policy of dealing with the regulator fault. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-8-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Reviewed-by: Matti Vaittinen Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- drivers/regulator/core.c | 4 ++-- drivers/regulator/irq_helpers.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 4ddf0efead68..280559509dcf 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -5262,8 +5262,8 @@ static void regulator_handle_critical(struct regulator_dev *rdev, if (!reason) return; - hw_protection_shutdown(reason, - rdev->constraints->uv_less_critical_window_ms); + hw_protection_trigger(reason, + rdev->constraints->uv_less_critical_window_ms); } /** diff --git a/drivers/regulator/irq_helpers.c b/drivers/regulator/irq_helpers.c index 0aa188b2bbb2..5742faee8071 100644 --- a/drivers/regulator/irq_helpers.c +++ b/drivers/regulator/irq_helpers.c @@ -64,16 +64,16 @@ static void regulator_notifier_isr_work(struct work_struct *work) reread: if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) { if (!d->die) - return hw_protection_shutdown("Regulator HW failure? - no IC recovery", - REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); + return hw_protection_trigger("Regulator HW failure? - no IC recovery", + REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); ret = d->die(rid); /* * If the 'last resort' IC recovery failed we will have * nothing else left to do... */ if (ret) - return hw_protection_shutdown("Regulator HW failure. IC recovery failed", - REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); + return hw_protection_trigger("Regulator HW failure. IC recovery failed", + REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); /* * If h->die() was implemented we assume recovery has been @@ -263,14 +263,14 @@ fail_out: if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) { /* If we have no recovery, just try shut down straight away */ if (!d->die) { - hw_protection_shutdown("Regulator failure. Retry count exceeded", - REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); + hw_protection_trigger("Regulator failure. Retry count exceeded", + REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); } else { ret = d->die(rid); /* If die() failed shut down as a last attempt to save the HW */ if (ret) - hw_protection_shutdown("Regulator failure. Recovery failed", - REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); + hw_protection_trigger("Regulator failure. Recovery failed", + REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS); } } -- cgit v1.2.3-59-g8ed1b From cd9beb9eb395a0df631e5b1d1314b4f7f1e8579f Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:49 +0100 Subject: platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal In the general case, a driver doesn't know which of system shutdown or reboot is the better action to take to protect hardware in an emergency situation. For this reason, hw_protection_shutdown is going to be removed in favor of hw_protection_trigger, which defaults to shutdown, but may be configured at kernel runtime to be a reboot instead. The ChromeOS EC situation is different as we do know that shutdown is the correct action as the EC is programmed to force reset after the short period, thus replace hw_protection_shutdown with __hw_protection_trigger with HWPROT_ACT_SHUTDOWN as argument to maintain the same behavior. No functional change. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-9-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Acked-by: Tzung-Bi Shih Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- drivers/platform/chrome/cros_ec_lpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/platform/chrome/cros_ec_lpc.c b/drivers/platform/chrome/cros_ec_lpc.c index 5a2f1d98b350..0b723c1e435c 100644 --- a/drivers/platform/chrome/cros_ec_lpc.c +++ b/drivers/platform/chrome/cros_ec_lpc.c @@ -454,7 +454,7 @@ static void cros_ec_lpc_acpi_notify(acpi_handle device, u32 value, void *data) blocking_notifier_call_chain(&ec_dev->panic_notifier, 0, ec_dev); kobject_uevent_env(&ec_dev->dev->kobj, KOBJ_CHANGE, (char **)env); /* Begin orderly shutdown. EC will force reset after a short period. */ - hw_protection_shutdown("CrOS EC Panic", -1); + __hw_protection_trigger("CrOS EC Panic", -1, HWPROT_ACT_SHUTDOWN); /* Do not query for other events after a panic is reported */ return; } -- cgit v1.2.3-59-g8ed1b From 738b7856933de7b9775ea7bf6a72a1002dbd2e12 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:50 +0100 Subject: dt-bindings: thermal: give OS some leeway in absence of critical-action An operating system may allow its user to configure the action to be undertaken on critical overtemperature events. However, the bindings currently mandate an absence of the critical-action property to be equal to critical-action = "shutdown", which would mean any differing user configuration would violate the bindings. Resolve this by documenting the absence of the property to mean that the OS gets to decide. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-10-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Acked-by: Rob Herring (Arm) Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Cc: Tzung-Bi Shih Signed-off-by: Andrew Morton --- Documentation/devicetree/bindings/thermal/thermal-zones.yaml | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml index 0f435be1dbd8..0de0a9757ccc 100644 --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml @@ -82,9 +82,8 @@ patternProperties: $ref: /schemas/types.yaml#/definitions/string description: | The action the OS should perform after the critical temperature is reached. - By default the system will shutdown as a safe action to prevent damage - to the hardware, if the property is not set. - The shutdown action should be always the default and preferred one. + If the property is not set, it is up to the system to select the correct + action. The recommended and preferred default is shutdown. Choose 'reboot' with care, as the hardware may be in thermal stress, thus leading to infinite reboots that may cause damage to the hardware. Make sure the firmware/bootloader will act as the last resort and take -- cgit v1.2.3-59-g8ed1b From 941a07cad2fdfe79e768936fb0f2652e4deb1766 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:51 +0100 Subject: thermal: core: allow user configuration of hardware protection action In the general case, we don't know which of system shutdown or reboot is the better action to take to protect hardware in an emergency situation. We thus allow the policy to come from the device-tree in the form of an optional critical-action OF property, but so far there was no way for the end user to configure this. With recent addition of the hw_protection parameter, the user can now choose a default action for the case, where the driver isn't fully sure what's the better course of action. Let's make use of this by passing HWPROT_ACT_DEFAULT in absence of the critical-action OF property. As HWPROT_ACT_DEFAULT is shutdown by default, this introduces no functional change for users, unless they start using the new parameter. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-11-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- drivers/thermal/thermal_core.c | 17 ++++++++++------- drivers/thermal/thermal_core.h | 1 + drivers/thermal/thermal_of.c | 7 +++++-- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 2328ac0d8561..5847729419f2 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -369,7 +369,8 @@ void thermal_governor_update_tz(struct thermal_zone_device *tz, tz->governor->update_tz(tz, reason); } -static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdown) +static void thermal_zone_device_halt(struct thermal_zone_device *tz, + enum hw_protection_action action) { /* * poweroff_delay_ms must be a carefully profiled positive value. @@ -380,21 +381,23 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type); - if (shutdown) - hw_protection_shutdown(msg, poweroff_delay_ms); - else - hw_protection_reboot(msg, poweroff_delay_ms); + __hw_protection_trigger(msg, poweroff_delay_ms, action); } void thermal_zone_device_critical(struct thermal_zone_device *tz) { - thermal_zone_device_halt(tz, true); + thermal_zone_device_halt(tz, HWPROT_ACT_DEFAULT); } EXPORT_SYMBOL(thermal_zone_device_critical); +void thermal_zone_device_critical_shutdown(struct thermal_zone_device *tz) +{ + thermal_zone_device_halt(tz, HWPROT_ACT_SHUTDOWN); +} + void thermal_zone_device_critical_reboot(struct thermal_zone_device *tz) { - thermal_zone_device_halt(tz, false); + thermal_zone_device_halt(tz, HWPROT_ACT_REBOOT); } static void handle_critical_trips(struct thermal_zone_device *tz, diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h index 09866f0ce765..bdadd141aa24 100644 --- a/drivers/thermal/thermal_core.h +++ b/drivers/thermal/thermal_core.h @@ -262,6 +262,7 @@ int thermal_build_list_of_policies(char *buf); void __thermal_zone_device_update(struct thermal_zone_device *tz, enum thermal_notify_event event); void thermal_zone_device_critical_reboot(struct thermal_zone_device *tz); +void thermal_zone_device_critical_shutdown(struct thermal_zone_device *tz); void thermal_governor_update_tz(struct thermal_zone_device *tz, enum thermal_notify_event reason); diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c index 5401f03d6b6c..ce9260cc30f7 100644 --- a/drivers/thermal/thermal_of.c +++ b/drivers/thermal/thermal_of.c @@ -405,9 +405,12 @@ static struct thermal_zone_device *thermal_of_zone_register(struct device_node * of_ops.should_bind = thermal_of_should_bind; ret = of_property_read_string(np, "critical-action", &action); - if (!ret) - if (!of_ops.critical && !strcasecmp(action, "reboot")) + if (!ret && !of_ops.critical) { + if (!strcasecmp(action, "reboot")) of_ops.critical = thermal_zone_device_critical_reboot; + else if (!strcasecmp(action, "shutdown")) + of_ops.critical = thermal_zone_device_critical_shutdown; + } tz = thermal_zone_device_register_with_trips(np->name, trips, ntrips, data, &of_ops, &tzp, -- cgit v1.2.3-59-g8ed1b From 5e40d0d196595874c3f0606f0642021e6ade8054 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Mon, 17 Feb 2025 21:39:52 +0100 Subject: reboot: retire hw_protection_reboot and hw_protection_shutdown helpers The hw_protection_reboot and hw_protection_shutdown functions mix mechanism with policy: They let the driver requesting an emergency action for hardware protection also decide how to deal with it. This is inadequate in the general case as a driver reporting e.g. an imminent power failure can't know whether a shutdown or a reboot would be more appropriate for a given hardware platform. With the addition of the hw_protection parameter, it's now possible to configure at runtime the default emergency action and drivers are expected to use hw_protection_trigger to have this parameter dictate policy. As no current users of either hw_protection_shutdown or hw_protection_shutdown helpers remain, remove them, as not to tempt driver authors to call them. Existing users now either defer to hw_protection_trigger or call __hw_protection_trigger with a suitable argument directly when they have inside knowledge on whether a reboot or shutdown would be more appropriate. Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-12-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum Reviewed-by: Tzung-Bi Shih Cc: Benson Leung Cc: Daniel Lezcano Cc: Fabio Estevam Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Liam Girdwood Cc: Lukasz Luba Cc: Mark Brown Cc: Matteo Croce Cc: Matti Vaittinen Cc: "Rafael J. Wysocki" Cc: Rob Herring (Arm) Cc: Rui Zhang Cc: Sascha Hauer Cc: "Serge E. Hallyn" Signed-off-by: Andrew Morton --- include/linux/reboot.h | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 79e02876f2ba..aa08c3bbbf59 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -211,16 +211,6 @@ static inline void hw_protection_trigger(const char *reason, int ms_until_forced __hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_DEFAULT); } -static inline void hw_protection_reboot(const char *reason, int ms_until_forced) -{ - __hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_REBOOT); -} - -static inline void hw_protection_shutdown(const char *reason, int ms_until_forced) -{ - __hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN); -} - /* * Emergency restart, callable from an interrupt handler. */ -- cgit v1.2.3-59-g8ed1b From 0ac451ecec6dd516fdac1ca064467e753ef6ae1a Mon Sep 17 00:00:00 2001 From: Kuan-Wei Chiu Date: Sun, 16 Feb 2025 00:56:18 +0800 Subject: lib min_heap: use size_t for array size and index variables Replace the int type with size_t for variables representing array sizes and indices in the min-heap implementation. Using size_t aligns with standard practices for size-related variables and avoids potential issues on platforms where int may be insufficient to represent all valid sizes or indices. Link: https://lkml.kernel.org/r/20250215165618.1757219-1-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu Cc: Ching-Chun (Jim) Huang Cc: Yu-Chun Lin Signed-off-by: Andrew Morton --- include/linux/min_heap.h | 12 ++++++------ lib/min_heap.c | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index 1160bed6579e..79ddc0adbf2b 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -218,7 +218,7 @@ static size_t parent(size_t i, unsigned int lsbit, size_t size) /* Initialize a min-heap. */ static __always_inline -void __min_heap_init_inline(min_heap_char *heap, void *data, int size) +void __min_heap_init_inline(min_heap_char *heap, void *data, size_t size) { heap->nr = 0; heap->size = size; @@ -254,7 +254,7 @@ bool __min_heap_full_inline(min_heap_char *heap) /* Sift the element at pos down the heap. */ static __always_inline -void __min_heap_sift_down_inline(min_heap_char *heap, int pos, size_t elem_size, +void __min_heap_sift_down_inline(min_heap_char *heap, size_t pos, size_t elem_size, const struct min_heap_callbacks *func, void *args) { const unsigned long lsbit = elem_size & -elem_size; @@ -324,7 +324,7 @@ static __always_inline void __min_heapify_all_inline(min_heap_char *heap, size_t elem_size, const struct min_heap_callbacks *func, void *args) { - int i; + ssize_t i; for (i = heap->nr / 2 - 1; i >= 0; i--) __min_heap_sift_down_inline(heap, i, elem_size, func, args); @@ -379,7 +379,7 @@ bool __min_heap_push_inline(min_heap_char *heap, const void *element, size_t ele const struct min_heap_callbacks *func, void *args) { void *data = heap->data; - int pos; + size_t pos; if (WARN_ONCE(heap->nr >= heap->size, "Pushing on a full heap")) return false; @@ -428,10 +428,10 @@ bool __min_heap_del_inline(min_heap_char *heap, size_t elem_size, size_t idx, __min_heap_del_inline(container_of(&(_heap)->nr, min_heap_char, nr), \ __minheap_obj_size(_heap), _idx, _func, _args) -void __min_heap_init(min_heap_char *heap, void *data, int size); +void __min_heap_init(min_heap_char *heap, void *data, size_t size); void *__min_heap_peek(struct min_heap_char *heap); bool __min_heap_full(min_heap_char *heap); -void __min_heap_sift_down(min_heap_char *heap, int pos, size_t elem_size, +void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, const struct min_heap_callbacks *func, void *args); void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args); diff --git a/lib/min_heap.c b/lib/min_heap.c index 4485372ff3b1..96f01a4c5fb6 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -2,7 +2,7 @@ #include #include -void __min_heap_init(min_heap_char *heap, void *data, int size) +void __min_heap_init(min_heap_char *heap, void *data, size_t size) { __min_heap_init_inline(heap, data, size); } @@ -20,7 +20,7 @@ bool __min_heap_full(min_heap_char *heap) } EXPORT_SYMBOL(__min_heap_full); -void __min_heap_sift_down(min_heap_char *heap, int pos, size_t elem_size, +void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, const struct min_heap_callbacks *func, void *args) { __min_heap_sift_down_inline(heap, pos, elem_size, func, args); -- cgit v1.2.3-59-g8ed1b From fa34ce0ad9417705e718343462ce010d85d70c9c Mon Sep 17 00:00:00 2001 From: Jarkko Sakkinen Date: Tue, 18 Feb 2025 19:05:55 +0200 Subject: mailmap: remove never used @parity.io email As this employment lasted only four months and was never used over here, let's just rip it off for good. Link: https://lkml.kernel.org/r/20250218170557.68371-1-jarkko@kernel.org Signed-off-by: Jarkko Sakkinen Cc: Alex Elder Cc: Antonio Quartulli Cc: Eugen Hristev Cc: Greg Kroah-Hartman Cc: Kees Cook Cc: Mathieu Othacehe Cc: Naoya Horiguchi Cc: Quentin Monnet Cc: Satya Priya Kakitapalli Cc: Simon Wunderlich Signed-off-by: Andrew Morton --- .mailmap | 1 - 1 file changed, 1 deletion(-) diff --git a/.mailmap b/.mailmap index 0e373e743495..87e42ee58685 100644 --- a/.mailmap +++ b/.mailmap @@ -302,7 +302,6 @@ Jan Glauber Jan Kuliga Jarkko Sakkinen Jarkko Sakkinen -Jarkko Sakkinen Jason Gunthorpe Jason Gunthorpe Jason Gunthorpe -- cgit v1.2.3-59-g8ed1b From ee87af982405baec02a188f41f1a141eb3fbdf01 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 19 Feb 2025 16:17:02 +0900 Subject: MAINTAINERS: mailmap: update Hyeonggon's name and email address Update my Korean name and personal email address to my English name and Oracle email address. Link: https://lkml.kernel.org/r/20250219071702.964344-1-42.hyeyoo@gmail.com Signed-off-by: Hyeonggon Yoo (Oracle) <42.hyeyoo@gmail.com> Acked-by: Lorenzo Stoakes Cc: Dev Jain Signed-off-by: Andrew Morton --- .mailmap | 1 + MAINTAINERS | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index 87e42ee58685..f88e5e28ce00 100644 --- a/.mailmap +++ b/.mailmap @@ -269,6 +269,7 @@ Hamza Mahfooz Hanjun Guo Hans Verkuil Hans Verkuil +Harry Yoo <42.hyeyoo@gmail.com> Heiko Carstens Heiko Carstens Heiko Stuebner diff --git a/MAINTAINERS b/MAINTAINERS index a666d7e34fd3..7cd3650ca466 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21812,7 +21812,7 @@ M: Joonsoo Kim M: Andrew Morton M: Vlastimil Babka R: Roman Gushchin -R: Hyeonggon Yoo <42.hyeyoo@gmail.com> +R: Harry Yoo L: linux-mm@kvack.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git -- cgit v1.2.3-59-g8ed1b From 4022918876f9a28fe5379e2d7b7840e84f7b56ed Mon Sep 17 00:00:00 2001 From: Brendan Jackman Date: Thu, 20 Feb 2025 12:23:40 +0000 Subject: scripts/gdb: add $lx_per_cpu_ptr() We currently have $lx_per_cpu() which works fine for stuff that kernel code would access via per_cpu(). But this doesn't work for stuff that kernel code accesses via per_cpu_ptr(): (gdb) p $lx_per_cpu(node_data[1].node_zones[2]->per_cpu_pageset) Cannot access memory at address 0xffff11105fbd6c28 This is because we take the address of the pointer and use that as the offset, instead of using the stored value. Add a GDB version that mirrors the kernel API, which uses the pointer value. To be consistent with per_cpu_ptr(), we need to return the pointer value instead of dereferencing it for the user. Therefore, move the existing dereference out of the per_cpu() Python helper and do that only in the $lx_per_cpu() implementation. Link: https://lkml.kernel.org/r/20250220-lx-per-cpu-ptr-v2-1-945dee8d8d38@google.com Signed-off-by: Brendan Jackman Reviewed-by: Jan Kiszka Cc: Florian Rommel Cc: Kieran Bingham Signed-off-by: Andrew Morton --- scripts/gdb/linux/cpus.py | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/scripts/gdb/linux/cpus.py b/scripts/gdb/linux/cpus.py index 13eb8b3901b8..1a50a4195def 100644 --- a/scripts/gdb/linux/cpus.py +++ b/scripts/gdb/linux/cpus.py @@ -46,7 +46,7 @@ def per_cpu(var_ptr, cpu): # !CONFIG_SMP case offset = 0 pointer = var_ptr.cast(utils.get_long_type()) + offset - return pointer.cast(var_ptr.type).dereference() + return pointer.cast(var_ptr.type) cpu_mask = {} @@ -149,11 +149,29 @@ Note that VAR has to be quoted as string.""" super(PerCpu, self).__init__("lx_per_cpu") def invoke(self, var, cpu=-1): - return per_cpu(var.address, cpu) + return per_cpu(var.address, cpu).dereference() PerCpu() + +class PerCpuPtr(gdb.Function): + """Return per-cpu pointer. + +$lx_per_cpu_ptr("VAR"[, CPU]): Return the per-cpu pointer called VAR for the +given CPU number. If CPU is omitted, the CPU of the current context is used. +Note that VAR has to be quoted as string.""" + + def __init__(self): + super(PerCpuPtr, self).__init__("lx_per_cpu_ptr") + + def invoke(self, var, cpu=-1): + return per_cpu(var, cpu) + + +PerCpuPtr() + + def get_current_task(cpu): task_ptr_type = task_type.get_type().pointer() -- cgit v1.2.3-59-g8ed1b From 758502918e77323bf999a3eddd71466bf6135173 Mon Sep 17 00:00:00 2001 From: Zijun Hu Date: Fri, 21 Feb 2025 05:02:21 -0800 Subject: rhashtable: remove needless return in three void APIs Remove needless 'return' in the following void APIs: rhltable_walk_enter() rhltable_free_and_destroy() rhltable_destroy() Since both the API and callee involved are void functions. Link: https://lkml.kernel.org/r/20250221-rmv_return-v1-16-cc8dff275827@quicinc.com Signed-off-by: Zijun Hu Signed-off-by: Andrew Morton --- include/linux/rhashtable.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 8463a128e2f4..6c85b28ea30b 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -1259,7 +1259,7 @@ static inline int rhashtable_replace_fast( static inline void rhltable_walk_enter(struct rhltable *hlt, struct rhashtable_iter *iter) { - return rhashtable_walk_enter(&hlt->ht, iter); + rhashtable_walk_enter(&hlt->ht, iter); } /** @@ -1275,12 +1275,12 @@ static inline void rhltable_free_and_destroy(struct rhltable *hlt, void *arg), void *arg) { - return rhashtable_free_and_destroy(&hlt->ht, free_fn, arg); + rhashtable_free_and_destroy(&hlt->ht, free_fn, arg); } static inline void rhltable_destroy(struct rhltable *hlt) { - return rhltable_free_and_destroy(hlt, NULL, NULL); + rhltable_free_and_destroy(hlt, NULL, NULL); } #endif /* _LINUX_RHASHTABLE_H */ -- cgit v1.2.3-59-g8ed1b From ede7cd607a1d206b9579264a7405d78316b2d7fd Mon Sep 17 00:00:00 2001 From: Zijun Hu Date: Fri, 21 Feb 2025 05:02:07 -0800 Subject: cpu: remove needless return in void API suspend_enable_secondary_cpus() Remove needless 'return' in void API suspend_enable_secondary_cpus() since both the API and thaw_secondary_cpus() are void functions. Link: https://lkml.kernel.org/r/20250221-rmv_return-v1-2-cc8dff275827@quicinc.com Signed-off-by: Zijun Hu Signed-off-by: Andrew Morton --- include/linux/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 6a0a8f1c7c90..e3049543008b 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -148,7 +148,7 @@ static inline int suspend_disable_secondary_cpus(void) } static inline void suspend_enable_secondary_cpus(void) { - return thaw_secondary_cpus(); + thaw_secondary_cpus(); } #else /* !CONFIG_PM_SLEEP_SMP */ -- cgit v1.2.3-59-g8ed1b From 15c200eaf569f804ba684cefbae437a6c731ba95 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:15 +0000 Subject: coccinelle: misc: secs_to_jiffies: Patch expressions too Patch series "Converge on using secs_to_jiffies() part two", v3. This is the second series that converts users of msecs_to_jiffies() that either use the multiply pattern of either of: - msecs_to_jiffies(N*1000) or - msecs_to_jiffies(N*MSEC_PER_SEC) where N is a constant or an expression, to avoid the multiplication. The conversion is made with Coccinelle with the secs_to_jiffies() script in scripts/coccinelle/misc. Attention is paid to what the best change can be rather than restricting to what the tool provides. This patch (of 16): Teach the script to suggest conversions for timeout patterns where the arguments to msecs_to_jiffies() are expressions as well. Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-0-a43967e36c88@linux.microsoft.com Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-1-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Cc: Mark Brown Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Takashi Iwai Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Xiubo Li Cc: Carlos Maiolino Cc: Marc Kleine-Budde Signed-off-by: Andrew Morton --- scripts/coccinelle/misc/secs_to_jiffies.cocci | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/scripts/coccinelle/misc/secs_to_jiffies.cocci b/scripts/coccinelle/misc/secs_to_jiffies.cocci index 8bbb2884ea5d..416f348174ca 100644 --- a/scripts/coccinelle/misc/secs_to_jiffies.cocci +++ b/scripts/coccinelle/misc/secs_to_jiffies.cocci @@ -20,3 +20,13 @@ virtual patch - msecs_to_jiffies(C * MSEC_PER_SEC) + secs_to_jiffies(C) + +@depends on patch@ expression E; @@ + +- msecs_to_jiffies(E * 1000) ++ secs_to_jiffies(E) + +@depends on patch@ expression E; @@ + +- msecs_to_jiffies(E * MSEC_PER_SEC) ++ secs_to_jiffies(E) -- cgit v1.2.3-59-g8ed1b From 84c34d0105877836a1c000b97bd6025d20f390e8 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:16 +0000 Subject: scsi: lpfc: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies(E * 1000) +secs_to_jiffies(E) -msecs_to_jiffies(E * MSEC_PER_SEC) +secs_to_jiffies(E) While here, convert some timeouts that are denominated in seconds manually. [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-2-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/scsi/lpfc/lpfc.h | 3 +-- drivers/scsi/lpfc/lpfc_els.c | 11 +++++------ drivers/scsi/lpfc/lpfc_hbadisc.c | 2 +- drivers/scsi/lpfc/lpfc_init.c | 10 +++++----- drivers/scsi/lpfc/lpfc_scsi.c | 12 +++++------- drivers/scsi/lpfc/lpfc_sli.c | 41 ++++++++++++++++------------------------ drivers/scsi/lpfc/lpfc_vport.c | 2 +- 7 files changed, 34 insertions(+), 47 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index e5a9c5a323f8..19e227f8b398 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -74,8 +74,7 @@ struct lpfc_sli2_slim; * queue depths when there are driver resource error or Firmware * resource error. */ -/* 1 Second */ -#define QUEUE_RAMP_DOWN_INTERVAL (msecs_to_jiffies(1000 * 1)) +#define QUEUE_RAMP_DOWN_INTERVAL (secs_to_jiffies(1)) /* Number of exchanges reserved for discovery to complete */ #define LPFC_DISC_IOCB_BUFF_COUNT 20 diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c index 1d7db49a8fe4..fcf8186ad5e3 100644 --- a/drivers/scsi/lpfc/lpfc_els.c +++ b/drivers/scsi/lpfc/lpfc_els.c @@ -8045,8 +8045,7 @@ lpfc_els_rcv_rscn(struct lpfc_vport *vport, struct lpfc_iocbq *cmdiocb, if (test_bit(FC_DISC_TMO, &vport->fc_flag)) { tmo = ((phba->fc_ratov * 3) + 3); mod_timer(&vport->fc_disctmo, - jiffies + - msecs_to_jiffies(1000 * tmo)); + jiffies + secs_to_jiffies(tmo)); } return 0; } @@ -8081,7 +8080,7 @@ lpfc_els_rcv_rscn(struct lpfc_vport *vport, struct lpfc_iocbq *cmdiocb, if (test_bit(FC_DISC_TMO, &vport->fc_flag)) { tmo = ((phba->fc_ratov * 3) + 3); mod_timer(&vport->fc_disctmo, - jiffies + msecs_to_jiffies(1000 * tmo)); + jiffies + secs_to_jiffies(tmo)); } if ((rscn_cnt < FC_MAX_HOLD_RSCN) && !test_bit(FC_RSCN_DISCOVERY, &vport->fc_flag)) { @@ -9511,7 +9510,7 @@ lpfc_els_timeout_handler(struct lpfc_vport *vport) if (!list_empty(&pring->txcmplq)) if (!test_bit(FC_UNLOADING, &phba->pport->load_flag)) mod_timer(&vport->els_tmofunc, - jiffies + msecs_to_jiffies(1000 * timeout)); + jiffies + secs_to_jiffies(timeout)); } /** @@ -10899,7 +10898,7 @@ lpfc_do_scr_ns_plogi(struct lpfc_hba *phba, struct lpfc_vport *vport) "3334 Delay fc port discovery for %d secs\n", phba->fc_ratov); mod_timer(&vport->delayed_disc_tmo, - jiffies + msecs_to_jiffies(1000 * phba->fc_ratov)); + jiffies + secs_to_jiffies(phba->fc_ratov)); return; } @@ -11156,7 +11155,7 @@ lpfc_retry_pport_discovery(struct lpfc_hba *phba) if (!ndlp) return; - mod_timer(&ndlp->nlp_delayfunc, jiffies + msecs_to_jiffies(1000)); + mod_timer(&ndlp->nlp_delayfunc, jiffies + secs_to_jiffies(1)); set_bit(NLP_DELAY_TMO, &ndlp->nlp_flag); ndlp->nlp_last_elscmd = ELS_CMD_FLOGI; phba->pport->port_state = LPFC_FLOGI; diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c index 36e66df36a18..0c390d703ee3 100644 --- a/drivers/scsi/lpfc/lpfc_hbadisc.c +++ b/drivers/scsi/lpfc/lpfc_hbadisc.c @@ -4973,7 +4973,7 @@ lpfc_set_disctmo(struct lpfc_vport *vport) tmo, vport->port_state, vport->fc_flag); } - mod_timer(&vport->fc_disctmo, jiffies + msecs_to_jiffies(1000 * tmo)); + mod_timer(&vport->fc_disctmo, jiffies + secs_to_jiffies(tmo)); set_bit(FC_DISC_TMO, &vport->fc_flag); /* Start Discovery Timer state */ diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index bcadf11414c8..aaf4004454ab 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -595,7 +595,7 @@ lpfc_config_port_post(struct lpfc_hba *phba) /* Set up ring-0 (ELS) timer */ timeout = phba->fc_ratov * 2; mod_timer(&vport->els_tmofunc, - jiffies + msecs_to_jiffies(1000 * timeout)); + jiffies + secs_to_jiffies(timeout)); /* Set up heart beat (HB) timer */ mod_timer(&phba->hb_tmofunc, jiffies + secs_to_jiffies(LPFC_HB_MBOX_INTERVAL)); @@ -604,7 +604,7 @@ lpfc_config_port_post(struct lpfc_hba *phba) phba->last_completion_time = jiffies; /* Set up error attention (ERATT) polling timer */ mod_timer(&phba->eratt_poll, - jiffies + msecs_to_jiffies(1000 * phba->eratt_poll_interval)); + jiffies + secs_to_jiffies(phba->eratt_poll_interval)); if (test_bit(LINK_DISABLED, &phba->hba_flag)) { lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT, @@ -3361,8 +3361,8 @@ lpfc_block_mgmt_io(struct lpfc_hba *phba, int mbx_action) /* Determine how long we might wait for the active mailbox * command to be gracefully completed by firmware. */ - timeout = msecs_to_jiffies(lpfc_mbox_tmo_val(phba, - phba->sli.mbox_active) * 1000) + jiffies; + timeout = secs_to_jiffies(lpfc_mbox_tmo_val(phba, + phba->sli.mbox_active)) + jiffies; } spin_unlock_irqrestore(&phba->hbalock, iflag); @@ -6909,7 +6909,7 @@ lpfc_sli4_async_fip_evt(struct lpfc_hba *phba, * re-instantiate the Vlink using FDISC. */ mod_timer(&ndlp->nlp_delayfunc, - jiffies + msecs_to_jiffies(1000)); + jiffies + secs_to_jiffies(1)); set_bit(NLP_DELAY_TMO, &ndlp->nlp_flag); ndlp->nlp_last_elscmd = ELS_CMD_FDISC; vport->port_state = LPFC_FDISC; diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index 055ed632c14d..f0158fc00f78 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -5645,9 +5645,8 @@ wait_for_cmpl: * cmd_flag is set to LPFC_DRIVER_ABORTED before we wait * for abort to complete. */ - wait_event_timeout(waitq, - (lpfc_cmd->pCmd != cmnd), - msecs_to_jiffies(2*vport->cfg_devloss_tmo*1000)); + wait_event_timeout(waitq, (lpfc_cmd->pCmd != cmnd), + secs_to_jiffies(2*vport->cfg_devloss_tmo)); spin_lock(&lpfc_cmd->buf_lock); @@ -5911,7 +5910,7 @@ lpfc_chk_tgt_mapped(struct lpfc_vport *vport, struct fc_rport *rport) * If target is not in a MAPPED state, delay until * target is rediscovered or devloss timeout expires. */ - later = msecs_to_jiffies(2 * vport->cfg_devloss_tmo * 1000) + jiffies; + later = secs_to_jiffies(2 * vport->cfg_devloss_tmo) + jiffies; while (time_after(later, jiffies)) { if (!pnode) return FAILED; @@ -5957,7 +5956,7 @@ lpfc_reset_flush_io_context(struct lpfc_vport *vport, uint16_t tgt_id, lpfc_sli_abort_taskmgmt(vport, &phba->sli.sli3_ring[LPFC_FCP_RING], tgt_id, lun_id, context); - later = msecs_to_jiffies(2 * vport->cfg_devloss_tmo * 1000) + jiffies; + later = secs_to_jiffies(2 * vport->cfg_devloss_tmo) + jiffies; while (time_after(later, jiffies) && cnt) { schedule_timeout_uninterruptible(msecs_to_jiffies(20)); cnt = lpfc_sli_sum_iocb(vport, tgt_id, lun_id, context); @@ -6137,8 +6136,7 @@ lpfc_target_reset_handler(struct scsi_cmnd *cmnd) wait_event_timeout(waitq, !test_bit(NLP_WAIT_FOR_LOGO, &pnode->save_flags), - msecs_to_jiffies(dev_loss_tmo * - 1000)); + secs_to_jiffies(dev_loss_tmo)); if (test_and_clear_bit(NLP_WAIT_FOR_LOGO, &pnode->save_flags)) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 3fd9723cd271..af11b4ef13df 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -1025,7 +1025,7 @@ lpfc_handle_rrq_active(struct lpfc_hba *phba) LIST_HEAD(send_rrq); clear_bit(HBA_RRQ_ACTIVE, &phba->hba_flag); - next_time = jiffies + msecs_to_jiffies(1000 * (phba->fc_ratov + 1)); + next_time = jiffies + secs_to_jiffies(phba->fc_ratov + 1); spin_lock_irqsave(&phba->rrq_list_lock, iflags); list_for_each_entry_safe(rrq, nextrrq, &phba->active_rrq_list, list) { @@ -1208,8 +1208,7 @@ lpfc_set_rrq_active(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp, else rrq->send_rrq = 0; rrq->xritag = xritag; - rrq->rrq_stop_time = jiffies + - msecs_to_jiffies(1000 * (phba->fc_ratov + 1)); + rrq->rrq_stop_time = jiffies + secs_to_jiffies(phba->fc_ratov + 1); rrq->nlp_DID = ndlp->nlp_DID; rrq->vport = ndlp->vport; rrq->rxid = rxid; @@ -1736,8 +1735,7 @@ lpfc_sli_ringtxcmpl_put(struct lpfc_hba *phba, struct lpfc_sli_ring *pring, BUG_ON(!piocb->vport); if (!test_bit(FC_UNLOADING, &piocb->vport->load_flag)) mod_timer(&piocb->vport->els_tmofunc, - jiffies + - msecs_to_jiffies(1000 * (phba->fc_ratov << 1))); + jiffies + secs_to_jiffies(phba->fc_ratov << 1)); } return 0; @@ -3956,8 +3954,7 @@ void lpfc_poll_eratt(struct timer_list *t) else /* Restart the timer for next eratt poll */ mod_timer(&phba->eratt_poll, - jiffies + - msecs_to_jiffies(1000 * phba->eratt_poll_interval)); + jiffies + secs_to_jiffies(phba->eratt_poll_interval)); return; } @@ -9008,7 +9005,7 @@ lpfc_sli4_hba_setup(struct lpfc_hba *phba) /* Start the ELS watchdog timer */ mod_timer(&vport->els_tmofunc, - jiffies + msecs_to_jiffies(1000 * (phba->fc_ratov * 2))); + jiffies + secs_to_jiffies(phba->fc_ratov * 2)); /* Start heart beat timer */ mod_timer(&phba->hb_tmofunc, @@ -9027,7 +9024,7 @@ lpfc_sli4_hba_setup(struct lpfc_hba *phba) /* Start error attention (ERATT) polling timer */ mod_timer(&phba->eratt_poll, - jiffies + msecs_to_jiffies(1000 * phba->eratt_poll_interval)); + jiffies + secs_to_jiffies(phba->eratt_poll_interval)); /* * The port is ready, set the host's link state to LINK_DOWN @@ -9504,8 +9501,7 @@ lpfc_sli_issue_mbox_s3(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmbox, goto out_not_finished; } /* timeout active mbox command */ - timeout = msecs_to_jiffies(lpfc_mbox_tmo_val(phba, pmbox) * - 1000); + timeout = secs_to_jiffies(lpfc_mbox_tmo_val(phba, pmbox)); mod_timer(&psli->mbox_tmo, jiffies + timeout); } @@ -9629,8 +9625,7 @@ lpfc_sli_issue_mbox_s3(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmbox, drvr_flag); goto out_not_finished; } - timeout = msecs_to_jiffies(lpfc_mbox_tmo_val(phba, pmbox) * - 1000) + jiffies; + timeout = secs_to_jiffies(lpfc_mbox_tmo_val(phba, pmbox)) + jiffies; i = 0; /* Wait for command to complete */ while (((word0 & OWN_CHIP) == OWN_CHIP) || @@ -9756,9 +9751,8 @@ lpfc_sli4_async_mbox_block(struct lpfc_hba *phba) * command to be gracefully completed by firmware. */ if (phba->sli.mbox_active) - timeout = msecs_to_jiffies(lpfc_mbox_tmo_val(phba, - phba->sli.mbox_active) * - 1000) + jiffies; + timeout = secs_to_jiffies(lpfc_mbox_tmo_val(phba, + phba->sli.mbox_active)) + jiffies; spin_unlock_irq(&phba->hbalock); /* Make sure the mailbox is really active */ @@ -9881,8 +9875,7 @@ lpfc_sli4_wait_bmbx_ready(struct lpfc_hba *phba, LPFC_MBOXQ_t *mboxq) } } - timeout = msecs_to_jiffies(lpfc_mbox_tmo_val(phba, mboxq) - * 1000) + jiffies; + timeout = secs_to_jiffies(lpfc_mbox_tmo_val(phba, mboxq)) + jiffies; do { bmbx_reg.word0 = readl(phba->sli4_hba.BMBXregaddr); @@ -10230,7 +10223,7 @@ lpfc_sli4_post_async_mbox(struct lpfc_hba *phba) /* Start timer for the mbox_tmo and log some mailbox post messages */ mod_timer(&psli->mbox_tmo, (jiffies + - msecs_to_jiffies(1000 * lpfc_mbox_tmo_val(phba, mboxq)))); + secs_to_jiffies(lpfc_mbox_tmo_val(phba, mboxq)))); lpfc_printf_log(phba, KERN_INFO, LOG_MBOX | LOG_SLI, "(%d):0355 Mailbox cmd x%x (x%x/x%x) issue Data: " @@ -13159,7 +13152,7 @@ lpfc_sli_issue_iocb_wait(struct lpfc_hba *phba, retval = lpfc_sli_issue_iocb(phba, ring_number, piocb, SLI_IOCB_RET_IOCB); if (retval == IOCB_SUCCESS) { - timeout_req = msecs_to_jiffies(timeout * 1000); + timeout_req = secs_to_jiffies(timeout); timeleft = wait_event_timeout(done_q, lpfc_chk_iocb_flg(phba, piocb, LPFC_IO_WAKE), timeout_req); @@ -13275,8 +13268,7 @@ lpfc_sli_issue_mbox_wait(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmboxq, /* now issue the command */ retval = lpfc_sli_issue_mbox(phba, pmboxq, MBX_NOWAIT); if (retval == MBX_BUSY || retval == MBX_SUCCESS) { - wait_for_completion_timeout(&mbox_done, - msecs_to_jiffies(timeout * 1000)); + wait_for_completion_timeout(&mbox_done, secs_to_jiffies(timeout)); spin_lock_irqsave(&phba->hbalock, flag); pmboxq->ctx_u.mbox_wait = NULL; @@ -13336,9 +13328,8 @@ lpfc_sli_mbox_sys_shutdown(struct lpfc_hba *phba, int mbx_action) * command to be gracefully completed by firmware. */ if (phba->sli.mbox_active) - timeout = msecs_to_jiffies(lpfc_mbox_tmo_val(phba, - phba->sli.mbox_active) * - 1000) + jiffies; + timeout = secs_to_jiffies(lpfc_mbox_tmo_val(phba, + phba->sli.mbox_active)) + jiffies; spin_unlock_irq(&phba->hbalock); /* Enable softirqs again, done with phba->hbalock */ diff --git a/drivers/scsi/lpfc/lpfc_vport.c b/drivers/scsi/lpfc/lpfc_vport.c index 3d70cc517573..cc56a7334319 100644 --- a/drivers/scsi/lpfc/lpfc_vport.c +++ b/drivers/scsi/lpfc/lpfc_vport.c @@ -246,7 +246,7 @@ static void lpfc_discovery_wait(struct lpfc_vport *vport) * fabric RA_TOV value and dev_loss tmo. The driver's * devloss_tmo is 10 giving this loop a 3x multiplier minimally. */ - wait_time_max = msecs_to_jiffies(((phba->fc_ratov * 3) + 3) * 1000); + wait_time_max = secs_to_jiffies((phba->fc_ratov * 3) + 3); wait_time_max += jiffies; start_time = jiffies; while (time_before(jiffies, wait_time_max)) { -- cgit v1.2.3-59-g8ed1b From 78cf56f8832a932ade20b8340a029ace14ac0e98 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:17 +0000 Subject: accel/habanalabs: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-3-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/accel/habanalabs/common/command_submission.c | 2 +- drivers/accel/habanalabs/common/debugfs.c | 2 +- drivers/accel/habanalabs/common/device.c | 2 +- drivers/accel/habanalabs/common/habanalabs_drv.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c index 59823e3c3bf7..dee487724918 100644 --- a/drivers/accel/habanalabs/common/command_submission.c +++ b/drivers/accel/habanalabs/common/command_submission.c @@ -2586,7 +2586,7 @@ int hl_cs_ioctl(struct drm_device *ddev, void *data, struct drm_file *file_priv) cs_seq = args->in.seq; timeout = flags & HL_CS_FLAGS_CUSTOM_TIMEOUT - ? msecs_to_jiffies(args->in.timeout * 1000) + ? secs_to_jiffies(args->in.timeout) : hpriv->hdev->timeout_jiffies; switch (cs_type) { diff --git a/drivers/accel/habanalabs/common/debugfs.c b/drivers/accel/habanalabs/common/debugfs.c index ca7677293a55..4b391807e5f2 100644 --- a/drivers/accel/habanalabs/common/debugfs.c +++ b/drivers/accel/habanalabs/common/debugfs.c @@ -1403,7 +1403,7 @@ static ssize_t hl_timeout_locked_write(struct file *f, const char __user *buf, return rc; if (value) - hdev->timeout_jiffies = msecs_to_jiffies(value * 1000); + hdev->timeout_jiffies = secs_to_jiffies(value); else hdev->timeout_jiffies = MAX_SCHEDULE_TIMEOUT; diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index 30277ae410d4..68eebed3b050 100644 --- a/drivers/accel/habanalabs/common/device.c +++ b/drivers/accel/habanalabs/common/device.c @@ -2091,7 +2091,7 @@ int hl_device_cond_reset(struct hl_device *hdev, u32 flags, u64 event_mask) dev_dbg(hdev->dev, "Device is going to be hard-reset in %u sec unless being released\n", hdev->device_release_watchdog_timeout_sec); schedule_delayed_work(&hdev->device_release_watchdog_work.reset_work, - msecs_to_jiffies(hdev->device_release_watchdog_timeout_sec * 1000)); + secs_to_jiffies(hdev->device_release_watchdog_timeout_sec)); hdev->reset_info.watchdog_active = 1; out: spin_unlock(&hdev->reset_info.lock); diff --git a/drivers/accel/habanalabs/common/habanalabs_drv.c b/drivers/accel/habanalabs/common/habanalabs_drv.c index 596c52e8aa26..0035748f3228 100644 --- a/drivers/accel/habanalabs/common/habanalabs_drv.c +++ b/drivers/accel/habanalabs/common/habanalabs_drv.c @@ -386,7 +386,7 @@ static int fixup_device_params(struct hl_device *hdev) hdev->fw_comms_poll_interval_usec = HL_FW_STATUS_POLL_INTERVAL_USEC; if (tmp_timeout) - hdev->timeout_jiffies = msecs_to_jiffies(tmp_timeout * MSEC_PER_SEC); + hdev->timeout_jiffies = secs_to_jiffies(tmp_timeout); else hdev->timeout_jiffies = MAX_SCHEDULE_TIMEOUT; -- cgit v1.2.3-59-g8ed1b From 344825e17cfa9a7506ad4fd4fc62df6181162c4a Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:18 +0000 Subject: ALSA: ac97: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-4-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Acked-by: Takashi Iwai Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- sound/pci/ac97/ac97_codec.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/sound/pci/ac97/ac97_codec.c b/sound/pci/ac97/ac97_codec.c index 6e710dce5c60..88ac37739b76 100644 --- a/sound/pci/ac97/ac97_codec.c +++ b/sound/pci/ac97/ac97_codec.c @@ -2461,8 +2461,7 @@ int snd_ac97_update_power(struct snd_ac97 *ac97, int reg, int powerup) * (for avoiding loud click noises for many (OSS) apps * that open/close frequently) */ - schedule_delayed_work(&ac97->power_work, - msecs_to_jiffies(power_save * 1000)); + schedule_delayed_work(&ac97->power_work, secs_to_jiffies(power_save)); else { cancel_delayed_work(&ac97->power_work); update_power_regs(ac97); -- cgit v1.2.3-59-g8ed1b From 8ef9019ed2acdf74dce7c8b5618bf2bdabb47004 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:19 +0000 Subject: btrfs: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-5-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Acked-by: David Sterba Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- fs/btrfs/disk-io.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f09db62e61a1..ed2772d27919 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1568,7 +1568,7 @@ static int transaction_kthread(void *arg) do { cannot_commit = false; - delay = msecs_to_jiffies(fs_info->commit_interval * 1000); + delay = secs_to_jiffies(fs_info->commit_interval); mutex_lock(&fs_info->transaction_kthread_mutex); spin_lock(&fs_info->trans_lock); @@ -1583,9 +1583,9 @@ static int transaction_kthread(void *arg) cur->state < TRANS_STATE_COMMIT_PREP && delta < fs_info->commit_interval) { spin_unlock(&fs_info->trans_lock); - delay -= msecs_to_jiffies((delta - 1) * 1000); + delay -= secs_to_jiffies(delta - 1); delay = min(delay, - msecs_to_jiffies(fs_info->commit_interval * 1000)); + secs_to_jiffies(fs_info->commit_interval)); goto sleep; } transid = cur->transid; -- cgit v1.2.3-59-g8ed1b From d5834614a4d97b2c88be83f736602b7c3dbc3a4d Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:22 +0000 Subject: ata: libata-zpodd: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-8-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Acked-by: Damien Le Moal Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/ata/libata-zpodd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c index 4b83b517caec..799531218ea2 100644 --- a/drivers/ata/libata-zpodd.c +++ b/drivers/ata/libata-zpodd.c @@ -160,8 +160,7 @@ void zpodd_on_suspend(struct ata_device *dev) return; } - expires = zpodd->last_ready + - msecs_to_jiffies(zpodd_poweroff_delay * 1000); + expires = zpodd->last_ready + secs_to_jiffies(zpodd_poweroff_delay); if (time_before(jiffies, expires)) return; -- cgit v1.2.3-59-g8ed1b From a877cd2a6487f9f3ba8aa701d4d780efdc034596 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:23 +0000 Subject: xfs: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-9-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Acked-by: Darrick J. Wong Reviewed-by: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_sysfs.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 7b6c026d01a1..7a1feb8dc21f 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -230,7 +230,7 @@ xfs_blockgc_queue( rcu_read_lock(); if (radix_tree_tagged(&pag->pag_ici_root, XFS_ICI_BLOCKGC_TAG)) queue_delayed_work(mp->m_blockgc_wq, &pag->pag_blockgc_work, - msecs_to_jiffies(xfs_blockgc_secs * 1000)); + secs_to_jiffies(xfs_blockgc_secs)); rcu_read_unlock(); } diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c index 60cb5318fdae..c9ca8cd8a2f2 100644 --- a/fs/xfs/xfs_sysfs.c +++ b/fs/xfs/xfs_sysfs.c @@ -568,8 +568,8 @@ retry_timeout_seconds_store( if (val == -1) cfg->retry_timeout = XFS_ERR_RETRY_FOREVER; else { - cfg->retry_timeout = msecs_to_jiffies(val * MSEC_PER_SEC); - ASSERT(msecs_to_jiffies(val * MSEC_PER_SEC) < LONG_MAX); + cfg->retry_timeout = secs_to_jiffies(val); + ASSERT(secs_to_jiffies(val) < LONG_MAX); } return count; } @@ -686,8 +686,8 @@ xfs_error_sysfs_init_class( if (init[i].retry_timeout == XFS_ERR_RETRY_FOREVER) cfg->retry_timeout = XFS_ERR_RETRY_FOREVER; else - cfg->retry_timeout = msecs_to_jiffies( - init[i].retry_timeout * MSEC_PER_SEC); + cfg->retry_timeout = + secs_to_jiffies(init[i].retry_timeout); } return 0; -- cgit v1.2.3-59-g8ed1b From 2c54b24fefa8481e1c2e2330db636e5781acc229 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:24 +0000 Subject: power: supply: da9030: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-10-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/power/supply/da9030_battery.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/power/supply/da9030_battery.c b/drivers/power/supply/da9030_battery.c index ac2e319e9517..d25279c26030 100644 --- a/drivers/power/supply/da9030_battery.c +++ b/drivers/power/supply/da9030_battery.c @@ -502,8 +502,7 @@ static int da9030_battery_probe(struct platform_device *pdev) /* 10 seconds between monitor runs unless platform defines other interval */ - charger->interval = msecs_to_jiffies( - (pdata->batmon_interval ? : 10) * 1000); + charger->interval = secs_to_jiffies(pdata->batmon_interval ? : 10); charger->charge_milliamp = pdata->charge_milliamp; charger->charge_millivolt = pdata->charge_millivolt; -- cgit v1.2.3-59-g8ed1b From 6cf828d2fbebd41d26f8233624bab88e5afc323b Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:25 +0000 Subject: nvme: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-11-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Acked-by: Keith Busch Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/nvme/host/core.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f028913e2e62..24c3e1765d49 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4466,11 +4466,9 @@ static void nvme_fw_act_work(struct work_struct *work) nvme_auth_stop(ctrl); if (ctrl->mtfa) - fw_act_timeout = jiffies + - msecs_to_jiffies(ctrl->mtfa * 100); + fw_act_timeout = jiffies + msecs_to_jiffies(ctrl->mtfa * 100); else - fw_act_timeout = jiffies + - msecs_to_jiffies(admin_timeout * 1000); + fw_act_timeout = jiffies + secs_to_jiffies(admin_timeout); nvme_quiesce_io_queues(ctrl); while (nvme_ctrl_pp_status(ctrl)) { -- cgit v1.2.3-59-g8ed1b From eee6af2319f81497bf02389a2036c14e0cf01e38 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:26 +0000 Subject: spi: spi-fsl-lpspi: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-12-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Acked-by: Mark Brown Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/spi/spi-fsl-lpspi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c index 40f5c8fdba76..5e3818445234 100644 --- a/drivers/spi/spi-fsl-lpspi.c +++ b/drivers/spi/spi-fsl-lpspi.c @@ -572,7 +572,7 @@ static int fsl_lpspi_calculate_timeout(struct fsl_lpspi_data *fsl_lpspi, timeout += 1; /* Double calculated timeout */ - return msecs_to_jiffies(2 * timeout * MSEC_PER_SEC); + return secs_to_jiffies(2 * timeout); } static int fsl_lpspi_dma_transfer(struct spi_controller *controller, -- cgit v1.2.3-59-g8ed1b From 7c9a8c57143c2163407967790fd46ab9545f51d4 Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:27 +0000 Subject: spi: spi-imx: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-13-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Acked-by: Mark Brown Acked-by: Marc Kleine-Budde Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/spi/spi-imx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c index eeb7d082c247..832d6e9009eb 100644 --- a/drivers/spi/spi-imx.c +++ b/drivers/spi/spi-imx.c @@ -1449,7 +1449,7 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size) timeout += 1; /* Double calculated timeout */ - return msecs_to_jiffies(2 * timeout * MSEC_PER_SEC); + return secs_to_jiffies(2 * timeout); } static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx, -- cgit v1.2.3-59-g8ed1b From 8ba1b428cf1a0905c260b5da4e44502a8457edae Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:28 +0000 Subject: platform/x86/amd/pmf: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-14-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/platform/x86/amd/pmf/acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/platform/x86/amd/pmf/acpi.c b/drivers/platform/x86/amd/pmf/acpi.c index dd5780a1d06e..f75f7ecd8cd9 100644 --- a/drivers/platform/x86/amd/pmf/acpi.c +++ b/drivers/platform/x86/amd/pmf/acpi.c @@ -220,7 +220,7 @@ static void apmf_sbios_heartbeat_notify(struct work_struct *work) if (!info) return; - schedule_delayed_work(&dev->heart_beat, msecs_to_jiffies(dev->hb_interval * 1000)); + schedule_delayed_work(&dev->heart_beat, secs_to_jiffies(dev->hb_interval)); kfree(info); } -- cgit v1.2.3-59-g8ed1b From 66644d80a4f9835f3dae0bfee6c6e529f6b082fe Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:29 +0000 Subject: platform/x86: thinkpad_acpi: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-15-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/platform/x86/thinkpad_acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 1cc91173e012..76b1f7a00be3 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -8513,7 +8513,7 @@ static void fan_watchdog_reset(void) if (fan_watchdog_maxinterval > 0 && tpacpi_lifecycle != TPACPI_LIFE_EXITING) mod_delayed_work(tpacpi_wq, &fan_watchdog_task, - msecs_to_jiffies(fan_watchdog_maxinterval * 1000)); + secs_to_jiffies(fan_watchdog_maxinterval)); else cancel_delayed_work(&fan_watchdog_task); } -- cgit v1.2.3-59-g8ed1b From f873136416293b786e7611d36226c9f5a8f6d20b Mon Sep 17 00:00:00 2001 From: Easwar Hariharan Date: Tue, 25 Feb 2025 20:17:30 +0000 Subject: RDMA/bnxt_re: convert timeouts to secs_to_jiffies() Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-16-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan Cc: Carlos Maiolino Cc: Carlos Maiolino Cc: Chris Mason Cc: Christoph Hellwig Cc: Damien Le Maol Cc: "Darrick J. Wong" Cc: David Sterba Cc: Dick Kennedy Cc: Dongsheng Yang Cc: Fabio Estevam Cc: Frank Li Cc: Hans de Goede Cc: Henrique de Moraes Holschuh Cc: Ilpo Jarvinen Cc: Ilya Dryomov Cc: James Bottomley Cc: James Smart Cc: Jaroslav Kysela Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Josef Bacik Cc: Julia Lawall Cc: Kalesh Anakkur Purayil Cc: Keith Busch Cc: Leon Romanovsky Cc: Marc Kleine-Budde Cc: Mark Brown Cc: "Martin K. Petersen" Cc: Nicolas Palix Cc: Niklas Cassel Cc: Oded Gabbay Cc: Sagi Grimberg Cc: Sascha Hauer Cc: Sebastian Reichel Cc: Selvin Thyparampil Xavier Cc: Shawn Guo Cc: Shyam-sundar S-k Cc: Takashi Iwai Cc: Takashi Iwai Cc: Xiubo Li Signed-off-by: Andrew Morton --- drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c index 17e62f22683b..b1a18c9cb7f6 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c @@ -160,7 +160,7 @@ static int __wait_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie) wait_event_timeout(cmdq->waitq, !crsqe->is_in_used || test_bit(ERR_DEVICE_DETACHED, &cmdq->flags), - msecs_to_jiffies(rcfw->max_timeout * 1000)); + secs_to_jiffies(rcfw->max_timeout)); if (!crsqe->is_in_used) return 0; -- cgit v1.2.3-59-g8ed1b From e0349c46cb4fbbb507fa34476bd70f9c82bad359 Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Fri, 21 Feb 2025 21:40:34 +0100 Subject: scripts/gdb/linux/symbols.py: address changes to module_sect_attrs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When loading symbols from kernel modules we used to iterate from 0 to module_sect_attrs::nsections, in order to retrieve their name and address. However module_sect_attrs::nsections has been removed from the struct by a previous commit. Re-arrange the iteration by accessing all items in module_sect_attrs::grp::bin_attrs[] until NULL is found (it's a NULL terminated array). At the same time the symbol address cannot be extracted from module_sect_attrs::attrs[]::address anymore because it has also been deleted. Fetch it from module_sect_attrs::grp::bin_attrs[]::private as described in 4b2c11e4aaf7. Link: https://lkml.kernel.org/r/20250221204034.4430-1-antonio@mandelbit.com Fixes: d8959b947a8d ("module: sysfs: Drop member 'module_sect_attrs::nsections'") Fixes: 4b2c11e4aaf7 ("module: sysfs: Drop member 'module_sect_attr::address'") Signed-off-by: Antonio Quartulli Reviewed-by: Jan Kiszka Cc: Thomas Weißschuh Cc: Kieran Bingham Signed-off-by: Andrew Morton --- scripts/gdb/linux/symbols.py | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py index f6c1b063775a..15d76f7d8ebc 100644 --- a/scripts/gdb/linux/symbols.py +++ b/scripts/gdb/linux/symbols.py @@ -15,6 +15,7 @@ import gdb import os import re +from itertools import count from linux import modules, utils, constants @@ -95,10 +96,14 @@ lx-symbols command.""" except gdb.error: return str(module_addr) - attrs = sect_attrs['attrs'] - section_name_to_address = { - attrs[n]['battr']['attr']['name'].string(): attrs[n]['address'] - for n in range(int(sect_attrs['nsections']))} + section_name_to_address = {} + for i in count(): + # this is a NULL terminated array + if sect_attrs['grp']['bin_attrs'][i] == 0x0: + break + + attr = sect_attrs['grp']['bin_attrs'][i].dereference() + section_name_to_address[attr['attr']['name'].string()] = attr['private'] textaddr = section_name_to_address.get(".text", module_addr) args = [] -- cgit v1.2.3-59-g8ed1b From 8a56f26607418ab517476de4a27e38be51f6fbca Mon Sep 17 00:00:00 2001 From: Mateusz Guzik Date: Mon, 3 Mar 2025 14:49:08 +0100 Subject: signal: avoid clearing TIF_SIGPENDING in recalc_sigpending() if unset Clearing is an atomic op and the flag is not set most of the time. When creating and destroying threads in the same process with the pthread family, the primary bottleneck is calls to sigprocmask which take the process-wide sighand lock. Avoiding the atomic gives me a 2% bump in start/teardown rate at 24-core scale. [akpm@linux-foundation.org: add unlikely() as well] Link: https://lkml.kernel.org/r/20250303134908.423242-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik Acked-by: Oleg Nesterov Signed-off-by: Andrew Morton --- kernel/signal.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 875e97f6205a..ce3fae2f1fb5 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -176,9 +176,10 @@ static bool recalc_sigpending_tsk(struct task_struct *t) void recalc_sigpending(void) { - if (!recalc_sigpending_tsk(current) && !freezing(current)) - clear_thread_flag(TIF_SIGPENDING); - + if (!recalc_sigpending_tsk(current) && !freezing(current)) { + if (unlikely(test_thread_flag(TIF_SIGPENDING))) + clear_thread_flag(TIF_SIGPENDING); + } } EXPORT_SYMBOL(recalc_sigpending); -- cgit v1.2.3-59-g8ed1b From 28939c3e9925735166e7b6e42f9e1dc828093510 Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Mon, 3 Mar 2025 12:03:58 +0100 Subject: scripts/gdb/symbols: determine KASLR offset on s390 Use QEMU's qemu.PhyMemMode [1] functionality to read vmcore from the physical memory the same way the existing dump tooling does this. Gracefully handle non-QEMU targets, early boot, and memory corruptions; print a warning if such situation is detected. [1] https://qemu-project.gitlab.io/qemu/system/gdb.html#examining-physical-memory Link: https://lkml.kernel.org/r/20250303110437.79070-1-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich Acked-by: Jan Kiszka Cc: Alexander Gordeev Cc: Andrew Donnellan Cc: Christian Borntraeger Cc: Heiko Carstens Cc: Kieran Bingham Cc: Nina Schoetterl-Glausch Cc: Vasily Gorbik Signed-off-by: Andrew Morton --- scripts/gdb/linux/symbols.py | 31 ++++++++++++++++++++++++++++++- scripts/gdb/linux/utils.py | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 1 deletion(-) diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py index 15d76f7d8ebc..b255177301e9 100644 --- a/scripts/gdb/linux/symbols.py +++ b/scripts/gdb/linux/symbols.py @@ -14,6 +14,7 @@ import gdb import os import re +import struct from itertools import count from linux import modules, utils, constants @@ -54,6 +55,29 @@ if hasattr(gdb, 'Breakpoint'): return False +def get_vmcore_s390(): + with utils.qemu_phy_mem_mode(): + vmcore_info = 0x0e0c + paddr_vmcoreinfo_note = gdb.parse_and_eval("*(unsigned long long *)" + + hex(vmcore_info)) + inferior = gdb.selected_inferior() + elf_note = inferior.read_memory(paddr_vmcoreinfo_note, 12) + n_namesz, n_descsz, n_type = struct.unpack(">III", elf_note) + desc_paddr = paddr_vmcoreinfo_note + len(elf_note) + n_namesz + 1 + return gdb.parse_and_eval("(char *)" + hex(desc_paddr)).string() + + +def get_kerneloffset(): + if utils.is_target_arch('s390'): + try: + vmcore_str = get_vmcore_s390() + except gdb.error as e: + gdb.write("{}\n".format(e)) + return None + return utils.parse_vmcore(vmcore_str).kerneloffset + return None + + class LxSymbols(gdb.Command): """(Re-)load symbols of Linux kernel and currently loaded modules. @@ -160,7 +184,12 @@ lx-symbols command.""" obj.filename.endswith('vmlinux.debug')): orig_vmlinux = obj.filename gdb.execute("symbol-file", to_string=True) - gdb.execute("symbol-file {0}".format(orig_vmlinux)) + kerneloffset = get_kerneloffset() + if kerneloffset is None: + offset_arg = "" + else: + offset_arg = " -o " + hex(kerneloffset) + gdb.execute("symbol-file {0}{1}".format(orig_vmlinux, offset_arg)) self.loaded_modules = [] module_list = modules.module_list() diff --git a/scripts/gdb/linux/utils.py b/scripts/gdb/linux/utils.py index 245ab297ea84..03ebdccf5f69 100644 --- a/scripts/gdb/linux/utils.py +++ b/scripts/gdb/linux/utils.py @@ -11,6 +11,11 @@ # This work is licensed under the terms of the GNU GPL version 2. # +import contextlib +import dataclasses +import re +import typing + import gdb @@ -216,3 +221,33 @@ def gdb_eval_or_none(expresssion): return gdb.parse_and_eval(expresssion) except gdb.error: return None + + +@contextlib.contextmanager +def qemu_phy_mem_mode(): + connection = gdb.selected_inferior().connection + orig = connection.send_packet("qqemu.PhyMemMode") + if orig not in b"01": + raise gdb.error("Unexpected qemu.PhyMemMode") + orig = orig.decode() + if connection.send_packet("Qqemu.PhyMemMode:1") != b"OK": + raise gdb.error("Failed to set qemu.PhyMemMode") + try: + yield + finally: + if connection.send_packet("Qqemu.PhyMemMode:" + orig) != b"OK": + raise gdb.error("Failed to restore qemu.PhyMemMode") + + +@dataclasses.dataclass +class VmCore: + kerneloffset: typing.Optional[int] + + +def parse_vmcore(s): + match = re.search(r"KERNELOFFSET=([0-9a-f]+)", s) + if match is None: + kerneloffset = None + else: + kerneloffset = int(match.group(1), 16) + return VmCore(kerneloffset=kerneloffset) -- cgit v1.2.3-59-g8ed1b From bc2f19d65373bc166c18c486e2ec2611f5a12d8f Mon Sep 17 00:00:00 2001 From: Philipp Hahn Date: Mon, 10 Mar 2025 12:22:27 +0100 Subject: checkpatch: describe --min-conf-desc-length Neither the warning nor the help message gives any hint on the unit for length: Could be meters, inches, bytes, characters or ... lines. Extend the output of `--help` to name the unit "lines" and the default: - --min-conf-desc-length=n set the min description length, if shorter, warn + --min-conf-desc-length=n set the minimum description length for config symbols + in lines, if shorter, warn (default 4) Include the minimum number of lines as other error messages already do: - WARNING: please write a help paragraph that fully describes the config symbol + WARNING: please write a help paragraph that fully describes the config symbol with at least 4 lines Link: https://lkml.kernel.org/r/c71c170c90eba26265951e248adfedd3245fe575.1741605695.git.p.hahn@avm.de Signed-off-by: Philipp Hahn Cc: Andy Whitcroft Cc: Joe Perches Cc: Dwaipayan Ray Cc: Lukas Bulwahn Signed-off-by: Andrew Morton --- scripts/checkpatch.pl | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 7b28ad331742..784912f570e9 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -113,7 +113,8 @@ Options: --max-line-length=n set the maximum line length, (default $max_line_length) if exceeded, warn on patches requires --strict for use with --file - --min-conf-desc-length=n set the min description length, if shorter, warn + --min-conf-desc-length=n set the minimum description length for config symbols + in lines, if shorter, warn (default $min_conf_desc_length) --tab-size=n set the number of spaces for tab (default $tabsize) --root=PATH PATH to the kernel tree root --no-summary suppress the per-file summary @@ -3645,7 +3646,7 @@ sub process { $help_length < $min_conf_desc_length) { my $stat_real = get_stat_real($linenr, $ln - 1); WARN("CONFIG_DESCRIPTION", - "please write a help paragraph that fully describes the config symbol\n" . "$here\n$stat_real\n"); + "please write a help paragraph that fully describes the config symbol with at least $min_conf_desc_length lines\n" . "$here\n$stat_real\n"); } } -- cgit v1.2.3-59-g8ed1b From 4164e1525d37d463bbd0a808709fd75abcfc89a5 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Mon, 10 Mar 2025 07:49:32 +0000 Subject: lib/rbtree: enable userland test suite for rbtree related data structure Patch series "lib/interval_tree: add some test cases and cleanup", v2. Since rbtree/augmented tree/interval tree share similar data structure, besides new cases for interval tree, this patch set also does cleanup for others. This patch (of 7): Currently we have some tests for rbtree related data structure, e.g. rbtree, augmented rbtree, interval tree, in lib/ as kernel module. To facilitate the test and debug for those fundamental data structure, this patch enable those tests in userland. Link: https://lkml.kernel.org/r/20250310074938.26756-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250310074938.26756-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang Cc: Matthew Wilcox Cc: Michel Lespinasse Cc: Jason Gunthorpe Signed-off-by: Andrew Morton --- tools/include/asm/timex.h | 13 ++++++ tools/include/linux/container_of.h | 18 ++++++++ tools/include/linux/kernel.h | 14 +----- tools/include/linux/math64.h | 5 ++ tools/include/linux/moduleparam.h | 7 +++ tools/include/linux/prandom.h | 51 +++++++++++++++++++++ tools/include/linux/slab.h | 1 + tools/lib/slab.c | 16 +++++++ tools/testing/rbtree/Makefile | 31 +++++++++++++ tools/testing/rbtree/interval_tree_test.c | 53 ++++++++++++++++++++++ tools/testing/rbtree/rbtree_test.c | 45 ++++++++++++++++++ tools/testing/rbtree/test.h | 4 ++ tools/testing/shared/interval_tree-shim.c | 5 ++ tools/testing/shared/linux/interval_tree.h | 7 +++ tools/testing/shared/linux/interval_tree_generic.h | 2 + tools/testing/shared/linux/rbtree.h | 8 ++++ tools/testing/shared/linux/rbtree_augmented.h | 7 +++ tools/testing/shared/linux/rbtree_types.h | 8 ++++ tools/testing/shared/rbtree-shim.c | 6 +++ 19 files changed, 288 insertions(+), 13 deletions(-) create mode 100644 tools/include/asm/timex.h create mode 100644 tools/include/linux/container_of.h create mode 100644 tools/include/linux/moduleparam.h create mode 100644 tools/include/linux/prandom.h create mode 100644 tools/testing/rbtree/Makefile create mode 100644 tools/testing/rbtree/interval_tree_test.c create mode 100644 tools/testing/rbtree/rbtree_test.c create mode 100644 tools/testing/rbtree/test.h create mode 100644 tools/testing/shared/interval_tree-shim.c create mode 100644 tools/testing/shared/linux/interval_tree.h create mode 100644 tools/testing/shared/linux/interval_tree_generic.h create mode 100644 tools/testing/shared/linux/rbtree.h create mode 100644 tools/testing/shared/linux/rbtree_augmented.h create mode 100644 tools/testing/shared/linux/rbtree_types.h create mode 100644 tools/testing/shared/rbtree-shim.c diff --git a/tools/include/asm/timex.h b/tools/include/asm/timex.h new file mode 100644 index 000000000000..5adfe3c6d326 --- /dev/null +++ b/tools/include/asm/timex.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __TOOLS_LINUX_ASM_TIMEX_H +#define __TOOLS_LINUX_ASM_TIMEX_H + +#include + +#define cycles_t clock_t + +static inline cycles_t get_cycles(void) +{ + return clock(); +} +#endif // __TOOLS_LINUX_ASM_TIMEX_H diff --git a/tools/include/linux/container_of.h b/tools/include/linux/container_of.h new file mode 100644 index 000000000000..c879e14c3dd6 --- /dev/null +++ b/tools/include/linux/container_of.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TOOLS_LINUX_CONTAINER_OF_H +#define _TOOLS_LINUX_CONTAINER_OF_H + +#ifndef container_of +/** + * container_of - cast a member of a structure out to the containing structure + * @ptr: the pointer to the member. + * @type: the type of the container struct this is embedded in. + * @member: the name of the member within the struct. + * + */ +#define container_of(ptr, type, member) ({ \ + const typeof(((type *)0)->member) * __mptr = (ptr); \ + (type *)((char *)__mptr - offsetof(type, member)); }) +#endif + +#endif /* _TOOLS_LINUX_CONTAINER_OF_H */ diff --git a/tools/include/linux/kernel.h b/tools/include/linux/kernel.h index 07cfad817d53..c8c18d3908a9 100644 --- a/tools/include/linux/kernel.h +++ b/tools/include/linux/kernel.h @@ -11,6 +11,7 @@ #include #include #include +#include #ifndef UINT_MAX #define UINT_MAX (~0U) @@ -25,19 +26,6 @@ #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) #endif -#ifndef container_of -/** - * container_of - cast a member of a structure out to the containing structure - * @ptr: the pointer to the member. - * @type: the type of the container struct this is embedded in. - * @member: the name of the member within the struct. - * - */ -#define container_of(ptr, type, member) ({ \ - const typeof(((type *)0)->member) * __mptr = (ptr); \ - (type *)((char *)__mptr - offsetof(type, member)); }) -#endif - #ifndef max #define max(x, y) ({ \ typeof(x) _max1 = (x); \ diff --git a/tools/include/linux/math64.h b/tools/include/linux/math64.h index 4ad45d5943dc..8a67d478bf19 100644 --- a/tools/include/linux/math64.h +++ b/tools/include/linux/math64.h @@ -72,4 +72,9 @@ static inline u64 mul_u64_u64_div64(u64 a, u64 b, u64 c) } #endif +static inline u64 div_u64(u64 dividend, u32 divisor) +{ + return dividend / divisor; +} + #endif /* _LINUX_MATH64_H */ diff --git a/tools/include/linux/moduleparam.h b/tools/include/linux/moduleparam.h new file mode 100644 index 000000000000..4c4d05bef0cb --- /dev/null +++ b/tools/include/linux/moduleparam.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TOOLS_LINUX_MODULE_PARAMS_H +#define _TOOLS_LINUX_MODULE_PARAMS_H + +#define MODULE_PARM_DESC(parm, desc) + +#endif // _TOOLS_LINUX_MODULE_PARAMS_H diff --git a/tools/include/linux/prandom.h b/tools/include/linux/prandom.h new file mode 100644 index 000000000000..b745041ccd6a --- /dev/null +++ b/tools/include/linux/prandom.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __TOOLS_LINUX_PRANDOM_H +#define __TOOLS_LINUX_PRANDOM_H + +#include + +struct rnd_state { + __u32 s1, s2, s3, s4; +}; + +/* + * Handle minimum values for seeds + */ +static inline u32 __seed(u32 x, u32 m) +{ + return (x < m) ? x + m : x; +} + +/** + * prandom_seed_state - set seed for prandom_u32_state(). + * @state: pointer to state structure to receive the seed. + * @seed: arbitrary 64-bit value to use as a seed. + */ +static inline void prandom_seed_state(struct rnd_state *state, u64 seed) +{ + u32 i = ((seed >> 32) ^ (seed << 10) ^ seed) & 0xffffffffUL; + + state->s1 = __seed(i, 2U); + state->s2 = __seed(i, 8U); + state->s3 = __seed(i, 16U); + state->s4 = __seed(i, 128U); +} + +/** + * prandom_u32_state - seeded pseudo-random number generator. + * @state: pointer to state structure holding seeded state. + * + * This is used for pseudo-randomness with no outside seeding. + * For more random results, use get_random_u32(). + */ +static inline u32 prandom_u32_state(struct rnd_state *state) +{ +#define TAUSWORTHE(s, a, b, c, d) (((s & c) << d) ^ (((s << a) ^ s) >> b)) + state->s1 = TAUSWORTHE(state->s1, 6U, 13U, 4294967294U, 18U); + state->s2 = TAUSWORTHE(state->s2, 2U, 27U, 4294967288U, 2U); + state->s3 = TAUSWORTHE(state->s3, 13U, 21U, 4294967280U, 7U); + state->s4 = TAUSWORTHE(state->s4, 3U, 12U, 4294967168U, 13U); + + return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4); +} +#endif // __TOOLS_LINUX_PRANDOM_H diff --git a/tools/include/linux/slab.h b/tools/include/linux/slab.h index 51b25e9c4ec7..c87051e2b26f 100644 --- a/tools/include/linux/slab.h +++ b/tools/include/linux/slab.h @@ -12,6 +12,7 @@ void *kmalloc(size_t size, gfp_t gfp); void kfree(void *p); +void *kmalloc_array(size_t n, size_t size, gfp_t gfp); bool slab_is_available(void); diff --git a/tools/lib/slab.c b/tools/lib/slab.c index 959997fb0652..981a21404f32 100644 --- a/tools/lib/slab.c +++ b/tools/lib/slab.c @@ -36,3 +36,19 @@ void kfree(void *p) printf("Freeing %p to malloc\n", p); free(p); } + +void *kmalloc_array(size_t n, size_t size, gfp_t gfp) +{ + void *ret; + + if (!(gfp & __GFP_DIRECT_RECLAIM)) + return NULL; + + ret = calloc(n, size); + uatomic_inc(&kmalloc_nr_allocated); + if (kmalloc_verbose) + printf("Allocating %p from calloc\n", ret); + if (gfp & __GFP_ZERO) + memset(ret, 0, n * size); + return ret; +} diff --git a/tools/testing/rbtree/Makefile b/tools/testing/rbtree/Makefile new file mode 100644 index 000000000000..bac6931b499d --- /dev/null +++ b/tools/testing/rbtree/Makefile @@ -0,0 +1,31 @@ +# SPDX-License-Identifier: GPL-2.0 + +.PHONY: clean + +TARGETS = rbtree_test interval_tree_test +OFILES = $(LIBS) rbtree-shim.o interval_tree-shim.o +DEPS = ../../../include/linux/rbtree.h \ + ../../../include/linux/rbtree_types.h \ + ../../../include/linux/rbtree_augmented.h \ + ../../../include/linux/interval_tree.h \ + ../../../include/linux/interval_tree_generic.h \ + ../../../lib/rbtree.c \ + ../../../lib/interval_tree.c + +targets: $(TARGETS) + +include ../shared/shared.mk + +ifeq ($(DEBUG), 1) + CFLAGS += -g +endif + +$(TARGETS): $(OFILES) + +rbtree-shim.o: $(DEPS) +rbtree_test.o: ../../../lib/rbtree_test.c +interval_tree-shim.o: $(DEPS) +interval_tree_test.o: ../../../lib/interval_tree_test.c + +clean: + $(RM) $(TARGETS) *.o generated/* diff --git a/tools/testing/rbtree/interval_tree_test.c b/tools/testing/rbtree/interval_tree_test.c new file mode 100644 index 000000000000..f1c41f5e28ba --- /dev/null +++ b/tools/testing/rbtree/interval_tree_test.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * interval_tree.c: Userspace Interval Tree test-suite + * Copyright (c) 2025 Wei Yang + */ +#include +#include +#include "shared.h" + +#include "../../../lib/interval_tree_test.c" + +int usage(void) +{ + fprintf(stderr, "Userland interval tree test cases\n"); + fprintf(stderr, " -n: Number of nodes in the interval tree\n"); + fprintf(stderr, " -p: Number of iterations modifying the tree\n"); + fprintf(stderr, " -q: Number of searches to the interval tree\n"); + fprintf(stderr, " -s: Number of iterations searching the tree\n"); + fprintf(stderr, " -a: Searches will iterate all nodes in the tree\n"); + fprintf(stderr, " -m: Largest value for the interval's endpoint\n"); + exit(-1); +} + +void interval_tree_tests(void) +{ + interval_tree_test_init(); + interval_tree_test_exit(); +} + +int main(int argc, char **argv) +{ + int opt; + + while ((opt = getopt(argc, argv, "n:p:q:s:am:")) != -1) { + if (opt == 'n') + nnodes = strtoul(optarg, NULL, 0); + else if (opt == 'p') + perf_loops = strtoul(optarg, NULL, 0); + else if (opt == 'q') + nsearches = strtoul(optarg, NULL, 0); + else if (opt == 's') + search_loops = strtoul(optarg, NULL, 0); + else if (opt == 'a') + search_all = true; + else if (opt == 'm') + max_endpoint = strtoul(optarg, NULL, 0); + else + usage(); + } + + interval_tree_tests(); + return 0; +} diff --git a/tools/testing/rbtree/rbtree_test.c b/tools/testing/rbtree/rbtree_test.c new file mode 100644 index 000000000000..c723e751b9a9 --- /dev/null +++ b/tools/testing/rbtree/rbtree_test.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * rbtree_test.c: Userspace Red Black Tree test-suite + * Copyright (c) 2025 Wei Yang + */ +#include +#include +#include +#include "shared.h" + +#include "../../../lib/rbtree_test.c" + +int usage(void) +{ + fprintf(stderr, "Userland rbtree test cases\n"); + fprintf(stderr, " -n: Number of nodes in the rb-tree\n"); + fprintf(stderr, " -p: Number of iterations modifying the rb-tree\n"); + fprintf(stderr, " -c: Number of iterations modifying and verifying the rb-tree\n"); + exit(-1); +} + +void rbtree_tests(void) +{ + rbtree_test_init(); + rbtree_test_exit(); +} + +int main(int argc, char **argv) +{ + int opt; + + while ((opt = getopt(argc, argv, "n:p:c:")) != -1) { + if (opt == 'n') + nnodes = strtoul(optarg, NULL, 0); + else if (opt == 'p') + perf_loops = strtoul(optarg, NULL, 0); + else if (opt == 'c') + check_loops = strtoul(optarg, NULL, 0); + else + usage(); + } + + rbtree_tests(); + return 0; +} diff --git a/tools/testing/rbtree/test.h b/tools/testing/rbtree/test.h new file mode 100644 index 000000000000..f1f1b545b55a --- /dev/null +++ b/tools/testing/rbtree/test.h @@ -0,0 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +void rbtree_tests(void); +void interval_tree_tests(void); diff --git a/tools/testing/shared/interval_tree-shim.c b/tools/testing/shared/interval_tree-shim.c new file mode 100644 index 000000000000..122e74756571 --- /dev/null +++ b/tools/testing/shared/interval_tree-shim.c @@ -0,0 +1,5 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* Very simple shim around the interval tree. */ + +#include "../../../lib/interval_tree.c" diff --git a/tools/testing/shared/linux/interval_tree.h b/tools/testing/shared/linux/interval_tree.h new file mode 100644 index 000000000000..129faf9f1d0a --- /dev/null +++ b/tools/testing/shared/linux/interval_tree.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TEST_INTERVAL_TREE_H +#define _TEST_INTERVAL_TREE_H + +#include "../../../../include/linux/interval_tree.h" + +#endif /* _TEST_INTERVAL_TREE_H */ diff --git a/tools/testing/shared/linux/interval_tree_generic.h b/tools/testing/shared/linux/interval_tree_generic.h new file mode 100644 index 000000000000..34cd654bee61 --- /dev/null +++ b/tools/testing/shared/linux/interval_tree_generic.h @@ -0,0 +1,2 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include "../../../../include/linux/interval_tree_generic.h" diff --git a/tools/testing/shared/linux/rbtree.h b/tools/testing/shared/linux/rbtree.h new file mode 100644 index 000000000000..d644bb7360bf --- /dev/null +++ b/tools/testing/shared/linux/rbtree.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TEST_RBTREE_H +#define _TEST_RBTREE_H + +#include +#include "../../../../include/linux/rbtree.h" + +#endif /* _TEST_RBTREE_H */ diff --git a/tools/testing/shared/linux/rbtree_augmented.h b/tools/testing/shared/linux/rbtree_augmented.h new file mode 100644 index 000000000000..ad138fcf6652 --- /dev/null +++ b/tools/testing/shared/linux/rbtree_augmented.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TEST_RBTREE_AUGMENTED_H +#define _TEST_RBTREE_AUGMENTED_H + +#include "../../../../include/linux/rbtree_augmented.h" + +#endif /* _TEST_RBTREE_AUGMENTED_H */ diff --git a/tools/testing/shared/linux/rbtree_types.h b/tools/testing/shared/linux/rbtree_types.h new file mode 100644 index 000000000000..194194a5bf92 --- /dev/null +++ b/tools/testing/shared/linux/rbtree_types.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TEST_RBTREE_TYPES_H +#define _TEST_RBTREE_TYPES_H + +#include "../../../../include/linux/rbtree_types.h" + +#endif /* _TEST_RBTREE_TYPES_H */ + diff --git a/tools/testing/shared/rbtree-shim.c b/tools/testing/shared/rbtree-shim.c new file mode 100644 index 000000000000..7692a993e5f1 --- /dev/null +++ b/tools/testing/shared/rbtree-shim.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* Very simple shim around the rbtree. */ + +#include "../../../lib/rbtree.c" + -- cgit v1.2.3-59-g8ed1b From 3e1d58cd5dae95de731dc3128a7bcffa7c5e7ed9 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Mon, 10 Mar 2025 07:49:33 +0000 Subject: lib/rbtree: split tests Current tests are gathered in one big function. Split tests into its own function for better understanding and also it is a preparation for introducing new test cases. Link: https://lkml.kernel.org/r/20250310074938.26756-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang Cc: Matthew Wilcox Cc: Michel Lespinasse Cc: Jason Gunthorpe Signed-off-by: Andrew Morton --- lib/interval_tree_test.c | 50 ++++++++++++++++++++++++++++++++++-------------- lib/rbtree_test.c | 29 ++++++++++++++++++++++------ 2 files changed, 59 insertions(+), 20 deletions(-) diff --git a/lib/interval_tree_test.c b/lib/interval_tree_test.c index 837064b83a6c..12880d772945 100644 --- a/lib/interval_tree_test.c +++ b/lib/interval_tree_test.c @@ -59,26 +59,13 @@ static void init(void) queries[i] = (prandom_u32_state(&rnd) >> 4) % max_endpoint; } -static int interval_tree_test_init(void) +static int basic_check(void) { int i, j; - unsigned long results; cycles_t time1, time2, time; - nodes = kmalloc_array(nnodes, sizeof(struct interval_tree_node), - GFP_KERNEL); - if (!nodes) - return -ENOMEM; - - queries = kmalloc_array(nsearches, sizeof(int), GFP_KERNEL); - if (!queries) { - kfree(nodes); - return -ENOMEM; - } - printk(KERN_ALERT "interval tree insert/remove"); - prandom_seed_state(&rnd, 3141592653589793238ULL); init(); time1 = get_cycles(); @@ -96,8 +83,19 @@ static int interval_tree_test_init(void) time = div_u64(time, perf_loops); printk(" -> %llu cycles\n", (unsigned long long)time); + return 0; +} + +static int search_check(void) +{ + int i, j; + unsigned long results; + cycles_t time1, time2, time; + printk(KERN_ALERT "interval tree search"); + init(); + for (j = 0; j < nnodes; j++) interval_tree_insert(nodes + j, &root); @@ -120,6 +118,30 @@ static int interval_tree_test_init(void) printk(" -> %llu cycles (%lu results)\n", (unsigned long long)time, results); + for (j = 0; j < nnodes; j++) + interval_tree_remove(nodes + j, &root); + + return 0; +} + +static int interval_tree_test_init(void) +{ + nodes = kmalloc_array(nnodes, sizeof(struct interval_tree_node), + GFP_KERNEL); + if (!nodes) + return -ENOMEM; + + queries = kmalloc_array(nsearches, sizeof(int), GFP_KERNEL); + if (!queries) { + kfree(nodes); + return -ENOMEM; + } + + prandom_seed_state(&rnd, 3141592653589793238ULL); + + basic_check(); + search_check(); + kfree(queries); kfree(nodes); diff --git a/lib/rbtree_test.c b/lib/rbtree_test.c index 8655a76d29a1..b0e0b26506cb 100644 --- a/lib/rbtree_test.c +++ b/lib/rbtree_test.c @@ -239,19 +239,14 @@ static void check_augmented(int nr_nodes) } } -static int __init rbtree_test_init(void) +static int basic_check(void) { int i, j; cycles_t time1, time2, time; struct rb_node *node; - nodes = kmalloc_array(nnodes, sizeof(*nodes), GFP_KERNEL); - if (!nodes) - return -ENOMEM; - printk(KERN_ALERT "rbtree testing"); - prandom_seed_state(&rnd, 3141592653589793238ULL); init(); time1 = get_cycles(); @@ -343,6 +338,14 @@ static int __init rbtree_test_init(void) check(0); } + return 0; +} + +static int augmented_check(void) +{ + int i, j; + cycles_t time1, time2, time; + printk(KERN_ALERT "augmented rbtree testing"); init(); @@ -390,6 +393,20 @@ static int __init rbtree_test_init(void) check_augmented(0); } + return 0; +} + +static int __init rbtree_test_init(void) +{ + nodes = kmalloc_array(nnodes, sizeof(*nodes), GFP_KERNEL); + if (!nodes) + return -ENOMEM; + + prandom_seed_state(&rnd, 3141592653589793238ULL); + + basic_check(); + augmented_check(); + kfree(nodes); return -EAGAIN; /* Fail will directly unload the module */ -- cgit v1.2.3-59-g8ed1b From 16b1936ae6d1858252413bf4bae8bcf247eb4b4c Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Mon, 10 Mar 2025 07:49:34 +0000 Subject: lib/rbtree: add random seed Current test use pseudo rand function with fixed seed, which means the test data is the same pattern each time. Add random seed parameter to randomize the test. Link: https://lkml.kernel.org/r/20250310074938.26756-4-richard.weiyang@gmail.com Signed-off-by: Wei Yang Cc: Matthew Wilcox Cc: Michel Lespinasse Cc: Jason Gunthorpe Signed-off-by: Andrew Morton --- include/linux/types.h | 1 + lib/interval_tree_test.c | 3 ++- lib/rbtree_test.c | 3 ++- tools/include/linux/types.h | 2 ++ tools/testing/rbtree/interval_tree_test.c | 5 ++++- tools/testing/rbtree/rbtree_test.c | 5 ++++- 6 files changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/types.h b/include/linux/types.h index 1c509ce8f7f6..864ab701f297 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -92,6 +92,7 @@ typedef unsigned char unchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; +typedef unsigned long long ullong; #ifndef __BIT_TYPES_DEFINED__ #define __BIT_TYPES_DEFINED__ diff --git a/lib/interval_tree_test.c b/lib/interval_tree_test.c index 12880d772945..51863077d4ec 100644 --- a/lib/interval_tree_test.c +++ b/lib/interval_tree_test.c @@ -19,6 +19,7 @@ __param(int, search_loops, 1000, "Number of iterations searching the tree"); __param(bool, search_all, false, "Searches will iterate all nodes in the tree"); __param(uint, max_endpoint, ~0, "Largest value for the interval's endpoint"); +__param(ullong, seed, 3141592653589793238ULL, "Random seed"); static struct rb_root_cached root = RB_ROOT_CACHED; static struct interval_tree_node *nodes = NULL; @@ -137,7 +138,7 @@ static int interval_tree_test_init(void) return -ENOMEM; } - prandom_seed_state(&rnd, 3141592653589793238ULL); + prandom_seed_state(&rnd, seed); basic_check(); search_check(); diff --git a/lib/rbtree_test.c b/lib/rbtree_test.c index b0e0b26506cb..690cede46ac2 100644 --- a/lib/rbtree_test.c +++ b/lib/rbtree_test.c @@ -14,6 +14,7 @@ __param(int, nnodes, 100, "Number of nodes in the rb-tree"); __param(int, perf_loops, 1000, "Number of iterations modifying the rb-tree"); __param(int, check_loops, 100, "Number of iterations modifying and verifying the rb-tree"); +__param(ullong, seed, 3141592653589793238ULL, "Random seed"); struct test_node { u32 key; @@ -402,7 +403,7 @@ static int __init rbtree_test_init(void) if (!nodes) return -ENOMEM; - prandom_seed_state(&rnd, 3141592653589793238ULL); + prandom_seed_state(&rnd, seed); basic_check(); augmented_check(); diff --git a/tools/include/linux/types.h b/tools/include/linux/types.h index 8519386acd23..4928e33d44ac 100644 --- a/tools/include/linux/types.h +++ b/tools/include/linux/types.h @@ -42,6 +42,8 @@ typedef __s16 s16; typedef __u8 u8; typedef __s8 s8; +typedef unsigned long long ullong; + #ifdef __CHECKER__ #define __bitwise __attribute__((bitwise)) #else diff --git a/tools/testing/rbtree/interval_tree_test.c b/tools/testing/rbtree/interval_tree_test.c index f1c41f5e28ba..63775b831c1c 100644 --- a/tools/testing/rbtree/interval_tree_test.c +++ b/tools/testing/rbtree/interval_tree_test.c @@ -18,6 +18,7 @@ int usage(void) fprintf(stderr, " -s: Number of iterations searching the tree\n"); fprintf(stderr, " -a: Searches will iterate all nodes in the tree\n"); fprintf(stderr, " -m: Largest value for the interval's endpoint\n"); + fprintf(stderr, " -r: Random seed\n"); exit(-1); } @@ -31,7 +32,7 @@ int main(int argc, char **argv) { int opt; - while ((opt = getopt(argc, argv, "n:p:q:s:am:")) != -1) { + while ((opt = getopt(argc, argv, "n:p:q:s:am:r:")) != -1) { if (opt == 'n') nnodes = strtoul(optarg, NULL, 0); else if (opt == 'p') @@ -44,6 +45,8 @@ int main(int argc, char **argv) search_all = true; else if (opt == 'm') max_endpoint = strtoul(optarg, NULL, 0); + else if (opt == 'r') + seed = strtoul(optarg, NULL, 0); else usage(); } diff --git a/tools/testing/rbtree/rbtree_test.c b/tools/testing/rbtree/rbtree_test.c index c723e751b9a9..585c970f679e 100644 --- a/tools/testing/rbtree/rbtree_test.c +++ b/tools/testing/rbtree/rbtree_test.c @@ -16,6 +16,7 @@ int usage(void) fprintf(stderr, " -n: Number of nodes in the rb-tree\n"); fprintf(stderr, " -p: Number of iterations modifying the rb-tree\n"); fprintf(stderr, " -c: Number of iterations modifying and verifying the rb-tree\n"); + fprintf(stderr, " -r: Random seed\n"); exit(-1); } @@ -29,13 +30,15 @@ int main(int argc, char **argv) { int opt; - while ((opt = getopt(argc, argv, "n:p:c:")) != -1) { + while ((opt = getopt(argc, argv, "n:p:c:r:")) != -1) { if (opt == 'n') nnodes = strtoul(optarg, NULL, 0); else if (opt == 'p') perf_loops = strtoul(optarg, NULL, 0); else if (opt == 'c') check_loops = strtoul(optarg, NULL, 0); + else if (opt == 'r') + seed = strtoul(optarg, NULL, 0); else usage(); } -- cgit v1.2.3-59-g8ed1b From 82114e45131ff8006435ce40f4275c9c8910b404 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Mon, 10 Mar 2025 07:49:35 +0000 Subject: lib/interval_tree: add test case for interval_tree_iter_xxx() helpers Verify interval_tree_iter_xxx() helpers could find intersection ranges as expected. [sfr@canb.auug.org.au: some of tools/ uses -Wno-unused-parameter] Link: https://lkml.kernel.org/r/20250312113612.31ac808e@canb.auug.org.au Link: https://lkml.kernel.org/r/20250310074938.26756-5-richard.weiyang@gmail.com Signed-off-by: Wei Yang Cc: Matthew Wilcox Cc: Michel Lespinasse Cc: Jason Gunthorpe Signed-off-by: Andrew Morton --- lib/interval_tree_test.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ tools/include/linux/bitmap.h | 21 ++++++++++++++ tools/lib/bitmap.c | 20 +++++++++++++ 3 files changed, 110 insertions(+) diff --git a/lib/interval_tree_test.c b/lib/interval_tree_test.c index 51863077d4ec..7821379e2c21 100644 --- a/lib/interval_tree_test.c +++ b/lib/interval_tree_test.c @@ -5,6 +5,7 @@ #include #include #include +#include #define __param(type, name, init, msg) \ static type name = init; \ @@ -125,6 +126,73 @@ static int search_check(void) return 0; } +static int intersection_range_check(void) +{ + int i, j, k; + unsigned long start, last; + struct interval_tree_node *node; + unsigned long *intxn1; + unsigned long *intxn2; + + printk(KERN_ALERT "interval tree iteration\n"); + + intxn1 = bitmap_alloc(nnodes, GFP_KERNEL); + if (!intxn1) { + WARN_ON_ONCE("Failed to allocate intxn1\n"); + return -ENOMEM; + } + + intxn2 = bitmap_alloc(nnodes, GFP_KERNEL); + if (!intxn2) { + WARN_ON_ONCE("Failed to allocate intxn2\n"); + bitmap_free(intxn1); + return -ENOMEM; + } + + for (i = 0; i < search_loops; i++) { + /* Initialize interval tree for each round */ + init(); + for (j = 0; j < nnodes; j++) + interval_tree_insert(nodes + j, &root); + + /* Let's try nsearches different ranges */ + for (k = 0; k < nsearches; k++) { + /* Try whole range once */ + if (!k) { + start = 0UL; + last = ULONG_MAX; + } else { + last = (prandom_u32_state(&rnd) >> 4) % max_endpoint; + start = (prandom_u32_state(&rnd) >> 4) % last; + } + + /* Walk nodes to mark intersection nodes */ + bitmap_zero(intxn1, nnodes); + for (j = 0; j < nnodes; j++) { + node = nodes + j; + + if (start <= node->last && last >= node->start) + bitmap_set(intxn1, j, 1); + } + + /* Iterate tree to clear intersection nodes */ + bitmap_zero(intxn2, nnodes); + for (node = interval_tree_iter_first(&root, start, last); node; + node = interval_tree_iter_next(node, start, last)) + bitmap_set(intxn2, node - nodes, 1); + + WARN_ON_ONCE(!bitmap_equal(intxn1, intxn2, nnodes)); + } + + for (j = 0; j < nnodes; j++) + interval_tree_remove(nodes + j, &root); + } + + bitmap_free(intxn1); + bitmap_free(intxn2); + return 0; +} + static int interval_tree_test_init(void) { nodes = kmalloc_array(nnodes, sizeof(struct interval_tree_node), @@ -142,6 +210,7 @@ static int interval_tree_test_init(void) basic_check(); search_check(); + intersection_range_check(); kfree(queries); kfree(nodes); diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h index 2a7f260ef9dc..d4d300040d01 100644 --- a/tools/include/linux/bitmap.h +++ b/tools/include/linux/bitmap.h @@ -19,6 +19,7 @@ bool __bitmap_and(unsigned long *dst, const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int bits); bool __bitmap_equal(const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int bits); +void __bitmap_set(unsigned long *map, unsigned int start, int len); void __bitmap_clear(unsigned long *map, unsigned int start, int len); bool __bitmap_intersects(const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int bits); @@ -79,6 +80,11 @@ static inline void bitmap_or(unsigned long *dst, const unsigned long *src1, __bitmap_or(dst, src1, src2, nbits); } +static inline unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags __maybe_unused) +{ + return malloc(bitmap_size(nbits)); +} + /** * bitmap_zalloc - Allocate bitmap * @nbits: Number of bits @@ -150,6 +156,21 @@ static inline bool bitmap_intersects(const unsigned long *src1, return __bitmap_intersects(src1, src2, nbits); } +static inline void bitmap_set(unsigned long *map, unsigned int start, unsigned int nbits) +{ + if (__builtin_constant_p(nbits) && nbits == 1) + __set_bit(start, map); + else if (small_const_nbits(start + nbits)) + *map |= GENMASK(start + nbits - 1, start); + else if (__builtin_constant_p(start & BITMAP_MEM_MASK) && + IS_ALIGNED(start, BITMAP_MEM_ALIGNMENT) && + __builtin_constant_p(nbits & BITMAP_MEM_MASK) && + IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT)) + memset((char *)map + start / 8, 0xff, nbits / 8); + else + __bitmap_set(map, start, nbits); +} + static inline void bitmap_clear(unsigned long *map, unsigned int start, unsigned int nbits) { diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c index 2178862bb114..51255c69754d 100644 --- a/tools/lib/bitmap.c +++ b/tools/lib/bitmap.c @@ -101,6 +101,26 @@ bool __bitmap_intersects(const unsigned long *bitmap1, return false; } +void __bitmap_set(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_set >= 0) { + *p |= mask_to_set; + len -= bits_to_set; + bits_to_set = BITS_PER_LONG; + mask_to_set = ~0UL; + p++; + } + if (len) { + mask_to_set &= BITMAP_LAST_WORD_MASK(size); + *p |= mask_to_set; + } +} + void __bitmap_clear(unsigned long *map, unsigned int start, int len) { unsigned long *p = map + BIT_WORD(start); -- cgit v1.2.3-59-g8ed1b From ccaf3efceefc2c3abc6c23277d5a93238efa5c58 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Mon, 10 Mar 2025 07:49:36 +0000 Subject: lib/interval_tree: add test case for span iteration Verify interval_tree_span_iter_xxx() helpers works as expected. Link: https://lkml.kernel.org/r/20250310074938.26756-6-richard.weiyang@gmail.com Signed-off-by: Wei Yang Cc: Matthew Wilcox Cc: Michel Lespinasse Cc: Jason Gunthorpe Signed-off-by: Andrew Morton --- lib/interval_tree_test.c | 117 ++++++++++++++++++++++++++++++ tools/testing/rbtree/Makefile | 6 +- tools/testing/rbtree/interval_tree_test.c | 2 + 3 files changed, 123 insertions(+), 2 deletions(-) diff --git a/lib/interval_tree_test.c b/lib/interval_tree_test.c index 7821379e2c21..5fd62656f42e 100644 --- a/lib/interval_tree_test.c +++ b/lib/interval_tree_test.c @@ -6,6 +6,7 @@ #include #include #include +#include #define __param(type, name, init, msg) \ static type name = init; \ @@ -193,6 +194,121 @@ static int intersection_range_check(void) return 0; } +#ifdef CONFIG_INTERVAL_TREE_SPAN_ITER +/* + * Helper function to get span of current position from maple tree point of + * view. + */ +static void mas_cur_span(struct ma_state *mas, struct interval_tree_span_iter *state) +{ + unsigned long cur_start; + unsigned long cur_last; + int is_hole; + + if (mas->status == ma_overflow) + return; + + /* walk to current position */ + state->is_hole = mas_walk(mas) ? 0 : 1; + + cur_start = mas->index < state->first_index ? + state->first_index : mas->index; + + /* whether we have followers */ + do { + + cur_last = mas->last > state->last_index ? + state->last_index : mas->last; + + is_hole = mas_next_range(mas, state->last_index) ? 0 : 1; + + } while (mas->status != ma_overflow && is_hole == state->is_hole); + + if (state->is_hole) { + state->start_hole = cur_start; + state->last_hole = cur_last; + } else { + state->start_used = cur_start; + state->last_used = cur_last; + } + + /* advance position for next round */ + if (mas->status != ma_overflow) + mas_set(mas, cur_last + 1); +} + +static int span_iteration_check(void) +{ + int i, j, k; + unsigned long start, last; + struct interval_tree_span_iter span, mas_span; + + DEFINE_MTREE(tree); + + MA_STATE(mas, &tree, 0, 0); + + printk(KERN_ALERT "interval tree span iteration\n"); + + for (i = 0; i < search_loops; i++) { + /* Initialize interval tree for each round */ + init(); + for (j = 0; j < nnodes; j++) + interval_tree_insert(nodes + j, &root); + + /* Put all the range into maple tree */ + mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE); + mt_set_in_rcu(&tree); + + for (j = 0; j < nnodes; j++) + WARN_ON_ONCE(mtree_store_range(&tree, nodes[j].start, + nodes[j].last, nodes + j, GFP_KERNEL)); + + /* Let's try nsearches different ranges */ + for (k = 0; k < nsearches; k++) { + /* Try whole range once */ + if (!k) { + start = 0UL; + last = ULONG_MAX; + } else { + last = (prandom_u32_state(&rnd) >> 4) % max_endpoint; + start = (prandom_u32_state(&rnd) >> 4) % last; + } + + mas_span.first_index = start; + mas_span.last_index = last; + mas_span.is_hole = -1; + mas_set(&mas, start); + + interval_tree_for_each_span(&span, &root, start, last) { + mas_cur_span(&mas, &mas_span); + + WARN_ON_ONCE(span.is_hole != mas_span.is_hole); + + if (span.is_hole) { + WARN_ON_ONCE(span.start_hole != mas_span.start_hole); + WARN_ON_ONCE(span.last_hole != mas_span.last_hole); + } else { + WARN_ON_ONCE(span.start_used != mas_span.start_used); + WARN_ON_ONCE(span.last_used != mas_span.last_used); + } + } + + } + + WARN_ON_ONCE(mas.status != ma_overflow); + + /* Cleanup maple tree for each round */ + mtree_destroy(&tree); + /* Cleanup interval tree for each round */ + for (j = 0; j < nnodes; j++) + interval_tree_remove(nodes + j, &root); + } + return 0; +} +#else +static inline int span_iteration_check(void) {return 0; } +#endif + static int interval_tree_test_init(void) { nodes = kmalloc_array(nnodes, sizeof(struct interval_tree_node), @@ -211,6 +327,7 @@ static int interval_tree_test_init(void) basic_check(); search_check(); intersection_range_check(); + span_iteration_check(); kfree(queries); kfree(nodes); diff --git a/tools/testing/rbtree/Makefile b/tools/testing/rbtree/Makefile index bac6931b499d..d7bbae2af4c7 100644 --- a/tools/testing/rbtree/Makefile +++ b/tools/testing/rbtree/Makefile @@ -3,7 +3,7 @@ .PHONY: clean TARGETS = rbtree_test interval_tree_test -OFILES = $(LIBS) rbtree-shim.o interval_tree-shim.o +OFILES = $(SHARED_OFILES) rbtree-shim.o interval_tree-shim.o maple-shim.o DEPS = ../../../include/linux/rbtree.h \ ../../../include/linux/rbtree_types.h \ ../../../include/linux/rbtree_augmented.h \ @@ -25,7 +25,9 @@ $(TARGETS): $(OFILES) rbtree-shim.o: $(DEPS) rbtree_test.o: ../../../lib/rbtree_test.c interval_tree-shim.o: $(DEPS) +interval_tree-shim.o: CFLAGS += -DCONFIG_INTERVAL_TREE_SPAN_ITER interval_tree_test.o: ../../../lib/interval_tree_test.c +interval_tree_test.o: CFLAGS += -DCONFIG_INTERVAL_TREE_SPAN_ITER clean: - $(RM) $(TARGETS) *.o generated/* + $(RM) $(TARGETS) *.o radix-tree.c idr.c generated/* diff --git a/tools/testing/rbtree/interval_tree_test.c b/tools/testing/rbtree/interval_tree_test.c index 63775b831c1c..49bc5b534330 100644 --- a/tools/testing/rbtree/interval_tree_test.c +++ b/tools/testing/rbtree/interval_tree_test.c @@ -6,6 +6,7 @@ #include #include #include "shared.h" +#include "maple-shared.h" #include "../../../lib/interval_tree_test.c" @@ -51,6 +52,7 @@ int main(int argc, char **argv) usage(); } + maple_tree_init(); interval_tree_tests(); return 0; } -- cgit v1.2.3-59-g8ed1b From 19811285784f7f3d4abaa6e94623f2e7d10219d0 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Mon, 10 Mar 2025 07:49:37 +0000 Subject: lib/interval_tree: skip the check before go to the right subtree The interval_tree_subtree_search() holds the loop invariant: start <= node->ITSUBTREE Let's say we have a following tree: node / \ left right So we know node->ITSUBTREE is contributed by one of the following: * left->ITSUBTREE * ITLAST(node) * right->ITSUBTREE When we come to the right node, we are sure the first two don't contribute to node->ITSUBTREE and it must be the right node does the job. So skip the check before go to the right subtree. Link: https://lkml.kernel.org/r/20250310074938.26756-7-richard.weiyang@gmail.com Signed-off-by: Wei Yang Cc: Matthew Wilcox Cc: Michel Lespinasse Cc: Jason Gunthorpe Signed-off-by: Andrew Morton --- include/linux/interval_tree_generic.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/include/linux/interval_tree_generic.h b/include/linux/interval_tree_generic.h index aaa8a0767aa3..1b400f26f63d 100644 --- a/include/linux/interval_tree_generic.h +++ b/include/linux/interval_tree_generic.h @@ -104,12 +104,8 @@ ITPREFIX ## _subtree_search(ITSTRUCT *node, ITTYPE start, ITTYPE last) \ if (ITSTART(node) <= last) { /* Cond1 */ \ if (start <= ITLAST(node)) /* Cond2 */ \ return node; /* node is leftmost match */ \ - if (node->ITRB.rb_right) { \ - node = rb_entry(node->ITRB.rb_right, \ - ITSTRUCT, ITRB); \ - if (start <= node->ITSUBTREE) \ - continue; \ - } \ + node = rb_entry(node->ITRB.rb_right, ITSTRUCT, ITRB); \ + continue; \ } \ return NULL; /* No match */ \ } \ -- cgit v1.2.3-59-g8ed1b From ceb08ee965a20dd1eecc4cdcaa0328fc7c03b1e5 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Mon, 10 Mar 2025 07:49:38 +0000 Subject: lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap() The comment of interval_tree_span_iter_next_gap() is not exact, nodes[1] is not always !NULL. There are threes cases here. If there is an interior hole, the statement is correct. If there is a tailing hole or the contiguous used range span to the end, nodes[1] is NULL. Link: https://lkml.kernel.org/r/20250310074938.26756-8-richard.weiyang@gmail.com Signed-off-by: Wei Yang Reviewed-by: Jason Gunthorpe Cc: Matthew Wilcox Cc: Michel Lespinasse Signed-off-by: Andrew Morton --- lib/interval_tree.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/lib/interval_tree.c b/lib/interval_tree.c index 3412737ff365..324766e9bf63 100644 --- a/lib/interval_tree.c +++ b/lib/interval_tree.c @@ -20,9 +20,15 @@ EXPORT_SYMBOL_GPL(interval_tree_iter_next); /* * Roll nodes[1] into nodes[0] by advancing nodes[1] to the end of a contiguous * span of nodes. This makes nodes[0]->last the end of that contiguous used span - * indexes that started at the original nodes[1]->start. nodes[1] is now the - * first node starting the next used span. A hole span is between nodes[0]->last - * and nodes[1]->start. nodes[1] must be !NULL. + * of indexes that started at the original nodes[1]->start. + * + * If there is an interior hole, nodes[1] is now the first node starting the + * next used span. A hole span is between nodes[0]->last and nodes[1]->start. + * + * If there is a tailing hole, nodes[1] is now NULL. A hole span is between + * nodes[0]->last and last_index. + * + * If the contiguous used range span to last_index, nodes[1] is set to NULL. */ static void interval_tree_span_iter_next_gap(struct interval_tree_span_iter *state) -- cgit v1.2.3-59-g8ed1b From 6164be01f1799458b5c6f59a547d6241e4a4b4d6 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Thu, 13 Mar 2025 14:30:02 +0100 Subject: watchdog/perf: optimize bytes copied and remove manual NUL-termination Currently, up to 23 bytes of the source string are copied to the destination buffer (including the comma and anything after it), only to then manually NUL-terminate the destination buffer again at index 'len' (where the comma was found). Fix this by calling strscpy() with 'len' instead of the destination buffer size to copy only as many bytes from the source string as needed. Change the length check to allow 'len' to be less than or equal to the destination buffer size to fill the whole buffer if needed. Remove the if-check for the return value of strscpy(), because calling strscpy() with 'len' always truncates the source string at the comma as expected and NUL-terminates the destination buffer at the corresponding index instead. Remove the manual NUL-termination. No functional changes intended. Link: https://lkml.kernel.org/r/20250313133004.36406-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum Cc: Song Liu Cc: Thomas Gleinxer Signed-off-by: Andrew Morton --- kernel/watchdog_perf.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 59c1d86a73a2..b81167cb0dfc 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -294,12 +294,10 @@ void __init hardlockup_config_perf_event(const char *str) } else { unsigned int len = comma - str; - if (len >= sizeof(buf)) + if (len > sizeof(buf)) return; - if (strscpy(buf, str, sizeof(buf)) < 0) - return; - buf[len] = 0; + strscpy(buf, str, len); if (kstrtoull(buf, 16, &config)) return; } -- cgit v1.2.3-59-g8ed1b From caeb8ba598f0238b86ee967a77c54173e036469c Mon Sep 17 00:00:00 2001 From: Yan Zhao Date: Fri, 7 Mar 2025 10:44:11 +0200 Subject: kexec_core: accept unaccepted kexec segments' destination addresses The UEFI Specification version 2.9 introduces the concept of memory acceptance: some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to be accepted before it can be used by the guest. Accepting memory is expensive. The memory must be allocated by the VMM and then brought to a known safe state: cache must be flushed, memory must be zeroed with the guest's encryption key, and associated metadata must be manipulated. These operations must be performed from a trusted environment (firmware or TDX module). Switching context to and from it also takes time. This cost adds up. On large confidential VMs, memory acceptance alone can take minutes. It is better to delay memory acceptance until the memory is actually needed. The kernel accepts memory when it is allocated from buddy allocator for the first time. This reduces boot time and decreases memory overhead as the VMM can allocate memory as needed. It does not work when the guest attempts to kexec into a new kernel. The kexec segments' destination addresses are not allocated by the buddy allocator. Instead, they are searched from normal system RAM (top-down or bottom-up) and exclude driver-managed memory, ACPI, persistent, and reserved memory. Unaccepted memory is normal system RAM from kernel point of view and kexec can place segments there. Kexec bypasses the code path in buddy allocator where memory gets accepted and it leads to a crash when kexec accesses segments' memory. Accept the destination addresses during the kexec load, immediately after they pass sanity checks. This ensures the code is located in a common place shared by both the kexec_load and kexec_file_load system calls. This will not conflict with the accounting in try_to_accept_memory_one() since the accounting is set during kernel boot and decremented when pages are moved to the freelists. There is no harm in invoking accept_memory() on a page before making it available to the buddy allocator. No need to worry about re-accepting memory since accept_memory() checks the unaccepted bitmap before accepting a memory page. Although a user may perform kexec loading without ever triggering the jump, it doesn't impact much since kexec loading is not in a performance-critical path. Additionally, the destination addresses are always searched and found in the same location on a given system. Changes to the destination address searching logic to locate only memory in either unaccepted or accepted status are unnecessary and complicated. [kirill.shutemov@linux.intel.com: update the commit message] Link: https://lkml.kernel.org/r/20250307084411.2150367-1-kirill.shutemov@linux.intel.com Signed-off-by: Yan Zhao Signed-off-by: Kirill A. Shutemov Acked-by: Dave Hansen Cc: "Eric W. Biederman" Cc: Ashish Kalra Cc: Baoquan He Cc: Jianxiong Gao Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c0bdc1686154..9a2095216f4f 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -210,6 +210,16 @@ int sanity_check_segment_list(struct kimage *image) } #endif + /* + * The destination addresses are searched from system RAM rather than + * being allocated from the buddy allocator, so they are not guaranteed + * to be accepted by the current kernel. Accept the destination + * addresses before kexec swaps their content with the segments' source + * pages to avoid accessing memory before it is accepted. + */ + for (i = 0; i < nr_segments; i++) + accept_memory(image->segment[i].mem, image->segment[i].memsz); + return 0; } -- cgit v1.2.3-59-g8ed1b From 3cf67d61ff98672120f6ad07528afa165df12588 Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" Date: Tue, 25 Feb 2025 16:02:34 +0900 Subject: hung_task: show the blocker task if the task is hung on mutex Patch series "hung_task: Dump the blocking task stacktrace", v4. The hung_task detector is very useful for detecting the lockup. However, since it only dumps the blocked (uninterruptible sleep) processes, it is not enough to identify the root cause of that lockup. For example, if a process holds a mutex and sleep an event in interruptible state long time, the other processes will wait on the mutex in uninterruptible state. In this case, the waiter processes are dumped, but the blocker process is not shown because it is sleep in interruptible state. This adds a feature to dump the blocker task which holds a mutex when detecting a hung task. e.g. INFO: task cat:115 blocked for more than 122 seconds. Not tainted 6.14.0-rc3-00003-ga8946be3de00 #156 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:cat state:D stack:13432 pid:115 tgid:115 ppid:106 task_flags:0x400100 flags:0x00000002 Call Trace: __schedule+0x731/0x960 ? schedule_preempt_disabled+0x54/0xa0 schedule+0xb7/0x140 ? __mutex_lock+0x51b/0xa60 ? __mutex_lock+0x51b/0xa60 schedule_preempt_disabled+0x54/0xa0 __mutex_lock+0x51b/0xa60 read_dummy+0x23/0x70 full_proxy_read+0x6a/0xc0 vfs_read+0xc2/0x340 ? __pfx_direct_file_splice_eof+0x10/0x10 ? do_sendfile+0x1bd/0x2e0 ksys_read+0x76/0xe0 do_syscall_64+0xe3/0x1c0 ? exc_page_fault+0xa9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x4840cd RSP: 002b:00007ffe99071828 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd RDX: 0000000000001000 RSI: 00007ffe99071870 RDI: 0000000000000003 RBP: 00007ffe99071870 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000 R13: 00000000132fd3a0 R14: 0000000000000001 R15: ffffffffffffffff INFO: task cat:115 is blocked on a mutex likely owned by task cat:114. task:cat state:S stack:13432 pid:114 tgid:114 ppid:106 task_flags:0x400100 flags:0x00000002 Call Trace: __schedule+0x731/0x960 ? schedule_timeout+0xa8/0x120 schedule+0xb7/0x140 schedule_timeout+0xa8/0x120 ? __pfx_process_timeout+0x10/0x10 msleep_interruptible+0x3e/0x60 read_dummy+0x2d/0x70 full_proxy_read+0x6a/0xc0 vfs_read+0xc2/0x340 ? __pfx_direct_file_splice_eof+0x10/0x10 ? do_sendfile+0x1bd/0x2e0 ksys_read+0x76/0xe0 do_syscall_64+0xe3/0x1c0 ? exc_page_fault+0xa9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x4840cd RSP: 002b:00007ffe3e0147b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd RDX: 0000000000001000 RSI: 00007ffe3e014800 RDI: 0000000000000003 RBP: 00007ffe3e014800 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000 R13: 000000001a0a93a0 R14: 0000000000000001 R15: ffffffffffffffff TBD: We can extend this feature to cover other locks like rwsem and rt_mutex, but rwsem requires to dump all the tasks which acquire and wait that rwsem. We can follow the waiter link but the output will be a bit different compared with mutex case. This patch (of 2): The "hung_task" shows a long-time uninterruptible slept task, but most often, it's blocked on a mutex acquired by another task. Without dumping such a task, investigating the root cause of the hung task problem is very difficult. This introduce task_struct::blocker_mutex to point the mutex lock which this task is waiting for. Since the mutex has "owner" information, we can find the owner task and dump it with hung tasks. Note: the owner can be changed while dumping the owner task, so this is "likely" the owner of the mutex. With this change, the hung task shows blocker task's info like below; INFO: task cat:115 blocked for more than 122 seconds. Not tainted 6.14.0-rc3-00003-ga8946be3de00 #156 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:cat state:D stack:13432 pid:115 tgid:115 ppid:106 task_flags:0x400100 flags:0x00000002 Call Trace: __schedule+0x731/0x960 ? schedule_preempt_disabled+0x54/0xa0 schedule+0xb7/0x140 ? __mutex_lock+0x51b/0xa60 ? __mutex_lock+0x51b/0xa60 schedule_preempt_disabled+0x54/0xa0 __mutex_lock+0x51b/0xa60 read_dummy+0x23/0x70 full_proxy_read+0x6a/0xc0 vfs_read+0xc2/0x340 ? __pfx_direct_file_splice_eof+0x10/0x10 ? do_sendfile+0x1bd/0x2e0 ksys_read+0x76/0xe0 do_syscall_64+0xe3/0x1c0 ? exc_page_fault+0xa9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x4840cd RSP: 002b:00007ffe99071828 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd RDX: 0000000000001000 RSI: 00007ffe99071870 RDI: 0000000000000003 RBP: 00007ffe99071870 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000 R13: 00000000132fd3a0 R14: 0000000000000001 R15: ffffffffffffffff INFO: task cat:115 is blocked on a mutex likely owned by task cat:114. task:cat state:S stack:13432 pid:114 tgid:114 ppid:106 task_flags:0x400100 flags:0x00000002 Call Trace: __schedule+0x731/0x960 ? schedule_timeout+0xa8/0x120 schedule+0xb7/0x140 schedule_timeout+0xa8/0x120 ? __pfx_process_timeout+0x10/0x10 msleep_interruptible+0x3e/0x60 read_dummy+0x2d/0x70 full_proxy_read+0x6a/0xc0 vfs_read+0xc2/0x340 ? __pfx_direct_file_splice_eof+0x10/0x10 ? do_sendfile+0x1bd/0x2e0 ksys_read+0x76/0xe0 do_syscall_64+0xe3/0x1c0 ? exc_page_fault+0xa9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x4840cd RSP: 002b:00007ffe3e0147b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd RDX: 0000000000001000 RSI: 00007ffe3e014800 RDI: 0000000000000003 RBP: 00007ffe3e014800 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000 R13: 000000001a0a93a0 R14: 0000000000000001 R15: ffffffffffffffff [akpm@linux-foundation.org: implement debug_show_blocker() in C rather than in CPP] Link: https://lkml.kernel.org/r/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com Link: https://lkml.kernel.org/r/174046695384.2194069.16796289525958195643.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) Reviewed-by: Waiman Long Reviewed-by: Lance Yang Reviewed-by: Sergey Senozhatsky Cc: Anna Schumaker Cc: Boqun Feng Cc: Ingo Molnar Cc: Joel Granados Cc: Kent Overstreet Cc: Steven Rostedt Cc: Tomasz Figa Cc: Will Deacon Cc: Yongliang Gao Signed-off-by: Andrew Morton --- include/linux/mutex.h | 2 ++ include/linux/sched.h | 4 ++++ kernel/hung_task.c | 38 ++++++++++++++++++++++++++++++++++++++ kernel/locking/mutex.c | 14 ++++++++++++++ lib/Kconfig.debug | 11 +++++++++++ 5 files changed, 69 insertions(+) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 2bf91b57591b..2143d05116be 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -202,4 +202,6 @@ DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T)) DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T)) DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0) +extern unsigned long mutex_get_owner(struct mutex *lock); + #endif /* __LINUX_MUTEX_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 9c15365a30c0..1419d94c8e87 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1217,6 +1217,10 @@ struct task_struct { struct mutex_waiter *blocked_on; #endif +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER + struct mutex *blocker_mutex; +#endif + #ifdef CONFIG_DEBUG_ATOMIC_SLEEP int non_block_count; #endif diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 04efa7a6e69b..dc898ec93463 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -93,6 +93,43 @@ static struct notifier_block panic_block = { .notifier_call = hung_task_panic, }; + +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER +static void debug_show_blocker(struct task_struct *task) +{ + struct task_struct *g, *t; + unsigned long owner; + struct mutex *lock; + + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held"); + + lock = READ_ONCE(task->blocker_mutex); + if (!lock) + return; + + owner = mutex_get_owner(lock); + if (unlikely(!owner)) { + pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n", + task->comm, task->pid); + return; + } + + /* Ensure the owner information is correct. */ + for_each_process_thread(g, t) { + if ((unsigned long)t == owner) { + pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n", + task->comm, task->pid, t->comm, t->pid); + sched_show_task(t); + return; + } + } +} +#else +static inline void debug_show_blocker(struct task_struct *task) +{ +} +#endif + static void check_hung_task(struct task_struct *t, unsigned long timeout) { unsigned long switch_count = t->nvcsw + t->nivcsw; @@ -152,6 +189,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" " disables this message.\n"); sched_show_task(t); + debug_show_blocker(t); hung_task_show_lock = true; if (sysctl_hung_task_all_cpu_backtrace) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index b36f23de48f1..6a543c204a14 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -72,6 +72,14 @@ static inline unsigned long __owner_flags(unsigned long owner) return owner & MUTEX_FLAGS; } +/* Do not use the return value as a pointer directly. */ +unsigned long mutex_get_owner(struct mutex *lock) +{ + unsigned long owner = atomic_long_read(&lock->owner); + + return (unsigned long)__owner_task(owner); +} + /* * Returns: __mutex_owner(lock) on failure or NULL on success. */ @@ -180,6 +188,9 @@ static void __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter, struct list_head *list) { +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER + WRITE_ONCE(current->blocker_mutex, lock); +#endif debug_mutex_add_waiter(lock, waiter, current); list_add_tail(&waiter->list, list); @@ -195,6 +206,9 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter) __mutex_clear_flag(lock, MUTEX_FLAGS); debug_mutex_remove_waiter(lock, waiter, current); +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER + WRITE_ONCE(current->blocker_mutex, NULL); +#endif } /* diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 35796c290ca3..1f5943f6be75 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1260,6 +1260,17 @@ config BOOTPARAM_HUNG_TASK_PANIC Say N if unsure. +config DETECT_HUNG_TASK_BLOCKER + bool "Dump Hung Tasks Blocker" + depends on DETECT_HUNG_TASK + depends on !PREEMPT_RT + default y + help + Say Y here to show the blocker task's stacktrace who acquires + the mutex lock which "hung tasks" are waiting. + This will add overhead a bit but shows suspicious tasks and + call trace if it comes from waiting a mutex. + config WQ_WATCHDOG bool "Detect Workqueue Stalls" depends on DEBUG_KERNEL -- cgit v1.2.3-59-g8ed1b From 2158599a4b6d735e1a57c8320723ca74c42dd7ae Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" Date: Tue, 25 Feb 2025 16:02:43 +0900 Subject: samples: add hung_task detector mutex blocking sample Add a hung_task detector mutex blocking test sample code. This module will create a dummy file on the debugfs. That file will cause the read process to sleep for enough long time (256 seconds) while holding a mutex. As a result, the second process will wait on the mutex for a prolonged duration and be detected by the hung_task detector. Usage is; > cd /sys/kernel/debug/hung_task > cat mutex & cat mutex and wait for hung_task message. [akpm@linux-foundation.org: make `hung_task_dir' static] Closes: https://lore.kernel.org/oe-kbuild-all/202503180827.4StpuFrD-lkp@intel.com/ Link: https://lkml.kernel.org/r/174046696281.2194069.4567490148001547311.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) Cc: Anna Schumaker Cc: Boqun Feng Cc: Ingo Molnar Cc: Joel Granados Cc: Kent Overstreet Cc: Lance Yang Cc: Sergey Senozhatsky Cc: Steven Rostedt Cc: Tomasz Figa Cc: Waiman Long Cc: Will Deacon Cc: Yongliang Gao Signed-off-by: Andrew Morton --- samples/Kconfig | 9 +++++ samples/Makefile | 1 + samples/hung_task/Makefile | 2 ++ samples/hung_task/hung_task_mutex.c | 66 +++++++++++++++++++++++++++++++++++++ 4 files changed, 78 insertions(+) create mode 100644 samples/hung_task/Makefile create mode 100644 samples/hung_task/hung_task_mutex.c diff --git a/samples/Kconfig b/samples/Kconfig index 820e00b2ed68..09011be2391a 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -300,6 +300,15 @@ config SAMPLE_CHECK_EXEC demonstrate how they should be used with execveat(2) + AT_EXECVE_CHECK. +config SAMPLE_HUNG_TASK + tristate "Hung task detector test code" + depends on DETECT_HUNG_TASK && DEBUG_FS + help + Build a module which provide a simple debugfs file. If user reads + the file, it will sleep long time (256 seconds) with holding a + mutex. Thus if there are 2 or more processes read this file, it + will be detected by the hung_task watchdog. + source "samples/rust/Kconfig" source "samples/damon/Kconfig" diff --git a/samples/Makefile b/samples/Makefile index f24cd0d72dd0..bf6e6fca5410 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -42,3 +42,4 @@ obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/ obj-$(CONFIG_SAMPLES_RUST) += rust/ obj-$(CONFIG_SAMPLE_DAMON_WSSE) += damon/ obj-$(CONFIG_SAMPLE_DAMON_PRCL) += damon/ +obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task/ diff --git a/samples/hung_task/Makefile b/samples/hung_task/Makefile new file mode 100644 index 000000000000..f4d6ab563488 --- /dev/null +++ b/samples/hung_task/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_mutex.o diff --git a/samples/hung_task/hung_task_mutex.c b/samples/hung_task/hung_task_mutex.c new file mode 100644 index 000000000000..47ed38239ea3 --- /dev/null +++ b/samples/hung_task/hung_task_mutex.c @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * hung_task_mutex.c - Sample code which causes hung task by mutex + * + * Usage: load this module and read `/hung_task/mutex` + * by 2 or more processes. + * + * This is for testing kernel hung_task error message. + * Note that this will make your system freeze and maybe + * cause panic. So do not use this except for the test. + */ + +#include +#include +#include +#include +#include + +#define HUNG_TASK_DIR "hung_task" +#define HUNG_TASK_FILE "mutex" +#define SLEEP_SECOND 256 + +static const char dummy_string[] = "This is a dummy string."; +static DEFINE_MUTEX(dummy_mutex); +static struct dentry *hung_task_dir; + +static ssize_t read_dummy(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + /* If the second task waits on the lock, it is uninterruptible sleep. */ + guard(mutex)(&dummy_mutex); + + /* When the first task sleep here, it is interruptible. */ + msleep_interruptible(SLEEP_SECOND * 1000); + + return simple_read_from_buffer(user_buf, count, ppos, + dummy_string, sizeof(dummy_string)); +} + +static const struct file_operations hung_task_fops = { + .read = read_dummy, +}; + +static int __init hung_task_sample_init(void) +{ + hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL); + if (IS_ERR(hung_task_dir)) + return PTR_ERR(hung_task_dir); + + debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir, + NULL, &hung_task_fops); + + return 0; +} + +static void __exit hung_task_sample_exit(void) +{ + debugfs_remove_recursive(hung_task_dir); +} + +module_init(hung_task_sample_init); +module_exit(hung_task_sample_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Masami Hiramatsu"); +MODULE_DESCRIPTION("Simple sleep under mutex file for testing hung task"); -- cgit v1.2.3-59-g8ed1b From 8ee065a6fdd285387d0eca5e1139ffb182a6e626 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 17 Mar 2025 20:11:10 +0200 Subject: resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "resource: Split and use DEFINE_RES*() macros", v2. Replace open coded variants of DEFINE_RES*() macros. Note, there are many more possibilities over the kernel and even in reources.c, however the latter contains not so trivial leftovers. That's why the examples cover only straightforward conversions. This patch (of 4): In some cases it would be useful to supply predefined descriptor of the resource. For this, introduce DEFINE_RES_NAMED_DESC() macro. While at it, provide DEFINE_RES() that takes only start, size, and flags. Link: https://lkml.kernel.org/r/20250317181412.1560630-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20250317181412.1560630-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Ilpo Järvinen Signed-off-by: Andrew Morton --- include/linux/ioport.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 5385349f0b8a..e8b2d6aa4013 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -154,15 +154,20 @@ enum { }; /* helpers to define resources */ -#define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ +#define DEFINE_RES_NAMED_DESC(_start, _size, _name, _flags, _desc) \ (struct resource) { \ .start = (_start), \ .end = (_start) + (_size) - 1, \ .name = (_name), \ .flags = (_flags), \ - .desc = IORES_DESC_NONE, \ + .desc = (_desc), \ } +#define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ + DEFINE_RES_NAMED_DESC(_start, _size, _name, _flags, IORES_DESC_NONE) +#define DEFINE_RES(_start, _size, _flags) \ + DEFINE_RES_NAMED(_start, _size, NULL, _flags) + #define DEFINE_RES_IO_NAMED(_start, _size, _name) \ DEFINE_RES_NAMED((_start), (_size), (_name), IORESOURCE_IO) #define DEFINE_RES_IO(_start, _size) \ -- cgit v1.2.3-59-g8ed1b From 76709e0a3f3b04dfba588f7985c1f3b302092418 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 17 Mar 2025 20:11:11 +0200 Subject: resource: replace open coded variant of DEFINE_RES_NAMED_DESC() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace open coded variant of DEFINE_RES_NAMED_DESC(). Link: https://lkml.kernel.org/r/20250317181412.1560630-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Ilpo Järvinen Signed-off-by: Andrew Morton --- kernel/resource.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 12004452d999..3464d6a38424 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1975,11 +1975,7 @@ get_free_mem_region(struct device *dev, struct resource *base, */ revoke_iomem(res); } else { - res->start = addr; - res->end = addr + size - 1; - res->name = name; - res->desc = desc; - res->flags = IORESOURCE_MEM; + *res = DEFINE_RES_NAMED_DESC(addr, size, name, IORESOURCE_MEM, desc); /* * Only succeed if the resource hosts an exclusive -- cgit v1.2.3-59-g8ed1b From 1af56ff09e676899c9a37468098f4a0d71fef848 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 17 Mar 2025 20:11:12 +0200 Subject: resource: replace open coded variants of DEFINE_RES_*_NAMED() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace open coded variants of DEFINE_RES_*_NAMED(). Link: https://lkml.kernel.org/r/20250317181412.1560630-4-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Ilpo Järvinen Signed-off-by: Andrew Morton --- kernel/resource.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 3464d6a38424..8f3652a41ea4 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1714,18 +1714,13 @@ static int __init reserve_setup(char *str) * I/O port space; otherwise assume it's memory. */ if (io_start < 0x10000) { - res->flags = IORESOURCE_IO; + *res = DEFINE_RES_IO_NAMED(io_start, io_num, "reserved"); parent = &ioport_resource; } else { - res->flags = IORESOURCE_MEM; + *res = DEFINE_RES_MEM_NAMED(io_start, io_num, "reserved"); parent = &iomem_resource; } - res->name = "reserved"; - res->start = io_start; - res->end = io_start + io_num - 1; res->flags |= IORESOURCE_BUSY; - res->desc = IORES_DESC_NONE; - res->child = NULL; if (request_resource(parent, res) == 0) reserved = x+1; } -- cgit v1.2.3-59-g8ed1b From 48376a4fa6af8642ab5e19016c38b072d73772c1 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 17 Mar 2025 20:11:13 +0200 Subject: resource: replace open coded variant of DEFINE_RES() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace open coded variant of DEFINE_RES(). No functional changes intended. Link: https://lkml.kernel.org/r/20250317181412.1560630-5-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Ilpo Järvinen Signed-off-by: Andrew Morton --- kernel/resource.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 8f3652a41ea4..8d3e6ed0bdc1 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -561,8 +561,7 @@ static int __region_intersects(struct resource *parent, resource_size_t start, struct resource res, o; bool covered; - res.start = start; - res.end = start + size - 1; + res = DEFINE_RES(start, size, 0); for (p = parent->child; p ; p = p->sibling) { if (!resource_intersection(p, &res, &o)) -- cgit v1.2.3-59-g8ed1b From 81ca2970b770d967f64803eff6e7016a68802da3 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 17 Mar 2025 23:29:47 +0200 Subject: relay: use kasprintf() instead of fixed buffer formatting Improve readability and maintainability by replacing a hard coded string allocation and formatting by using the kasprintf() helper. It also eliminates the GCC compiler warning (with CONFIG_WERROR=y, which is default, it becomes an error: kernel/relay.c:357:42: error: `snprintf' output may be truncated before the last format character [-Werror=format-truncation=] Link: https://lkml.kernel.org/r/20250317212948.1811176-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Signed-off-by: Andrew Morton --- kernel/relay.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/relay.c b/kernel/relay.c index a8ae436dc77e..5ac7e711e4b6 100644 --- a/kernel/relay.c +++ b/kernel/relay.c @@ -351,10 +351,9 @@ static struct dentry *relay_create_buf_file(struct rchan *chan, struct dentry *dentry; char *tmpname; - tmpname = kzalloc(NAME_MAX + 1, GFP_KERNEL); + tmpname = kasprintf(GFP_KERNEL, "%s%d", chan->base_filename, cpu); if (!tmpname) return NULL; - snprintf(tmpname, NAME_MAX, "%s%d", chan->base_filename, cpu); /* Create file in fs */ dentry = chan->cb->create_buf_file(tmpname, chan->parent, -- cgit v1.2.3-59-g8ed1b From 6287fbad1cd91f0c25cdc3a580499060828a8f30 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 19 Mar 2025 14:02:22 -0700 Subject: fs/procfs: fix the comment above proc_pid_wchan() proc_pid_wchan() used to report kernel addresses to user space but that is no longer the case today. Bring the comment above proc_pid_wchan() in sync with the implementation. Link: https://lkml.kernel.org/r/20250319210222.1518771-1-bvanassche@acm.org Fixes: b2f73922d119 ("fs/proc, core/debug: Don't expose absolute kernel addresses via wchan") Signed-off-by: Bart Van Assche Cc: Kees Cook Cc: Eric W. Biederman Cc: Alexey Dobriyan Signed-off-by: Andrew Morton --- fs/proc/base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index cd89e956c322..7feb8f41aa25 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -416,7 +416,7 @@ static const struct file_operations proc_pid_cmdline_ops = { #ifdef CONFIG_KALLSYMS /* * Provides a wchan file via kallsyms in a proper one-value-per-file format. - * Returns the resolved symbol. If that fails, simply return the address. + * Returns the resolved symbol to user space. */ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) -- cgit v1.2.3-59-g8ed1b From 434333dd3f66f9d1ad387dabd2a565182a823f31 Mon Sep 17 00:00:00 2001 From: Alexander Sverdlin Date: Wed, 19 Mar 2025 09:52:46 +0100 Subject: mailmap: consolidate email addresses of Alexander Sverdlin Alias all the addresses used in the past and currently to the single contact address. Link: https://lkml.kernel.org/r/20250319085251.3335678-1-alexander.sverdlin@siemens.com Signed-off-by: Alexander Sverdlin Signed-off-by: Andrew Morton --- .mailmap | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/.mailmap b/.mailmap index f88e5e28ce00..71074abd4699 100644 --- a/.mailmap +++ b/.mailmap @@ -31,6 +31,13 @@ Alexander Lobakin Alexander Lobakin Alexander Mikhalitsyn Alexander Mikhalitsyn +Alexander Sverdlin +Alexander Sverdlin +Alexander Sverdlin +Alexander Sverdlin +Alexander Sverdlin +Alexander Sverdlin +Alexander Sverdlin Alexandre Belloni Alexandre Ghiti Alexei Avshalom Lazar -- cgit v1.2.3-59-g8ed1b