aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-sqlite.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-01-27tools/power turbostat: Add fixed RAPL PSYS divisor for SPRPatryk Wlazlyn1-2/+9
Intel Sapphire Rapids is an exception and has fixed divisor for RAPL PSYS counter set to 1.0. Add a platform bit and enable it for SPR. Reported-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-14tools/power turbostat: Fix PMT mmaped file size roundingPatryk Wlazlyn1-1/+3
This (the old code) is just not how you round up to a page size. Noticed on a recent Intel platform. Previous ones must have been reporting sizes already aligned to a page and so the bug was missed when testing. Fixes: f0e4ed752fda ("tools/power turbostat: Add early support for PMT counters") Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-14tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULTPatryk Wlazlyn2-3/+3
The counter is present on most supporting Intel platforms and provides useful data to the user. There is no reason to disable the counter by default. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-14tools/power turbostat: Add an NMI columnLen Brown1-6/+48
Add an NMI column, a proper sub-set of the IRQ column. It would be preferable if the kernel exported /sys/kernel/irq/NMI/per_cpu_count. But since we are already forced to parse /proc/interrupts, noticing which row is the NMI is simple enough. Suggested-by: Artem Bityutskiy <artem.bityutskiy@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-14tools/power turbostat: add Busy% to "show idle"Len Brown1-1/+1
Suggested-by: Artem Bityutskiy <artem.bityutskiy@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-12-03tools/power turbostat: Introduce --force parameterZhang Rui1-2/+15
Turbostat currently exits under the following conditions: 1. When running on non-Intel/AMD/Hygon x86 vendors. 2. When running on Intel models that lack specific platform features. Introduce a new `--force` parameter that allows turbostat to run on these unsupported platforms with minimal default feature support. This provides users with the flexibility to gather basic information even on unsupported systems. [lenb: updated warning message text] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-12-03tools/power turbostat: Improve --help outputZhang Rui1-15/+26
Improve the `--help` output of turbostat by standardizing the format and enhancing readability. The following changes are made to ensure consistency and clarity in the help message: 1. Use a consistent pattern for each parameter's help message: - Display the parameter and its input (if any) on the same line, separated by a space. - Provide the detailed description on a separate line. 2. Ensure that the first character of each description is in lower-case. These changes make the help output more uniform and easier to read, helping users quickly understand the available options and their usage. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-12-03tools/power turbostat: Exit on unsupported VendorsZhang Rui1-2/+6
Turbostat currently supports x86 processors from Intel, AMD, and Hygon. The behavior of turbostat on CPUs from other vendors has not been evaluated and may lead to incorrect or undefined behavior. Enhance turbostat to exit by default when running on an unsupported CPU vendor. This ensures that users are aware that their CPU is not currently supported by turbostat, guiding them to seek support for their specific hardware through future patches. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-12-03tools/power turbostat: Exit on unsupported Intel modelsZhang Rui1-0/+4
Turbostat requires per-platform enabling for Intel CPU models due to platform-specific features. When running on unsupported Intel CPU models, turbostat currently operates with limited default features, which can lead to users unknowingly using an outdated version of the tool. Enhance turbostat to exit by default when run on unsupported Intel CPU models, with a clear message to users, informing them that their CPU model is not supported and advising them to update to the latest version of turbostat for full functionality. [lenb: updated error message wording] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-12-03tools/power turbostat: update turbostat(8)Len Brown1-1/+27
Clarify how to get the latest version. Signed-off-by: Len Brown <len.brown@intel.com>
2024-12-03tools/power turbostat: Add initial support for ClearwaterForestZhang Rui1-0/+1
Add initial support for ClearwaterForest. It shares the same features with SierraForest. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-12-03tools/power turbostat: Add initial support for PantherLakeZhang Rui1-0/+1
Add initial support for PantherLake. It shares the same features with Lunarlake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: 2024.11.30Len Brown2-2/+2
since 2024.07.26: assorted minor bug fixes assorted platform specific tweaks initial RAPL PSYS (SysWatt) support Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add RAPL psys as a built-in counterPatryk Wlazlyn2-10/+85
Introduce the counter as a part of global, platform counters structure. We open the counter for only one cpu, but otherwise treat it as an ordinary RAPL counter, allowing for grouped perf read. The counter is disabled by default, because it's interpretation may require additional, platform specific information, making it unsuitable for general use. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix child's argument forwardingPatryk Wlazlyn1-1/+1
Add '+' to optstring when early scanning for --no-msr and --no-perf. It causes option processing to stop as soon as a nonoption argument is encountered, effectively skipping child's arguments. Fixes: 3e4048466c39 ("tools/power turbostat: Add --no-msr option") Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Force --no-perf in --dump modePatryk Wlazlyn1-0/+6
Force the --no-perf early to prevent using it as a source. User asks for raw values, but perf returns them relative to the opening of the file descriptor. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add support for /sys/class/drm/card1Zhang Rui1-9/+29
On some machines, the graphics device is enumerated as /sys/class/drm/card1 instead of /sys/class/drm/card0. The current implementation does not handle this scenario, resulting in the loss of graphics C6 residency and frequency information. Add support for /sys/class/drm/card1, ensuring that turbostat can retrieve and display the graphics columns for these platforms. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Cache graphics sysfs file descriptors during probeZhang Rui1-50/+32
Snapshots of the graphics sysfs knobs are taken based on file descriptors. To optimize this process, open the files and cache the file descriptors during the graphics probe phase. As a result, the previously cached pathnames become redundant and are removed. This change aims to streamline the code without altering its functionality. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Consolidate graphics sysfs accessZhang Rui1-9/+6
Currently, there is an inconsistency in how graphics sysfs knobs are accessed: graphics residency sysfs knobs are opened and closed for each read, while graphics frequency sysfs knobs are opened once and remain open until turbostat exits. This inconsistency is confusing and adds unnecessary code complexity. Consolidate the access method by opening the sysfs files once and reusing the file pointers for subsequent accesses. This approach simplifies the code and ensures a consistent method for accessing graphics sysfs knobs. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove unnecessary fflush() callZhang Rui1-4/+3
The graphics sysfs knobs are read-only, making the use of fflush() before reading them redundant. Remove the unnecessary fflush() call. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Enhance platform divergence descriptionZhang Rui1-28/+30
In various generations, platforms often share a majority of features, diverging only in a few specific aspects. The current approach of using hardcoded values in 'platform_features' structure fails to effectively represent these divergences. To improve the description of platform divergence: 1. Each newly introduced 'platform_features' structure must have a base, typically derived from the previous generation. 2. Platform feature values should be inherited from the base structure rather than being hardcoded. This approach ensures a more accurate and maintainable representation of platform-specific features across different generations. Converts `adl_features` and `lnl_features` to follow this new scheme. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add initial support for GraniteRapids-DZhang Rui1-0/+1
Add initial support for GraniteRapids-D. It shares the same features with SapphireRapids. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove PC3 support on LunarlakeZhang Rui1-1/+1
Lunarlake supports CC1/CC6/CC7/PC2/PC6/PC10. Remove PC3 support on Lunarlake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Rename arl_features to lnl_featuresZhang Rui1-2/+2
As ARL shares the same features with ADL/RPL/MTL, now 'arl_features' is used by Lunarlake platform only. Rename 'arl_features' to 'lnl_features'. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add back PC8 support on ArrowlakeZhang Rui1-3/+3
Similar to ADL/RPL/MTL, ARL supports CC1/CC6/CC7/PC2/PC3/PC6/PC8/PC10. Add back PC8 support on Arrowlake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove PC7/PC9 support on MTLZhang Rui1-2/+2
Similar to ADL/RPL, MTL support CC1/CC6/CC7/PC2/PC3/PC6/PC8/CP10. Remove PC7/PC9 support on MTL. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Honor --show CPU, even when even when num_cpus=1Patryk Wlazlyn1-2/+2
Honor --show CPU and --show Core when "topo.num_cpus == 1". Previously turbostat assumed that on a 1-CPU system, these columns should never appear. Honoring these flags makes it easier for several programs that parse turbostat output. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix trailing '\n' parsingZhang Rui1-0/+3
parse_cpu_string() parses the string input either from command line or from /sys/fs/cgroup/cpuset.cpus.effective to get a list of CPUs that turbostat can run with. The cpu string returned by /sys/fs/cgroup/cpuset.cpus.effective contains a trailing '\n', but strtoul() fails to treat this as an error. That says, for the code below val = ("\n", NULL, 10); val returns 0, and errno is also not set. As a result, CPU0 is erroneously considered as allowed CPU and this causes failures when turbostat tries to run on CPU0. get_counters: Could not migrate to CPU 0 ... turbostat: re-initialized with num_cpus 8, allowed_cpus 5 get_counters: Could not migrate to CPU 0 Add a check to return immediately if '\n' or '\0' is detected. Fixes: 8c3dd2c9e542 ("tools/power/turbostat: Abstrct function for parsing cpu string") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Allow using cpu device in perf counters on hybrid platformsPatryk Wlazlyn2-7/+123
Intel hybrid platforms expose different perf devices for P and E cores. Instead of one, "/sys/bus/event_source/devices/cpu" device, there are "/sys/bus/event_source/devices/{cpu_core,cpu_atom}". This, however makes it more complicated for the user, because most of the counters are available on both and had to be handled manually. This patch allows users to use "virtual" cpu device that is seemingly translated to cpu_core and cpu_atom perf devices, depending on the type of a CPU we are opening the counter for. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix column printing for PMT xtal_time countersPatryk Wlazlyn1-3/+3
If the very first printed column was for a PMT counter of type xtal_time we would misalign the column header, because we were always printing the delimiter. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: fix GCC9 build regressionTodd Brandt1-9/+6
Fix build regression seen when using old gcc-9 compiler. Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-17Linux 6.12Linus Torvalds1-1/+1
2024-11-16mm: revert "mm: shmem: fix data-race in shmem_getattr()"Andrew Morton1-2/+0
Revert d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as suggested by Chuck [1]. It is causing deadlocks when accessing tmpfs over NFS. As Hugh commented, "added just to silence a syzbot sanitizer splat: added where there has never been any practical problem". Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1] Fixes: d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") Acked-by: Hugh Dickins <hughd@google.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-16Revert "drm/amd/pm: correct the workload setting"Alex Deucher12-84/+36
This reverts commit 74e1006430a5377228e49310f6d915628609929e. This causes a regression in the workload selection. A more extensive fix is being worked on. For now, revert. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3618 Fixes: 74e1006430a5 ("drm/amd/pm: correct the workload setting") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-14ocfs2: uncache inode which has failed entering the groupDmitry Antipov1-0/+2
Syzbot has reported the following BUG: kernel BUG at fs/ocfs2/uptodate.c:509! ... Call Trace: <TASK> ? __die_body+0x5f/0xb0 ? die+0x9e/0xc0 ? do_trap+0x15a/0x3a0 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ? do_error_trap+0x1dc/0x2c0 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ? __pfx_do_error_trap+0x10/0x10 ? handle_invalid_op+0x34/0x40 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ? exc_invalid_op+0x38/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? ocfs2_set_new_buffer_uptodate+0x2e/0x160 ? ocfs2_set_new_buffer_uptodate+0x144/0x160 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ocfs2_group_add+0x39f/0x15a0 ? __pfx_ocfs2_group_add+0x10/0x10 ? __pfx_lock_acquire+0x10/0x10 ? mnt_get_write_access+0x68/0x2b0 ? __pfx_lock_release+0x10/0x10 ? rcu_read_lock_any_held+0xb7/0x160 ? __pfx_rcu_read_lock_any_held+0x10/0x10 ? smack_log+0x123/0x540 ? mnt_get_write_access+0x68/0x2b0 ? mnt_get_write_access+0x68/0x2b0 ? mnt_get_write_access+0x226/0x2b0 ocfs2_ioctl+0x65e/0x7d0 ? __pfx_ocfs2_ioctl+0x10/0x10 ? smack_file_ioctl+0x29e/0x3a0 ? __pfx_smack_file_ioctl+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x43d/0x780 ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10 ? __pfx_ocfs2_ioctl+0x10/0x10 __se_sys_ioctl+0xfb/0x170 do_syscall_64+0xf3/0x230 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... </TASK> When 'ioctl(OCFS2_IOC_GROUP_ADD, ...)' has failed for the particular inode in 'ocfs2_verify_group_and_input()', corresponding buffer head remains cached and subsequent call to the same 'ioctl()' for the same inode issues the BUG() in 'ocfs2_set_new_buffer_uptodate()' (trying to cache the same buffer head of that inode). Fix this by uncaching the buffer head with 'ocfs2_remove_from_cache()' on error path in 'ocfs2_group_add()'. Link: https://lkml.kernel.org/r/20241114043844.111847-1-dmantipov@yandex.ru Fixes: 7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+453873f1588c2d75b447@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=453873f1588c2d75b447 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Dmitry Antipov <dmantipov@yandex.ru> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14mm: fix NULL pointer dereference in alloc_pages_bulk_noprofJinjiang Tu1-1/+2
We triggered a NULL pointer dereference for ac.preferred_zoneref->zone in alloc_pages_bulk_noprof() when the task is migrated between cpusets. When cpuset is enabled, in prepare_alloc_pages(), ac->nodemask may be &current->mems_allowed. when first_zones_zonelist() is called to find preferred_zoneref, the ac->nodemask may be modified concurrently if the task is migrated between different cpusets. Assuming we have 2 NUMA Node, when traversing Node1 in ac->zonelist, the nodemask is 2, and when traversing Node2 in ac->zonelist, the nodemask is 1. As a result, the ac->preferred_zoneref points to NULL zone. In alloc_pages_bulk_noprof(), for_each_zone_zonelist_nodemask() finds a allowable zone and calls zonelist_node_idx(ac.preferred_zoneref), leading to NULL pointer dereference. __alloc_pages_noprof() fixes this issue by checking NULL pointer in commit ea57485af8f4 ("mm, page_alloc: fix check for NULL preferred_zone") and commit df76cee6bbeb ("mm, page_alloc: remove redundant checks from alloc fastpath"). To fix it, check NULL pointer for preferred_zoneref->zone. Link: https://lkml.kernel.org/r/20241113083235.166798-1-tujinjiang@huawei.com Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Lobakin <alobakin@pm.me> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14mm, doc: update read_ahead_kb for MADV_HUGEPAGEYafang Shao1-0/+3
MADV_HUGEPAGE is a new addition to readahead with behavior distinct from normal pages. To prevent confusion, we should update the documentation accordingly. Link: https://lkml.kernel.org/r/20241113150711.1685-1-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()Dan Carpenter1-1/+3
The "arg->vec_len" variable is a u64 that comes from the user at the start of the function. The "arg->vec_len * sizeof(struct page_region))" multiplication can lead to integer wrapping. Use size_mul() to avoid that. Also the size_add/mul() functions work on unsigned long so for 32bit systems we need to ensure that "arg->vec_len" fits in an unsigned long. Link: https://lkml.kernel.org/r/39d41335-dd4d-48ed-8a7f-402c57d8ea84@stanley.mountain Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Andrei Vagin <avagin@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14sched/task_stack: fix object_is_on_stack() for KASAN tagged pointersQun-Wei Lin1-0/+2
When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the object_is_on_stack() function may produce incorrect results due to the presence of tags in the obj pointer, while the stack pointer does not have tags. This discrepancy can lead to incorrect stack object detection and subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled. Example of the warning: ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4 Hardware name: linux,dummy-virt (DT) pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __debug_object_init+0x330/0x364 lr : __debug_object_init+0x330/0x364 sp : ffff800082ea7b40 x29: ffff800082ea7b40 x28: 98ff0000c0164518 x27: 98ff0000c0164534 x26: ffff800082d93ec8 x25: 0000000000000001 x24: 1cff0000c00172a0 x23: 0000000000000000 x22: ffff800082d93ed0 x21: ffff800081a24418 x20: 3eff800082ea7bb0 x19: efff800000000000 x18: 0000000000000000 x17: 00000000000000ff x16: 0000000000000047 x15: 206b63617473206e x14: 0000000000000018 x13: ffff800082ea7780 x12: 0ffff800082ea78e x11: 0ffff800082ea790 x10: 0ffff800082ea79d x9 : 34d77febe173e800 x8 : 34d77febe173e800 x7 : 0000000000000001 x6 : 0000000000000001 x5 : feff800082ea74b8 x4 : ffff800082870a90 x3 : ffff80008018d3c4 x2 : 0000000000000001 x1 : ffff800082858810 x0 : 0000000000000050 Call trace: __debug_object_init+0x330/0x364 debug_object_init_on_stack+0x30/0x3c schedule_hrtimeout_range_clock+0xac/0x26c schedule_hrtimeout+0x1c/0x30 wait_task_inactive+0x1d4/0x25c kthread_bind_mask+0x28/0x98 init_rescuer+0x1e8/0x280 workqueue_init+0x1a0/0x3cc kernel_init_freeable+0x118/0x200 kernel_init+0x28/0x1f0 ret_from_fork+0x10/0x20 ---[ end trace 0000000000000000 ]--- ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated. ------------[ cut here ]------------ Link: https://lkml.kernel.org/r/20241113042544.19095-1-qun-wei.lin@mediatek.com Signed-off-by: Qun-Wei Lin <qun-wei.lin@mediatek.com> Cc: Andrew Yang <andrew.yang@mediatek.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Casper Li <casper.li@mediatek.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32Dave Vasilevsky10-1/+29
Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware. On these machines, the kernel refuses to boot from non-zero PHYSICAL_START, which occurs when CRASH_DUMP is on. Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should default to off for them. Users booting via some other mechanism can still turn it on explicitly. Does not change the default on any other architectures for the time being. Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca Fixes: 75bc255a7444 ("crash: clean up kdump related config items") Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca> Reported-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Closes: https://lists.debian.org/debian-powerpc/2024/07/msg00001.html Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Baoquan He <bhe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14mm/mremap: fix address wraparound in move_page_tables()Jann Horn1-1/+1
On 32-bit platforms, it is possible for the expression `len + old_addr < old_end` to be false-positive if `len + old_addr` wraps around. `old_addr` is the cursor in the old range up to which page table entries have been moved; so if the operation succeeded, `old_addr` is the *end* of the old region, and adding `len` to it can wrap. The overflow causes mremap() to mistakenly believe that PTEs have been copied; the consequence is that mremap() bails out, but doesn't move the PTEs back before the new VMA is unmapped, causing anonymous pages in the region to be lost. So basically if userspace tries to mremap() a private-anon region and hits this bug, mremap() will return an error and the private-anon region's contents appear to have been zeroed. The idea of this check is that `old_end - len` is the original start address, and writing the check that way also makes it easier to read; so fix the check by rearranging the comparison accordingly. (An alternate fix would be to refactor this function by introducing an "orig_old_start" variable or such.) Tested in a VM with a 32-bit X86 kernel; without the patch: ``` user@horn:~/big_mremap$ cat test.c #define _GNU_SOURCE #include <stdlib.h> #include <stdio.h> #include <err.h> #include <sys/mman.h> #define ADDR1 ((void*)0x60000000) #define ADDR2 ((void*)0x10000000) #define SIZE 0x50000000uL int main(void) { unsigned char *p1 = mmap(ADDR1, SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0); if (p1 == MAP_FAILED) err(1, "mmap 1"); unsigned char *p2 = mmap(ADDR2, SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0); if (p2 == MAP_FAILED) err(1, "mmap 2"); *p1 = 0x41; printf("first char is 0x%02hhx\n", *p1); unsigned char *p3 = mremap(p1, SIZE, SIZE, MREMAP_MAYMOVE|MREMAP_FIXED, p2); if (p3 == MAP_FAILED) { printf("mremap() failed; first char is 0x%02hhx\n", *p1); } else { printf("mremap() succeeded; first char is 0x%02hhx\n", *p3); } } user@horn:~/big_mremap$ gcc -static -o test test.c user@horn:~/big_mremap$ setarch -R ./test first char is 0x41 mremap() failed; first char is 0x00 ``` With the patch: ``` user@horn:~/big_mremap$ setarch -R ./test first char is 0x41 mremap() succeeded; first char is 0x41 ``` Link: https://lkml.kernel.org/r/20241111-fix-mremap-32bit-wrap-v1-1-61d6be73b722@google.com Fixes: af8ca1c14906 ("mm/mremap: optimize the start addresses in move_page_tables()") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14tools/mm: fix compile errorMotiejus JakÅ`tys1-1/+1
Add a missing semicolon. Link: https://lkml.kernel.org/r/20241112171655.1662670-1-motiejus@jakstys.lt Fixes: ece5897e5a10 ("tools/mm: -Werror fixes in page-types/slabinfo") Signed-off-by: Motiejus JakÅ`tys <motiejus@jakstys.lt> Closes: https://github.com/NixOS/nixpkgs/issues/355369 Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Wladislav Wiebe <wladislav.kw@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14mm, swap: fix allocation and scanning race with swapoffKairui Song1-3/+19
There are two flags used to synchronize allocation and scanning with swapoff: SWP_WRITEOK and SWP_SCANNING. SWP_WRITEOK: Swapoff will first unset this flag, at this point any further swap allocation or scanning on this device should just abort so no more new entries will be referencing this device. Swapoff will then unuse all existing swap entries. SWP_SCANNING: This flag is set when device is being scanned. Swapoff will wait for all scanner to stop before the final release of the swap device structures to avoid UAF. Note this flag is the highest used bit of si->flags so it could be added up arithmetically, if there are multiple scanner. commit 5f843a9a3a1e ("mm: swap: separate SSD allocation from scan_swap_map_slots()") ignored SWP_SCANNING and SWP_WRITEOK flags while separating cluster allocation path from the old allocation path. Add the flags back to fix swapoff race. The race is hard to trigger as si->lock prevents most parallel operations, but si->lock could be dropped for reclaim or discard. This issue is found during code review. This commit fixes this problem. For SWP_SCANNING, Just like before, set the flag before scan and remove it afterwards. For SWP_WRITEOK, there are several places where si->lock could be dropped, it will be error-prone and make the code hard to follow if we try to cover these places one by one. So just do one check before the real allocation, which is also very similar like before. With new cluster allocator it may waste a bit of time iterating the clusters but won't take long, and swapoff is not performance sensitive. Link: https://lkml.kernel.org/r/20241112083414.78174-1-ryncsn@gmail.com Fixes: 5f843a9a3a1e ("mm: swap: separate SSD allocation from scan_swap_map_slots()") Reported-by: "Huang, Ying" <ying.huang@intel.com> Closes: https://lore.kernel.org/linux-mm/87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com/ Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14sched_ext: ops.cpu_acquire() should be called with SCX_KF_RESTTejun Heo1-1/+1
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is called from balance_one() under the rq lock and should only be allowed call kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Vernet <void@manifault.com> Cc: Zhao Mengmeng <zhaomzhao@126.com> Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org Fixes: 245254f7081d ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
2024-11-14tracing/ring-buffer: Clear all memory mapped CPU ring buffers on first recordingSteven Rostedt1-2/+26
The events of a memory mapped ring buffer from the previous boot should not be mixed in with events from the current boot. There's meta data that is used to handle KASLR so that function names can be shown properly. Also, since the timestamps of the previous boot have no meaning to the timestamps of the current boot, having them intermingled in a buffer can also cause confusion because there could possibly be events in the future. When a trace is activated the meta data is reset so that the pointers of are now processed for the new address space. The trace buffers are reset when tracing starts for the first time. The problem here is that the reset only happens on online CPUs. If a CPU is offline, it does not get reset. To demonstrate the issue, a previous boot had tracing enabled in the boot mapped ring buffer on reboot. On the following boot, tracing has not been started yet so the function trace from the previous boot is still visible. # trace-cmd show -B boot_mapped -c 3 | tail <idle>-0 [003] d.h2. 156.462395: __rcu_read_lock <-cpu_emergency_disable_virtualization <idle>-0 [003] d.h2. 156.462396: vmx_emergency_disable_virtualization_cpu <-cpu_emergency_disable_virtualization <idle>-0 [003] d.h2. 156.462396: __rcu_read_unlock <-__sysvec_reboot <idle>-0 [003] d.h2. 156.462397: stop_this_cpu <-__sysvec_reboot <idle>-0 [003] d.h2. 156.462397: set_cpu_online <-stop_this_cpu <idle>-0 [003] d.h2. 156.462397: disable_local_APIC <-stop_this_cpu <idle>-0 [003] d.h2. 156.462398: clear_local_APIC <-disable_local_APIC <idle>-0 [003] d.h2. 156.462574: mcheck_cpu_clear <-stop_this_cpu <idle>-0 [003] d.h2. 156.462575: mce_intel_feature_clear <-stop_this_cpu <idle>-0 [003] d.h2. 156.462575: lmce_supported <-mce_intel_feature_clear Now, if CPU 3 is taken offline, and tracing is started on the memory mapped ring buffer, the events from the previous boot in the CPU 3 ring buffer is not reset. Now those events are using the meta data from the current boot and produces just hex values. # echo 0 > /sys/devices/system/cpu/cpu3/online # trace-cmd start -B boot_mapped -p function # trace-cmd show -B boot_mapped -c 3 | tail <idle>-0 [003] d.h2. 156.462395: 0xffffffff9a1e3194 <-0xffffffff9a0f655e <idle>-0 [003] d.h2. 156.462396: 0xffffffff9a0a1d24 <-0xffffffff9a0f656f <idle>-0 [003] d.h2. 156.462396: 0xffffffff9a1e6bc4 <-0xffffffff9a0f7323 <idle>-0 [003] d.h2. 156.462397: 0xffffffff9a0d12b4 <-0xffffffff9a0f732a <idle>-0 [003] d.h2. 156.462397: 0xffffffff9a1458d4 <-0xffffffff9a0d12e2 <idle>-0 [003] d.h2. 156.462397: 0xffffffff9a0faed4 <-0xffffffff9a0d12e7 <idle>-0 [003] d.h2. 156.462398: 0xffffffff9a0faaf4 <-0xffffffff9a0faef2 <idle>-0 [003] d.h2. 156.462574: 0xffffffff9a0e3444 <-0xffffffff9a0d12ef <idle>-0 [003] d.h2. 156.462575: 0xffffffff9a0e4964 <-0xffffffff9a0d12ef <idle>-0 [003] d.h2. 156.462575: 0xffffffff9a0e3fb0 <-0xffffffff9a0e496f Reset all CPUs when starting a boot mapped ring buffer for the first time, and not just the online CPUs. Fixes: 7a1d1e4b9639f ("tracing/ring-buffer: Add last_boot_info file to boot instance") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-14btrfs: fix incorrect comparison for delayed refsJosef Bacik1-1/+1
When I reworked delayed ref comparison in cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node"), I made a mistake and returned -1 for the case where ref1->ref_root was > than ref2->ref_root. This is a subtle bug that can result in improper delayed ref running order, which can result in transaction aborts. Fixes: cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node") CC: stable@vger.kernel.org # 6.10+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-14Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU hotplug"Steven Rostedt1-6/+3
A crash happened when testing cpu hotplug with respect to the memory mapped ring buffers. It was assumed that the hot plug code was adding a per CPU buffer that was already created that caused the crash. The real problem was due to ref counting and was fixed by commit 2cf9733891a4 ("ring-buffer: Fix refcount setting of boot mapped buffers"). When a per CPU buffer is created, it will not be created again even with CPU hotplug, so the fix to not use CPU hotplug was a red herring. In fact, it caused only the boot CPU buffer to be created, leaving the other CPU per CPU buffers disabled. Revert that change as it was not the culprit of the fix it was intended to be. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241113230839.6c03640f@gandalf.local.home Fixes: 912da2c384d5 ("ring-buffer: Do not have boot mapped buffers hook to CPU hotplug") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-14net: sched: u32: Add test case for systematic hnode IDR leaksAlexandre Ferrieux1-0/+24
Add a tdc test case to exercise the just-fixed systematic leak of IDR entries in u32 hnode disposal. Given the IDR in question is confined to the range [1..0x7FF], it is sufficient to create/delete the same filter 2048 times to fill it up and get a nonzero exit status from "tc filter add". Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20241113100428.360460-1-alexandre.ferrieux@orange.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-14drm/bridge: tc358768: Fix DSI command txFrancesco Dolcini1-2/+19
Wait for the command transmission to be completed in the DSI transfer function polling for the dc_start bit to go back to idle state after the transmission is started. This is documented in the datasheet and failures to do so lead to commands corruption. Fixes: ff1ca6397b1d ("drm/bridge: Add tc358768 driver") Cc: stable@vger.kernel.org Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240926141246.48282-1-francesco@dolcini.it Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240926141246.48282-1-francesco@dolcini.it
2024-11-14selftests: bonding: add ns multicast group testingHangbin Liu1-1/+53
Add a test to make sure the backup slaves join correct multicast group when arp_validate enabled and ns_ip6_target is set. Here is the result: TEST: arp_validate (active-backup ns_ip6_target arp_validate 0) [ OK ] TEST: arp_validate (join mcast group) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 1) [ OK ] TEST: arp_validate (join mcast group) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 2) [ OK ] TEST: arp_validate (join mcast group) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 3) [ OK ] TEST: arp_validate (join mcast group) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 4) [ OK ] TEST: arp_validate (join mcast group) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 5) [ OK ] TEST: arp_validate (join mcast group) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 6) [ OK ] TEST: arp_validate (join mcast group) [ OK ] Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>