aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/exported-sql-viewer.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-01-13f2fs: add parameter @len to f2fs_invalidate_blocks()Yi Sun5-16/+35
New function can process some consecutive blocks at a time. Function f2fs_invalidate_blocks()->down_write() and up_write() are very time-consuming, so if f2fs_invalidate_blocks() can process consecutive blocks at one time, it will save a lot of time. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: update_sit_entry_for_release() supports consecutive blocks.Yi Sun1-30/+45
This function can process some consecutive blocks at a time. When using update_sit_entry() to release consecutive blocks, ensure that the consecutive blocks belong to the same segment. Because after update_sit_entry_for_realese(), @segno is still in use in update_sit_entry(). Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: introduce update_sit_entry_for_release/alloc()Yi Sun1-69/+93
No logical changes, just for cleanliness. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: don't call block truncation for aliased fileJaegeuk Kim1-2/+3
This patch should avoid the below warning which does not corrupt the metadata tho. [ 51.508120][ T253] F2FS-fs (dm-59): access invalid blkaddr:36 [ 51.508156][ T253] __f2fs_is_valid_blkaddr+0x330/0x384 [ 51.508162][ T253] f2fs_is_valid_blkaddr_raw+0x10/0x24 [ 51.508163][ T253] f2fs_truncate_data_blocks_range+0x1ec/0x438 [ 51.508177][ T253] f2fs_remove_inode_page+0x8c/0x148 [ 51.508194][ T253] f2fs_evict_inode+0x230/0x76c Fixes: 128d333f0dff ("f2fs: introduce device aliasing file") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: Introduce linear search for dentriesDaniel Lee3-19/+45
This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points"). F2FS uses these case-folded names to calculate hash values for locating dentries and stores them on disk. Since utf8_casefold() can produce different output across kernel versions, stored hash values and newly calculated hash values may differ. This results in affected files no longer being found via the hash-based lookup. To resolve this, the patch introduces a linear search fallback. If the initial hash-based search fails, F2FS will sequentially scan the directory entries. Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586 Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: add parameter @len to f2fs_invalidate_internal_cache()Yi Sun4-8/+8
New function can process some consecutive blocks at a time. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range()Yi Sun2-6/+8
New function f2fs_invalidate_compress_pages_range() adds the @len parameter. So it can process some consecutive blocks at a time. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: ensure that node info flags are always initializedDmitry Antipov1-0/+1
Syzbot has reported the following KMSAN splat: BUG: KMSAN: uninit-value in f2fs_new_node_page+0x1494/0x1630 f2fs_new_node_page+0x1494/0x1630 f2fs_new_inode_page+0xb9/0x100 f2fs_init_inode_metadata+0x176/0x1e90 f2fs_add_inline_entry+0x723/0xc90 f2fs_do_add_link+0x48f/0xa70 f2fs_symlink+0x6af/0xfc0 vfs_symlink+0x1f1/0x470 do_symlinkat+0x471/0xbc0 __x64_sys_symlink+0xcf/0x140 x64_sys_call+0x2fcc/0x3d90 do_syscall_64+0xd9/0x1b0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable new_ni created at: f2fs_new_node_page+0x9d/0x1630 f2fs_new_inode_page+0xb9/0x100 So adjust 'f2fs_get_node_info()' to ensure that 'flag' field of 'struct node_info' is always initialized. Reported-by: syzbot+5141f6db57a2f7614352@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5141f6db57a2f7614352 Fixes: e05df3b115e7 ("f2fs: add node operations") Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: The GC triggered by ioctl also needs to mark the segno as victimYongpeng Yang1-4/+7
In SSR mode, the segment selected for allocation might be the same as the target segment of the GC triggered by ioctl, resulting in the GC moving the CURSEG_I(sbi, type)->segno. Thread A Thread B or Thread A - f2fs_ioc_gc_range - __f2fs_ioc_gc_range(.victim_segno=segno#N) - f2fs_gc - __get_victim - f2fs_get_victim : segno#N is valid, return segno#N as source segment of GC - f2fs_allocate_data_block - need_new_seg - get_ssr_segment - f2fs_get_victim : get segno #N as destination segment - change_curseg Fixes: e066b83c9b40 ("f2fs: add ioctl to flush data from faster device to cold area") Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: cache more dentry pageszangyangyang11-1/+1
While traversing dir entries in dentry page, it's better to refresh current accessed page in lru list by using FGP_ACCESSED flag, otherwise, such page may has less chance to survive during memory reclaim, result in causing additional IO when revisiting dentry page. Signed-off-by: zangyangyang1 <zangyangyang1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Remove calls to folio_file_mapping()Matthew Wilcox (Oracle)3-7/+6
All folios that f2fs sees belong to f2fs and not to the swapcache so it can dereference folio->mapping directly like all other filesystems do. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Convert __read_io_type() to take a folioMatthew Wilcox (Oracle)1-4/+4
Remove the last call to page_file_mapping() as both callers can now pass in a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a data folio in f2fs_submit_page_bio()Matthew Wilcox (Oracle)1-9/+5
Remove a call to compound_head(). We can call bio_add_folio_nofail() here because we just allocated the bio, so we know it can't fail and thus the error path can never be taken. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio more in f2fs_submit_page_bio()Matthew Wilcox (Oracle)1-4/+4
Cache the result of page_folio(fio->page) in a local variable so we don't have to keep calling it. Saves a couple of calls to compound_head() and removes an access to page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Convert f2fs_finish_read_bio() to use foliosMatthew Wilcox (Oracle)1-13/+8
Use bio_for_each_folio_all() to iterate over each folio in the bio. This lets us use folio_end_read() which saves an atomic operation and memory barrier compared to marking the folio uptodate and unlocking it as two separate operations. This also removes a few hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Add F2FS_F_SB()Matthew Wilcox (Oracle)1-1/+6
This is the folio equivalent of F2FS_P_SB(). Removes a call to page_file_mapping() as we know folios seen by f2fs are never part of the swap cache. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Convert submit tracepoints to take a folioMatthew Wilcox (Oracle)2-18/+18
Remove accesses to page->index and page->mapping as well as unnecessary calls to page_file_mapping(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_write_compressed_pages()Matthew Wilcox (Oracle)1-3/+5
Remove accesses to page->index and an unnecessary reference to page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_truncate_partial_cluster()Matthew Wilcox (Oracle)1-4/+5
Convert the incoming page to a folio and use it throughout. Removes an access to page->index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_compress_write_end()Matthew Wilcox (Oracle)1-1/+2
This removes an access of page->index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16f2fs: Use a folio in f2fs_all_cluster_page_ready()Matthew Wilcox (Oracle)1-3/+5
Remove references to page->index and use folio_test_uptodate() instead of PageUptodate(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-01Linux 6.13-rc1Linus Torvalds1-2/+2
2024-12-01strscpy: write destination buffer only onceLinus Torvalds1-6/+17
The point behind strscpy() was to once and for all avoid all the problems with 'strncpy()' and later broken "fixed" versions like strlcpy() that just made things worse. So strscpy not only guarantees NUL-termination (unlike strncpy), it also doesn't do unnecessary padding at the destination. But at the same time also avoids byte-at-a-time reads and writes by _allowing_ some extra NUL writes - within the size, of course - so that the whole copy can be done with word operations. It is also stable in the face of a mutable source string: it explicitly does not read the source buffer multiple times (so an implementation using "strnlen()+memcpy()" would be wrong), and does not read the source buffer past the size (like the mis-design that is strlcpy does). Finally, the return value is designed to be simple and unambiguous: if the string cannot be copied fully, it returns an actual negative error, making error handling clearer and simpler (and the caller already knows the size of the buffer). Otherwise it returns the string length of the result. However, there was one final stability issue that can be important to callers: the stability of the destination buffer. In particular, the same way we shouldn't read the source buffer more than once, we should avoid doing multiple writes to the destination buffer: first writing a potentially non-terminated string, and then terminating it with NUL at the end does not result in a stable result buffer. Yes, it gives the right result in the end, but if the rule for the destination buffer was that it is _always_ NUL-terminated even when accessed concurrently with updates, the final byte of the buffer needs to always _stay_ as a NUL byte. [ Note that "final byte is NUL" here is literally about the final byte in the destination array, not the terminating NUL at the end of the string itself. There is no attempt to try to make concurrent reads and writes give any kind of consistent string length or contents, but we do want to guarantee that there is always at least that final terminating NUL character at the end of the destination array if it existed before ] This is relevant in the kernel for the tsk->comm[] array, for example. Even without locking (for either readers or writers), we want to know that while the buffer contents may be garbled, it is always a valid C string and always has a NUL character at 'comm[TASK_COMM_LEN-1]' (and never has any "out of thin air" data). So avoid any "copy possibly non-terminated string, and terminate later" behavior, and write the destination buffer only once. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-30printf: Remove unused 'bprintf'Dr. David Alan Gilbert2-24/+0
bprintf() is unused. Remove it. It was added in the commit 4370aa4aa753 ("vsprintf: add binary printf") but as far as I can see was never used, unlike the other two functions in that patch. Link: https://lore.kernel.org/20241002173147.210107-1-linux@treblig.org Reviewed-by: Andy Shevchenko <andy@kernel.org> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-30tools/power turbostat: 2024.11.30Len Brown2-2/+2
since 2024.07.26: assorted minor bug fixes assorted platform specific tweaks initial RAPL PSYS (SysWatt) support Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add RAPL psys as a built-in counterPatryk Wlazlyn2-10/+85
Introduce the counter as a part of global, platform counters structure. We open the counter for only one cpu, but otherwise treat it as an ordinary RAPL counter, allowing for grouped perf read. The counter is disabled by default, because it's interpretation may require additional, platform specific information, making it unsuitable for general use. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix child's argument forwardingPatryk Wlazlyn1-1/+1
Add '+' to optstring when early scanning for --no-msr and --no-perf. It causes option processing to stop as soon as a nonoption argument is encountered, effectively skipping child's arguments. Fixes: 3e4048466c39 ("tools/power turbostat: Add --no-msr option") Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Force --no-perf in --dump modePatryk Wlazlyn1-0/+6
Force the --no-perf early to prevent using it as a source. User asks for raw values, but perf returns them relative to the opening of the file descriptor. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add support for /sys/class/drm/card1Zhang Rui1-9/+29
On some machines, the graphics device is enumerated as /sys/class/drm/card1 instead of /sys/class/drm/card0. The current implementation does not handle this scenario, resulting in the loss of graphics C6 residency and frequency information. Add support for /sys/class/drm/card1, ensuring that turbostat can retrieve and display the graphics columns for these platforms. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Cache graphics sysfs file descriptors during probeZhang Rui1-50/+32
Snapshots of the graphics sysfs knobs are taken based on file descriptors. To optimize this process, open the files and cache the file descriptors during the graphics probe phase. As a result, the previously cached pathnames become redundant and are removed. This change aims to streamline the code without altering its functionality. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Consolidate graphics sysfs accessZhang Rui1-9/+6
Currently, there is an inconsistency in how graphics sysfs knobs are accessed: graphics residency sysfs knobs are opened and closed for each read, while graphics frequency sysfs knobs are opened once and remain open until turbostat exits. This inconsistency is confusing and adds unnecessary code complexity. Consolidate the access method by opening the sysfs files once and reusing the file pointers for subsequent accesses. This approach simplifies the code and ensures a consistent method for accessing graphics sysfs knobs. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove unnecessary fflush() callZhang Rui1-4/+3
The graphics sysfs knobs are read-only, making the use of fflush() before reading them redundant. Remove the unnecessary fflush() call. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Enhance platform divergence descriptionZhang Rui1-28/+30
In various generations, platforms often share a majority of features, diverging only in a few specific aspects. The current approach of using hardcoded values in 'platform_features' structure fails to effectively represent these divergences. To improve the description of platform divergence: 1. Each newly introduced 'platform_features' structure must have a base, typically derived from the previous generation. 2. Platform feature values should be inherited from the base structure rather than being hardcoded. This approach ensures a more accurate and maintainable representation of platform-specific features across different generations. Converts `adl_features` and `lnl_features` to follow this new scheme. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add initial support for GraniteRapids-DZhang Rui1-0/+1
Add initial support for GraniteRapids-D. It shares the same features with SapphireRapids. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove PC3 support on LunarlakeZhang Rui1-1/+1
Lunarlake supports CC1/CC6/CC7/PC2/PC6/PC10. Remove PC3 support on Lunarlake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Rename arl_features to lnl_featuresZhang Rui1-2/+2
As ARL shares the same features with ADL/RPL/MTL, now 'arl_features' is used by Lunarlake platform only. Rename 'arl_features' to 'lnl_features'. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add back PC8 support on ArrowlakeZhang Rui1-3/+3
Similar to ADL/RPL/MTL, ARL supports CC1/CC6/CC7/PC2/PC3/PC6/PC8/PC10. Add back PC8 support on Arrowlake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove PC7/PC9 support on MTLZhang Rui1-2/+2
Similar to ADL/RPL, MTL support CC1/CC6/CC7/PC2/PC3/PC6/PC8/CP10. Remove PC7/PC9 support on MTL. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Honor --show CPU, even when even when num_cpus=1Patryk Wlazlyn1-2/+2
Honor --show CPU and --show Core when "topo.num_cpus == 1". Previously turbostat assumed that on a 1-CPU system, these columns should never appear. Honoring these flags makes it easier for several programs that parse turbostat output. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix trailing '\n' parsingZhang Rui1-0/+3
parse_cpu_string() parses the string input either from command line or from /sys/fs/cgroup/cpuset.cpus.effective to get a list of CPUs that turbostat can run with. The cpu string returned by /sys/fs/cgroup/cpuset.cpus.effective contains a trailing '\n', but strtoul() fails to treat this as an error. That says, for the code below val = ("\n", NULL, 10); val returns 0, and errno is also not set. As a result, CPU0 is erroneously considered as allowed CPU and this causes failures when turbostat tries to run on CPU0. get_counters: Could not migrate to CPU 0 ... turbostat: re-initialized with num_cpus 8, allowed_cpus 5 get_counters: Could not migrate to CPU 0 Add a check to return immediately if '\n' or '\0' is detected. Fixes: 8c3dd2c9e542 ("tools/power/turbostat: Abstrct function for parsing cpu string") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Allow using cpu device in perf counters on hybrid platformsPatryk Wlazlyn2-7/+123
Intel hybrid platforms expose different perf devices for P and E cores. Instead of one, "/sys/bus/event_source/devices/cpu" device, there are "/sys/bus/event_source/devices/{cpu_core,cpu_atom}". This, however makes it more complicated for the user, because most of the counters are available on both and had to be handled manually. This patch allows users to use "virtual" cpu device that is seemingly translated to cpu_core and cpu_atom perf devices, depending on the type of a CPU we are opening the counter for. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix column printing for PMT xtal_time countersPatryk Wlazlyn1-3/+3
If the very first printed column was for a PMT counter of type xtal_time we would misalign the column header, because we were always printing the delimiter. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: fix GCC9 build regressionTodd Brandt1-9/+6
Fix build regression seen when using old gcc-9 compiler. Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30PCI/pwrctrl: Unregister platform device only if one actually existsBrian Norris1-2/+7
If a PCI device has an associated device_node with power supplies, pci_bus_add_device() creates platform devices for use by pwrctrl. When the PCI device is removed, pci_stop_dev() uses of_find_device_by_node() to locate the related platform device, then unregisters it. But when we remove a PCI device with no associated device node, dev_of_node(dev) is NULL, and of_find_device_by_node(NULL) returns the first device with "dev->of_node == NULL". The result is that we (a) mistakenly unregister a completely unrelated platform device, leading to issues like the first trace below, and (b) dereference the NULL pointer from dev_of_node() when clearing OF_POPULATED, as in the second trace. Unregister a platform device only if there is one associated with this PCI device. This resolves issues seen when doing: # echo 1 > /sys/bus/pci/devices/.../remove Sample issue from unregistering the wrong platform device: WARNING: CPU: 0 PID: 5095 at drivers/regulator/core.c:5885 regulator_unregister+0x140/0x160 Call trace: regulator_unregister+0x140/0x160 devm_rdev_release+0x1c/0x30 release_nodes+0x68/0x100 devres_release_all+0x98/0xf8 device_unbind_cleanup+0x20/0x70 device_release_driver_internal+0x1f4/0x240 device_release_driver+0x20/0x40 bus_remove_device+0xd8/0x170 device_del+0x154/0x380 device_unregister+0x28/0x88 of_device_unregister+0x1c/0x30 pci_stop_bus_device+0x154/0x1b0 pci_stop_and_remove_bus_device_locked+0x28/0x48 remove_store+0xa0/0xb8 dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 Later NULL pointer dereference for of_node_clear_flag(NULL, OF_POPULATED): Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0 Call trace: pci_stop_bus_device+0x190/0x1b0 pci_stop_and_remove_bus_device_locked+0x28/0x48 remove_store+0xa0/0xb8 dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 Link: https://lore.kernel.org/r/20241126210443.4052876-1-briannorris@chromium.org Fixes: 681725afb6b9 ("PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent") Reported-by: Saurabh Sengar <ssengar@linux.microsoft.com> Closes: https://lore.kernel.org/r/1732890621-19656-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Brian Norris <briannorris@chromium.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-11-30Revert "serial: sh-sci: Clean sci_ports[0] after at earlycon exit"Greg Kroah-Hartman1-28/+0
This reverts commit 3791ea69a4858b81e0277f695ca40f5aae40f312. It was reported to cause boot-time issues, so revert it for now. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: 3791ea69a485 ("serial: sh-sci: Clean sci_ports[0] after at earlycon exit") Cc: stable <stable@kernel.org> Cc: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-30sh: intc: Fix use-after-free bug in register_intc_controller()Dan Carpenter1-1/+1
In the error handling for this function, d is freed without ever removing it from intc_list which would lead to a use after free. To fix this, let's only add it to the list after everything has succeeded. Fixes: 2dcec7a988a1 ("sh: intc: set_irq_wake() support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-11-30sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACKHuacai Chen1-1/+1
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected, cpu_max_bits_warn() generates a runtime warning similar as below when showing /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit) instead of NR_CPUS to iterate CPUs. [ 3.052463] ------------[ cut here ]------------ [ 3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0 [ 3.070072] Modules linked in: efivarfs autofs4 [ 3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052 [ 3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000 [ 3.109127] 9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430 [ 3.118774] 90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff [ 3.128412] 0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890 [ 3.138056] 0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa [ 3.147711] ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000 [ 3.157364] 900000000101c998 0000000000000004 9000000000ef7430 0000000000000000 [ 3.167012] 0000000000000009 000000000000006c 0000000000000000 0000000000000000 [ 3.176641] 9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286 [ 3.186260] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c [ 3.195868] ... [ 3.199917] Call Trace: [ 3.203941] [<90000000002086d8>] show_stack+0x38/0x14c [ 3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88 [ 3.217625] [<900000000023d268>] __warn+0xd0/0x100 [ 3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc [ 3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0 [ 3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4 [ 3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4 [ 3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0 [ 3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100 [ 3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94 [ 3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160 [ 3.281824] ---[ end trace 8b484262b4b8c24c ]--- Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-11-29brd: decrease the number of allocated pages which discardedZhang Xianwei1-1/+3
The number of allocated pages which discarded will not decrease. Fix it. Fixes: 9ead7efc6f3f ("brd: implement discard support") Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241128170056565nPKSz2vsP8K8X2uk2iaDG@zte.com.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-29block, bfq: fix bfqq uaf in bfq_limit_depth()Yu Kuai1-13/+24
Set new allocated bfqq to bic or remove freed bfqq from bic are both protected by bfqd->lock, however bfq_limit_depth() is deferencing bfqq from bic without the lock, this can lead to UAF if the io_context is shared by multiple tasks. For example, test bfq with io_uring can trigger following UAF in v6.6: ================================================================== BUG: KASAN: slab-use-after-free in bfqq_group+0x15/0x50 Call Trace: <TASK> dump_stack_lvl+0x47/0x80 print_address_description.constprop.0+0x66/0x300 print_report+0x3e/0x70 kasan_report+0xb4/0xf0 bfqq_group+0x15/0x50 bfqq_request_over_limit+0x130/0x9a0 bfq_limit_depth+0x1b5/0x480 __blk_mq_alloc_requests+0x2b5/0xa00 blk_mq_get_new_requests+0x11d/0x1d0 blk_mq_submit_bio+0x286/0xb00 submit_bio_noacct_nocheck+0x331/0x400 __block_write_full_folio+0x3d0/0x640 writepage_cb+0x3b/0xc0 write_cache_pages+0x254/0x6c0 write_cache_pages+0x254/0x6c0 do_writepages+0x192/0x310 filemap_fdatawrite_wbc+0x95/0xc0 __filemap_fdatawrite_range+0x99/0xd0 filemap_write_and_wait_range.part.0+0x4d/0xa0 blkdev_read_iter+0xef/0x1e0 io_read+0x1b6/0x8a0 io_issue_sqe+0x87/0x300 io_wq_submit_work+0xeb/0x390 io_worker_handle_work+0x24d/0x550 io_wq_worker+0x27f/0x6c0 ret_from_fork_asm+0x1b/0x30 </TASK> Allocated by task 808602: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc_node+0x1b1/0x6d0 bfq_get_queue+0x138/0xfa0 bfq_get_bfqq_handle_split+0xe3/0x2c0 bfq_init_rq+0x196/0xbb0 bfq_insert_request.isra.0+0xb5/0x480 bfq_insert_requests+0x156/0x180 blk_mq_insert_request+0x15d/0x440 blk_mq_submit_bio+0x8a4/0xb00 submit_bio_noacct_nocheck+0x331/0x400 __blkdev_direct_IO_async+0x2dd/0x330 blkdev_write_iter+0x39a/0x450 io_write+0x22a/0x840 io_issue_sqe+0x87/0x300 io_wq_submit_work+0xeb/0x390 io_worker_handle_work+0x24d/0x550 io_wq_worker+0x27f/0x6c0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x1b/0x30 Freed by task 808589: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 __kasan_slab_free+0x126/0x1b0 kmem_cache_free+0x10c/0x750 bfq_put_queue+0x2dd/0x770 __bfq_insert_request.isra.0+0x155/0x7a0 bfq_insert_request.isra.0+0x122/0x480 bfq_insert_requests+0x156/0x180 blk_mq_dispatch_plug_list+0x528/0x7e0 blk_mq_flush_plug_list.part.0+0xe5/0x590 __blk_flush_plug+0x3b/0x90 blk_finish_plug+0x40/0x60 do_writepages+0x19d/0x310 filemap_fdatawrite_wbc+0x95/0xc0 __filemap_fdatawrite_range+0x99/0xd0 filemap_write_and_wait_range.part.0+0x4d/0xa0 blkdev_read_iter+0xef/0x1e0 io_read+0x1b6/0x8a0 io_issue_sqe+0x87/0x300 io_wq_submit_work+0xeb/0x390 io_worker_handle_work+0x24d/0x550 io_wq_worker+0x27f/0x6c0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x1b/0x30 Fix the problem by protecting bic_to_bfqq() with bfqd->lock. CC: Jan Kara <jack@suse.cz> Fixes: 76f1df88bbc2 ("bfq: Limit number of requests consumed by each cgroup") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241129091509.2227136-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-29io_uring/tctx: work around xa_store() allocation error issueJens Axboe1-1/+12
syzbot triggered the following WARN_ON: WARNING: CPU: 0 PID: 16 at io_uring/tctx.c:51 __io_uring_free+0xfa/0x140 io_uring/tctx.c:51 which is the WARN_ON_ONCE(!xa_empty(&tctx->xa)); sanity check in __io_uring_free() when a io_uring_task is going through its final put. The syzbot test case includes injecting memory allocation failures, and it very much looks like xa_store() can fail one of its memory allocations and end up with ->head being non-NULL even though no entries exist in the xarray. Until this issue gets sorted out, work around it by attempting to iterate entries in our xarray, and WARN_ON_ONCE() if one is found. Reported-by: syzbot+cc36d44ec9f368e443d3@syzkaller.appspotmail.com Link: https://lore.kernel.org/io-uring/673c1643.050a0220.87769.0066.GAE@google.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>