aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-01-15genirq: Remove IRQ_MOVE_PCNTXT and related codeThomas Gleixner6-34/+2
Now that x86 is converted over to use the IRQCHIP_MOVE_DEFERRED flags, remove IRQ*_MOVE_PCNTXT and related code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210103335.626707225@linutronix.de
2025-01-15x86/apic: Convert to IRQCHIP_MOVE_DEFERREDThomas Gleixner11-23/+7
Instead of marking individual interrupts as safe to be migrated in arbitrary contexts, mark the interrupt chips, which require the interrupt to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED flag. This makes more sense because this is a per interrupt chip property and not restricted to individual interrupts. That flips the logic from the historical opt-out to a opt-in model. This is simpler to handle for other architectures, which default to unrestricted affinity setting. It also allows to cleanup the redundant core logic significantly. All interrupt chips, which belong to a top-level domain sitting directly on top of the x86 vector domain are marked accordingly, unless the related setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
2025-01-15genirq: Provide IRQCHIP_MOVE_DEFERREDThomas Gleixner4-3/+22
The logic of GENERIC_PENDING_IRQ is backwards for historical reasons. Most interrupt controllers allow to move the interrupt from arbitrary contexts. If GENERIC_PENDING_IRQ is enabled by an architecture to support a chip, which requires the affinity change to happen in interrupt context, all other chips have to be marked with IRQF_MOVE_PCNTXT. That's tedious and there is no real good reason for the extra flags in the irq descriptor and the irq data status fields. In fact the decision whether interrupts can be moved in arbitrary context or not is a property of the interrupt chip. To simplify adoption for RISC-V provide a new mechanism which is enabled via a config switch and allows to add a flag to irq_chip::flags to request that interrupt affinity changes are deferred. Setting the top level chip of an interrupt evaluates the flag and maps it into the existing logic. The config switch and the various PCNTXT flags are temporary until x86 is converted over to this scheme. This intermediate step also allows trivial backporting of the mechanism to plug the affinity change race of various RISC-V interrupt controllers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210103335.500314436@linutronix.de
2025-01-15hexagon: Remove GENERIC_PENDING_IRQ leftoverThomas Gleixner1-1/+0
Commented out since 2011.... Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Brian Cain <bcain@quicinc.com> Link: https://lore.kernel.org/all/20241210103335.437630614@linutronix.de
2025-01-15ARC: Remove GENERIC_PENDING_IRQThomas Gleixner2-3/+0
Nothing uses the actual functionality and the MCIP controller sets the flags which disables the deferred affinity change. The other interrupt controller does not support affinity setting at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Vineet Gupta <vgupta@kernel.org>   # arch/arc/ Link: https://lore.kernel.org/all/20241210103335.373392568@linutronix.de
2025-01-15genirq: Remove handle_enforce_irqctx() wrapperThomas Gleixner3-7/+2
Now that it is unconditionally available, remove the wrapper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210101811.561078243@linutronix.de
2025-01-15genirq: Make handle_enforce_irqctx() unconditionally availableThomas Gleixner1-6/+3
Commit 1b57d91b969c ("irqchip/gic-v2, v3: Prevent SW resends entirely") sett the flag which enforces interrupt handling in interrupt context and prevents software base resends for ARM GIC v2/v3. But it missed that the helper function which checks the flag was hidden behind CONFIG_GENERIC_PENDING_IRQ, which is not set by ARM[64]. Make the helper unconditionally available so that the enforcement actually works. Fixes: 1b57d91b969c ("irqchip/gic-v2, v3: Prevent SW resends entirely") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210101811.497716609@linutronix.de
2025-01-15irqchip/loongarch-avec: Add multi-nodes topology supportTianyang Zhang1-4/+12
avecintc_init() enables the Advanced Interrupt Controller (AVEC) of the boot CPU node, but nothing enables the AVEC on secondary nodes. Move the enablement to the CPU hotplug callback so that secondary nodes get the AVEC enabled too. In theory enabling it once per node would be sufficient, but redundant enabling does no hurt, so keep the code simple and do it unconditionally. Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/all/20250111023704.17285-1-zhangtianyang@loongson.cn
2025-01-15irqchip/ts4800: Replace seq_printf() by seq_puts()Geert Uytterhoeven1-1/+1
Simplify "seq_printf(p, "%s", ...)" to "seq_puts(p, ...)". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/1ba5692126804f9e1ff062ac24939b24030b4f72.1733403985.git.geert+renesas@glider.be
2025-01-15irqchip/ti-sci-inta : Add module build supportNicolas Frayer3-2/+3
Add module build support in Kconfig for the TI SCI interrupt aggregator driver. The driver's default build is built-in and it also depends on ARCH_K3 as the driver uses some 64 bit ops and should only be built for 64-bit platforms. Signed-off-by: Nicolas Frayer <nfrayer@baylibre.com> Signed-off-by: Guillaume La Roque <glaroque@baylibre.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/all/20241224-timodules-v4-2-c5e010f58e2c@baylibre.com
2025-01-15irqchip/ti-sci-intr: Add module build supportNicolas Frayer3-2/+3
Add module build support in Kconfig for the TI SCI interrupt router driver. This driver depends on the TI sci firmware driver which aready supports module build. Signed-off-by: Nicolas Frayer <nfrayer@baylibre.com> Signed-off-by: Guillaume La Roque <glaroque@baylibre.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/all/20241224-timodules-v4-1-c5e010f58e2c@baylibre.com
2025-01-15irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic functionDr. David Alan Gilbert1-27/+1
Replace brcmstb_l2_mask_and_ack() by the generic irq_gc_mask_disable_and_ack_set(). brcmstb_l2_mask_and_ack() was added in commit 49aa6ef0b439 ("irqchip/brcmstb-l2: Remove some processing from the handler") in September 2017 with a comment saying it was actually generic and someone should add it to the generic code. commit 20608924cc2e ("genirq: generic chip: Add irq_gc_mask_disable_and_ack_set()") did that a few weeks later, however no one went back and took the brcmstb variant out. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/all/20241224001727.149337-1-linux@treblig.org
2025-01-15irqchip: keystone: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski1-9/+2
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250111185414.183971-1-krzysztof.kozlowski@linaro.org
2024-12-11genirq/kexec: Prevent redundant IRQ masking by checking state before shutdownEliav Farber1-6/+2
During machine kexec, machine_kexec_mask_interrupts() is responsible for disabling or masking all interrupts. While the irq_disable() is only invoked when the interrupt is not yet disabled, it unconditionally invokes the irq_mask() callback for every interrupt descriptor, even when the interrupt is already masked or not even started up yet. A specific issue was observed in the crash kernel flow after unbinding a device (prior to kexec) that used a GPIO as an IRQ source. The warning was triggered by the gpiochip_disable_irq() function, which attempts to clear the FLAG_IRQ_IS_ENABLED flag when FLAG_USED_AS_IRQ was not set. This issue surfaced after commit a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable") introduced lazy disablement for GPIO IRQs. It replaced disable/enable hooks with mask/unmask hooks. Unlike the disable hook, the mask hook doesn't handle already-masked IRQs. When a GPIO-IRQ driver is unbound, the IRQ is released, triggering __irq_disable() and irq_state_set_masked(). A subsequent call to machine_kexec_mask_interrupts() re-invokes chip->irq_mask(). This results in a call chain, including gpiochip_irq_mask() and gpiochip_disable_irq(). Since FLAG_USED_AS_IRQ was cleared earlier, the warning is triggered. Replace the direct invocation of the irq_mask() and irq_disable() callbacks invoking to irq_shutdown(), which handles the cases correct and avoid it all together when the interrupt has never been started up. Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204142003.32859-3-farbere@amazon.com
2024-12-11kexec: Consolidate machine_kexec_mask_interrupts() implementationEliav Farber11-101/+52
Consolidate the machine_kexec_mask_interrupts implementation into a common function located in a new file: kernel/irq/kexec.c. This removes duplicate implementations from architecture-specific files in arch/arm, arch/arm64, arch/powerpc, and arch/riscv, reducing code duplication and improving maintainability. The new implementation retains architecture-specific behavior for CONFIG_GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD, which was previously implemented for ARM64. When enabled (currently for ARM64), it clears the active state of interrupts forwarded to virtual machines (VMs) before handling other interrupt masking operations. Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204142003.32859-2-farbere@amazon.com
2024-12-03genirq: Reuse irq_thread_fn() for forced thread caseAndy Shevchenko1-5/+1
rq_forced_thread_fn() uses the same action callback as the non-forced variant but with different locking decorations. Reuse irq_thread_fn() here to make that clear. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241119104339.2112455-3-andriy.shevchenko@linux.intel.com
2024-12-03genirq: Move irq_thread_fn() further up in the codeAndy Shevchenko1-19/+16
In a preparation to reuse irq_thread_fn() move it further up in the code. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241119104339.2112455-2-andriy.shevchenko@linux.intel.com
2024-12-01Get rid of 'remove_new' relic from platform driver structLinus Torvalds474-494/+484
The continual trickle of small conversion patches is grating on me, and is really not helping. Just get rid of the 'remove_new' member function, which is just an alias for the plain 'remove', and had a comment to that effect: /* * .remove_new() is a relic from a prototype conversion of .remove(). * New drivers are supposed to implement .remove(). Once all drivers are * converted to not use .remove_new any more, it will be dropped. */ This was just a tree-wide 'sed' script that replaced '.remove_new' with '.remove', with some care taken to turn a subsequent tab into two tabs to make things line up. I did do some minimal manual whitespace adjustment for places that used spaces to line things up. Then I just removed the old (sic) .remove_new member function, and this is the end result. No more unnecessary conversion noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01Linux 6.13-rc1Linus Torvalds1-2/+2
2024-12-01strscpy: write destination buffer only onceLinus Torvalds1-6/+17
The point behind strscpy() was to once and for all avoid all the problems with 'strncpy()' and later broken "fixed" versions like strlcpy() that just made things worse. So strscpy not only guarantees NUL-termination (unlike strncpy), it also doesn't do unnecessary padding at the destination. But at the same time also avoids byte-at-a-time reads and writes by _allowing_ some extra NUL writes - within the size, of course - so that the whole copy can be done with word operations. It is also stable in the face of a mutable source string: it explicitly does not read the source buffer multiple times (so an implementation using "strnlen()+memcpy()" would be wrong), and does not read the source buffer past the size (like the mis-design that is strlcpy does). Finally, the return value is designed to be simple and unambiguous: if the string cannot be copied fully, it returns an actual negative error, making error handling clearer and simpler (and the caller already knows the size of the buffer). Otherwise it returns the string length of the result. However, there was one final stability issue that can be important to callers: the stability of the destination buffer. In particular, the same way we shouldn't read the source buffer more than once, we should avoid doing multiple writes to the destination buffer: first writing a potentially non-terminated string, and then terminating it with NUL at the end does not result in a stable result buffer. Yes, it gives the right result in the end, but if the rule for the destination buffer was that it is _always_ NUL-terminated even when accessed concurrently with updates, the final byte of the buffer needs to always _stay_ as a NUL byte. [ Note that "final byte is NUL" here is literally about the final byte in the destination array, not the terminating NUL at the end of the string itself. There is no attempt to try to make concurrent reads and writes give any kind of consistent string length or contents, but we do want to guarantee that there is always at least that final terminating NUL character at the end of the destination array if it existed before ] This is relevant in the kernel for the tsk->comm[] array, for example. Even without locking (for either readers or writers), we want to know that while the buffer contents may be garbled, it is always a valid C string and always has a NUL character at 'comm[TASK_COMM_LEN-1]' (and never has any "out of thin air" data). So avoid any "copy possibly non-terminated string, and terminate later" behavior, and write the destination buffer only once. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-30printf: Remove unused 'bprintf'Dr. David Alan Gilbert2-24/+0
bprintf() is unused. Remove it. It was added in the commit 4370aa4aa753 ("vsprintf: add binary printf") but as far as I can see was never used, unlike the other two functions in that patch. Link: https://lore.kernel.org/20241002173147.210107-1-linux@treblig.org Reviewed-by: Andy Shevchenko <andy@kernel.org> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-30tools/power turbostat: 2024.11.30Len Brown2-2/+2
since 2024.07.26: assorted minor bug fixes assorted platform specific tweaks initial RAPL PSYS (SysWatt) support Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add RAPL psys as a built-in counterPatryk Wlazlyn2-10/+85
Introduce the counter as a part of global, platform counters structure. We open the counter for only one cpu, but otherwise treat it as an ordinary RAPL counter, allowing for grouped perf read. The counter is disabled by default, because it's interpretation may require additional, platform specific information, making it unsuitable for general use. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix child's argument forwardingPatryk Wlazlyn1-1/+1
Add '+' to optstring when early scanning for --no-msr and --no-perf. It causes option processing to stop as soon as a nonoption argument is encountered, effectively skipping child's arguments. Fixes: 3e4048466c39 ("tools/power turbostat: Add --no-msr option") Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Force --no-perf in --dump modePatryk Wlazlyn1-0/+6
Force the --no-perf early to prevent using it as a source. User asks for raw values, but perf returns them relative to the opening of the file descriptor. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add support for /sys/class/drm/card1Zhang Rui1-9/+29
On some machines, the graphics device is enumerated as /sys/class/drm/card1 instead of /sys/class/drm/card0. The current implementation does not handle this scenario, resulting in the loss of graphics C6 residency and frequency information. Add support for /sys/class/drm/card1, ensuring that turbostat can retrieve and display the graphics columns for these platforms. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Cache graphics sysfs file descriptors during probeZhang Rui1-50/+32
Snapshots of the graphics sysfs knobs are taken based on file descriptors. To optimize this process, open the files and cache the file descriptors during the graphics probe phase. As a result, the previously cached pathnames become redundant and are removed. This change aims to streamline the code without altering its functionality. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Consolidate graphics sysfs accessZhang Rui1-9/+6
Currently, there is an inconsistency in how graphics sysfs knobs are accessed: graphics residency sysfs knobs are opened and closed for each read, while graphics frequency sysfs knobs are opened once and remain open until turbostat exits. This inconsistency is confusing and adds unnecessary code complexity. Consolidate the access method by opening the sysfs files once and reusing the file pointers for subsequent accesses. This approach simplifies the code and ensures a consistent method for accessing graphics sysfs knobs. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove unnecessary fflush() callZhang Rui1-4/+3
The graphics sysfs knobs are read-only, making the use of fflush() before reading them redundant. Remove the unnecessary fflush() call. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Enhance platform divergence descriptionZhang Rui1-28/+30
In various generations, platforms often share a majority of features, diverging only in a few specific aspects. The current approach of using hardcoded values in 'platform_features' structure fails to effectively represent these divergences. To improve the description of platform divergence: 1. Each newly introduced 'platform_features' structure must have a base, typically derived from the previous generation. 2. Platform feature values should be inherited from the base structure rather than being hardcoded. This approach ensures a more accurate and maintainable representation of platform-specific features across different generations. Converts `adl_features` and `lnl_features` to follow this new scheme. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add initial support for GraniteRapids-DZhang Rui1-0/+1
Add initial support for GraniteRapids-D. It shares the same features with SapphireRapids. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove PC3 support on LunarlakeZhang Rui1-1/+1
Lunarlake supports CC1/CC6/CC7/PC2/PC6/PC10. Remove PC3 support on Lunarlake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Rename arl_features to lnl_featuresZhang Rui1-2/+2
As ARL shares the same features with ADL/RPL/MTL, now 'arl_features' is used by Lunarlake platform only. Rename 'arl_features' to 'lnl_features'. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Add back PC8 support on ArrowlakeZhang Rui1-3/+3
Similar to ADL/RPL/MTL, ARL supports CC1/CC6/CC7/PC2/PC3/PC6/PC8/PC10. Add back PC8 support on Arrowlake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Remove PC7/PC9 support on MTLZhang Rui1-2/+2
Similar to ADL/RPL, MTL support CC1/CC6/CC7/PC2/PC3/PC6/PC8/CP10. Remove PC7/PC9 support on MTL. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Honor --show CPU, even when even when num_cpus=1Patryk Wlazlyn1-2/+2
Honor --show CPU and --show Core when "topo.num_cpus == 1". Previously turbostat assumed that on a 1-CPU system, these columns should never appear. Honoring these flags makes it easier for several programs that parse turbostat output. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix trailing '\n' parsingZhang Rui1-0/+3
parse_cpu_string() parses the string input either from command line or from /sys/fs/cgroup/cpuset.cpus.effective to get a list of CPUs that turbostat can run with. The cpu string returned by /sys/fs/cgroup/cpuset.cpus.effective contains a trailing '\n', but strtoul() fails to treat this as an error. That says, for the code below val = ("\n", NULL, 10); val returns 0, and errno is also not set. As a result, CPU0 is erroneously considered as allowed CPU and this causes failures when turbostat tries to run on CPU0. get_counters: Could not migrate to CPU 0 ... turbostat: re-initialized with num_cpus 8, allowed_cpus 5 get_counters: Could not migrate to CPU 0 Add a check to return immediately if '\n' or '\0' is detected. Fixes: 8c3dd2c9e542 ("tools/power/turbostat: Abstrct function for parsing cpu string") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Allow using cpu device in perf counters on hybrid platformsPatryk Wlazlyn2-7/+123
Intel hybrid platforms expose different perf devices for P and E cores. Instead of one, "/sys/bus/event_source/devices/cpu" device, there are "/sys/bus/event_source/devices/{cpu_core,cpu_atom}". This, however makes it more complicated for the user, because most of the counters are available on both and had to be handled manually. This patch allows users to use "virtual" cpu device that is seemingly translated to cpu_core and cpu_atom perf devices, depending on the type of a CPU we are opening the counter for. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: Fix column printing for PMT xtal_time countersPatryk Wlazlyn1-3/+3
If the very first printed column was for a PMT counter of type xtal_time we would misalign the column header, because we were always printing the delimiter. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30tools/power turbostat: fix GCC9 build regressionTodd Brandt1-9/+6
Fix build regression seen when using old gcc-9 compiler. Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30PCI/pwrctrl: Unregister platform device only if one actually existsBrian Norris1-2/+7
If a PCI device has an associated device_node with power supplies, pci_bus_add_device() creates platform devices for use by pwrctrl. When the PCI device is removed, pci_stop_dev() uses of_find_device_by_node() to locate the related platform device, then unregisters it. But when we remove a PCI device with no associated device node, dev_of_node(dev) is NULL, and of_find_device_by_node(NULL) returns the first device with "dev->of_node == NULL". The result is that we (a) mistakenly unregister a completely unrelated platform device, leading to issues like the first trace below, and (b) dereference the NULL pointer from dev_of_node() when clearing OF_POPULATED, as in the second trace. Unregister a platform device only if there is one associated with this PCI device. This resolves issues seen when doing: # echo 1 > /sys/bus/pci/devices/.../remove Sample issue from unregistering the wrong platform device: WARNING: CPU: 0 PID: 5095 at drivers/regulator/core.c:5885 regulator_unregister+0x140/0x160 Call trace: regulator_unregister+0x140/0x160 devm_rdev_release+0x1c/0x30 release_nodes+0x68/0x100 devres_release_all+0x98/0xf8 device_unbind_cleanup+0x20/0x70 device_release_driver_internal+0x1f4/0x240 device_release_driver+0x20/0x40 bus_remove_device+0xd8/0x170 device_del+0x154/0x380 device_unregister+0x28/0x88 of_device_unregister+0x1c/0x30 pci_stop_bus_device+0x154/0x1b0 pci_stop_and_remove_bus_device_locked+0x28/0x48 remove_store+0xa0/0xb8 dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 Later NULL pointer dereference for of_node_clear_flag(NULL, OF_POPULATED): Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0 Call trace: pci_stop_bus_device+0x190/0x1b0 pci_stop_and_remove_bus_device_locked+0x28/0x48 remove_store+0xa0/0xb8 dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 Link: https://lore.kernel.org/r/20241126210443.4052876-1-briannorris@chromium.org Fixes: 681725afb6b9 ("PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent") Reported-by: Saurabh Sengar <ssengar@linux.microsoft.com> Closes: https://lore.kernel.org/r/1732890621-19656-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Brian Norris <briannorris@chromium.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-11-30Revert "serial: sh-sci: Clean sci_ports[0] after at earlycon exit"Greg Kroah-Hartman1-28/+0
This reverts commit 3791ea69a4858b81e0277f695ca40f5aae40f312. It was reported to cause boot-time issues, so revert it for now. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: 3791ea69a485 ("serial: sh-sci: Clean sci_ports[0] after at earlycon exit") Cc: stable <stable@kernel.org> Cc: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-30sh: intc: Fix use-after-free bug in register_intc_controller()Dan Carpenter1-1/+1
In the error handling for this function, d is freed without ever removing it from intc_list which would lead to a use after free. To fix this, let's only add it to the list after everything has succeeded. Fixes: 2dcec7a988a1 ("sh: intc: set_irq_wake() support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-11-30sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACKHuacai Chen1-1/+1
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected, cpu_max_bits_warn() generates a runtime warning similar as below when showing /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit) instead of NR_CPUS to iterate CPUs. [ 3.052463] ------------[ cut here ]------------ [ 3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0 [ 3.070072] Modules linked in: efivarfs autofs4 [ 3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052 [ 3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000 [ 3.109127] 9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430 [ 3.118774] 90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff [ 3.128412] 0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890 [ 3.138056] 0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa [ 3.147711] ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000 [ 3.157364] 900000000101c998 0000000000000004 9000000000ef7430 0000000000000000 [ 3.167012] 0000000000000009 000000000000006c 0000000000000000 0000000000000000 [ 3.176641] 9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286 [ 3.186260] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c [ 3.195868] ... [ 3.199917] Call Trace: [ 3.203941] [<90000000002086d8>] show_stack+0x38/0x14c [ 3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88 [ 3.217625] [<900000000023d268>] __warn+0xd0/0x100 [ 3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc [ 3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0 [ 3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4 [ 3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4 [ 3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0 [ 3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100 [ 3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94 [ 3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160 [ 3.281824] ---[ end trace 8b484262b4b8c24c ]--- Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-11-29brd: decrease the number of allocated pages which discardedZhang Xianwei1-1/+3
The number of allocated pages which discarded will not decrease. Fix it. Fixes: 9ead7efc6f3f ("brd: implement discard support") Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241128170056565nPKSz2vsP8K8X2uk2iaDG@zte.com.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-29block, bfq: fix bfqq uaf in bfq_limit_depth()Yu Kuai1-13/+24
Set new allocated bfqq to bic or remove freed bfqq from bic are both protected by bfqd->lock, however bfq_limit_depth() is deferencing bfqq from bic without the lock, this can lead to UAF if the io_context is shared by multiple tasks. For example, test bfq with io_uring can trigger following UAF in v6.6: ================================================================== BUG: KASAN: slab-use-after-free in bfqq_group+0x15/0x50 Call Trace: <TASK> dump_stack_lvl+0x47/0x80 print_address_description.constprop.0+0x66/0x300 print_report+0x3e/0x70 kasan_report+0xb4/0xf0 bfqq_group+0x15/0x50 bfqq_request_over_limit+0x130/0x9a0 bfq_limit_depth+0x1b5/0x480 __blk_mq_alloc_requests+0x2b5/0xa00 blk_mq_get_new_requests+0x11d/0x1d0 blk_mq_submit_bio+0x286/0xb00 submit_bio_noacct_nocheck+0x331/0x400 __block_write_full_folio+0x3d0/0x640 writepage_cb+0x3b/0xc0 write_cache_pages+0x254/0x6c0 write_cache_pages+0x254/0x6c0 do_writepages+0x192/0x310 filemap_fdatawrite_wbc+0x95/0xc0 __filemap_fdatawrite_range+0x99/0xd0 filemap_write_and_wait_range.part.0+0x4d/0xa0 blkdev_read_iter+0xef/0x1e0 io_read+0x1b6/0x8a0 io_issue_sqe+0x87/0x300 io_wq_submit_work+0xeb/0x390 io_worker_handle_work+0x24d/0x550 io_wq_worker+0x27f/0x6c0 ret_from_fork_asm+0x1b/0x30 </TASK> Allocated by task 808602: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc_node+0x1b1/0x6d0 bfq_get_queue+0x138/0xfa0 bfq_get_bfqq_handle_split+0xe3/0x2c0 bfq_init_rq+0x196/0xbb0 bfq_insert_request.isra.0+0xb5/0x480 bfq_insert_requests+0x156/0x180 blk_mq_insert_request+0x15d/0x440 blk_mq_submit_bio+0x8a4/0xb00 submit_bio_noacct_nocheck+0x331/0x400 __blkdev_direct_IO_async+0x2dd/0x330 blkdev_write_iter+0x39a/0x450 io_write+0x22a/0x840 io_issue_sqe+0x87/0x300 io_wq_submit_work+0xeb/0x390 io_worker_handle_work+0x24d/0x550 io_wq_worker+0x27f/0x6c0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x1b/0x30 Freed by task 808589: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 __kasan_slab_free+0x126/0x1b0 kmem_cache_free+0x10c/0x750 bfq_put_queue+0x2dd/0x770 __bfq_insert_request.isra.0+0x155/0x7a0 bfq_insert_request.isra.0+0x122/0x480 bfq_insert_requests+0x156/0x180 blk_mq_dispatch_plug_list+0x528/0x7e0 blk_mq_flush_plug_list.part.0+0xe5/0x590 __blk_flush_plug+0x3b/0x90 blk_finish_plug+0x40/0x60 do_writepages+0x19d/0x310 filemap_fdatawrite_wbc+0x95/0xc0 __filemap_fdatawrite_range+0x99/0xd0 filemap_write_and_wait_range.part.0+0x4d/0xa0 blkdev_read_iter+0xef/0x1e0 io_read+0x1b6/0x8a0 io_issue_sqe+0x87/0x300 io_wq_submit_work+0xeb/0x390 io_worker_handle_work+0x24d/0x550 io_wq_worker+0x27f/0x6c0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x1b/0x30 Fix the problem by protecting bic_to_bfqq() with bfqd->lock. CC: Jan Kara <jack@suse.cz> Fixes: 76f1df88bbc2 ("bfq: Limit number of requests consumed by each cgroup") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241129091509.2227136-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-29io_uring/tctx: work around xa_store() allocation error issueJens Axboe1-1/+12
syzbot triggered the following WARN_ON: WARNING: CPU: 0 PID: 16 at io_uring/tctx.c:51 __io_uring_free+0xfa/0x140 io_uring/tctx.c:51 which is the WARN_ON_ONCE(!xa_empty(&tctx->xa)); sanity check in __io_uring_free() when a io_uring_task is going through its final put. The syzbot test case includes injecting memory allocation failures, and it very much looks like xa_store() can fail one of its memory allocations and end up with ->head being non-NULL even though no entries exist in the xarray. Until this issue gets sorted out, work around it by attempting to iterate entries in our xarray, and WARN_ON_ONCE() if one is found. Reported-by: syzbot+cc36d44ec9f368e443d3@syzkaller.appspotmail.com Link: https://lore.kernel.org/io-uring/673c1643.050a0220.87769.0066.GAE@google.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-29Revert "s390/mm: Allow large pages for KASAN shadow mapping"Vasily Gorbik1-11/+1
This reverts commit ff123eb7741638d55abf82fac090bb3a543c1e74. Allowing large pages for KASAN shadow mappings isn't inherently wrong, but adding POPULATE_KASAN_MAP_SHADOW to large_allowed() exposes an issue in can_large_pud() and can_large_pmd(). Since commit d8073dc6bc04 ("s390/mm: Allow large pages only for aligned physical addresses"), both can_large_pud() and can_large_pmd() call _pa() to check if large page physical addresses are aligned. However, _pa() has a side effect: it allocates memory in POPULATE_KASAN_MAP_SHADOW mode. This results in massive memory leaks. The proper fix would be to address both large_allowed() and _pa()'s side effects, but for now, revert this change to avoid the leaks. Fixes: ff123eb77416 ("s390/mm: Allow large pages for KASAN shadow mapping") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-29posix-timers: Target group sigqueue to current task only if not exitingFrederic Weisbecker1-3/+4
A sigqueue belonging to a posix timer, which target is not a specific thread but a whole thread group, is preferrably targeted to the current task if it is part of that thread group. However nothing prevents a posix timer event from queueing such a sigqueue from a reaped yet running task. The interruptible code space between exit_notify() and the final call to schedule() is enough for posix_timer_fn() hrtimer to fire. If that happens while the current task is part of the thread group target, it is proposed to handle it but since its sighand pointer may have been cleared already, the sigqueue is dropped even if there are other tasks running within the group that could handle it. As a result posix timers with thread group wide target may miss signals when some of their threads are exiting. Fix this with verifying that the current task hasn't been through exit_notify() before proposing it as a preferred target so as to ensure that its sighand is still here and stable. complete_signal() might still reconsider the choice and find a better target within the group if current has passed retarget_shared_pending() already. Fixes: bcb7ee79029d ("posix-timers: Prefer delivery of signals to the current thread") Reported-by: Anthony Mallet <anthony.mallet@laas.fr> Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241122234811.60455-1-frederic@kernel.org Closes: https://lore.kernel.org/all/26411.57288.238690.681680@gargle.gargle.HOWL
2024-11-29MAINTAINERS: fix typo in I2C OF COMPONENT PROBERLukas Bulwahn1-1/+1
Commit 157ce8f381ef ("i2c: Introduce OF component probe function") adds the header file include/linux/i2c-of-prober.h and a corresponding file entry in the newly added MAINTAINERS section I2C OF COMPONENT PROBER. This file entry unfortunately has a typo. Fortunately, ./scripts/get_maintainer.pl --self-test=patterns detects this broken reference. Fix the typo in this file entry in the I2C OF COMPONENT PROBER section. Fixes: 157ce8f381ef ("i2c: Introduce OF component probe function") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>