aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-04-06Linux 6.15-rc1Linus Torvalds1-2/+2
2025-04-06tools/include: make uapi/linux/types.h usable from assemblyThomas Weißschuh1-0/+3
The "real" linux/types.h UAPI header gracefully degrades to a NOOP when included from assembly code. Mirror this behaviour in the tools/ variant. Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the toolchain automatically. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/ Fixes: c9fbaa879508 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06tools/power turbostat: v2025.05.06Len Brown1-1/+1
Support up to 8192 processors Add cpuidle governor debug telemetry, disabled by default Update default output to exclude cpuidle invocation counts Bug fixes Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: disable "cpuidle" invocation counters, by defaultLen Brown2-13/+33
Create "pct_idle" counter group, the sofware notion of residency so it can now be singled out, independent of other counter groups. Create "cpuidle" group, the cpuidle invocation counts. Disable "cpuidle", by default. Create "swidle" = "cpuidle" + "pct_idle". Undocument "sysfs", the old name for "swidle", but keep it working for backwards compatibilty. Create "hwidle", all the HW idle counters Modify "idle", enabled by default "idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle") Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06Disable SLUB_TINY for build testingLinus Torvalds2-2/+2
... and don't error out so hard on missing module descriptions. Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") we used to warn about missing module descriptions, but only when building with extra warnigns (ie 'W=1'). After that commit the warning became an unconditional hard error. And it turns out not all modules have been converted despite the claims to the contrary. As reported by Damian Tometzki, the slub KUnit test didn't have a module description, and apparently nobody ever really noticed. The reason nobody noticed seems to be that the slub KUnit tests get disabled by SLUB_TINY, which also ends up disabling a lot of other code, both in tests and in slub itself. And so anybody doing full build tests didn't actually see this failre. So let's disable SLUB_TINY for build-only tests, since it clearly ends up limiting build coverage. Also turn the missing module descriptions error back into a warning, but let's keep it around for non-'W=1' builds. Reported-by: Damian Tometzki <damian@riscv-rocks.de> Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/ Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06tools/power turbostat: re-factor sysfs codeLen Brown1-10/+21
Probe cpuidle "sysfs" residency and counts separately, since soon we will make one disabled on, and the other disabled off. Clarify that some BIC (build-in-counters) are actually "groups". since we're about to re-name some of those groups. no functional change. Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: Restore GFX sysfs fflush() callZhang Rui1-0/+1
Do fflush() to discard the buffered data, before each read of the graphics sysfs knobs. Fixes: ba99a4fc8c24 ("tools/power turbostat: Remove unnecessary fflush() call") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: Document GNR UncMHz domain conventionLen Brown1-0/+1
Document that on Intel Granite Rapids Systems, Uncore domains 0-2 are CPU domains, and uncore domains 3-4 are IO domains. Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: report CoreThr per measurement intervalLen Brown2-1/+3
The CoreThr column displays total thermal throttling events since boot time. Change it to report events during the measurement interval. This is more useful for showing a user the current conditions. Total events since boot time are still available to the user via /sys/devices/system/cpu/cpu*/thermal_throttle/* Document CoreThr on turbostat.8 Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print") Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Chen Yu <yu.c.chen@intel.com>
2025-04-06tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192Justin Ernst1-1/+1
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output: "turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151" A similar error appears with the use of turbostat --cpu when the inputted cpu range contains a cpu number >= 1024: # turbostat -c 1100-1151 "--cpu 1100-1151" malformed ... Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS. It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low. For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS brings support for parsing cpu numbers >= 1024. Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64. Signed-off-by: Justin Ernst <justin.ernst@hpe.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06kbuild: rpm-pkg: build a debuginfo RPMUday Shankar2-2/+54
The rpm-pkg make target currently suffers from a few issues related to debuginfo: 1. debuginfo for things built into the kernel (vmlinux) is not available in any RPM produced by make rpm-pkg. This makes using tools like systemtap against a make rpm-pkg kernel impossible. 2. debug source for the kernel is not available. This means that commands like 'disas /s' in gdb, which display source intermixed with assembly, can only print file names/line numbers which then must be painstakingly resolved to actual source in a separate editor. 3. debuginfo for modules is available, but it remains bundled with the .ko files that contain module code, in the main kernel RPM. This is a waste of space for users who do not need to debug the kernel (i.e. most users). Address all of these issues by additionally building a debuginfo RPM when the kernel configuration allows for it, in line with standard patterns followed by RPM distributors. With these changes: 1. systemtap now works (when these changes are backported to 6.11, since systemtap lags a bit behind in compatibility), as verified by the following simple test script: # stap -e 'probe kernel.function("do_sys_open").call { printf("%s\n", $$parms); }' dfd=0xffffffffffffff9c filename=0x7fe18800b160 flags=0x88800 mode=0x0 ... 2. disas /s works correctly in gdb, with source and disassembly interspersed: # gdb vmlinux --batch -ex 'disas /s blk_op_str' Dump of assembler code for function blk_op_str: block/blk-core.c: 125 { 0xffffffff814c8740 <+0>: endbr64 127 128 if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op]) 0xffffffff814c8744 <+4>: mov $0xffffffff824a7378,%rax 0xffffffff814c874b <+11>: cmp $0x23,%edi 0xffffffff814c874e <+14>: ja 0xffffffff814c8768 <blk_op_str+40> 0xffffffff814c8750 <+16>: mov %edi,%edi 126 const char *op_str = "UNKNOWN"; 0xffffffff814c8752 <+18>: mov $0xffffffff824a7378,%rdx 127 128 if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op]) 0xffffffff814c8759 <+25>: mov -0x7dfa0160(,%rdi,8),%rax 126 const char *op_str = "UNKNOWN"; 0xffffffff814c8761 <+33>: test %rax,%rax 0xffffffff814c8764 <+36>: cmove %rdx,%rax 129 op_str = blk_op_name[op]; 130 131 return op_str; 132 } 0xffffffff814c8768 <+40>: jmp 0xffffffff81d01360 <__x86_return_thunk> End of assembler dump. 3. The size of the main kernel package goes down substantially, especially if many modules are built (quite typical). Here is a comparison of installed size of the kernel package (configured with allmodconfig, dwarf4 debuginfo, and module compression turned off) before and after this patch: # rpm -qi kernel-6.13* | grep -E '^(Version|Size)' Version : 6.13.0postpatch+ Size : 1382874089 Version : 6.13.0prepatch+ Size : 17870795887 This is a ~92% size reduction. Note that a debuginfo package can only be produced if the following configs are set: - CONFIG_DEBUG_INFO=y - CONFIG_MODULE_COMPRESS=n - CONFIG_DEBUG_INFO_SPLIT=n The first of these is obvious - we can't produce debuginfo if the build does not generate it. The second two requirements can in principle be removed, but doing so is difficult with the current approach, which uses a generic rpmbuild script find-debuginfo.sh that processes all packaged executables. If we want to remove those requirements the best path forward is likely to add some debuginfo extraction/installation logic to the modules_install target (controllable by flags). That way, it's easier to operate on modules before they're compressed, and the logic can be reused by all packaging targets. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-04-06kconfig: merge_config: use an empty file as initfileDaniel Gomez1-2/+2
The scripts/kconfig/merge_config.sh script requires an existing $INITFILE (or the $1 argument) as a base file for merging Kconfig fragments. However, an empty $INITFILE can serve as an initial starting point, later referenced by the KCONFIG_ALLCONFIG Makefile variable if -m is not used. This variable can point to any configuration file containing preset config symbols (the merged output) as stated in Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will contain just the merge output requiring the user to run make (i.e. KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make olddefconfig). Instead of failing when `$INITFILE` is missing, create an empty file and use it as the starting point for merges. Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-04-06nios2: migrate to the generic rule for built-in DTBMasahiro Yamada4-9/+10
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot DTBs") introduced generic support for built-in DTBs. Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled. To keep consistency across architectures, this commit also renames CONFIG_NIOS2_DTB_SOURCE_BOOL to CONFIG_BUILTIN_DTB, and CONFIG_NIOS2_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-04-05sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEXJohan Korsnes4-4/+0
This option was removed from Kconfig in 8c710f75256b ("net/sched: Retire tcindex classifier") but from the defconfigs. Fixes: 8c710f75256b ("net/sched: Retire tcindex classifier") Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-04-05sh: Align .bss section padding to 8-byte boundaryArtur Rojek1-1/+14
J2-based devices expect to find a device tree blob at the end of the .bss section. As of a77725a9a3c5 ("scripts/dtc: Update to upstream version v1.6.1-19-g0a3a9d3449c8"), libfdt enforces 8-byte alignment for the DTB, causing J2 devices to fail early in sh_fdt_init(). As the J2 loader firmware calculates the DTB location based on the kernel image .bss section size rather than the __bss_stop symbol offset, the required alignment can't be enforced with BSS_SECTION(0, PAGE_SIZE, 8). To fix this, inline a modified version of the above macro which grows .bss by the required size. While this change affects all existing SH boards, it should be benign on platforms which don't need this alignment. Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: Rob Landley <rob@landley.net> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-04-05tracing/timers: Rename the hrtimer_init event to hrtimer_setupNam Cao4-7/+7
The function hrtimer_init() doesn't exist anymore. It was replaced by hrtimer_setup(). Thus, rename the hrtimer_init trace event to hrtimer_setup to keep it consistent. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/cba84c3d853c5258aa3a262363a6eac08e2c7afc.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()Nam Cao1-4/+4
All the hrtimer_init*() functions have been renamed to hrtimer_setup*(). Rename debug_init_on_stack() to debug_setup_on_stack() as well, to keep the names consistent. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/073cf6162779a2f5b12624677d4c49ee7eccc1ed.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Rename debug_init() to debug_setup()Nam Cao1-4/+2
All the hrtimer_init*() functions have been renamed to hrtimer_setup*(). Rename debug_init() to debug_setup() as well, to keep the names consistent. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/4b730c1f79648b16a1c5413f928fdc2e138dfc43.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()Nam Cao1-4/+4
All the hrtimer_init*() functions have been renamed to hrtimer_setup*(). Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() as well, to keep the names consistent. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/807694aedad9353421c4a7347629a30c5c31026f.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()Nam Cao1-2/+0
The struct hrtimer::function field can only be changed using hrtimer_setup*() or hrtimer_update_function(), and both already null-check 'function'. Therefore, null-checking 'function' in hrtimer_start_range_ns() is not necessary. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/4661c571ee87980c340ccc318fc1a473c0c8f6bc.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Make callback function pointer privateNam Cao4-8/+8
Make the struct hrtimer::function field private, to prevent users from changing this field in an unsafe way. hrtimer_update_function() should be used if the callback function needs to be changed. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/7d0e6e0c5c59a64a9bea940051aac05d750bc0c2.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Merge __hrtimer_init() into __hrtimer_setup()Nam Cao1-9/+3
__hrtimer_init() is only called by __hrtimer_setup(). Simplify by merging __hrtimer_init() into __hrtimer_setup(). Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/8a0a847a35f711f66b2d05b57255aa44e7e61279.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Switch to use __htimer_setup()Nam Cao1-2/+1
__hrtimer_init_sleeper() calls __hrtimer_init() and also sets up the callback function. But there is already __hrtimer_setup() which does both actions. Switch to use __hrtimer_setup() to simplify the code. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/d9a45a51b6a8aa0045310d63f73753bf6b33f385.1738746927.git.namcao@linutronix.de
2025-04-05hrtimers: Delete hrtimer_init()Nam Cao3-23/+1
hrtimer_init() is now unused. Delete it. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/003722f60c7a2a4f8d4ed24fb741aa313b7e5136.1738746927.git.namcao@linutronix.de
2025-04-05treewide: Convert new and leftover hrtimer_init() usersThomas Gleixner4-12/+9
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Coccinelle scripted cleanup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner787-1648/+1613
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-04Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"Thomas Gleixner1-69/+25
This reverts commit 757b000f7b936edf79311ab0971fe465bbda75ea. Miroslav reported that the changes for handling the inconsistencies in the coarse time getters result in a regression on the adjtimex() side. There are two issues: 1) The forwarding of the base time moves the update out of the original period and establishes a new one. 2) The clearing of the accumulated NTP error is changing the behaviour as well. Userspace expects that multiplier/frequency updates are in effect, when the syscall returns, so delaying the update to the next tick is not solving the problem either. Revert the change, so that the established expectations of user space implementations (ntpd, chronyd) are restored. The re-introduced inconsistency of the coarse time getters will be addressed in a subsequent fix. Fixes: 757b000f7b93 ("timekeeping: Fix possible inconsistencies in _COARSE clockids") Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/Z-qsg6iDGlcIJulJ@localhost
2025-04-04genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()Thomas Gleixner1-1/+1
Frank reported, that the common irq_force_complete_move() breaks the out of tree build of ia64. The reason is that ia64 uses the migration code, but does not have hierarchical interrupt domains enabled. This went unnoticed in mainline as both x86 and RISC-V have hierarchical domains enabled. Not that it matters for mainline, but it's still inconsistent. Use irqd_get_parent_data() instead of accessing the parent_data field directly. The helper returns NULL when hierarchical domains are disabled otherwise it accesses the parent_data field of the domain. No functional change. Fixes: 751dc837dabd ("genirq: Introduce common irq_force_complete_move() implementation") Reported-by: Frank Scheiner <frank.scheiner@web.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Frank Scheiner <frank.scheiner@web.de> Link: https://lore.kernel.org/all/87h634ugig.ffs@tglx
2025-04-04selftests: net: amt: indicate progress in the stress testJakub Kicinski1-6/+14
Our CI expects output from the test at least once every 10 minutes. The AMT test when running on debug kernel is just on the edge of that time for the stress test. Improve the output: - print the name of the test first, before starting it, - output a dot every 10% of the way. Output after: TEST: amt discovery [ OK ] TEST: IPv4 amt multicast forwarding [ OK ] TEST: IPv6 amt multicast forwarding [ OK ] TEST: IPv4 amt traffic forwarding torture .......... [ OK ] TEST: IPv6 amt traffic forwarding torture .......... [ OK ] Reviewed-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250403145636.2891166-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04irqdomain: Stop using 'host' for domainJiri Slaby (SUSE)1-6/+6
It is confusing to see 'host' and 'domain' to be used as 'domain'. Given this header is all about domains, switch the remaining 'host' uses to 'domain'. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-5-jirislaby@kernel.org
2025-04-04irqdomain: Rename irq_get_default_host() to irq_get_default_domain()Jiri Slaby (SUSE)8-11/+11
Naming interrupt domains host is confusing at best and the irqdomain code uses both domain and host inconsistently. Therefore rename irq_get_default_host() to irq_get_default_domain(). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-4-jirislaby@kernel.org
2025-04-04irqdomain: Rename irq_set_default_host() to irq_set_default_domain()Jiri Slaby (SUSE)30-36/+36
Naming interrupt domains host is confusing at best and the irqdomain code uses both domain and host inconsistently. Therefore rename irq_set_default_host() to irq_set_default_domain(). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org
2025-04-04netlink: specs: rt_route: pull the ifa- prefix out of the namesJakub Kicinski1-89/+91
YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in route-attrs and metrics and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: 023289b4f582 ("doc/netlink: Add spec for rt route messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: pull the ifa- prefix out of the namesJakub Kicinski2-20/+21
YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in addr-attrs and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: dfb0f7d9d979 ("doc/netlink: Add spec for rt addr messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: fix get multi command nameJakub Kicinski2-2/+2
Command names should match C defines, codegens may depend on it. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: fix the spec format / schema failuresJakub Kicinski1-0/+1
The spec is mis-formatted, schema validation says: Failed validating 'type' in schema['properties']['operations']['properties']['list']['items']['properties']['dump']['properties']['request']['properties']['value']: {'minimum': 0, 'type': 'integer'} On instance['operations']['list'][3]['dump']['request']['value']: '58 - ifa-family' The ifa-family clearly wants to be part of an attribute list. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Yuyang Huang <yuyanghuang@google.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Link: https://patch.msgid.link/20250403013706.2828322-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: avoid false positive warnings in __net_mp_close_rxq()Jakub Kicinski2-8/+8
Commit under Fixes solved the problem of spurious warnings when we uninstall an MP from a device while its down. The __net_mp_close_rxq() which is used by io_uring was not fixed. Move the fix over and reuse __net_mp_close_rxq() in the devmem path. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250403013405.2827250-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: move mp dev config validation to __net_mp_open_rxq()Jakub Kicinski4-57/+54
devmem code performs a number of safety checks to avoid having to reimplement all of them in the drivers. Move those to __net_mp_open_rxq() and reuse that function for binding to make sure that io_uring ZC also benefits from them. While at it rename the queue ID variable to rxq_idx in __net_mp_open_rxq(), we touch most of the relevant lines. The XArray insertion is reordered after the netdev_rx_queue_restart() call, otherwise we'd need to duplicate the queue index check or risk inserting an invalid pointer. The XArray allocation failures should be extremely rare. Reviewed-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: 6e18ed929d3b ("net: add helpers for setting a memory provider on an rx queue") Link: https://patch.msgid.link/20250403013405.2827250-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: ibmveth: make veth_pool_store stop hangingDave Marquardt1-12/+27
v2: - Created a single error handling unlock and exit in veth_pool_store - Greatly expanded commit message with previous explanatory-only text Summary: Use rtnl_mutex to synchronize veth_pool_store with itself, ibmveth_close and ibmveth_open, preventing multiple calls in a row to napi_disable. Background: Two (or more) threads could call veth_pool_store through writing to /sys/devices/vio/30000002/pool*/*. You can do this easily with a little shell script. This causes a hang. I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new kernel. I ran this test again and saw: Setting pool0/active to 0 Setting pool1/active to 1 [ 73.911067][ T4365] ibmveth 30000002 eth0: close starting Setting pool1/active to 1 Setting pool1/active to 0 [ 73.911367][ T4366] ibmveth 30000002 eth0: close starting [ 73.916056][ T4365] ibmveth 30000002 eth0: close complete [ 73.916064][ T4365] ibmveth 30000002 eth0: open starting [ 110.808564][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 230.808495][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 243.683786][ T123] INFO: task stress.sh:4365 blocked for more than 122 seconds. [ 243.683827][ T123] Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8 [ 243.683833][ T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.683838][ T123] task:stress.sh state:D stack:28096 pid:4365 tgid:4365 ppid:4364 task_flags:0x400040 flags:0x00042000 [ 243.683852][ T123] Call Trace: [ 243.683857][ T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable) [ 243.683868][ T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0 [ 243.683878][ T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0 [ 243.683888][ T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210 [ 243.683896][ T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50 [ 243.683904][ T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0 [ 243.683913][ T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60 [ 243.683921][ T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc [ 243.683928][ T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270 [ 243.683936][ T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0 [ 243.683944][ T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0 [ 243.683951][ T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650 [ 243.683958][ T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150 [ 243.683966][ T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340 [ 243.683973][ T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec ... [ 243.684087][ T123] Showing all locks held in the system: [ 243.684095][ T123] 1 lock held by khungtaskd/123: [ 243.684099][ T123] #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248 [ 243.684114][ T123] 4 locks held by stress.sh/4365: [ 243.684119][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684132][ T123] #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684143][ T123] #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684155][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60 [ 243.684166][ T123] 5 locks held by stress.sh/4366: [ 243.684170][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684183][ T123] #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684194][ T123] #2: c0000000366f4ba8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684205][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_disable+0x30/0x60 [ 243.684216][ T123] #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x138/0x12a0 From the ibmveth debug, two threads are calling veth_pool_store, which calls ibmveth_close and ibmveth_open. Here's the sequence: T4365 T4366 ----------------- ----------------- --------- veth_pool_store veth_pool_store ibmveth_close ibmveth_close napi_disable napi_disable ibmveth_open napi_enable <- HANG ibmveth_close calls napi_disable at the top and ibmveth_open calls napi_enable at the top. https://docs.kernel.org/networking/napi.html]] says The control APIs are not idempotent. Control API calls are safe against concurrent use of datapath APIs but an incorrect sequence of control API calls may result in crashes, deadlocks, or race conditions. For example, calling napi_disable() multiple times in a row will deadlock. In the normal open and close paths, rtnl_mutex is acquired to prevent other callers. This is missing from veth_pool_store. Use rtnl_mutex in veth_pool_store fixes these hangs. Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com> Fixes: 860f242eb534 ("[PATCH] ibmveth change buffer pools dynamically") Reviewed-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250402154403.386744-1-davemarq@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04arcnet: Add NULL check in com20020pci_probe()Henry Martin1-1/+16
devm_kasprintf() returns NULL when memory allocation fails. Currently, com20020pci_probe() does not check for this case, which results in a NULL pointer dereference. Add NULL check after devm_kasprintf() to prevent this issue and ensure no resources are left allocated. Fixes: 6b17a597fc2f ("arcnet: restoring support for multiple Sohard Arcnet cards") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Link: https://patch.msgid.link/20250402135036.44697-1-bsdhenrymartin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04ipv6: Do not consider link down nexthops in path selectionIdo Schimmel1-2/+4
Nexthops whose link is down are not supposed to be considered during path selection when the "ignore_routes_with_linkdown" sysctl is set. This is done by assigning them a negative region boundary. However, when comparing the computed hash (unsigned) with the region boundary (signed), the negative region boundary is treated as unsigned, resulting in incorrect nexthop selection. Fix by treating the computed hash as signed. Note that the computed hash is always in range of [0, 2^31 - 1]. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250402114224.293392-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04ipv6: Start path selection from the first nexthopIdo Schimmel1-3/+35
Cited commit transitioned IPv6 path selection to use hash-threshold instead of modulo-N. With hash-threshold, each nexthop is assigned a region boundary in the multipath hash function's output space and a nexthop is chosen if the calculated hash is smaller than the nexthop's region boundary. Hash-threshold does not work correctly if path selection does not start with the first nexthop. For example, if fib6_select_path() is always passed the last nexthop in the group, then it will always be chosen because its region boundary covers the entire hash function's output space. Fix this by starting the selection process from the first nexthop and do not consider nexthops for which rt6_score_route() provided a negative score. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Reported-by: Stanislav Fomichev <stfomichev@gmail.com> Closes: https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250402114224.293392-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04usbnet:fix NPE during rx_completeYing Lu1-3/+3
Missing usbnet_going_away Check in Critical Path. The usb_submit_urb function lacks a usbnet_going_away validation, whereas __usbnet_queue_skb includes this check. This inconsistency creates a race condition where: A URB request may succeed, but the corresponding SKB data fails to be queued. Subsequent processes: (e.g., rx_complete → defer_bh → __skb_unlink(skb, list)) attempt to access skb->next, triggering a NULL pointer dereference (Kernel Panic). Fixes: 04e906839a05 ("usbnet: fix cyclical race on disconnect with work queue") Cc: stable@vger.kernel.org Signed-off-by: Ying Lu <luying1@xiaomi.com> Link: https://patch.msgid.link/4c9ef2efaa07eb7f9a5042b74348a67e5a3a7aea.1743584159.git.luying1@xiaomi.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROPLorenzo Bianconi1-5/+4
In the current implementation octeontx2 manages XDP_ABORTED and XDP invalid as XDP_PASS forwarding the skb to the networking stack. Align the behaviour to other XDP drivers handling XDP_ABORTED and XDP invalid as XDP_DROP. Please note this patch has just compile tested. Fixes: 06059a1a9a4a5 ("octeontx2-pf: Add XDP support to netdev PF") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250401-octeontx2-xdp-abort-fix-v1-1-f0587c35a0b9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: fix geneve_opt length integer overflowLin Ma4-4/+4
struct geneve_opt uses 5 bit length for each single option, which means every vary size option should be smaller than 128 bytes. However, all current related Netlink policies cannot promise this length condition and the attacker can exploit a exact 128-byte size option to *fake* a zero length option and confuse the parsing logic, further achieve heap out-of-bounds read. One example crash log is like below: [ 3.905425] ================================================================== [ 3.905925] BUG: KASAN: slab-out-of-bounds in nla_put+0xa9/0xe0 [ 3.906255] Read of size 124 at addr ffff888005f291cc by task poc/177 [ 3.906646] [ 3.906775] CPU: 0 PID: 177 Comm: poc-oob-read Not tainted 6.1.132 #1 [ 3.907131] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 3.907784] Call Trace: [ 3.907925] <TASK> [ 3.908048] dump_stack_lvl+0x44/0x5c [ 3.908258] print_report+0x184/0x4be [ 3.909151] kasan_report+0xc5/0x100 [ 3.909539] kasan_check_range+0xf3/0x1a0 [ 3.909794] memcpy+0x1f/0x60 [ 3.909968] nla_put+0xa9/0xe0 [ 3.910147] tunnel_key_dump+0x945/0xba0 [ 3.911536] tcf_action_dump_1+0x1c1/0x340 [ 3.912436] tcf_action_dump+0x101/0x180 [ 3.912689] tcf_exts_dump+0x164/0x1e0 [ 3.912905] fw_dump+0x18b/0x2d0 [ 3.913483] tcf_fill_node+0x2ee/0x460 [ 3.914778] tfilter_notify+0xf4/0x180 [ 3.915208] tc_new_tfilter+0xd51/0x10d0 [ 3.918615] rtnetlink_rcv_msg+0x4a2/0x560 [ 3.919118] netlink_rcv_skb+0xcd/0x200 [ 3.919787] netlink_unicast+0x395/0x530 [ 3.921032] netlink_sendmsg+0x3d0/0x6d0 [ 3.921987] __sock_sendmsg+0x99/0xa0 [ 3.922220] __sys_sendto+0x1b7/0x240 [ 3.922682] __x64_sys_sendto+0x72/0x90 [ 3.922906] do_syscall_64+0x5e/0x90 [ 3.923814] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 3.924122] RIP: 0033:0x7e83eab84407 [ 3.924331] Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 faf [ 3.925330] RSP: 002b:00007ffff505e370 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 3.925752] RAX: ffffffffffffffda RBX: 00007e83eaafa740 RCX: 00007e83eab84407 [ 3.926173] RDX: 00000000000001a8 RSI: 00007ffff505e3c0 RDI: 0000000000000003 [ 3.926587] RBP: 00007ffff505f460 R08: 00007e83eace1000 R09: 000000000000000c [ 3.926977] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffff505f3c0 [ 3.927367] R13: 00007ffff505f5c8 R14: 00007e83ead1b000 R15: 00005d4fbbe6dcb8 Fix these issues by enforing correct length condition in related policies. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Fixes: 4ece47787077 ("lwtunnel: add options setting and dumping for geneve") Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250402165632.6958-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03fs: actually hold the namespace semaphoreChristian Brauner1-1/+2
Don't use a scoped guard that only protects the next statement. Use a regular guard to make sure that the namespace semaphore is held across the whole function. Signed-off-by: Christian Brauner <brauner@kernel.org> Reported-by: Leon Romanovsky <leon@kernel.org> Link: https://lore.kernel.org/all/20250401170715.GA112019@unreal/ Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-03io_uring/zcrx: fix selftests w/ updated netdev Python helpersDavid Wei1-4/+4
Fix io_uring zero copy rx selftest with updated netdev Python helpers. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250402172414.895276-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03selftests: net: use netdevsim in netns testStanislav Fomichev2-4/+34
Netdevsim has extra register_netdevice_notifier_dev_net notifiers, use netdevim instead of dummy device to test them out. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-9-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03docs: net: document netdev notifier expectationsStanislav Fomichev1-0/+23
We don't have a consistent state yet, but document where we think we are and where we wanna be. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: dummy: request ops lockStanislav Fomichev1-0/+1
Even though dummy device doesn't really need an instance lock, a lot of selftests use dummy so it's useful to have extra expose to the instance lock on NIPA. Request the instance/ops locking. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>