aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-06-16tools headers x86 svm: Sync svm headers with the kernel sourcesArnaldo Carvalho de Melo1-0/+2
To pick the changes in: 827547bc3a2a2af6 ("KVM: SVM: Add architectural definitions/assets for Bus Lock Threshold") That triggers: CC /tmp/build/perf-tools/arch/x86/util/kvm-stat.o LD /tmp/build/perf-tools/arch/x86/util/perf-util-in.o LD /tmp/build/perf-tools/arch/x86/perf-util-in.o LD /tmp/build/perf-tools/arch/perf-util-in.o LD /tmp/build/perf-tools/perf-util-in.o AR /tmp/build/perf-tools/libperf-util.a LINK /tmp/build/perf-tools/perf The SVM_EXIT_BUS_LOCK exit reason was added to SVM_EXIT_REASONS, used in kvm-stat.c. This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nikunj A Dadhania <nikunj@amd.com> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/aErcjuTTCVEZ-8Nb@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16tools headers UAPI: Sync KVM's vmx.h header with the kernel sourcesArnaldo Carvalho de Melo1-1/+4
To pick the changes in: 6c441e4d6e729616 ("KVM: TDX: Handle EXIT_REASON_OTHER_SMI") c42856af8f70d983 ("KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL)") That makes 'perf kvm-stat' aware of this new TDCALL exit reason, thus addressing the following perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Isaku Yamahata <isaku.yamahata@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/aErcVn_4plQyODR1@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16tools kvm headers arm64: Update KVM header from the kernel sourcesArnaldo Carvalho de Melo1-4/+5
To pick the changes from: b7628c7973765c85 ("KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs") That doesn't result in any changes in tooling (built on a Raspberry PI 5 running Debian GNU/Linux 12 (bookworm)), only addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16tools headers UAPI: Sync linux/prctl.h with the kernel sources to pick FUTEX knobArnaldo Carvalho de Melo1-0/+7
To pick the changes in: 63e8595c060a1fef ("futex: Allow to make the private hash immutable") 80367ad01d93ac78 ("futex: Add basic infrastructure for local task local hash") That adds a FUTEX knob: $ tools/perf/trace/beauty/prctl_option.sh > before $ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h $ tools/perf/trace/beauty/prctl_option.sh > after $ diff -u before after --- before 2025-06-09 14:50:45.162579336 -0300 +++ after 2025-06-09 14:50:52.797660024 -0300 @@ -72,6 +72,7 @@ [75] = "SET_SHADOW_STACK_STATUS", [76] = "LOCK_SHADOW_STACK_STATUS", [77] = "TIMER_CREATE_RESTORE_IDS", + [78] = "FUTEX_HASH", }; static const char *prctl_set_mm_options[] = { [1] = "START_CODE", $ That now will be used to decode the syscall option and also to compose filters, for instance: [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME 0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee) 0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670) 7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10) 7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970) 8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10) 8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970) ^C[root@five ~]# This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/aEiYOtKkrVDT03hZ@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16perf mem: Document new output fields (op, cache, mem, dtlb, snoop)Namhyung Kim2-17/+92
Update the documentation of the new fields with examples and caveats. Also update the related documentation for AMD IBS. Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250610005742.2173050-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16tools headers: Update the fs headers with the kernel sourcesArnaldo Carvalho de Melo4-6/+17
To pick up changes from: 5d894321c49e6137 ("fs: add atomic write unit max opt to statx") a516403787e08119 ("fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions") c07d3aede2b26830 ("fscrypt: add support for hardware-wrapped keys") These are used to beautify fs syscall arguments, albeit the changes in this update are not affecting those beautifiers. This addresses these tools/ build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h diff -u tools/perf/trace/beauty/include/uapi/linux/stat.h include/uapi/linux/stat.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Eric Biggers <ebiggers@google.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Liam R. Howlett <liam.howlett@oracle.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/aEce1keWdO-vGeqe@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16perf test: Restrict uniquifying test to machines with 'uncore_imc'Chun-Tse Shao1-2/+10
The test would fail if target machine does not have 'uncore_imc' devices. Since event uniquifying behavior is similar among different architectures, we are restricting the test to only run on machines with `uncore_imc` devices. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250521224513.1104129-1-ctshao@google.com [ Skip the test, i.e. return 2, instead of returning 0 as if the test had succeed ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-16fs: drop assert in file_seek_cur_needs_f_lockLuis Henriques1-2/+6
The assert in function file_seek_cur_needs_f_lock() can be triggered very easily because there are many users of vfs_llseek() (such as overlayfs) that do their custom locking around llseek instead of relying on fdget_pos(). Just drop the overzealous assertion. Fixes: da06e3c51794 ("fs: don't needlessly acquire f_lock") Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Luis Henriques <luis@igalia.com> Link: https://lore.kernel.org/20250613101111.17716-1-luis@igalia.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-15Linux 6.16-rc2Linus Torvalds1-1/+1
2025-06-16gendwarfksyms: Fix structure type overridesSami Tolvanen2-58/+21
As we always iterate through the entire die_map when expanding type strings, recursively processing referenced types in type_expand_child() is not actually necessary. Furthermore, the type_string kABI rule added in commit c9083467f7b9 ("gendwarfksyms: Add a kABI rule to override type strings") can fail to override type strings for structures due to a missing kabi_get_type_string() check in this function. Fix the issue by dropping the unnecessary recursion and moving the override check to type_expand(). Note that symbol versions are otherwise unchanged with this patch. Fixes: c9083467f7b9 ("gendwarfksyms: Add a kABI rule to override type strings") Reported-by: Giuliano Procida <gprocida@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-06-16kbuild: move warnings about linux/export.h from W=1 to W=2Masahiro Yamada2-6/+12
This hides excessive warnings, as nobody builds with W=2. Fixes: a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1") Fixes: 7d95680d64ac ("scripts/misc-check: check unnecessary #include <linux/export.h> when W=1") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com>
2025-06-13io_uring: run local task_work from ring exit IOPOLL reapingJens Axboe1-0/+3
In preparation for needing to shift NVMe passthrough to always use task_work for polled IO completions, ensure that those are suitably run at exit time. See commit: 9ce6c9875f3e ("nvme: always punt polled uring_cmd end_io work to task_work") for details on why that is necessary. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13nvme: always punt polled uring_cmd end_io work to task_workJens Axboe1-14/+7
Currently NVMe uring_cmd completions will complete locally, if they are polled. This is done because those completions are always invoked from task context. And while that is true, there's no guarantee that it's invoked under the right ring context, or even task. If someone does NVMe passthrough via multiple threads and with a limited number of poll queues, then ringA may find completions from ringB. For that case, completing the request may not be sound. Always just punt the passthrough completions via task_work, which will redirect the completion, if needed. Cc: stable@vger.kernel.org Fixes: 585079b6e425 ("nvme: wire up async polling for io passthrough commands") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13posix-cpu-timers: fix race between handle_posix_cpu_timers() and posix_cpu_timer_del()Oleg Nesterov1-0/+9
If an exiting non-autoreaping task has already passed exit_notify() and calls handle_posix_cpu_timers() from IRQ, it can be reaped by its parent or debugger right after unlock_task_sighand(). If a concurrent posix_cpu_timer_del() runs at that moment, it won't be able to detect timer->it.cpu.firing != 0: cpu_timer_task_rcu() and/or lock_task_sighand() will fail. Add the tsk->exit_state check into run_posix_cpu_timers() to fix this. This fix is not needed if CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, because exit_task_work() is called before exit_notify(). But the check still makes sense, task_work_add(&tsk->posix_cputimers_work.work) will fail anyway in this case. Cc: stable@vger.kernel.org Reported-by: Benoît Sevens <bsevens@google.com> Fixes: 0bdd2ed4138e ("sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-06-13io_uring/kbuf: don't truncate end buffer for multiple buffer peeksJens Axboe1-1/+4
If peeking a bunch of buffers, normally io_ring_buffers_peek() will truncate the end buffer. This isn't optimal as presumably more data will be arriving later, and hence it's better to stop with the last full buffer rather than truncate the end buffer. Cc: stable@vger.kernel.org Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers") Reported-by: Christian Mazakas <christian.mazakas@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13powerpc: Fix struct termio related ioctl macrosMadhavan Srinivasan1-4/+4
Since termio interface is now obsolete, include/uapi/asm/ioctls.h has some constant macros referring to "struct termio", this caused build failure at userspace. In file included from /usr/include/asm/ioctl.h:12, from /usr/include/asm/ioctls.h:5, from tst-ioctls.c:3: tst-ioctls.c: In function 'get_TCGETA': tst-ioctls.c:12:10: error: invalid application of 'sizeof' to incomplete type 'struct termio' 12 | return TCGETA; | ^~~~~~ Even though termios.h provides "struct termio", trying to juggle definitions around to make it compile could introduce regressions. So better to open code it. Reported-by: Tulio Magno <tuliom@ascii.art.br> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Closes: https://lore.kernel.org/linuxppc-dev/8734dji5wl.fsf@ascii.art.br/ Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250517142237.156665-1-maddy@linux.ibm.com
2025-06-13Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublistsBagas Sanjaya1-0/+2
Stephen Rothwell reports htmldocs warning on ublk docs: Documentation/block/ublk.rst:414: ERROR: Unexpected indentation. [docutils] Fix the warning by separating sublists of auto buffer registration fallback behavior from their appropriate parent list item. Fixes: ff20c516485e ("ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250612132638.193de386@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20250613023857.15971-1-bagasdotme@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13iommu/tegra: Fix incorrect size calculationJason Gunthorpe1-2/+2
This driver uses a mixture of ways to get the size of a PTE, tegra_smmu_set_pde() did it as sizeof(*pd) which became wrong when pd switched to a struct tegra_pd. Switch pd back to a u32* in tegra_smmu_set_pde() so the sizeof(*pd) returns 4. Fixes: 50568f87d1e2 ("iommu/terga: Do not use struct page as the handle for as->pd memory") Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Closes: https://lore.kernel.org/all/62e7f7fe-6200-4e4f-ad42-d58ad272baa6@tecnico.ulisboa.pt/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Link: https://lore.kernel.org/r/0-v1-da7b8b3d57eb+ce-iommu_terga_sizeof_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-13block: Fix bvec_set_folio() for very large foliosMatthew Wilcox (Oracle)1-2/+5
Similarly to 26064d3e2b4d ("block: fix adding folio to bio"), if we attempt to add a folio that is larger than 4GB, we'll silently truncate the offset and len. Widen the parameters to size_t, assert that the length is less than 4GB and set the first page that contains the interesting data rather than the first page of the folio. Fixes: 26db5ee15851 (block: add a bvec_set_folio helper) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20250612144255.2850278-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAPMatthew Wilcox (Oracle)1-1/+1
It is possible for physically contiguous folios to have discontiguous struct pages if SPARSEMEM is enabled and SPARSEMEM_VMEMMAP is not. This is correctly handled by folio_page_idx(), so remove this open-coded implementation. Fixes: 640d1930bef4 (block: Add bio_for_each_folio_all()) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20250612144126.2849931-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engineThangaraj Samynathan1-1/+1
Removes MSI-X from the interrupt request path, as the DMA engine used by the SPI controller does not support MSI-X interrupts. Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Link: https://patch.msgid.link/20250612023059.71726-1-thangaraj.s@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-06-13powerpc: dts: mpc8315erdb: Add GPIO controller nodeJ. Neuschäfer1-0/+10
The MPC8315E SoC and variants have a GPIO controller at IMMR + 0xc00. This node was previously missing from the device tree. Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250611-mpc-gpio-v1-1-02d1f75336e2@posteo.net
2025-06-13powerpc/microwatt: Fix model property in device treeJ. Neuschäfer1-1/+1
The standard property for the model name is called "model". Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250611-microwatt-v2-1-80847bbc5f9c@posteo.net
2025-06-13powerpc/eeh: Fix missing PE bridge reconfiguration during VFIO EEH recoveryNarayana Murty N1-0/+2
VFIO EEH recovery for PCI passthrough devices fails on PowerNV and pseries platforms due to missing host-side PE bridge reconfiguration. In the current implementation, eeh_pe_configure() only performs RTAS or OPAL-based bridge reconfiguration for native host devices, but skips it entirely for PEs managed through VFIO in guest passthrough scenarios. This leads to incomplete EEH recovery when a PCI error affects a passthrough device assigned to a QEMU/KVM guest. Although VFIO triggers the EEH recovery flow through VFIO_EEH_PE_ENABLE ioctl, the platform-specific bridge reconfiguration step is silently bypassed. As a result, the PE's config space is not fully restored, causing subsequent config space access failures or EEH freeze-on-access errors inside the guest. This patch fixes the issue by ensuring that eeh_pe_configure() always invokes the platform's configure_bridge() callback (e.g., pseries_eeh_phb_configure_bridge) even for VFIO-managed PEs. This ensures that RTAS or OPAL calls to reconfigure the PE bridge are correctly issued on the host side, restoring the PE's configuration space after an EEH event. This fix is essential for reliable EEH recovery in QEMU/KVM guests using VFIO PCI passthrough on PowerNV and pseries systems. Tested with: - QEMU/KVM guest using VFIO passthrough (IBM Power9,(lpar)Power11 host) - Injected EEH errors with pseries EEH errinjct tool on host, recovery verified on qemu guest. - Verified successful config space access and CAP_EXP DevCtl restoration after recovery Fixes: 212d16cdca2d ("powerpc/eeh: EEH support for VFIO PCI device") Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250508062928.146043-1-nnmlinux@linux.ibm.com
2025-06-13powerpc/vdso: Fix build of VDSO32 with pcrelChristophe Leroy2-2/+2
Building vdso32 on power10 with pcrel leads to following errors: VDSO32A arch/powerpc/kernel/vdso/gettimeofday-32.o arch/powerpc/kernel/vdso/gettimeofday.S: Assembler messages: arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: syntax error; found `@', expected `,' arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: junk at end of line: `@notoc' arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here ... make[2]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/gettimeofday-32.o] Error 1 make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2 Once the above is fixed, the following happens: VDSO32C arch/powerpc/kernel/vdso/vgettimeofday-32.o cc1: error: '-mpcrel' requires '-mcmodel=medium' make[2]: *** [arch/powerpc/kernel/vdso/Makefile:89: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1 make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2 make: *** [Makefile:251: __sub-make] Error 2 Make sure pcrel version of CFUNC() macro is used only for powerpc64 builds and remove -mpcrel for powerpc32 builds. Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/1fa3453f07d42a50a70114da9905bf7b73304fca.1747073669.git.christophe.leroy@csgroup.eu
2025-06-12mm: add mmap_prepare() compatibility layer for nested file systemsLorenzo Stoakes5-3/+107
Nested file systems, that is those which invoke call_mmap() within their own f_op->mmap() handlers, may encounter underlying file systems which provide the f_op->mmap_prepare() hook introduced by commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). We have a chicken-and-egg scenario here - until all file systems are converted to using .mmap_prepare(), we cannot convert these nested handlers, as we can't call f_op->mmap from an .mmap_prepare() hook. So we have to do it the other way round - invoke the .mmap_prepare() hook from an .mmap() one. in order to do so, we need to convert VMA state into a struct vm_area_desc descriptor, invoking the underlying file system's f_op->mmap_prepare() callback passing a pointer to this, and then setting VMA state accordingly and safely. This patch achieves this via the compat_vma_mmap_prepare() function, which we invoke from call_mmap() if f_op->mmap_prepare() is specified in the passed in file pointer. We place the fundamental logic into mm/vma.h where VMA manipulation belongs. We also update the VMA userland tests to accommodate the changes. The compat_vma_mmap_prepare() function and its associated machinery is temporary, and will be removed once the conversion of file systems is complete. We carefully place this code so it can be used with CONFIG_MMU and also with cutting edge nommu silicon. [akpm@linux-foundation.org: export compat_vma_mmap_prepare tp fix build] [lorenzo.stoakes@oracle.com: remove unused declarations] Link: https://lkml.kernel.org/r/ac3ae324-4c65-432a-8c6d-2af988b18ac8@lucifer.local Link: https://lkml.kernel.org/r/20250609165749.344976-1-lorenzo.stoakes@oracle.com Fixes: c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Jann Horn <jannh@google.com> Closes: https://lore.kernel.org/linux-mm/CAG48ez04yOEVx1ekzOChARDDBZzAKwet8PEoPM4Ln3_rk91AzQ@mail.gmail.com/ Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-12smb: improve directory cache reuse for readdir operationsBharath SM2-17/+19
Currently, cached directory contents were not reused across subsequent 'ls' operations because the cache validity check relied on comparing the ctx pointer, which changes with each readdir invocation. As a result, the cached dir entries was not marked as valid and the cache was not utilized for subsequent 'ls' operations. This change uses the file pointer, which remains consistent across all readdir calls for a given directory instance, to associate and validate the cache. As a result, cached directory contents can now be correctly reused, improving performance for repeated directory listings. Performance gains with local windows SMB server: Without the patch and default actimeo=1: 1000 directory enumeration operations on dir with 10k files took 135.0s With this patch and actimeo=0: 1000 directory enumeration operations on dir with 10k files took just 5.1s Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-12smb: client: fix perf regression with deferred closesPaulo Alcantara1-3/+6
Customer reported that one of their applications started failing to open files with STATUS_INSUFFICIENT_RESOURCES due to NetApp server hitting the maximum number of opens to same file that it would allow for a single client connection. It turned out the client was failing to reuse open handles with deferred closes because matching ->f_flags directly without masking off O_CREAT|O_EXCL|O_TRUNC bits first broke the comparision and then client ended up with thousands of deferred closes to same file. Those bits are already satisfied on the original open, so no need to check them against existing open handles. Reproducer: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <pthread.h> #define NR_THREADS 4 #define NR_ITERATIONS 2500 #define TEST_FILE "/mnt/1/test/dir/foo" static char buf[64]; static void *worker(void *arg) { int i, j; int fd; for (i = 0; i < NR_ITERATIONS; i++) { fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_APPEND, 0666); for (j = 0; j < 16; j++) write(fd, buf, sizeof(buf)); close(fd); } } int main(int argc, char *argv[]) { pthread_t t[NR_THREADS]; int fd; int i; fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0666); close(fd); memset(buf, 'a', sizeof(buf)); for (i = 0; i < NR_THREADS; i++) pthread_create(&t[i], NULL, worker, NULL); for (i = 0; i < NR_THREADS; i++) pthread_join(t[i], NULL); return 0; } Before patch: $ mount.cifs //srv/share /mnt/1 -o ... $ mkdir -p /mnt/1/test/dir $ gcc repro.c && ./a.out ... number of opens: 1391 After patch: $ mount.cifs //srv/share /mnt/1 -o ... $ mkdir -p /mnt/1/test/dir $ gcc repro.c && ./a.out ... number of opens: 1 Cc: linux-cifs@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: Jay Shin <jaeshin@redhat.com> Cc: Pierguido Lambri <plambri@redhat.com> Fixes: b8ea3b1ff544 ("smb: enable reuse of deferred file handles for write operations") Acked-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-12drm/xe/lrc: Use a temporary buffer for WA BBLucas De Marchi1-4/+20
In case the BO is in iomem, we can't simply take the vaddr and write to it. Instead, prepare a separate buffer that is later copied into io memory. Right now it's just a few words that could be using xe_map_write32(), but the intention is to grow the WA BB for other uses. Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization") Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit ef48715b2d3df17c060e23b9aa636af3d95652f8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-06-12selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS contextGal Pressman1-1/+58
Add test_rss_default_context_rule() to verify that ntuple rules can correctly direct traffic to the default RSS context (context 0). The test creates two ntuple rules with explicit location priorities: - A high-priority rule (loc 0) directing specific port traffic to context 0. - A low-priority rule (loc 1) directing all other TCP traffic to context 1. This validates that: 1. Rules targeting the default context function properly. 2. Traffic steering works as expected when mixing default and additional RSS contexts. The test was written by AI, and reviewed by humans. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250612071958.1696361-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net: ethtool: Don't check if RSS context exists in case of context 0Gal Pressman1-1/+2
Context 0 (default context) always exists, there is no need to check whether it exists or not when adding a flow steering rule. The existing check fails when creating a flow steering rule for context 0 as it is not stored in the rss_ctx xarray. For example: $ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618 rmgr: Cannot insert RX class rule: Invalid argument Cannot insert classification rule An example usecase for this could be: - A high-priority rule (loc 0) directing specific port traffic to context 0. - A low-priority rule (loc 1) directing all other TCP traffic to context 1. This is a user-visible regression that was caught in our testing environment, it was not reported by a user yet. Fixes: de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist") Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.Kuniyuki Iwashima1-1/+2
Before the cited commit, the kernel unconditionally embedded SCM credentials to skb for embryo sockets even when both the sender and listener disabled SO_PASSCRED and SO_PASSPIDFD. Now, the credentials are added to skb only when configured by the sender or the listener. However, as reported in the link below, it caused a regression for some programs that assume credentials are included in every skb, but sometimes not now. The only problematic scenario would be that a socket starts listening before setting the option. Then, there will be 2 types of non-small race window, where a client can send skb without credentials, which the peer receives as an "invalid" message (and aborts the connection it seems ?): Client Server ------ ------ s1.listen() <-- No SO_PASS{CRED,PIDFD} s2.connect() s2.send() <-- w/o cred s1.setsockopt(SO_PASS{CRED,PIDFD}) s2.send() <-- w/ cred or Client Server ------ ------ s1.listen() <-- No SO_PASS{CRED,PIDFD} s2.connect() s2.send() <-- w/o cred s3, _ = s1.accept() <-- Inherit cred options s2.send() <-- w/o cred but not set yet s3.setsockopt(SO_PASS{CRED,PIDFD}) s2.send() <-- w/ cred It's unfortunate that buggy programs depend on the behaviour, but let's restore the previous behaviour. Fixes: 3f84d577b79d ("af_unix: Inherit sk_flags at connect().") Reported-by: Jacek Łuczak <difrost.kernel@gmail.com> Closes: https://lore.kernel.org/all/68d38b0b-1666-4974-85d4-15575789c8d4@gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Tested-by: Christian Heusel <christian@heusel.eu> Tested-by: André Almeida <andrealmeid@igalia.com> Tested-by: Jacek Łuczak <difrost.kernel@gmail.com> Link: https://patch.msgid.link/20250611202758.3075858-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12ipv6: Move fib6_config_validate() to ip6_route_add().Kuniyuki Iwashima1-55/+55
syzkaller created an IPv6 route from a malformed packet, which has a prefix len > 128, triggering the splat below. [0] This is a similar issue fixed by commit 586ceac9acb7 ("ipv6: Restore fib6_config validation for SIOCADDRT."). The cited commit removed fib6_config validation from some callers of ip6_add_route(). Let's move the validation back to ip6_route_add() and ip6_route_multipath_add(). [0]: UBSAN: array-index-out-of-bounds in ./include/net/ipv6.h:616:34 index 20 is out of range for type '__u8 [16]' CPU: 1 UID: 0 PID: 7444 Comm: syz.0.708 Not tainted 6.16.0-rc1-syzkaller-g19272b37aa4f #0 PREEMPT Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80078a80>] dump_backtrace+0x2e/0x3c arch/riscv/kernel/stacktrace.c:132 [<ffffffff8000327a>] show_stack+0x30/0x3c arch/riscv/kernel/stacktrace.c:138 [<ffffffff80061012>] __dump_stack lib/dump_stack.c:94 [inline] [<ffffffff80061012>] dump_stack_lvl+0x12e/0x1a6 lib/dump_stack.c:120 [<ffffffff800610a6>] dump_stack+0x1c/0x24 lib/dump_stack.c:129 [<ffffffff8001c0ea>] ubsan_epilogue+0x14/0x46 lib/ubsan.c:233 [<ffffffff819ba290>] __ubsan_handle_out_of_bounds+0xf6/0xf8 lib/ubsan.c:455 [<ffffffff85b363a4>] ipv6_addr_prefix include/net/ipv6.h:616 [inline] [<ffffffff85b363a4>] ip6_route_info_create+0x8f8/0x96e net/ipv6/route.c:3793 [<ffffffff85b635da>] ip6_route_add+0x2a/0x1aa net/ipv6/route.c:3889 [<ffffffff85b02e08>] addrconf_prefix_route+0x2c4/0x4e8 net/ipv6/addrconf.c:2487 [<ffffffff85b23bb2>] addrconf_prefix_rcv+0x1720/0x1e62 net/ipv6/addrconf.c:2878 [<ffffffff85b92664>] ndisc_router_discovery+0x1a06/0x3504 net/ipv6/ndisc.c:1570 [<ffffffff85b99038>] ndisc_rcv+0x500/0x600 net/ipv6/ndisc.c:1874 [<ffffffff85bc2c18>] icmpv6_rcv+0x145e/0x1e0a net/ipv6/icmp.c:988 [<ffffffff85af6798>] ip6_protocol_deliver_rcu+0x18a/0x1976 net/ipv6/ip6_input.c:436 [<ffffffff85af8078>] ip6_input_finish+0xf4/0x174 net/ipv6/ip6_input.c:480 [<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:317 [inline] [<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:311 [inline] [<ffffffff85af8262>] ip6_input+0x16a/0x70c net/ipv6/ip6_input.c:491 [<ffffffff85af8dcc>] ip6_mc_input+0x5c8/0x1268 net/ipv6/ip6_input.c:588 [<ffffffff85af6112>] dst_input include/net/dst.h:469 [inline] [<ffffffff85af6112>] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] [<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:317 [inline] [<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:311 [inline] [<ffffffff85af6112>] ipv6_rcv+0x5ae/0x6e0 net/ipv6/ip6_input.c:309 [<ffffffff85087e84>] __netif_receive_skb_one_core+0x106/0x16e net/core/dev.c:5977 [<ffffffff85088104>] __netif_receive_skb+0x2c/0x144 net/core/dev.c:6090 [<ffffffff850883c6>] netif_receive_skb_internal net/core/dev.c:6176 [inline] [<ffffffff850883c6>] netif_receive_skb+0x1aa/0xbf2 net/core/dev.c:6235 [<ffffffff8328656e>] tun_rx_batched.isra.0+0x430/0x686 drivers/net/tun.c:1485 [<ffffffff8329ed3a>] tun_get_user+0x2952/0x3d6c drivers/net/tun.c:1938 [<ffffffff832a21e0>] tun_chr_write_iter+0xc4/0x21c drivers/net/tun.c:1984 [<ffffffff80b9b9ae>] new_sync_write fs/read_write.c:593 [inline] [<ffffffff80b9b9ae>] vfs_write+0x56c/0xa9a fs/read_write.c:686 [<ffffffff80b9c2be>] ksys_write+0x126/0x228 fs/read_write.c:738 [<ffffffff80b9c42e>] __do_sys_write fs/read_write.c:749 [inline] [<ffffffff80b9c42e>] __se_sys_write fs/read_write.c:746 [inline] [<ffffffff80b9c42e>] __riscv_sys_write+0x6e/0x94 fs/read_write.c:746 [<ffffffff80076912>] syscall_handler+0x94/0x118 arch/riscv/include/asm/syscall.h:112 [<ffffffff8637e31e>] do_trap_ecall_u+0x396/0x530 arch/riscv/kernel/traps.c:341 [<ffffffff863a69e2>] handle_exception+0x146/0x152 arch/riscv/kernel/entry.S:197 Fixes: fa76c1674f2e ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().") Reported-by: syzbot+4c2358694722d304c44e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6849b8c3.a00a0220.1eb5f5.00f0.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250611193551.2999991-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net: drv: netdevsim: don't napi_complete() from netpollJakub Kicinski1-1/+2
netdevsim supports netpoll. Make sure we don't call napi_complete() from it, since it may not be scheduled. Breno reports hitting a warning in napi_complete_done(): WARNING: CPU: 14 PID: 104 at net/core/dev.c:6592 napi_complete_done+0x2cc/0x560 __napi_poll+0x2d8/0x3a0 handle_softirqs+0x1fe/0x710 This is presumably after netpoll stole the SCHED bit prematurely. Reported-by: Breno Leitao <leitao@debian.org> Fixes: 3762ec05a9fb ("netdevsim: add NAPI support") Tested-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250611174643.2769263-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()Dan Carpenter1-2/+17
Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails. Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12veth: prevent NULL pointer dereference in veth_xdp_rcvJesper Dangaard Brouer1-2/+2
The veth peer device is RCU protected, but when the peer device gets deleted (veth_dellink) then the pointer is assigned NULL (via RCU_INIT_POINTER). This patch adds a necessary NULL check in veth_xdp_rcv when accessing the veth peer net_device. This fixes a bug introduced in commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops"). The bug is a race and only triggers when having inflight packets on a veth that is being deleted. Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev> Closes: https://lore.kernel.org/all/fecfcad0-7a16-42b8-bff2-66ee83a6e5c4@linux.dev/ Reported-by: syzbot+c4c7bf27f6b0c4bd97fe@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/683da55e.a00a0220.d8eae.0052.GAE@google.com/ Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev> Link: https://patch.msgid.link/174964557873.519608.10855046105237280978.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net_sched: remove qdisc_tree_flush_backlog()Eric Dumazet1-8/+0
This function is no longer used after the four prior fixes. Given all prior uses were wrong, it seems better to remove it. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250611111515.1983366-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net_sched: ets: fix a race in ets_qdisc_change()Eric Dumazet1-1/+1
Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer fires at the wrong time. The race is as follows: CPU 0 CPU 1 [1]: lock root [2]: qdisc_tree_flush_backlog() [3]: unlock root | | [5]: lock root | [6]: rehash | [7]: qdisc_tree_reduce_backlog() | [4]: qdisc_put() This can be abused to underflow a parent's qlen. Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog() should fix the race, because all packets will be purged from the qdisc before releasing the lock. Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock") Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net_sched: tbf: fix a race in tbf_change()Eric Dumazet1-1/+1
Gerrard Tai reported a race condition in TBF, whenever SFQ perturb timer fires at the wrong time. The race is as follows: CPU 0 CPU 1 [1]: lock root [2]: qdisc_tree_flush_backlog() [3]: unlock root | | [5]: lock root | [6]: rehash | [7]: qdisc_tree_reduce_backlog() | [4]: qdisc_put() This can be abused to underflow a parent's qlen. Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog() should fix the race, because all packets will be purged from the qdisc before releasing the lock. Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock") Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://patch.msgid.link/20250611111515.1983366-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net_sched: red: fix a race in __red_change()Eric Dumazet1-1/+1
Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer fires at the wrong time. The race is as follows: CPU 0 CPU 1 [1]: lock root [2]: qdisc_tree_flush_backlog() [3]: unlock root | | [5]: lock root | [6]: rehash | [7]: qdisc_tree_reduce_backlog() | [4]: qdisc_put() This can be abused to underflow a parent's qlen. Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog() should fix the race, because all packets will be purged from the qdisc before releasing the lock. Fixes: 0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace") Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net_sched: prio: fix a race in prio_tune()Eric Dumazet1-1/+1
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer fires at the wrong time. The race is as follows: CPU 0 CPU 1 [1]: lock root [2]: qdisc_tree_flush_backlog() [3]: unlock root | | [5]: lock root | [6]: rehash | [7]: qdisc_tree_reduce_backlog() | [4]: qdisc_put() This can be abused to underflow a parent's qlen. Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog() should fix the race, because all packets will be purged from the qdisc before releasing the lock. Fixes: 7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change") Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net_sched: sch_sfq: reject invalid perturb periodEric Dumazet1-2/+8
Gerrard Tai reported that SFQ perturb_period has no range check yet, and this can be used to trigger a race condition fixed in a separate patch. We want to make sure ctl->perturb_period * HZ will not overflow and is positive. Tested: tc qd add dev lo root sfq perturb -10 # negative value : error Error: sch_sfq: invalid perturb period. tc qd add dev lo root sfq perturb 1000000000 # too big : error Error: sch_sfq: invalid perturb period. tc qd add dev lo root sfq perturb 2000000 # acceptable value tc -s -d qd sh dev lo qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12net: phy: phy_caps: Don't skip better duplex macth on non-exact matchMaxime Chevallier1-6/+12
When performing a non-exact phy_caps lookup, we are looking for a supported mode that matches as closely as possible the passed speed/duplex. Blamed patch broke that logic by returning a match too early in case the caller asks for half-duplex, as a full-duplex linkmode may match first, and returned as a non-exact match without even trying to mach on half-duplex modes. Reported-by: Jijie Shao <shaojijie@huawei.com> Closes: https://lore.kernel.org/netdev/20250603102500.4ec743cf@fedora/T/#m22ed60ca635c67dc7d9cbb47e8995b2beb5c1576 Tested-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Fixes: fc81e257d19f ("net: phy: phy_caps: Allow looking-up link caps based on speed and duplex") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250606094321.483602-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12io_uring: consistently use rcu semantics with sqpoll threadKeith Busch4-15/+38
The sqpoll thread is dereferenced with rcu read protection in one place, so it needs to be annotated as an __rcu type, and should consistently use rcu helpers for access and assignment to make sparse happy. Since most of the accesses occur under the sqd->lock, we can use rcu_dereference_protected() without declaring an rcu read section. Provide a simple helper to get the thread from a locked context. Fixes: ac0b8b327a5677d ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250611205343.1821117-1-kbusch@meta.com [axboe: fold in fix for register.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-12fs: unlock the superblock during iterate_supers_typeDarrick J. Wong1-1/+3
This function takes super_lock in shared mode, so it should release the same lock. Cc: stable@vger.kernel.org # v6.16-rc1 Fixes: af7551cf13cf7f ("super: remove pointless s_root checks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Link: https://lore.kernel.org/20250611164044.GF6138@frogsfrogsfrogs Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-12ovl: fix debug print in case of mkdir errorAmir Goldstein1-3/+5
We want to print the name in case of mkdir failure and now we will get a cryptic (efault) as name. Fixes: c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250612072245.2825938-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-11init: fix build warnings about export.hHuacai Chen2-0/+2
After commit a934a57a42f64a4 ("scripts/misc-check: check missing #include <linux/export.h> when W=1") and 7d95680d64ac8e836c ("scripts/misc-check: check unnecessary #include <linux/export.h> when W=1"), we get some build warnings with W=1: init/main.c: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing init/initramfs.c: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing So fix these build warnings for the init code. Link: https://lkml.kernel.org/r/20250608141235.155206-1-chenhuacai@loongson.cn Fixes: a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-11MAINTAINERS: add Barry as a THP reviewerBarry Song1-0/+1
I have been actively contributing to mTHP and reviewing related patches for an extended period, and I would like to continue supporting patch reviews. Link: https://lkml.kernel.org/r/20250609002442.1856-1-21cnbao@gmail.com Signed-off-by: Barry Song <baohua@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Dev Jain <dev.jain@arm.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-11drivers/rapidio/rio_cm.c: prevent possible heap overwriteAndrew Morton1-0/+3
In riocm_cdev_ioctl(RIO_CM_CHAN_SEND) -> cm_chan_msg_send() -> riocm_ch_send() cm_chan_msg_send() checks that userspace didn't send too much data but riocm_ch_send() failed to check that userspace sent sufficient data. The result is that riocm_ch_send() can write to fields in the rio_ch_chan_hdr which were outside the bounds of the space which cm_chan_msg_send() allocated. Address this by teaching riocm_ch_send() to check that the entire rio_ch_chan_hdr was copied in from userspace. Reported-by: maher azz <maherazz04@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-11mm: close theoretical race where stale TLB entries could lingerRyan Roberts1-0/+2
Commit 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries") described a theoretical race as such: """ Nadav Amit identified a theoretical race between page reclaim and mprotect due to TLB flushes being batched outside of the PTL being held. He described the race as follows: CPU0 CPU1 ---- ---- user accesses memory using RW PTE [PTE now cached in TLB] try_to_unmap_one() ==> ptep_get_and_clear() ==> set_tlb_ubc_flush_pending() mprotect(addr, PROT_READ) ==> change_pte_range() ==> [ PTE non-present - no flush ] user writes using cached RW PTE ... try_to_unmap_flush() The same type of race exists for reads when protecting for PROT_NONE and also exists for operations that can leave an old TLB entry behind such as munmap, mremap and madvise. """ The solution was to introduce flush_tlb_batched_pending() and call it under the PTL from mprotect/madvise/munmap/mremap to complete any pending tlb flushes. However, while madvise_free_pte_range() and madvise_cold_or_pageout_pte_range() were both retro-fitted to call flush_tlb_batched_pending() immediately after initially acquiring the PTL, they both temporarily release the PTL to split a large folio if they stumble upon one. In this case, where re-acquiring the PTL flush_tlb_batched_pending() must be called again, but it previously was not. Let's fix that. There are 2 Fixes: tags here: the first is the commit that fixed madvise_free_pte_range(). The second is the commit that added madvise_cold_or_pageout_pte_range(), which looks like it copy/pasted the faulty pattern from madvise_free_pte_range(). This is a theoretical bug discovered during code review. Link: https://lkml.kernel.org/r/20250606092809.4194056-1-ryan.roberts@arm.com Fixes: 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries") Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>