aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-30Documentation: net: add flow control guide and document ethtool APIOleksij Rempel1-3/+42
Introduce a new document, flow_control.rst, to provide a comprehensive guide on Ethernet Flow Control in Linux. The guide explains how flow control works, how autonegotiation resolves pause capabilities, and how to configure it using ethtool and Netlink. In parallel, document the pause and pause-stat attributes in the ethtool.yaml netlink spec. This enables the ynl tool to generate kernel-doc comments for the corresponding enums in the UAPI header, making the C interface self-documenting. Finally, replace the legacy flow control section in phy.rst with a reference to the new document and add pointers in the relevant C source files. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250924120241.724850-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-29Merge tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-4/+0
Pull powerpc updates from Madhavan Srinivasan: - powerpc support for BPF arena and arena atomics - Patches to switch to msi parent domain (per-device MSI domains) - Add a lock contention tracepoint in the queued spinlock slowpath - Fixes for underflow in pseries/powernv msi and pci paths - Switch from legacy-of-mm-gpiochip dependency to platform driver - Fixes for handling TLB misses - Introduce support for powerpc papr-hvpipe - Add vpa-dtl PMU driver for pseries platform - Misc fixes and cleanups Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence, Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao, Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters, Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao Bagalkote. * tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits) powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct genirq/msi: Remove msi_post_free() powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf powerpc/time: Expose boot_tb via accessor powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure powerpc/fprobe: fix updated fprobe for function-graph tracer powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL powerpc64/modules: replace stub allocation sentinel with an explicit counter powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs powerpc/ftrace: ensure ftrace record ops are always set for NOPs powerpc/603: Really copy kernel PGD entries into all PGDIRs powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler powerpc/pseries: HVPIPE changes to support migration powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS powerpc/pseries: Enable HVPIPE event message interrupt ...
2025-09-29Merge tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds1-0/+23
Pull s390 updates from Alexander Gordeev: - Refactor SCLP memory hotplug code - Introduce common boot_panic() decompressor helper macro and use it to get rid of nearly few identical implementations - Take into account additional key generation flags and forward it to the ep11 implementation. With that allow users to modify the key generation process, e.g. provide valid combinations of XCP_BLOB_* flags - Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390 debug facility and HMC driver - Add DAX support for DCSS memory block devices - Make the compiler statement attribute "assume" available with a new __assume macro - Rework ffs() and fls() family bitops functions, including source code improvements and generated code optimizations. Use the newly introduced __assume macro for that - Enable additional network features in default configurations - Use __GFP_ACCOUNT flag for user page table allocations to add missing kmemcg accounting - Add WQ_PERCPU flag to explicitly request the use of the per-CPU workqueue for 3590 tape driver - Switch power reading to the per-CPU and the Hiperdispatch to the default workqueue - Add memory allocation profiling hooks to allow better profiling data and the /proc/allocinfo output similar to other architectures * tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits) s390/mm: Add memory allocation profiling hooks s390: Replace use of system_wq with system_dfl_wq s390/diag324: Replace use of system_wq with system_percpu_wq s390/tape: Add WQ_PERCPU to alloc_workqueue users s390/bitops: Switch to generic ffs() if supported by compiler s390/bitops: Switch to generic fls(), fls64(), etc. s390/mm: Use __GFP_ACCOUNT for user page table allocations s390/configs: Enable additional network features s390/bitops: Cleanup __flogr() s390/bitops: Use __assume() for __flogr() inline assembly return value compiler_types: Add __assume macro s390/bitops: Limit return value range of __flogr() s390/dcssblk: Add DAX support s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul() s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul() s390/pkey: Forward keygenflags to ep11_unwrapkey s390/boot: Add common boot_panic() code s390/bitops: Optimize inlining s390/bitops: Slightly optimize ffs() and fls64() s390/sclp: Move memory hotplug code for better modularity ...
2025-09-29dpll: add phase_offset_avg_factor_get/set callback opsIvan Vecera1-0/+6
Add new callback operations for a dpll device: - phase_offset_avg_factor_get(...) - to obtain current phase offset averaging factor from dpll device, - phase_offset_avg_factor_set(...) - to set phase offset averaging factor Obtain the factor value using the get callback and provide it to the user if the device driver implement this callback. Execute the set callback upon user requests, if the driver implement it. Signed-off-by: Ivan Vecera <ivecera@redhat.com> v2: * do not require 'set' callback to retrieve current value * always call 'set' callback regardless of current value Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250927084912.2343597-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linuxJakub Kicinski1-3/+8
Tariq Toukan says: ==================== mlx5-next updates 2025-09-28 * tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: IFC add balance ID and LAG per MP group bits net/mlx5: Add IFC bit for TIR/SQ order capability ==================== Link: https://patch.msgid.link/1759093989-841873-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds1-0/+3
Pull arm64 updates from Will Deacon: "There's good stuff across the board, including some nice mm improvements for CPUs with the 'noabort' BBML2 feature and a clever patch to allow ptdump to play nicely with block mappings in the vmalloc area. Confidential computing: - Add support for accepting secrets from firmware (e.g. ACPI CCEL) and mapping them with appropriate attributes. CPU features: - Advertise atomic floating-point instructions to userspace - Extend Spectre workarounds to cover additional Arm CPU variants - Extend list of CPUs that support break-before-make level 2 and guarantee not to generate TLB conflict aborts for changes of mapping granularity (BBML2_NOABORT) - Add GCS support to our uprobes implementation. Documentation: - Remove bogus SME documentation concerning register state when entering/exiting streaming mode. Entry code: - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY) - Micro-optimise syscall entry path with a compiler branch hint. Memory management: - Enable huge mappings in vmalloc space even when kernel page-table dumping is enabled - Tidy up the types used in our early MMU setup code - Rework rodata= for closer parity with the behaviour on x86 - For CPUs implementing BBML2_NOABORT, utilise block mappings in the linear map even when rodata= applies to virtual aliases - Don't re-allocate the virtual region between '_text' and '_stext', as doing so confused tools parsing /proc/vmcore. Miscellaneous: - Clean-up Kconfig menuconfig text for architecture features - Avoid redundant bitmap_empty() during determination of supported SME vector lengths - Re-enable warnings when building the 32-bit vDSO object - Avoid breaking our eggs at the wrong end. Perf and PMUs: - Support for v3 of the Hisilicon L3C PMU - Support for Hisilicon's MN and NoC PMUs - Support for Fujitsu's Uncore PMU - Support for SPE's extended event filtering feature - Preparatory work to enable data source filtering in SPE - Support for multiple lanes in the DWC PCIe PMU - Support for i.MX94 in the IMX DDR PMU driver - MAINTAINERS update (Thank you, Yicong) - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets). Selftests: - Add basic LSFE check to the existing hwcaps test - Support nolibc in GCS tests - Extend SVE ptrace test to pass unsupported regsets and invalid vector lengths - Minor cleanups (typos, cosmetic changes). System registers: - Fix ID_PFR1_EL1 definition - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1 - Sync TCR_EL1 definition with the latest Arm ARM (L.b) - Be stricter about the input fed into our AWK sysreg generator script - Typo fixes and removal of redundant definitions. ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code. CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: cpufeature: Remove duplicate asm/mmu.h header arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN perf/dwc_pcie: Fix use of uninitialized variable arm/syscalls: mark syscall invocation as likely in invoke_syscall Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver arm64: map [_text, _stext) virtual address range non-executable+read-only arm64/sysreg: Update TCR_EL1 register arm64: Enable vmalloc-huge with ptdump arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions ...
2025-09-29ptr_ring: __ptr_ring_zero_tail micro optimizationMichael S. Tsirkin1-3/+3
__ptr_ring_zero_tail currently does the - 1 operation twice: - during initialization of head - at each loop iteration Let's just do it in one place, all we need to do is adjust the loop condition. this is better: - a slightly clearer logic with less duplication - uses prefix -- we don't need to save the old value - one less - 1 operation - for example, when ring is empty we now don't do - 1 at all, existing code does it once Text size shrinks from 15081 to 15050 bytes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/bcd630c7edc628e20d4f8e037341f26c90ab4365.1758976026.git.mst@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds5-12/+34
Pull hardening updates from Kees Cook: "One notable addition is the creation of the 'transitional' keyword for kconfig so CONFIG renaming can go more smoothly. This has been a long-standing deficiency, and with the renaming of CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI support), this came up again. The breadth of the diffstat is mainly this renaming. - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva) - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure (Junjie Cao) - Add str_assert_deassert() helper (Lad Prabhakar) - gcc-plugins: Remove TODO_verify_il for GCC >= 16 - kconfig: Fix BrokenPipeError warnings in selftests - kconfig: Add transitional symbol attribute for migration support - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI" * tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/string_choices: Add str_assert_deassert() helper kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI kconfig: Add transitional symbol attribute for migration support kconfig: Fix BrokenPipeError warnings in selftests gcc-plugins: Remove TODO_verify_il for GCC >= 16 stddef: Introduce __TRAILING_OVERLAP() stddef: Remove token-pasting in TRAILING_OVERLAP() lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29Merge tag 'execve-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds1-0/+5
Pull execve updates from Kees Cook: - binfmt_elf: preserve original ELF e_flags for core dumps (Svetlana Parfenova) - exec: Fix incorrect type for ret (Xichao Zhao) - binfmt_elf: Replace offsetof() with struct_size() in fill_note_info() (Xichao Zhao) * tag 'execve-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_elf: preserve original ELF e_flags for core dumps binfmt_elf: Replace offsetof() with struct_size() in fill_note_info() exec: Fix incorrect type for ret
2025-09-29Merge tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds1-1/+1
Pull ffs const-attribute cleanups from Kees Cook: "While working on various hardening refactoring a while back we encountered inconsistencies in the application of __attribute_const__ on the ffs() family of functions. This series fixes this across all archs and adds KUnit tests. Notably, this found a theoretical underflow in PCI (also fixed here) and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I kept the series separate from the general hardening PR since it is a stand-alone "topic". - PCI: Fix theoretical underflow in use of ffs(). - Universally apply __attribute_const__ to all architecture's ffs()-family of functions. - Add KUnit tests for ffs() behavior and const-ness" * tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: KUnit: ffs: Validate all the __attribute_const__ annotations sparc: Add __attribute_const__ to ffs()-family implementations xtensa: Add __attribute_const__ to ffs()-family implementations s390: Add __attribute_const__ to ffs()-family implementations parisc: Add __attribute_const__ to ffs()-family implementations mips: Add __attribute_const__ to ffs()-family implementations m68k: Add __attribute_const__ to ffs()-family implementations openrisc: Add __attribute_const__ to ffs()-family implementations riscv: Add __attribute_const__ to ffs()-family implementations hexagon: Add __attribute_const__ to ffs()-family implementations alpha: Add __attribute_const__ to ffs()-family implementations sh: Add __attribute_const__ to ffs()-family implementations powerpc: Add __attribute_const__ to ffs()-family implementations x86: Add __attribute_const__ to ffs()-family implementations csky: Add __attribute_const__ to ffs()-family implementations bitops: Add __attribute_const__ to generic ffs()-family implementations KUnit: Introduce ffs()-family tests PCI: Test for bit underflow in pcie_set_readrq()
2025-09-29Merge tag 'dlm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlmLinus Torvalds1-1/+32
Pull dlm updates from David Teigland: "This adds a dlm_release_lockspace() flag to request that node-failure recovery be performed for the node leaving the lockspace. The implementation of this flag requires coordination with userland clustering components. It's been requested for use by GFS2" * tag 'dlm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: check for undefined release_option values dlm: handle release_option as unsigned dlm: move to rinfo for all middle conversion cases dlm: handle invalid lockspace member remove dlm: add new flag DLM_RELEASE_RECOVER for dlm_lockspace_release dlm: add new configfs entry release_recover for lockspace members dlm: add new RELEASE_RECOVER uevent attribute for release_lockspace dlm: use defines for force values in dlm_release_lockspace dlm: check for defined force value in dlm_lockspace_release
2025-09-29Merge tag 'hfs-v6.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfsLinus Torvalds1-0/+20
Pull hfs updates from Viacheslav Dubeyko: "This contains several fixes of syzbot reported issues, HFS/HFS+ fixes of xfstests failures, and rework of HFS/HFS+ debug output subsystem. - Kang Chen fixed a slab-out-of-bounds issue in hfsplus_uni2asc() when hfsplus_uni2asc() is called from hfsplus_listxattr(). - Yang Chenzhi fixed a crash in hfsplus_bmap_alloc() if record offset or length is larger than node_size. - Yangtao Li corrected the error code from hfsplus_fill_super() if Catalog File contains corrupted record for the case of hidden directory's type. - KMSAN uninit-value fixes: hfs_find_set_zero_bits() and __hfsplus_ext_cache_extent() use kzalloc() instead of kmalloc(), and in hfsplus_delete_cat() by proper initialization of struct hfsplus_inode_info in the hfsplus_iget() logic. - A slab-out-of-bounds issue could happen in hfsplus_strcasecmp() if the length field of struct hfsplus_unistr is bigger than HFSPLUS_MAX_STRLEN. Fixed by checking the length of comparing strings, and if the strings' length is bigger than HFSPLUS_MAX_STRLEN, then the length is corrected to this value. - The generic/736 xfstest failed for HFS because the HFS volume becomes corrupted after the test run. The main reason was the absence of logic that corrects mdb->drNxtCNID/HFS_SB(sb)->next_id (next unused CNID) after deleting a record in Catalog File. That was fixed by implementing the necessary logic in hfs_correct_next_unused_CNID()" * tag 'hfs-v6.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs: hfs/hfsplus: rework debug output subsystem hfsplus: fix slab-out-of-bounds read in hfsplus_strcasecmp() hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() hfs: clear offset and space out of valid records in b-tree node hfs: add logic of correcting a next unused CNID hfsplus: fix KMSAN uninit-value issue in hfsplus_delete_cat() hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits() hfs: make proper initalization of struct hfs_find_data hfsplus: fix KMSAN uninit-value issue in __hfsplus_ext_cache_extent() hfs: validate record offset in hfsplus_bmap_alloc hfsplus: return EIO when type of hidden directory mismatch in hfsplus_fill_super() MAINTAINERS: update location of hfs&hfsplus trees
2025-09-29Merge tag 'xfs-merge-6.18' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+1
Pull xfs updates from Carlos Maiolino: "For this merge window, there are really no new features, but there are a few things worth to emphasize: - Deprecated for years already, the (no)attr2 and (no)ikeep mount options have been removed for good - Several cleanups (specially from typedefs) and bug fixes - Improvements made in the online repair reap calculations - online fsck is now enabled by default" * tag 'xfs-merge-6.18' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits) xfs: rework datasync tracking and execution xfs: rearrange code in xfs_inode_item_precommit xfs: scrub: use kstrdup_const() for metapath scan setups xfs: use bt_nr_sectors in xfs_dax_translate_range xfs: track the number of blocks in each buftarg xfs: constify xfs_errortag_random_default xfs: improve default maximum number of open zones xfs: improve zone statistics message xfs: centralize error tag definitions xfs: remove pointless externs in xfs_error.h xfs: remove the expr argument to XFS_TEST_ERROR xfs: remove xfs_errortag_set xfs: remove xfs_errortag_get xfs: move the XLOG_REG_ constants out of xfs_log_format.h xfs: adjust the hint based zone allocation policy xfs: refactor hint based zone allocation fs: add an enum for number of life time hints xfs: fix log CRC mismatches between i386 and other architectures xfs: rename the old_crc variable in xlog_recover_process xfs: remove the unused xfs_log_iovec_t typedef ...
2025-09-29Merge tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds2-11/+16
Pull vfs async directory updates from Christian Brauner: "This contains further preparatory changes for the asynchronous directory locking scheme: - Add lookup_one_positive_killable() which allows overlayfs to perform lookup that won't block on a fatal signal - Unify the mount idmap handling in struct renamedata as a rename can only happen within a single mount - Introduce kern_path_parent() for audit which sets the path to the parent and returns a dentry for the target without holding any locks on return - Rename kern_path_locked() as it is only used to prepare for the removal of an object from the filesystem: kern_path_locked() => start_removing_path() kern_path_create() => start_creating_path() user_path_create() => start_creating_user_path() user_path_locked_at() => start_removing_user_path_at() done_path_create() => end_creating_path() NA => end_removing_path()" * tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: debugfs: rename start_creating() to debugfs_start_creating() VFS: rename kern_path_locked() and related functions. VFS/audit: introduce kern_path_parent() for audit VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata VFS: discard err2 in filename_create() VFS/ovl: add lookup_one_positive_killable()
2025-09-29Merge tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds2-0/+6
Pull vfs writeback updates from Christian Brauner: "This contains work adressing lockups reported by users when a systemd unit reading lots of files from a filesystem mounted with the lazytime mount option exits. With the lazytime mount option enabled we can be switching many dirty inodes on cgroup exit to the parent cgroup. The numbers observed in practice when systemd slice of a large cron job exits can easily reach hundreds of thousands or millions. The logic in inode_do_switch_wbs() which sorts the inode into appropriate place in b_dirty list of the target wb however has linear complexity in the number of dirty inodes thus overall time complexity of switching all the inodes is quadratic leading to workers being pegged for hours consuming 100% of the CPU and switching inodes to the parent wb. Simple reproducer of the issue: FILES=10000 # Filesystem mounted with lazytime mount option MNT=/mnt/ echo "Creating files and switching timestamps" for (( j = 0; j < 50; j ++ )); do mkdir $MNT/dir$j for (( i = 0; i < $FILES; i++ )); do echo "foo" >$MNT/dir$j/file$i done touch -a -t 202501010000 $MNT/dir$j/file* done wait echo "Syncing and flushing" sync echo 3 >/proc/sys/vm/drop_caches echo "Reading all files from a cgroup" mkdir /sys/fs/cgroup/unified/mycg1 || exit echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit for (( j = 0; j < 50; j ++ )); do cat /mnt/dir$j/file* >/dev/null & done wait echo "Switching wbs" # Now rmdir the cgroup after the script exits This can be solved by: - Avoiding contention on the wb->list_lock when switching inodes by running a single work item per wb and managing a queue of items switching to the wb - Allowing rescheduling when switching inodes over to a different cgroup to avoid softlockups - Maintaining the b_dirty list ordering instead of sorting it" * tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: writeback: Add tracepoint to track pending inode switches writeback: Avoid excessively long inode switching times writeback: Avoid softlockup when switching many inodes writeback: Avoid contention on wb->list_lock when switching inodes
2025-09-29Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds16-138/+435
Pull namespace updates from Christian Brauner: "This contains a larger set of changes around the generic namespace infrastructure of the kernel. Each specific namespace type (net, cgroup, mnt, ...) embedds a struct ns_common which carries the reference count of the namespace and so on. We open-coded and cargo-culted so many quirks for each namespace type that it just wasn't scalable anymore. So given there's a bunch of new changes coming in that area I've started cleaning all of this up. The core change is to make it possible to correctly initialize every namespace uniformly and derive the correct initialization settings from the type of the namespace such as namespace operations, namespace type and so on. This leaves the new ns_common_init() function with a single parameter which is the specific namespace type which derives the correct parameters statically. This also means the compiler will yell as soon as someone does something remotely fishy. The ns_common_init() addition also allows us to remove ns_alloc_inum() and drops any special-casing of the initial network namespace in the network namespace initialization code that Linus complained about. Another part is reworking the reference counting. The reference counting was open-coded and copy-pasted for each namespace type even though they all followed the same rules. This also removes all open accesses to the reference count and makes it private and only uses a very small set of dedicated helpers to manipulate them just like we do for e.g., files. In addition this generalizes the mount namespace iteration infrastructure introduced a few cycles ago. As reminder, the vfs makes it possible to iterate sequentially and bidirectionally through all mount namespaces on the system or all mount namespaces that the caller holds privilege over. This allow userspace to iterate over all mounts in all mount namespaces using the listmount() and statmount() system call. Each mount namespace has a unique identifier for the lifetime of the systems that is exposed to userspace. The network namespace also has a unique identifier working exactly the same way. This extends the concept to all other namespace types. The new nstree type makes it possible to lookup namespaces purely by their identifier and to walk the namespace list sequentially and bidirectionally for all namespace types, allowing userspace to iterate through all namespaces. Looking up namespaces in the namespace tree works completely locklessly. This also means we can move the mount namespace onto the generic infrastructure and remove a bunch of code and members from struct mnt_namespace itself. There's a bunch of stuff coming on top of this in the future but for now this uses the generic namespace tree to extend a concept introduced first for pidfs a few cycles ago. For a while now we have supported pidfs file handles for pidfds. This has proven to be very useful. This extends the concept to cover namespaces as well. It is possible to encode and decode namespace file handles using the common name_to_handle_at() and open_by_handle_at() apis. As with pidfs file handles, namespace file handles are exhaustive, meaning it is not required to actually hold a reference to nsfs in able to decode aka open_by_handle_at() a namespace file handle. Instead the FD_NSFS_ROOT constant can be passed which will let the kernel grab a reference to the root of nsfs internally and thus decode the file handle. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them trivially. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. The namespace file handle layout is exposed as uapi and has a stable and extensible format. For now it simply contains the namespace identifier, the namespace type, and the inode number. The stable format means that userspace may construct its own namespace file handles without going through name_to_handle_at() as they are already allowed for pidfs and cgroup file handles" * tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits) ns: drop assert ns: move ns type into struct ns_common nstree: make struct ns_tree private ns: add ns_debug() ns: simplify ns_common_init() further cgroup: add missing ns_common include ns: use inode initializer for initial namespaces selftests/namespaces: verify initial namespace inode numbers ns: rename to __ns_ref nsfs: port to ns_ref_*() helpers net: port to ns_ref_*() helpers uts: port to ns_ref_*() helpers ipv4: use check_net() net: use check_net() net-sysfs: use check_net() user: port to ns_ref_*() helpers time: port to ns_ref_*() helpers pid: port to ns_ref_*() helpers ipc: port to ns_ref_*() helpers cgroup: port to ns_ref_*() helpers ...
2025-09-29Merge tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds16-28/+28
Pull copy_process updates from Christian Brauner: "This contains the changes to enable support for clone3() on nios2 which apparently is still a thing. The more exciting part of this is that it cleans up the inconsistency in how the 64-bit flag argument is passed from copy_process() into the various other copy_*() helpers" [ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ] * tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nios2: implement architecture-specific portion of sys_clone3 arch: copy_thread: pass clone_flags as u64 copy_process: pass clone_flags as u64 across calltree copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
2025-09-29Merge tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds3-135/+208
Pull vfs inode updates from Christian Brauner: "This contains a series I originally wrote and that Eric brought over the finish line. It moves out the i_crypt_info and i_verity_info pointers out of 'struct inode' and into the fs-specific part of the inode. So now the few filesytems that actually make use of this pay the price in their own private inode storage instead of forcing it upon every user of struct inode. The pointer for the crypt and verity info is simply found by storing an offset to its address in struct fsverity_operations and struct fscrypt_operations. This shrinks struct inode by 16 bytes. I hope to move a lot more out of it in the future so that struct inode becomes really just about very core stuff that we need, much like struct dentry and struct file, instead of the dumping ground it has become over the years. On top of this are a various changes associated with the ongoing inode lifetime handling rework that multiple people are pushing forward: - Stop accessing inode->i_count directly in f2fs and gfs2. They simply should use the __iget() and iput() helpers - Make the i_state flags an enum - Rework the iput() logic Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above - Add an icount_read() helper and convert everyone that accesses inode->i_count directly for this purpose to use the helper - Expand dump_inode() to dump more information about an inode helping in debugging - Add some might_sleep() annotations to iput() and associated helpers" * tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add might_sleep() annotation to iput() and more fs: expand dump_inode() inode: fix whitespace issues fs: add an icount_read helper fs: rework iput logic fs: make the i_state flags an enum fs: stop accessing ->i_count directly in f2fs and gfs2 fsverity: check IS_VERITY() in fsverity_cleanup_inode() fs: remove inode::i_verity_info btrfs: move verity info pointer to fs-specific part of inode f2fs: move verity info pointer to fs-specific part of inode ext4: move verity info pointer to fs-specific part of inode fsverity: add support for info in fs-specific part of inode fs: remove inode::i_crypt_info ceph: move crypt info pointer to fs-specific part of inode ubifs: move crypt info pointer to fs-specific part of inode f2fs: move crypt info pointer to fs-specific part of inode ext4: move crypt info pointer to fs-specific part of inode fscrypt: add support for info in fs-specific part of inode fscrypt: replace raw loads of info pointer with helper function
2025-09-29Merge tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds2-10/+14
Pull vfs mount updates from Christian Brauner: "This contains some work around mount api handling: - Output the warning message for mnt_too_revealing() triggered during fsmount() to the fscontext log. This makes it possible for the mount tool to output appropriate warnings on the command line. For example, with the newest fsopen()-based mount(8) from util-linux, the error messages now look like: # mount -t proc proc /tmp mount: /tmp: fsmount() failed: VFS: Mount too revealing. dmesg(1) may have more information after failed mount system call. - Do not consume fscontext log entries when returning -EMSGSIZE Userspace generally expects APIs that return -EMSGSIZE to allow for them to adjust their buffer size and retry the operation. However, the fscontext log would previously clear the message even in the -EMSGSIZE case. Given that it is very cheap for us to check whether the buffer is too small before we remove the message from the ring buffer, let's just do that instead. - Drop an unused argument from do_remount()" * tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: fs/namespace.c: remove ms_flags argument from do_remount selftests/filesystems: add basic fscontext log tests fscontext: do not consume log entries when returning -EMSGSIZE vfs: output mount_too_revealing() errors to fscontext docs/vfs: Remove mentions to the old mount API helpers fscontext: add custom-prefix log helpers fs: Remove mount_bdev fs: Remove mount_nodev
2025-09-29Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds2-6/+13
Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add "initramfs_options" parameter to set initramfs mount options. This allows to add specific mount options to the rootfs to e.g., limit the memory size - Add RWF_NOSIGNAL flag for pwritev2() Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal from being raised when writing on disconnected pipes or sockets. The flag is handled directly by the pipe filesystem and converted to the existing MSG_NOSIGNAL flag for sockets - Allow to pass pid namespace as procfs mount option Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed) This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs) But this requires forking off a helper process because you cannot just one-shot this using mount(2) * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process) While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate Things would be much easier if there was a way for userspace to just specify the pidns they want. So this pull request contains changes to implement a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); Cleanups: - Remove the last references to EXPORT_OP_ASYNC_LOCK - Make file_remove_privs_flags() static - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used - Use try_cmpxchg() in start_dir_add() - Use try_cmpxchg() in sb_init_done_wq() - Replace offsetof() with struct_size() in ioctl_file_dedupe_range() - Remove vfs_ioctl() export - Replace rwlock() with spinlock in epoll code as rwlock causes priority inversion on preempt rt kernels - Make ns_entries in fs/proc/namespaces const - Use a switch() statement() in init_special_inode() just like we do in may_open() - Use struct_size() in dir_add() in the initramfs code - Use str_plural() in rd_load_image() - Replace strcpy() with strscpy() in find_link() - Rename generic_delete_inode() to inode_just_drop() and generic_drop_inode() to inode_generic_drop() - Remove unused arguments from fcntl_{g,s}et_rw_hint() Fixes: - Document @name parameter for name_contains_dotdot() helper - Fix spelling mistake - Always return zero from replace_fd() instead of the file descriptor number - Limit the size for copy_file_range() in compat mode to prevent a signed overflow - Fix debugfs mount options not being applied - Verify the inode mode when loading it from disk in minixfs - Verify the inode mode when loading it from disk in cramfs - Don't trigger automounts with RESOLVE_NO_XDEV If openat2() was called with RESOLVE_NO_XDEV it didn't traverse through automounts, but could still trigger them - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints - Fix unused variable warning in rd_load_image() on s390 - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD - Use ns_capable_noaudit() when determining net sysctl permissions - Don't call path_put() under namespace semaphore in listmount() and statmount()" * tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) fcntl: trim arguments listmount: don't call path_put() under namespace semaphore statmount: don't call path_put() under namespace semaphore pid: use ns_capable_noaudit() when determining net sysctl permissions fs: rename generic_delete_inode() and generic_drop_inode() init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD initramfs: Replace strcpy() with strscpy() in find_link() initrd: Use str_plural() in rd_load_image() initramfs: Use struct_size() helper to improve dir_add() initrd: Fix unused variable warning in rd_load_image() on s390 fs: use the switch statement in init_special_inode() fs/proc/namespaces: make ns_entries const filelock: add FL_RECLAIM to show_fl_flags() macro eventpoll: Replace rwlock with spinlock selftests/proc: add tests for new pidns APIs procfs: add "pidns" mount option pidns: move is-ancestor logic to helper openat2: don't trigger automounts with RESOLVE_NO_XDEV namei: move cross-device check to __traverse_mounts namei: remove LOOKUP_NO_XDEV check from handle_mounts ...
2025-09-29PM: runtime: Drop DEFINE_FREE() for pm_runtime_put()Rafael J. Wysocki1-2/+0
The DEFINE_FREE() for pm_runtime_put has been superseded by recently introduced runtime PM auto-cleanup macros and its only user has been converted to using one of the new macros, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
2025-09-29PM: runtime: Add auto-cleanup macros for "resume and get" operationsRafael J. Wysocki1-9/+35
It is generally useful to be able to automatically drop a device's runtime PM usage counter incremented by runtime PM operations that resume a device and bump up its usage counter [1]. To that end, add guard definition macros allowing pm_runtime_put() and pm_runtime_put_autosuspend() to be used for the auto-cleanup in those cases. Simply put, a piece of code like below: pm_runtime_get_sync(dev); ..... pm_runtime_put(dev); return 0; can be transformed with guard() like: guard(pm_runtime_active)(dev); ..... return 0; (see the pm_runtime_put() call is gone). However, it is better to do proper error handling in the majority of cases, so doing something like this instead of the above is recommended: ACQUIRE(pm_runtime_active_try, pm)(dev); if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) return -ENXIO; ..... return 0; In all of the cases in which runtime PM is known to be enabled for the given device or the device can be regarded as operational (and so it can be accessed) with runtime PM disabled, a piece of code like: ret = pm_runtime_resume_and_get(dev); if (ret < 0) return ret; ..... pm_runtime_put(dev); return 0; can be changed as follows: ACQUIRE(pm_runtime_active_try, pm)(dev); ret = ACQUIRE_ERR(pm_runtime_active_try, &pm); if (ret < 0) return ret; ..... return 0; (again, see the pm_runtime_put() call is gone). Still, if the device cannot be accessed unless runtime PM has been enabled for it, the pm_runtime_active_try_enabled guard variant needs to be used, that is (in the context of the example above): ACQUIRE(pm_runtime_active_try_enabled, pm)(dev); ret = ACQUIRE_ERR(pm_runtime_active_try_enabled, &pm); if (ret < 0) return ret; ..... return 0; When the original code calls pm_runtime_put_autosuspend(), use one of the "auto" guard variants, pm_runtime_active_auto/_try/_enabled, so for example, a piece of code like: ret = pm_runtime_resume_and_get(dev); if (ret < 0) return ret; ..... pm_runtime_put_autosuspend(dev); return 0; will become: ACQUIRE(pm_runtime_active_auto_try_enabled, pm)(dev); ret = ACQUIRE_ERR(pm_runtime_active_auto_try_enabled, &pm); if (ret < 0) return ret; ..... return 0; Note that the cases in which the return value of pm_runtime_get_sync() is checked can also be handled with the help of the new guard macros. For example, a piece of code like: ret = pm_runtime_get_sync(dev); if (ret < 0) { pm_runtime_put(dev); return ret; } ..... pm_runtime_put(dev); return 0; can be rewritten as: ACQUIRE(pm_runtime_active_auto_try_enabled, pm)(dev); ret = ACQUIRE_ERR(pm_runtime_active_auto_try_enabled, &pm); if (ret < 0) return ret; ..... return 0; or pm_runtime_get_active_try can be used if transparent handling of disabled runtime PM is desirable. Link: https://lore.kernel.org/linux-pm/878qimv24u.wl-tiwai@suse.de/ [1] Link: https://lore.kernel.org/linux-pm/20250926150613.000073a4@huawei.com/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/2238241.irdbgypaU6@rafael.j.wysocki [ rjw: Fixed leftovers from the previous version in the changelog ] Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-29Merge branches 'pm-core', 'pm-runtime' and 'pm-sleep'Rafael J. Wysocki1-0/+6
Merge changes related to system sleep and runtime PM framework for 6.18-rc1: - Annotate loops walking device links in the power management core code as _srcu and add macros for walking device links to reduce the likelihood of coding mistakes related to them (Rafael Wysocki) - Document time units for *_time functions in the runtime PM API (Brian Norris) - Clear power.must_resume in noirq suspend error path to avoid resuming a dependant device under a suspended parent or supplier (Rafael Wysocki) - Fix GFP mask handling during hybrid suspend and make the amdgpu driver handle hybrid suspend correctly (Mario Limonciello, Rafael Wysocki) - Fix GFP mask handling after aborted hibernation in platform mode and combine exit paths in power_down() to avoid code duplication (Rafael Wysocki) - Use vmalloc_array() and vcalloc() in the hibernation core to avoid open-coded size computations (Qianfeng Rong) - Fix typo in hibernation core code comment (Li Jun) - Call pm_wakeup_clear() in the same place where other functions that do bookkeeping prior to suspend_prepare() are called (Samuel Wu) * pm-core: PM: core: Add two macros for walking device links PM: core: Annotate loops walking device links as _srcu * pm-runtime: PM: runtime: Documentation: ABI: Document time units for *_time * pm-sleep: PM: hibernate: Combine return paths in power_down() PM: hibernate: Restrict GFP mask in power_down() PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage drm/amd: Fix hybrid sleep PM: hibernate: Add pm_hibernation_mode_is_suspend() PM: hibernate: Fix hybrid-sleep PM: sleep: core: Clear power.must_resume in noirq suspend error path PM: sleep: Make pm_wakeup_clear() call more clear PM: hibernate: Fix typo in memory bitmaps description comment PM: hibernate: Use vmalloc_array() and vcalloc() to improve code
2025-09-29Merge branches 'pm-em', 'pm-opp' and 'pm-devfreq'Rafael J. Wysocki1-0/+30
Merge energy model management, OPP (operating performance points) and devfreq updates for 6.18-rc1: - Prevent CPU capacity updates after registering a perf domain from failing on a first CPU that is not present (Christian Loehle) - Add support for the cases in which frequency alone is not sufficient to uniquely identify an OPP (Krishna Chaitanya Chundru) - Use to_result() for OPP error handling in Rust (Onur Özkan) - Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi devfreq driver (Nicolas Frattaroli) - Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X) are reported as half by adding a cycle multiplier to the DFI driver in rockchip-dfi devfreq-event driver (Nicolas Frattaroli) - Fix missing error pointer dereference check of regulator instance in the mtk-cci devfreq driver probe and remove a redundant condition from an if () statement in that driver (Dan Carpenter, Liao Yuanhong) * pm-em: PM: EM: Fix late boot with holes in CPU topology * pm-opp: OPP: Add support to find OPP for a set of keys rust: opp: use to_result for error handling * pm-devfreq: PM / devfreq: rockchip-dfi: add support for LPDDR5 PM / devfreq: rockchip-dfi: double count on RK3588 PM / devfreq: mtk-cci: avoid redundant conditions PM / devfreq: mtk-cci: Fix potential error pointer dereference in probe()
2025-09-29Merge series "slab: Re-entrant kmalloc_nolock()"Vlastimil Babka7-8/+42
From the cover letter [1]: This patch set introduces kmalloc_nolock() which is the next logical step towards any context allocation necessary to remove bpf_mem_alloc and get rid of preallocation requirement in BPF infrastructure. In production BPF maps grew to gigabytes in size. Preallocation wastes memory. Alloc from any context addresses this issue for BPF and other subsystems that are forced to preallocate too. This long task started with introduction of alloc_pages_nolock(), then memcg and objcg were converted to operate from any context including NMI, this set completes the task with kmalloc_nolock() that builds on top of alloc_pages_nolock() and memcg changes. After that BPF subsystem will gradually adopt it everywhere. Link: https://lore.kernel.org/all/20250909010007.1660-1-alexei.starovoitov@gmail.com/ [1]
2025-09-29slab: Introduce kmalloc_nolock() and kfree_nolock().Alexei Starovoitov3-5/+14
kmalloc_nolock() relies on ability of local_trylock_t to detect the situation when per-cpu kmem_cache is locked. In !PREEMPT_RT local_(try)lock_irqsave(&s->cpu_slab->lock, flags) disables IRQs and marks s->cpu_slab->lock as acquired. local_lock_is_locked(&s->cpu_slab->lock) returns true when slab is in the middle of manipulating per-cpu cache of that specific kmem_cache. kmalloc_nolock() can be called from any context and can re-enter into ___slab_alloc(): kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> kmalloc_nolock() -> ___slab_alloc(cache_B) or kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() -> ___slab_alloc(cache_B) Hence the caller of ___slab_alloc() checks if &s->cpu_slab->lock can be acquired without a deadlock before invoking the function. If that specific per-cpu kmem_cache is busy the kmalloc_nolock() retries in a different kmalloc bucket. The second attempt will likely succeed, since this cpu locked different kmem_cache. Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu rt_spin_lock is locked by current _task_. In this case re-entrance into the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different bucket that is most likely is not locked by the current task. Though it may be locked by a different task it's safe to rt_spin_lock() and sleep on it. Similar to alloc_pages_nolock() the kmalloc_nolock() returns NULL immediately if called from hard irq or NMI in PREEMPT_RT. kfree_nolock() defers freeing to irq_work when local_lock_is_locked() and (in_nmi() or in PREEMPT_RT). SLUB_TINY config doesn't use local_lock_is_locked() and relies on spin_trylock_irqsave(&n->list_lock) to allocate, while kfree_nolock() always defers to irq_work. Note, kfree_nolock() must be called _only_ for objects allocated with kmalloc_nolock(). Debug checks (like kmemleak and kfence) were skipped on allocation, hence obj = kmalloc(); kfree_nolock(obj); will miss kmemleak/kfence book keeping and will cause false positives. large_kmalloc is not supported by either kmalloc_nolock() or kfree_nolock(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29slab: Reuse first bit for OBJEXTS_ALLOC_FAILAlexei Starovoitov1-2/+8
Since the combination of valid upper bits in slab->obj_exts with OBJEXTS_ALLOC_FAIL bit can never happen, use OBJEXTS_ALLOC_FAIL == (1ull << 0) as a magic sentinel instead of (1ull << 2) to free up bit 2. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock().Alexei Starovoitov1-1/+1
Change alloc_pages_nolock() to default to __GFP_COMP when allocating pages, since upcoming reentrant alloc_slab_page() needs __GFP_COMP. Also allow __GFP_ACCOUNT flag to be specified, since most of BPF infra needs __GFP_ACCOUNT except BPF streams. Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29locking/local_lock: Introduce local_lock_is_locked().Alexei Starovoitov3-0/+19
Introduce local_lock_is_locked() that returns true when given local_lock is locked by current cpu (in !PREEMPT_RT) or by current task (in PREEMPT_RT). The goal is to detect a deadlock by the caller. Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29maple_tree: Add single node allocation support to maple stateLiam R. Howlett1-1/+3
The fast path through a write will require replacing a single node in the tree. Using a sheaf (32 nodes) is too heavy for the fast path, so special case the node store operation by just allocating one node in the maple state. Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29maple_tree: Prefilled sheaf conversion and testingLiam R. Howlett1-2/+4
Use prefilled sheaves instead of bulk allocations. This should speed up the allocations and the return path of unused allocations. Remove the push and pop of nodes from the maple state as this is now handled by the slab layer with sheaves. Testing has been removed as necessary since the features of the tree have been reduced. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29slab: sheaf prefilling for guaranteed allocationsVlastimil Babka1-0/+16
Add functions for efficient guaranteed allocations e.g. in a critical section that cannot sleep, when the exact number of allocations is not known beforehand, but an upper limit can be calculated. kmem_cache_prefill_sheaf() returns a sheaf containing at least given number of objects. kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf and is guaranteed not to fail until depleted. kmem_cache_return_sheaf() is for giving the sheaf back to the slab allocator after the critical section. This will also attempt to refill it to cache's sheaf capacity for better efficiency of sheaves handling, but it's not stricly necessary to succeed. kmem_cache_refill_sheaf() can be used to refill a previously obtained sheaf to requested size. If the current size is sufficient, it does nothing. If the requested size exceeds cache's sheaf_capacity and the sheaf's current capacity, the sheaf will be replaced with a new one, hence the indirect pointer parameter. kmem_cache_sheaf_size() can be used to query the current size. The implementation supports requesting sizes that exceed cache's sheaf_capacity, but it is not efficient - such "oversize" sheaves are allocated fresh in kmem_cache_prefill_sheaf() and flushed and freed immediately by kmem_cache_return_sheaf(). kmem_cache_refill_sheaf() might be especially ineffective when replacing a sheaf with a new one of a larger capacity. It is therefore better to size cache's sheaf_capacity accordingly to make oversize sheaves exceptional. CONFIG_SLUB_STATS counters are added for sheaf prefill and return operations. A prefill or return is considered _fast when it is able to grab or return a percpu spare sheaf (even if the sheaf needs a refill to satisfy the request, as those should amortize over time), and _slow otherwise (when the barn or even sheaf allocation/freeing has to be involved). sheaf_prefill_oversize is provided to determine how many prefills were oversize (counter for oversize returns is not necessary as all oversize refills result in oversize returns). When slub_debug is enabled for a cache with sheaves, no percpu sheaves exist for it, but the prefill functionality is still provided simply by all prefilled sheaves becoming oversize. If percpu sheaves are not created for a cache due to not passing the sheaf_capacity argument on cache creation, the prefills also work through oversize sheaves, but there's a WARN_ON_ONCE() to indicate the omission. Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-28lib/string_choices: Add str_assert_deassert() helperLad Prabhakar1-0/+6
Add str_assert_deassert() helper to return "assert" or "deassert" string literal depending on the boolean argument. Also add the inversed variant str_deassert_assert(). Suggested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20250923095229.2149740-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-28mm: convert folio_page() back to a macroDavid Hildenbrand1-4/+1
In commit 73b3294b1152 ("mm: simplify folio_page() and folio_page_idx()") we converted folio_page() into a static inline function. However briefly afterwards in commit a847b17009ec ("mm: constify highmem related functions for improved const-correctness") we had to add some nasty const-away casting to make the compiler happy when checking const correctness. So let's just convert it back to a simple macro so the compiler can check const correctness properly. There is the alternative of using a _Generic() similar to page_folio(), but there is not a lot of benefit compared to just using a simple macro. Link: https://lkml.kernel.org/r/20250923140058.2020023-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-28mm: silence data-race in update_hiwater_rssLance Yang1-1/+1
KCSAN reports a data race on mm_cluster.hiwater_rss, which can be accessed concurrently from various paths like page migration and memory unmapping without synchronization. Since hiwater_rss is a statistical field for accounting purposes, this data race is benign. Annotate both the read and write accesses with data_race() to make KCSAN happy. Link: https://lkml.kernel.org/r/20250926092426.43312-1-lance.yang@linux.dev Signed-off-by: Lance Yang <lance.yang@linux.dev> Reported-by: syzbot+60192c8877d0bc92a92b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/68d6364e.050a0220.3390a8.000d.GAE@google.com Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Marco Elver <elver@google.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-28mm/ksm: fix incorrect KSM counter handling in mm_struct during forkDonet Tom1-1/+7
Patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork", v3. The first patch in this series fixes the incorrect accounting of KSM counters such as ksm_merging_pages, ksm_rmap_items, and the global ksm_zero_pages during fork. The following patch add a selftest to verify the ksm_merging_pages counter was updated correctly during fork. Test Results ============ Without the first patch ----------------------- # [RUN] test_fork_ksm_merging_page_count not ok 10 ksm_merging_page in child: 32 With the first patch -------------------- # [RUN] test_fork_ksm_merging_page_count ok 10 ksm_merging_pages is not inherited after fork This patch (of 2): Currently, the KSM-related counters in `mm_struct`, such as `ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited by the child process during fork. This results in inconsistent accounting. When a process uses KSM, identical pages are merged and an rmap item is created for each merged page. The `ksm_merging_pages` and `ksm_rmap_items` counters are updated accordingly. However, after a fork, these counters are copied to the child while the corresponding rmap items are not. As a result, when the child later triggers an unmerge, there are no rmap items present in the child, so the counters remain stale, leading to incorrect accounting. A similar issue exists with `ksm_zero_pages`, which maintains both a global counter and a per-process counter. During fork, the per-process counter is inherited by the child, but the global counter is not incremented. Since the child also references zero pages, the global counter should be updated as well. Otherwise, during zero-page unmerge, both the global and per-process counters are decremented, causing the global counter to become inconsistent. To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during fork, and the global ksm_zero_pages counter is updated with the per-process ksm_zero_pages value inherited by the child. This ensures that KSM statistics remain accurate and reflect the activity of each process correctly. Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.1758648700.git.donettom@linux.ibm.com Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process") Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process") Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: <stable@vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-28mm/page_vma_mapped: track if the page is mapped across page table boundaryKiryl Shutsemau1-0/+5
Patch series "mm: Improve mlock tracking for large folios", v3. The patchset includes several fixes and improvements related to mlock tracking of large folios. The main objective is to reduce the undercount of Mlocked memory in /proc/meminfo and improve the accuracy of the statistics. Patches 1-2: These patches address a minor race condition in folio_referenced_one() related to mlock_vma_folio(). Currently, mlock_vma_folio() is called on large folio without the page table lock, which can result in a race condition with unmap (i.e. MADV_DONTNEED). This can lead to partially mapped folios on the unevictable LRU list. While not a significant issue, I do not believe backporting is necessary. Patch 3: This patch adds mlocking logic similar to folio_referenced_one() to try_to_unmap_one(), allowing for mlocking of large folios where possible. Patch 4-5: These patches modifies finish_fault() and faultaround to map in the entire folio when possible, enabling efficient mlocking upon addition to the rmap. Patch 6: This patch makes rmap mlock large folios if they are fully mapped, addressing the primary source of mlock undercount for large folios. This patch (of 6): Add a PVMW_PGTABLE_CROSSSED flag that page_vma_mapped_walk() will set if the page is mapped across page table boundary. Unlike other PVMW_* flags, this one is result of page_vma_mapped_walk() and not set by the caller. folio_referenced_one() will use it to detect if it safe to mlock the folio. [akpm@linux-foundation.org: s/CROSSSED/CROSSED/] Link: https://lkml.kernel.org/r/20250923110711.690639-1-kirill@shutemov.name Link: https://lkml.kernel.org/r/20250923110711.690639-2-kirill@shutemov.name Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-28Merge tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds2-3/+8
Pull misc fixes from Andrew Morton: "7 hotfixes. 4 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 6 of these fixes are for MM. All singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call() mailmap: add entry for Bence Csókás fs/proc/task_mmu: check p->vec_buf for NULL kmsan: fix out-of-bounds access to shadow memory mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count mm/hugetlb: fix folio is still mapped when deleted
2025-09-28net/mlx5: IFC add balance ID and LAG per MP group bitsMark Bloch1-2/+6
Add interface definitions for load balance ID and LAG per multiplane group functionality. This patch introduces the hardware capability bits needed to support balance ID in multiplane LAG configurations. The new fields include: - load_balance_id: 4-bit field for balance identifier. - lag_per_mp_group: capability bit for LAG per multiplane group support. These interface additions are prerequisites for implementing balance ID support in the MLX5 driver. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1758521191-814350-3-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-28net/mlx5: Add IFC bit for TIR/SQ order capabilityTariq Toukan1-1/+2
Before this cap, firmware requested a certain creation order between TIR objects and SQs of the same transport domain to properly support the self loopback prevention feature. If order is not preserved, explicit modify_tir operations are necessary after the opening of the SQs. When set, this cap bit indicates that this firmware requirement / limitation no longer holds. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1758521191-814350-2-git-send-email-tariqt@nvidia.com Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-27Have cc(1) catch attempts to modify ->f_pathAl Viro1-1/+6
There are very few places that have cause to do that - all in core VFS now, and all done to files that are not yet opened (or visible to anybody else, for that matter). Let's turn f_path into a union of struct path __f_path and const struct path f_path. It's C, not C++ - 6.5.2.3[4] in C99 and later explicitly allows that kind of type-punning. That way any attempts to bypass these checks will be either very easy to catch, or (if the bastards get sufficiently creative to make it hard to spot with grep alone) very clearly malicious - and still catchable with a bit of instrumentation for sparse. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-27Merge branches 'work.path' and 'work.mount' into work.f_pathAl Viro4-9/+10
2025-09-28Merge tag 'i2c-host-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindowWolfram Sang37-119/+195
i2c-host for v6.18 - Add support for MediaTek MT6878 I2C - Drop support for S3C2410
2025-09-27bpf: Enforce expected_attach_type for tailcall compatibilityDaniel Borkmann1-0/+1
Yinhao et al. recently reported: Our fuzzer tool discovered an uninitialized pointer issue in the bpf_prog_test_run_xdp() function within the Linux kernel's BPF subsystem. This leads to a NULL pointer dereference when a BPF program attempts to deference the txq member of struct xdp_buff object. The test initializes two programs of BPF_PROG_TYPE_XDP: progA acts as the entry point for bpf_prog_test_run_xdp() and its expected_attach_type can neither be of be BPF_XDP_DEVMAP nor BPF_XDP_CPUMAP. progA calls into a slot of a tailcall map it owns. progB's expected_attach_type must be BPF_XDP_DEVMAP to pass xdp_is_valid_access() validation. The program returns struct xdp_md's egress_ifindex, and the latter is only allowed to be accessed under mentioned expected_attach_type. progB is then inserted into the tailcall which progA calls. The underlying issue goes beyond XDP though. Another example are programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. sock_addr_is_valid_access() as well as sock_addr_func_proto() have different logic depending on the programs' expected_attach_type. Similarly, a program attached to BPF_CGROUP_INET4_GETPEERNAME should not be allowed doing a tailcall into a program which calls bpf_bind() out of BPF which is only enabled for BPF_CGROUP_INET4_CONNECT. In short, specifying expected_attach_type allows to open up additional functionality or restrictions beyond what the basic bpf_prog_type enables. The use of tailcalls must not violate these constraints. Fix it by enforcing expected_attach_type in __bpf_prog_map_compatible(). Note that we only enforce this for tailcall maps, but not for BPF devmaps or cpumaps: There, the programs are invoked through dev_map_bpf_prog_run*() and cpu_map_bpf_prog_run*() which set up a new environment / context and therefore these situations are not prone to this issue. Fixes: 5e43f899b03a ("bpf: Check attach type at prog load time") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250926171201.188490-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-27PM: runtime: Update kerneldoc return codesBrian Norris1-25/+31
APIs based on __pm_runtime_idle() (pm_runtime_idle(), pm_request_idle()) do not return 1 when already suspended. They return -EAGAIN. This is already covered in the docs, so the entry for "1" is redundant and conflicting. (pm_runtime_put() and pm_runtime_put_sync() were previously incorrect, but that's fixed in "PM: runtime: pm_runtime_put{,_sync}() returns 1 when already suspended", to ensure consistency with APIs like pm_runtime_put_autosuspend().) RPM_GET_PUT APIs based on __pm_runtime_suspend() do return 1 when already suspended, but the language is a little unclear -- it's not really an "error", so it seems better to list as a clarification before the 0/success case. Additionally, they only actually return 1 when the refcount makes it to 0; if the usage counter is still non-zero, we return 0. pm_runtime_put(), etc., also don't appear at first like they can ever see "-EAGAIN: Runtime PM usage_count non-zero", because in non-racy conditions, pm_runtime_put() would drop its reference count, see it's non-zero, and return early (in __pm_runtime_idle()). However, it's possible to race with another actor that increments the usage_count afterward, since rpm_idle() is protected by a separate lock; in such a case, we may see -EAGAIN. Because this case is only seen in the presence of concurrent actors, it makes sense to clarify that this is when "usage_count **became** non-zero", by way of some racing actor. Lastly, pm_runtime_put_sync_suspend() duplicated some -EAGAIN language. Fix that. Fixes: 271ff96d6066 ("PM: runtime: Document return values of suspend-related API functions") Link: https://lore.kernel.org/linux-pm/aJ5pkEJuixTaybV4@google.com/ Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: 6.17+ <stable@vger.kernel.org> # 6.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-27bpf: Remove duplicate crypto/sha2.h headerJiapeng Chong1-1/+0
./include/linux/bpf.h: crypto/sha2.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=25501 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20250926095240.3397539-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-26net: airoha: npu: Add a NPU callback to initialize flow statsLorenzo Bianconi1-3/+4
Introduce a NPU callback to initialize flow stats and remove NPU stats initialization from airoha_npu_get routine. Add num_stats_entries to airoha_npu_ppe_stats_setup routine. This patch makes the code more readable since NPU statistic are now initialized on demand by the NPU consumer (at the moment NPU statistic are configured just by the airoha_eth driver). Moreover this patch allows the NPU consumer (PPE module) to explicitly enable/disable NPU flow stats. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250924-airoha-npu-init-stats-callback-v1-1-88bdf3c941b2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26ethtool: add FEC bins histogram reportVadim Fedorenko1-1/+24
IEEE 802.3ck-2022 defines counters for FEC bins and 802.3df-2024 clarifies it a bit further. Implement reporting interface through as addition to FEC stats available in ethtool. Drivers can leave bin counter uninitialized if per-lane values are provided. In this case the core will recalculate summ for the bin. Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250924124037.1508846-2-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26ptr_ring: drop duplicated tail zeroing codeMichael S. Tsirkin1-18/+24
We have some rather subtle code around zeroing tail entries, minimizing cache bouncing. Let's put it all in one place. Doing this also reduces the text size slightly, e.g. for drivers/vhost/net.o Before: text: 15,114 bytes After: text: 15,082 bytes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/adb9d941de4a2b619ddb2be271a9939849e70687.1758690291.git.mst@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26Merge tag 'linux-can-next-for-6.18-20250924' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-nextJakub Kicinski3-65/+53
Marc Kleine-Budde says: ==================== pull-request: can-next 2025-09-25 this is a pull request of 48 patches for net-next/main, which supersedes tags/linux-can-next-for-6.18-20250923. The 1st patch is by Xichao Zhao and converts ns_to_ktime() to us_to_ktime() in the m_can driver. Vincent Mailhol contributes 2 patches: Updating the MAINTAINERS and mailmap files to Vincent's new email address and sorting the includes in the CAN helper library alphabeticaly. Stéphane Grosjean's patch modifies all peak CAN drivers and the mailmap to reflect Stéphane's new email address. 4 patches by Biju Das update the CAN-FD handling in the rcar_canfd driver. Followed by 11 patches by Geert Uytterhoeven updating and improving the rcar_can driver. Stefan Mätje contributes 2 patches for the esd_usb driver updating the error messages. The next 3 patch series are all by Vincent Mailhol: 3 patches to optimize the size of struct raw_sock and struct uniqframe. 4 patches which rework the CAN MTU logic as preparation for CAN-XL interfaces. And finally 20 patches that prepare and refactor the CAN netlink code for the upcoming CAN-XL support. * tag 'linux-can-next-for-6.18-20250924' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (48 commits) can: netlink: add userland error messages can: dev: add can_get_ctrlmode_str() can: calc_bittiming: make can_calc_tdco() FD agnostic can: netlink: make can_tdc_fill_info() FD agnostic can: netlink: add can_bitrate_const_fill_info() can: netlink: add can_bittiming_const_fill_info() can: netlink: add can_bittiming_fill_info() can: netlink: add can_data_bittiming_get_size() can: netlink: make can_tdc_get_size() FD agnostic can: netlink: add can_ctrlmode_changelink() can: netlink: add can_dtb_changelink() can: netlink: make can_tdc_changelink() FD agnostic can: netlink: remove useless check in can_tdc_changelink() can: netlink: refactor CAN_CTRLMODE_TDC_{AUTO,MANUAL} flag reset logic can: netlink: add can_validate_databittiming() can: netlink: add can_validate_tdc() can: netlink: refactor can_validate_bittiming() can: netlink: document which symbols are FD specific can: dev: make can_get_relative_tdco() FD agnostic and move it to bittiming.h can: dev: move struct data_bittiming_params to linux/can/bittiming.h ... ==================== Link: https://patch.msgid.link/20250925121332.848157-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>