linux-rng - Development tree for the kernel CSPRNG

Age	Commit message (Collapse)	Author	Files	Lines
2025-03-10	riscv/crc64: add Zbc optimized CRC64 functions	Eric Biggers	7	-1/+112
	Wire up crc64_be_arch() and crc64_nvme_arch() for 64-bit RISC-V using crc-clmul-template.h. This greatly improves the performance of these CRCs on Zbc-capable CPUs in 64-bit kernels. These optimized CRC64 functions are not yet supported in 32-bit kernels, since crc-clmul-template.h assumes that the CRC fits in an unsigned long. That implementation limitation could be addressed, but it would add a fair bit of complexity, so it has been omitted for now. Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250216225530.306980-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-10	riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function	Eric Biggers	6	-1/+66
	Wire up crc_t10dif_arch() for RISC-V using crc-clmul-template.h. This greatly improves CRC-T10DIF performance on Zbc-capable CPUs. Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250216225530.306980-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-10	riscv/crc32: reimplement the CRC32 functions using new template	Eric Biggers	7	-310/+177
	Delete the previous Zbc optimized CRC32 code, and re-implement it using the new template. The new implementation is more optimized and shares more code among CRC variants. Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250216225530.306980-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-10	riscv/crc: add "template" for Zbc optimized CRC functions	Eric Biggers	2	-1/+319
	Add a "template" crc-clmul-template.h that can generate RISC-V Zbc optimized CRC functions. Each generated CRC function is parameterized by CRC length and bit order, and it accepts a pointer to the constants struct required for the specific CRC polynomial desired. Update gen-crc-consts.py to support generating the needed constants structs. This makes it possible to easily wire up a Zbc optimized implementation of almost any CRC. The design generally follows what I did for x86, but it is simplified by using RISC-V's scalar carryless multiplication Zbc, which has no equivalent on x86. RISC-V's clmulr instruction is also helpful. A potential switch to Zvbc (or support for Zvbc alongside Zbc) is left for future work. For long messages Zvbc should be fastest, but it would need to be shown to be worthwhile over just using Zbc which is significantly more convenient to use, especially in the kernel context. Compared to the existing Zbc-optimized CRC32 code and the earlier proposed Zbc-optimized CRC-T10DIF code (https://lore.kernel.org/r/20250211071101.181652-1-zhihang.shao.iscas@gmail.com), this submission deduplicates the code among CRC variants and is significantly more optimized. It uses "folding" to take better advantage of instruction-level parallelism (to a more limited extent than x86 for now, but it could be extended to more), it reworks the Barrett reduction to eliminate unnecessary instructions, and it documents all the math used and makes all the constants reproducible. Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250216225530.306980-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-18	x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings	Eric Biggers	1	-0/+5
	The assembly functions generated by crc-pclmul-template.S are called only via static_call, so they do not need to begin with an endbr instruction. But objtool still warns about a missing endbr by default. Add ANNOTATE_NOENDBR to suppress these warnings: vmlinux.o: warning: objtool: crc32_x86_init+0x1c0: relocation to !ENDBR: crc32_lsb_vpclmul_avx10_256+0x0 vmlinux.o: warning: objtool: crc64_x86_init+0x183: relocation to !ENDBR: crc64_msb_vpclmul_avx10_256+0x0 vmlinux.o: warning: objtool: crc_t10dif_x86_init+0x183: relocation to !ENDBR: crc16_msb_vpclmul_avx10_256+0x0 vmlinux.o: warning: objtool: __SCK__crc32_lsb_pclmul+0x0: data relocation to !ENDBR: crc32_lsb_pclmul_sse+0x0 vmlinux.o: warning: objtool: __SCK__crc64_lsb_pclmul+0x0: data relocation to !ENDBR: crc64_lsb_pclmul_sse+0x0 vmlinux.o: warning: objtool: __SCK__crc64_msb_pclmul+0x0: data relocation to !ENDBR: crc64_msb_pclmul_sse+0x0 vmlinux.o: warning: objtool: __SCK__crc16_msb_pclmul+0x0: data relocation to !ENDBR: crc16_msb_pclmul_sse+0x0 Fixes: 8d2d3e72e35b ("x86/crc: add "template" for [V]PCLMULQDQ based CRC functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20250217170555.3d14df62@canb.auug.org.au/ Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250217193230.100443-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-12	x86/crc32: improve crc32c_arch() code generation with clang	Eric Biggers	1	-2/+2
	crc32c_arch() is affected by https://github.com/llvm/llvm-project/issues/20571 where clang unnecessarily spills the inputs to "rm"-constrained operands to the stack. Replace "rm" with ASM_INPUT_RM which partially works around this by expanding to "r" when the compiler is clang. This results in better code generation with clang, though still not optimal. Link: https://lore.kernel.org/r/20250210210741.471725-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-10	x86/crc64: implement crc64_be and crc64_nvme using new template	Eric Biggers	5	-1/+158
	Add x86_64 [V]PCLMULQDQ optimized implementations of crc64_be() and crc64_nvme() by wiring them up to crc-pclmul-template.S. crc64_be() is used by bcache and bcachefs, and crc64_nvme() is used by blk-integrity. Both features can CRC large amounts of data, and the developers of both features have expressed interest in having these CRCs be optimized. So this optimization should be worthwhile. (See https://lore.kernel.org/r/v36sousjd5ukqlkpdxslvpu7l37zbu7d7slgc2trjjqwty2bny@qgzew34feo2r and https://lore.kernel.org/r/20220222163144.1782447-11-kbusch@kernel.org) Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit: crc64_be: Length Before After ------ ------ ----- 1 633 MB/s 477 MB/s 16 717 MB/s 2517 MB/s 64 715 MB/s 7525 MB/s 127 714 MB/s 10002 MB/s 128 713 MB/s 13344 MB/s 200 715 MB/s 15752 MB/s 256 714 MB/s 22933 MB/s 511 715 MB/s 28025 MB/s 512 714 MB/s 49772 MB/s 1024 715 MB/s 65261 MB/s 3173 714 MB/s 78773 MB/s 4096 714 MB/s 83315 MB/s 16384 714 MB/s 89487 MB/s crc64_nvme: Length Before After ------ ------ ----- 1 716 MB/s 474 MB/s 16 717 MB/s 3303 MB/s 64 713 MB/s 7940 MB/s 127 715 MB/s 9867 MB/s 128 714 MB/s 13698 MB/s 200 715 MB/s 15995 MB/s 256 714 MB/s 23479 MB/s 511 714 MB/s 28013 MB/s 512 715 MB/s 51533 MB/s 1024 715 MB/s 66788 MB/s 3173 715 MB/s 79182 MB/s 4096 715 MB/s 83966 MB/s 16384 715 MB/s 89739 MB/s Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250210174540.161705-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-10	x86/crc-t10dif: implement crc_t10dif using new template	Eric Biggers	6	-349/+64
	Instantiate crc-pclmul-template.S for crc_t10dif and delete the original PCLMULQDQ optimized implementation. This has the following advantages: - Less CRC-variant-specific code. - VPCLMULQDQ support, greatly improving performance on sufficiently long messages on newer CPUs. - A faster reduction from 128 bits to the final CRC. - Support for i386. Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit: Length Before After ------ ------ ----- 1 440 MB/s 386 MB/s 16 1865 MB/s 2008 MB/s 64 4343 MB/s 6917 MB/s 127 5440 MB/s 8909 MB/s 128 5533 MB/s 12150 MB/s 200 5908 MB/s 14423 MB/s 256 15870 MB/s 21288 MB/s 511 14219 MB/s 25840 MB/s 512 18361 MB/s 37797 MB/s 1024 19941 MB/s 61374 MB/s 3173 20461 MB/s 74909 MB/s 4096 21310 MB/s 78919 MB/s 16384 21663 MB/s 85012 MB/s Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250210174540.161705-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-10	x86/crc32: implement crc32_le using new template	Eric Biggers	3	-244/+65
	Instantiate crc-pclmul-template.S for crc32_le, and delete the original PCLMULQDQ optimized implementation. This has the following advantages: - Less CRC-variant-specific code. - VPCLMULQDQ support, greatly improving performance on sufficiently long messages on newer CPUs. - A faster reduction from 128 bits to the final CRC. - Support for lengths not a multiple of 16 bytes, improving performance for such lengths. - Support for misaligned buffers, improving performance in such cases. Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit: Length Before After ------ ------ ----- 1 427 MB/s 605 MB/s 16 710 MB/s 3631 MB/s 64 704 MB/s 7615 MB/s 127 3610 MB/s 9710 MB/s 128 8759 MB/s 12702 MB/s 200 7083 MB/s 15343 MB/s 256 17284 MB/s 22904 MB/s 511 10919 MB/s 27309 MB/s 512 19849 MB/s 48900 MB/s 1024 21216 MB/s 62630 MB/s 3173 22150 MB/s 72437 MB/s 4096 22496 MB/s 79593 MB/s 16384 22018 MB/s 85106 MB/s Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250210174540.161705-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-10	x86/crc: add "template" for [V]PCLMULQDQ based CRC functions	Eric Biggers	2	-0/+665
	The Linux kernel implements many variants of CRC, such as crc16, crc_t10dif, crc32_le, crc32c, crc32_be, crc64_nvme, and crc64_be. On x86, except for crc32c which has special scalar instructions, the fastest way to compute any of these CRCs on any message of length roughly >= 16 bytes is to use the SIMD carryless multiplication instructions PCLMULQDQ or VPCLMULQDQ. Depending on the available CPU features this can mean PCLMULQDQ+SSE4.1, VPCLMULQDQ+AVX2, VPCLMULQDQ+AVX10/256, or VPCLMULQDQ+AVX10/512 (or the AVX512 equivalents to AVX10/*). This results in a total of 20+ CRC implementations being potentially needed to properly optimize all CRCs that someone cares about for x86. Besides crc32c, currently only crc32_le and crc_t10dif are actually optimized for x86, and they only use PCLMULQDQ, which means they can be 2-4x slower than what is possible with VPCLMULQDQ. Fortunately, at a high level the code that is needed for any [V]PCLMULQDQ based CRC implementation is mostly the same. Therefore, this patch introduces an assembly macro that expands into the body of a [V]PCLMULQDQ based CRC function for a given number of bits (8, 16, 32, or 64), bit order (lsb or msb-first), vector length, and AVX level. The function expects to be passed a constants table, specific to the polynomial desired, that was generated by the script previously added. When two CRC variants share the same number of bits and bit order, the same functions can be reused, with only the constants table differing. A new C header is also added to make it easy to integrate the new assembly code using a static call. The result is that it becomes straightforward to wire up an optimized implementation of any CRC-8, CRC-16, CRC-32, or CRC-64 for x86. Later patches will wire up specific CRC variants. Although this new template allows easily generating many functions, care was taken to still keep the binary size fairly low. Each generated function is only 550 to 850 bytes depending on the CRC variant and target CPU features. And only one function per CRC variant is actually used at runtime (since all functions support all lengths >= 16 bytes). Note that a similar approach should also work for other architectures that have carryless multiplication instructions, such as arm64. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250210174540.161705-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-10	scripts/gen-crc-consts: add gen-crc-consts.py	Eric Biggers	2	-0/+239
	Add a Python script that generates constants for computing the given CRC variant(s) using x86's pclmulqdq or vpclmulqdq instructions. This is specifically tuned for x86's crc-pclmul-template.S. However, other architectures with a 64x64 => 128-bit carryless multiplication instruction should be able to use the generated constants too. (Some tweaks may be warranted based on the exact instructions available on each arch, so the script may grow an arch argument in the future.) The script also supports generating the tables needed for table-based CRC computation. Thus, it can also be used to reproduce the tables like t10_dif_crc_table[] and crc16_table[] that are currently hardcoded in the source with no generation script explicitly documented. Python is used rather than C since it enables implementing the CRC math in the simplest way possible, using arbitrary precision integers. The outputs of this script are intended to be checked into the repo, so Python will continue to not be required to build the kernel, and the script has been optimized for simplicity rather than performance. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250210174540.161705-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-10	x86: move ZMM exclusion list into CPU feature flag	Eric Biggers	3	-21/+24
	Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is set when the CPU is on this list. This allows other code in arch/x86/, such as the CRC library code, to apply the same exclusion list when deciding whether to execute 256-bit or 512-bit optimized functions. Note that full AVX512 support including ZMM registers is still exposed to userspace and is still supported for in-kernel use. This flag just indicates whether in-kernel code should prefer to use YMM registers. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250210174540.161705-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-09	lib/crc-t10dif: remove crc_t10dif_is_optimized()	Eric Biggers	5	-33/+0
	With the "crct10dif" algorithm having been removed from the crypto API, crc_t10dif_is_optimized() is no longer used. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208175647.12333-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	crypto: crct10dif - remove from crypto API	Eric Biggers	11	-489/+0
	Remove the "crct10dif" shash algorithm from the crypto API. It has no known user now that the lib is no longer built on top of it. It has no remaining references in kernel code. The only other potential users would be the usual components that allow specifying arbitrary hash algorithms by name, namely AF_ALG and dm-integrity. However there are no indications that "crct10dif" is being used with these components. Debian Code Search and web searches don't find anything relevant, and explicitly grepping the source code of the usual suspects (cryptsetup, libell, iwd) finds no matches either. "crc32" and "crc32c" are used in a few more places, but that doesn't seem to be the case for "crct10dif". crc_t10dif_update() is also tested by crc_kunit now, so the test coverage provided via the crypto self-tests is no longer needed. Also note that the "crct10dif" shash algorithm was inconsistent with the rest of the shash API in that it wrote the digest in CPU endianness, making the resulting byte array differ on little endian vs. big endian platforms. This means it was effectively just built for use by the lib functions, and it was not actually correct to treat it as "just another hash function" that could be dropped in via the shash API. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250206173857.39794-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc32: remove "_le" from crc32c base and arch functions	Eric Biggers	12	-40/+40
	Following the standardization on crc32c() as the lib entry point for the Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(), and __crc32c_le(), make the same change to the underlying base and arch functions that implement it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc32: rename __crc32c_le_combine() to crc32c_combine()	Eric Biggers	5	-23/+21
	Since the Castagnoli CRC32 is now always just crc32c(), rename __crc32c_le_combine() and __crc32c_le_shift() accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc32: standardize on crc32c() name for Castagnoli CRC32	Eric Biggers	11	-45/+32
	For historical reasons, the Castagnoli CRC32 is available under 3 names: crc32c(), crc32c_le(), and __crc32c_le(). Most callers use crc32c(). The more verbose versions are not really warranted; there is no "_be" version that the "_le" version needs to be differentiated from, and the leading underscores are pointless. Therefore, let's standardize on just crc32c(). Remove the other two names, and update callers accordingly. Specifically, the new crc32c() comes from what was previously __crc32c_le(), so compared to the old crc32c() it now takes a size_t length rather than unsigned int, and it's now in linux/crc32.h instead of just linux/crc32c.h (which includes linux/crc32.h). Later patches will also rename __crc32c_le_combine(), crc32c_le_base(), and crc32c_le_arch(). Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc32: don't bother with pure and const function attributes	Eric Biggers	4	-29/+27
	Drop the use of __pure and __attribute_const__ from the CRC32 library functions that had them. Both of these are unusual optimizations that don't help properly written code. They seem more likely to cause problems than have any real benefit. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc32: use void pointer for data	Eric Biggers	1	-3/+3
	Update crc32_le(), crc32_be(), and __crc32c_le() to take the data as a 'const void ' instead of 'const u8 '. This makes them slightly easier to use, as it can eliminate the need for casts in the calling code. It's the only pointer argument, so there is no possibility for confusion with another pointer argument. Also, some of the CRC library functions, for example crc32c() and crc64_be(), already used 'const void '. Let's standardize on that, as it seems like a better choice. The underlying base and arch functions continue to use 'const u8 ', as that is often more convenient for the implementation. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	mips/crc32: remove unused enums	Eric Biggers	1	-9/+0
	Remove enum crc_op_size and enum crc_type, since they are never actually used. Tokens with the names of the enum values do appear in the file, but they are only used for token concatenation with the preprocessor. This prevents a conflict with the addition of crc32c() to linux/crc32.h. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/r/20250207224233.GA1261167@ax162 Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc32: remove obsolete CRC32 options from defconfig files	Eric Biggers	10	-10/+0
	Remove all remaining references to CONFIG_CRC32_BIT, CONFIG_CRC32_SARWATE, CONFIG_CRC32_SLICEBY4, and CONFIG_CRC32_SLICEBY8. These options no longer exist, now that we've standardized on a single generic CRC32 implementation. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250205000424.75149-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc64: add support for arch-optimized implementations	Eric Biggers	3	-32/+37
	Add support for architecture-optimized implementations of the CRC64 library functions, following the approach taken for the CRC32 and CRC-T10DIF library functions. Also take the opportunity to tweak the function prototypes: - Use 'const void ' for the lib entry points (since this is easier for users) but 'const u8 ' for the underlying arch and generic functions (since this is easier for the implementations of these functions). - Don't bother with __pure. It's an unusual optimization that doesn't help properly written code. It's a weird quirk we can do without. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250130035130.180676-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc_kunit.c: add test and benchmark for CRC64-NVME	Eric Biggers	1	-1/+29
	Wire up crc64_nvme() to the new CRC unit test and benchmark. This replaces and improves on the test coverage that was lost by removing this CRC variant from the crypto API. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250130035130.180676-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc64: rename CRC64-Rocksoft to CRC64-NVME	Eric Biggers	4	-15/+18
	This CRC64 variant comes from the NVME NVM Command Set Specification (https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-1.0e-2024.07.29-Ratified.pdf). The "Rocksoft Model CRC Algorithm", published in 1993 and available at https://www.zlib.net/crc_v3.txt, is a generalized CRC algorithm that can calculate any variant of CRC, given a list of parameters such as polynomial, bit order, etc. It is not a CRC variant. The NVME NVM Command Set Specification has a table that gives the "Rocksoft Model Parameters" for the CRC variant it uses. When support for this CRC variant was added to Linux, this table seems to have been misinterpreted as naming the CRC variant the "Rocksoft" CRC. In fact, the table names the CRC variant as the "NVM Express 64b CRC". Most implementations of this CRC variant outside Linux have been calling it CRC64-NVME. Therefore, update Linux to match. While at it, remove the superfluous "update" from the function name, so crc64_rocksoft_update() is now just crc64_nvme(), matching most of the other CRC library functions. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250130035130.180676-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	crypto: crc64-rocksoft - remove from crypto API	Eric Biggers	6	-125/+0
	Remove crc64-rocksoft from the crypto API. It has no known user now that the lib is no longer built on top of it. It was also added much more recently than the longstanding crc32 and crc32c. Unlike crc32 and crc32c, crc64-rocksoft is also not mentioned in the dm-integrity documentation and there are no references to it in anywhere in the cryptsetup git repo, so it is unlikely to have any user there either. Also, this CRC variant is named incorrectly; it has nothing to do with Rocksoft and should be called crc64-nvme. That is yet another reason to remove it from the crypto API; we would not want anyone to start depending on the current incorrect algorithm name of crc64-rocksoft. Note that this change temporarily makes this CRC variant not be covered by any tests, as previously it was relying on the crypto self-tests. This will be fixed by adding this CRC variant to crc_kunit. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250130035130.180676-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08	lib/crc64-rocksoft: stop wrapping the crypto API	Eric Biggers	6	-146/+12
	Following what was done for the CRC32 and CRC-T10DIF library functions, get rid of the pointless use of the crypto API and make crc64_rocksoft_update() call into the library directly. This is faster and simpler. Remove crc64_rocksoft() (the version of the function that did not take a 'crc' argument) since it is unused. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250130035130.180676-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-02	Linux 6.14-rc1	Linus Torvalds	1	-2/+2

2025-02-02	tools/power turbostat: version 2025.02.02	Len Brown	1	-1/+1
	Summary of Changes since 2024.11.30: Fix regression in 2023.11.07 that affinitized forked child in one-shot mode. Harden one-shot mode against hotplug online/offline Enable RAPL SysWatt column by default. Add initial PTL, CWF platform support. Harden initial PMT code in response to early use. Enable first built-in PMT counter: CWF c1e residency Refuse to run on unsupported platforms without --force, to encourage updating to a version that supports the system, and to avoid no-so-useful measurement results. Signed-off-by: Len Brown <len.brown@intel.com>
2025-02-01	MAINTAINERS: include linux-mm for xarray maintenance	Andrew Morton	1	-0/+1
	MM developers have an interest in the xarray code. Cc: David Gow <davidgow@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	revert "xarray: port tests to kunit"	Andrew Morton	16	-410/+294
	Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code. Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com Cc: David Gow <davidgow@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Tamir Duberstein <tamird@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	MAINTAINERS: add lib/test_xarray.c	Tamir Duberstein	1	-0/+1
	Ensure test-only changes are sent to the relevant maintainer. Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com Signed-off-by: Tamir Duberstein <tamird@gmail.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap, MAINTAINERS, docs: update Carlos's email address	Carlos Bilbao	3	-6/+8
	Update .mailmap to reflect my new (and final) primary email address, carlos.bilbao@kernel.org. Also update contact information in files Documentation/translations/sp_SP/index.rst and MAINTAINERS. Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Carlos Bilbao <bilbao@vt.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/hugetlb: fix hugepage allocation for interleaved memory nodes	Ritesh Harjani (IBM)	1	-1/+1
	gather_bootmem_prealloc() assumes the start nid as 0 and size as num_node_state(N_MEMORY). That means in case if memory attached numa nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan few of these nodes. Since memory attached numa nodes can be interleaved in any fashion, hence ensure that the current code checks for all numa node ids (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that it can distributes all nr_node_ids among the these many no. threads. e.g. qemu cmdline ======================== numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20" mem_cmd="-object memory-backend-ram,id=mem1,size=16G" w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): ========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): =========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2 HugePages_Free: 2 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com> Suggested-by: Muchun Song <muchun.song@linux.dev> Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Gang Li <gang.li@linux.dev> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: gup: fix infinite loop within __get_longterm_locked	Zhaoyang Huang	1	-10/+4
	We can run into an infinite loop in __get_longterm_locked() when collect_longterm_unpinnable_folios() finds only folios that are isolated from the LRU or were never added to the LRU. This can happen when all folios to be pinned are never added to the LRU, for example when vm_ops->fault allocated pages using cma_alloc() and never added them to the LRU. Fix it by simply taking a look at the list in the single caller, to see if anything was added. [zhaoyang.huang@unisoc.com: move definition of local] Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()") Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aijun Sun <aijun.sun@unisoc.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm, swap: fix reclaim offset calculation error during allocation	Kairui Song	1	-1/+1
	There is a code error that will cause the swap entry allocator to reclaim and check the whole cluster with an unexpected tail offset instead of the part that needs to be reclaimed. This may cause corruption of the swap map, so fix it. Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com Fixes: 3b644773eefd ("mm, swap: reduce contention on device lock") Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chris Li <chrisl@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	.mailmap: update email address for Christopher Obbard	Christopher Obbard	1	-0/+1
	Update my email address. Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kfence: skip __GFP_THISNODE allocations on NUMA systems	Marco Elver	1	-0/+2
	On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on a particular node, and failure to allocate on the desired node will result in a failed allocation. Skip __GFP_THISNODE allocations if we are running on a NUMA system, since KFENCE can't guarantee which node its pool pages are allocated on. Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chistoph Lameter <cl@linux.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	nilfs2: fix possible int overflows in nilfs_fiemap()	Nikita Zhandarovich	1	-3/+3
	Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result by being prepared to go through potentially maxblocks == INT_MAX blocks, the value in n may experience an overflow caused by left shift of blkbits. While it is extremely unlikely to occur, play it safe and cast right hand expression to wider type to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com Fixes: 622daaff0a89 ("nilfs2: fiemap support") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: compaction: use the proper flag to determine watermarks	yangge	1	-4/+25
	There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of memory. I have configured 16GB of CMA memory on each NUMA node, and starting a 32GB virtual machine with device passthrough is extremely slow, taking almost an hour. Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB of no-CMA memory on a NUMA node can be used as virtual machine memory. There is 16GB of free CMA memory on a NUMA node, which is sufficient to pass the order-0 watermark check, causing the __compaction_suitable() function to consistently return true. For costly allocations, if the __compaction_suitable() function always returns true, it causes the __alloc_pages_slowpath() function to fail to exit at the appropriate point. This prevents timely fallback to allocating memory on other nodes, ultimately resulting in excessively long virtual machine startup times. Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED \|\| compact_result == COMPACT_DEFERRED) goto nopage; // should exit __alloc_pages_slowpath() from here We could use the real unmovable allocation context to have __zone_watermark_unusable_free() subtract CMA pages, and thus we won't pass the order-0 check anymore once the non-CMA part is exhausted. There is some risk that in some different scenario the compaction could in fact migrate pages from the exhausted non-CMA part of the zone to the CMA part and succeed, and we'll skip it instead. But only __GFP_NORETRY allocations should be affected in the immediate "goto nopage" when compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY anyway and won't fail without trying to compact-migrate the non-CMA pageblocks into CMA pageblocks first, so it should be fine. After this fix, it only takes a few tens of seconds to start a 32GB virtual machine with device passthrough functionality. Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/ Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com Signed-off-by: yangge <yangge1116@126.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kernel: be more careful about dup_mmap() failures and uprobe registering	Liam R. Howlett	2	-3/+18
	If a memory allocation fails during dup_mmap(), the maple tree can be left in an unsafe state for other iterators besides the exit path. All the locks are dropped before the exit_mmap() call (in mm/mmap.c), but the incomplete mm_struct can be reached through (at least) the rmap finding the vmas which have a pointer back to the mm_struct. Up to this point, there have been no issues with being able to find an mm_struct that was only partially initialised. Syzbot was able to make the incomplete mm_struct fail with recent forking changes, so it has been proven unsafe to use the mm_struct that hasn't been initialised, as referenced in the link below. Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to invalid mm") fixed the uprobe access, it does not completely remove the race. This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the oom side (even though this is extremely unlikely to be selected as an oom victim in the race window), and sets MMF_UNSTABLE to avoid other potential users from using a partially initialised mm_struct. When registering vmas for uprobe, skip the vmas in an mm that is marked unstable. Modifying a vma in an unstable mm may cause issues if the mm isn't fully initialised. Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/ Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/fake-numa: handle cases with no SRAT info	Bruno Faccini	1	-1/+10
	Handle more gracefully cases where no SRAT information is available, like in VMs with no Numa support, and allow fake-numa configuration to complete successfully in these cases Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com Fixes: 63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”) Signed-off-by: Bruno Faccini <bfaccini@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: kmemleak: fix upper boundary check for physical address objects	Catalin Marinas	1	-1/+1
	Memblock allocations are registered by kmemleak separately, based on their physical address. During the scanning stage, it checks whether an object is within the min_low_pfn and max_low_pfn boundaries and ignores it otherwise. With the recent addition of __percpu pointer leak detection (commit 6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak started reporting leaks in setup_zone_pageset() and setup_per_cpu_pageset(). These were caused by the node_data[0] object (initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn) boundary. The non-strict upper boundary check introduced by commit 84c326299191 ("mm: kmemleak: check physical address when scan") causes the pg_data_t object to be ignored (not scanned) and the __percpu pointers it contains to be reported as leaks. Make the max_low_pfn upper boundary check strict when deciding whether to ignore a physical address object and not scan it. Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Patrick Wang <patrick.wang.shcn@gmail.com> Cc: <stable@vger.kernel.org> [6.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap: add an entry for Hamza Mahfooz	Hamza Mahfooz	1	-0/+1
	Map my previous work email to my current one. Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hans verkuil <hverkuil@xs4all.nl> Cc: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	MAINTAINERS: mailmap: update Yosry Ahmed's email address	Yosry Ahmed	2	-1/+2
	Moving to a linux.dev email address. Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	scripts/gdb: fix aarch64 userspace detection in get_current_task	Jan Kiszka	1	-1/+1
	At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long which lets the right-shift always return 0. Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Barry Song <baohua@kernel.org> Cc: Kieran Bingham <kbingham@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/vmscan: accumulate nr_demoted for accurate demotion statistics	Li Zhijian	1	-3/+4
	In shrink_folio_list(), demote_folio_list() can be called 2 times. Currently stat->nr_demoted will only store the last nr_demoted( the later nr_demoted is always zero, the former nr_demoted will get lost), as a result number of demoted pages is not accurate. Accumulate the nr_demoted count across multiple calls to demote_folio_list(), ensuring accurate reporting of demotion statistics. [lizhijian@fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting] Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu> Tested-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	ocfs2: fix incorrect CPU endianness conversion causing mount failure	Heming Zhao	1	-1/+1
	Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") introduced a regression bug. The blksz_bits value is already converted to CPU endian in the previous code; therefore, the code shouldn't use le32_to_cpu() anymore. Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()	Hyeonggon Yoo	1	-1/+1
	Commit c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") introduces is_first_zpdesc() function. However, the function is only used when CONFIG_DEBUG_VM=y. When building with LLVM=1 and W=1 option, the following warning is generated: $ make -j12 W=1 LLVM=1 mm/zsmalloc.o mm/zsmalloc.c:455:20: error: function 'is_first_zpdesc' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] 455 \| static inline bool is_first_zpdesc(struct zpdesc *zpdesc) \| ^~~~~~~~~~~~~~~ 1 error generated. Fix the warning by adding __maybe_unused attribute to the function. No functional change intended. Link: https://lkml.kernel.org/r/20250127231631.4363-1-42.hyeyoo@gmail.com Fixes: c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501240958.4ILzuBrH-lkp@intel.com/ Cc: Alex Shi <alexs@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/vmscan: fix hard LOCKUP in function isolate_lru_folios	liuye	2	-1/+6
	This fixes the following hard lockup in isolate_lru_folios() during memory reclaim. If the LRU mostly contains ineligible folios this may trigger watchdog. watchdog: Watchdog detected hard LOCKUP on cpu 173 RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0 Call Trace: _raw_spin_lock_irqsave+0x31/0x40 folio_lruvec_lock_irqsave+0x5f/0x90 folio_batch_move_lru+0x91/0x150 lru_add_drain_per_cpu+0x1c/0x40 process_one_work+0x17d/0x350 worker_thread+0x27b/0x3a0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 lruvec->lru_lock owner： PID: 2865 TASK: ffff888139214d40 CPU: 40 COMMAND: "kswapd0" #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde [exception RIP: isolate_lru_folios+403] RIP: ffffffffa597df53 RSP: ffffc90006fb7c28 RFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffc90006fb7c60 RCX: ffffea04a2196f88 RDX: ffffc90006fb7c60 RSI: ffffc90006fb7c60 RDI: ffffea04a2197048 RBP: ffff88812cbd3010 R8: ffffea04a2197008 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffffea04a2197008 R13: ffffea04a2197048 R14: ffffc90006fb7de8 R15: 0000000003e3e937 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 <NMI exception stack> #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0 #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354 #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238 crash> Scenario: User processe are requesting a large amount of memory and keep page active. Then a module continuously requests memory from ZONE_DMA32 area. Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached. However pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. Reproduce: Terminal 1: Construct to continuously increase pages active(anon). mkdir /tmp/memory mount -t tmpfs -o size=1024000M tmpfs /tmp/memory dd if=/dev/zero of=/tmp/memory/block bs=4M tail /tmp/memory/block Terminal 2: vmstat -a 1 active will increase. procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ... r b swpd free inact active si so bi bo 1 0 0 1445623076 45898836 83646008 0 0 0 1 0 0 1445623076 43450228 86094616 0 0 0 1 0 0 1445623076 41003480 88541364 0 0 0 1 0 0 1445623076 38557088 90987756 0 0 0 1 0 0 1445623076 36109688 93435156 0 0 0 1 0 0 1445619552 33663256 95881632 0 0 0 1 0 0 1445619804 31217140 98327792 0 0 0 1 0 0 1445619804 28769988 100774944 0 0 0 1 0 0 1445619804 26322348 103222584 0 0 0 1 0 0 1445619804 23875592 105669340 0 0 0 cat /proc/meminfo \| head Active(anon) increase. MemTotal: 1579941036 kB MemFree: 1445618500 kB MemAvailable: 1453013224 kB Buffers: 6516 kB Cached: 128653956 kB SwapCached: 0 kB Active: 118110812 kB Inactive: 11436620 kB Active(anon): 115345744 kB Inactive(anon): 945292 kB When the Active(anon) is 115345744 kB, insmod module triggers the ZONE_DMA32 watermark. perf record -e vmscan:mm_vmscan_lru_isolate -aR perf script isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2 nr_skipped=2 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29 nr_skipped=29 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon See nr_scanned=28835844. 28835844 * 4k = 115343376KB approximately equal to 115345744 kB. If increase Active(anon) to 1000G then insmod module triggers the ZONE_DMA32 watermark. hard lockup will occur. In my device nr_scanned = 0000000003e3e937 when hard lockup. Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB. [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 ffffc90006fb7c30: 0000000000000020 0000000000000000 ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000 ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8 ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48 ffffc90006fb7c70: 0000000000000000 0000000000000000 ffffc90006fb7c80: 0000000000000000 0000000000000000 ffffc90006fb7c90: 0000000000000000 0000000000000000 ffffc90006fb7ca0: 0000000000000000 0000000003e3e937 ffffc90006fb7cb0: 0000000000000000 0000000000000000 ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000 About the Fixes: Why did it take eight years to be discovered? The problem requires the following conditions to occur: 1. The device memory should be large enough. 2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. 3. The memory in ZONE_DMA32 needs to reach the watermark. If the memory is not large enough, or if the usage design of ZONE_DMA32 area memory is reasonable, this problem is difficult to detect. notes: The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but other suitable scenarios may also trigger the problem. Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis") Signed-off-by: liuye <liuye@kylinos.cn> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	sh: boards: Use imply to enable hardware with complex dependencies	Geert Uytterhoeven	1	-2/+2
	If CONFIG_I2C=n: WARNING: unmet direct dependencies detected for SND_SOC_AK4642 Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && I2C [=n] Selected by [y]: - SH_7724_SOLUTION_ENGINE [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y] WARNING: unmet direct dependencies detected for SND_SOC_DA7210 Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && SND_SOC_I2C_AND_SPI [=n] Selected by [y]: - SH_ECOVEC [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y] Fix this by replacing select by imply, instead of adding a dependency on I2C. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501240836.OvXqmANX-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>