linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-07-01	arm64: Add the arm64.nosve command line option	Marc Zyngier	6	-17/+48
	In order to be able to completely disable SVE even if the HW seems to support it (most likely because the FW is broken), move the SVE setup into the EL2 finalisation block, and use a new idreg override to deal with it. Note that we also nuke id_aa64zfr0_el1 as a byproduct, and that SME also gets disabled, due to the dependency between the two features. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-9-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: Add the arm64.nosme command line option	Marc Zyngier	6	-46/+65
	In order to be able to completely disable SME even if the HW seems to support it (most likely because the FW is broken), move the SME setup into the EL2 finalisation block, and use a new idreg override to deal with it. Note that we also nuke id_aa64smfr0_el1 as a byproduct. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-8-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: Expose a __check_override primitive for oddball features	Marc Zyngier	1	-6/+11
	In order to feal with early override of features that are not classically encoded in a standard ID register with a 4 bit wide field, add a primitive that takes a sysreg value as an input (instead of the usual sysreg name) as well as a bit field width (usually 4). No functional change. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-7-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: Allow the idreg override to deal with variable field width	Marc Zyngier	1	-12/+16
	Currently, the override mechanism can only deal with 4bit fields, which is the most common case. However, we now have a bunch of ID registers that have more diverse field widths, such as ID_AA64SMFR0_EL1, which has fields that are a single bit wide. Add the support for variable width, and a macro that encodes a feature width of 4 for all existing override. No functional change. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-6-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: Factor out checking of a feature against the override into a macro	Marc Zyngier	1	-14/+20
	Checking for a feature being supported from assembly code is a bit tedious if we need to factor in the idreg override. Since we already have such code written for forcing nVHE, move the whole thing into a macro. This heavily relies on the override structure being called foo_override for foo_el1. No functional change. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-5-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: Allow sticky E2H when entering EL1	Marc Zyngier	1	-24/+10
	For CPUs that have the unfortunate mis-feature to be stuck in VHE mode, we perform a funny dance where we completely shortcut the normal boot process to enable VHE and run the kernel at EL2, and only then start booting the kernel. Not only this is pretty ugly, but it means that the EL2 finalisation occurs before we have processed the sysreg override. Instead, start executing the kernel as if it was an EL1 guest and rely on the normal EL2 finalisation to go back to EL2. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-4-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: Save state of HCR_EL2.E2H before switch to EL1	Marc Zyngier	3	-5/+20
	As we're about to switch the way E2H-stuck CPUs boot, save the boot CPU E2H state as a flag tied to the boot mode that can then be checked by the idreg override code. This allows us to replace the is_kernel_in_hyp_mode() check with a simple comparison with this state, even when running at EL1. Note that this flag isn't saved in __boot_cpu_mode, and is only kept in a register in the assembly code. Use with caution. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-3-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: Rename the VHE switch to "finalise_el2"	Marc Zyngier	5	-22/+22
	as we are about to perform a lot more in 'mutate_to_vhe' than we currently do, this function really becomes the point where we finalise the basic EL2 configuration. Reflect this into the code by renaming a bunch of things: - HVC_VHE_RESTART -> HVC_FINALISE_EL2 - switch_to_vhe --> finalise_el2 - mutate_to_vhe -> __finalise_el2 No functional changes. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-2-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01	arm64: mm: fix booting with 52-bit address space	Ard Biesheuvel	3	-16/+22
	Joey reports that booting 52-bit VA capable builds on 52-bit VA capable CPUs is broken since commit 0d9b1ffefabe ("arm64: mm: make vabits_actual a build time constant if possible"). This is due to the fact that the primary CPU reads the vabits_actual variable before it has been assigned. The reason for deferring the assignment of vabits_actual was that we try to perform as few stores to memory as we can with the MMU and caches off, due to the cache coherency issues it creates. Since __cpu_setup() [which is where the read of vabits_actual occurs] is also called on the secondary boot path, we cannot just read the CPU ID registers directly, given that the size of the VA space is decided by the capabilities of the primary CPU. So let's read vabits_actual only on the secondary boot path, and read the CPU ID registers directly on the primary boot path, by making it a function parameter of __cpu_setup(). To ensure that all users of vabits_actual (including kasan_early_init()) observe the correct value, move the assignment of vabits_actual back into asm code, but still defer it to after the MMU and caches have been enabled. Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 0d9b1ffefabe ("arm64: mm: make vabits_actual a build time constant if possible") Reported-by: Joey Gouly <joey.gouly@arm.com> Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220701111045.2944309-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-29	arm64: head: remove __PHYS_OFFSET	Mark Rutland	1	-8/+3
	It's very easy to confuse __PHYS_OFFSET and PHYS_OFFSET. To clarify things, let's remove __PHYS_OFFSET and use KERNEL_START directly, with comments to show that we're using physical address, as we do for other objects. At the same time, update the comment regarding the kernel entry address to mention __pa(KERNEL_START) rather than __pa(PAGE_OFFSET). There should be no functional change as a result of this patch. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220629041207.1670133-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-29	arm64: lds: use PROVIDE instead of conditional definitions	Ard Biesheuvel	1	-32/+31
	Currently, a build with CONFIG_EFI=n and CONFIG_KASAN=y will not complete successfully because of missing symbols. This is due to the fact that the __pi_ prefixed aliases for __memcpy/__memmove were put inside a #ifdef CONFIG_EFI block inadvertently, and are therefore missing from the build in question. These definitions should only be provided when needed, as they will otherwise clutter up the symbol table, kallsyms etc for no reason. Fortunately, instead of using CPP conditionals, we can achieve the same result by using the linker's PROVIDE() directive, which only defines a symbol if it is required to complete the link. So let's use that for all symbols alias definitions. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220629083246.3729177-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: setup: drop early FDT pointer helpers	Ard Biesheuvel	3	-20/+0
	We no longer need to call into the kernel to map the FDT before calling into the kernel so let's drop the helpers we added for this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-22-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: avoid relocating the kernel twice for KASLR	Ard Biesheuvel	6	-140/+171
	Currently, when KASLR is in effect, we set up the kernel virtual address space twice: the first time, the KASLR seed is looked up in the device tree, and the kernel virtual mapping is torn down and recreated again, after which the relocations are applied a second time. The latter step means that statically initialized global pointer variables will be reset to their initial values, and to ensure that BSS variables are not set to values based on the initial translation, they are cleared again as well. All of this is needed because we need the command line (taken from the DT) to tell us whether or not to randomize the virtual address space before entering the kernel proper. However, this code has expanded little by little and now creates global state unrelated to the virtual randomization of the kernel before the mapping is torn down and set up again, and the BSS cleared for a second time. This has created some issues in the past, and it would be better to avoid this little dance if possible. So instead, let's use the temporary mapping of the device tree, and execute the bare minimum of code to decide whether or not KASLR should be enabled, and what the seed is. Only then, create the virtual kernel mapping, clear BSS, etc and proceed as normal. This avoids the issues around inconsistent global state due to BSS being cleared twice, and is generally more maintainable, as it permits us to defer all the remaining DT parsing and KASLR initialization to a later time. This means the relocation fixup code runs only a single time as well, allowing us to simplify the RELR handling code too, which is not idempotent and was therefore required to keep track of the offset that was applied the first time around. Note that this means we have to clone a pair of FDT library objects, so that we can control how they are built - we need the stack protector and other instrumentation disabled so that the code can tolerate being called this early. Note that only the kernel page tables and the temporary stack are mapped read-write at this point, which ensures that the early code does not modify any global state inadvertently. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: kaslr: defer initialization to initcall where permitted	Ard Biesheuvel	1	-55/+40
	The early KASLR init code runs extremely early, and anything that could be deferred until later should be. So let's defer the randomization of the module region until much later - this also simplifies the arithmetic, given that we no longer have to reason about the link time vs load time placement of the core kernel explicitly. Also get rid of the global status variable, and infer the status reported by the diagnostic print from other KASLR related context. While at it, get rid of the special case for KASAN without KASAN_VMALLOC, which never occurs in practice. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-20-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: record CPU boot mode after enabling the MMU	Ard Biesheuvel	3	-39/+23
	In order to avoid having to touch memory with the MMU and caches disabled, and therefore having to invalidate it from the caches explicitly, just defer storing the value until after the MMU has been turned on, unless we are giving up with an error. While at it, move the associated variable definitions into C code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-19-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: populate kernel page tables with MMU and caches on	Ard Biesheuvel	1	-46/+16
	Now that we can access the entire kernel image via the ID map, we can execute the page table population code with the MMU and caches enabled. The only thing we need to ensure is that translations via TTBR1 remain disabled while we are updating the page tables the second time around, in case KASLR wants them to be randomized. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-18-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: factor out TTBR1 assignment into a macro	Ard Biesheuvel	2	-8/+14
	Create a macro load_ttbr1 to avoid having to repeat the same instruction sequence 3 times in a subsequent patch. No functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-17-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: idreg-override: use early FDT mapping in ID map	Ard Biesheuvel	2	-11/+7
	Instead of calling into the kernel to map the FDT into the kernel page tables before even calling start_kernel(), let's switch to the initial, temporary mapping of the device tree that has been added to the ID map. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-16-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: create a temporary FDT mapping in the initial ID map	Ard Biesheuvel	2	-3/+17
	We need to access the DT very early to get at the command line and the KASLR seed, which currently means we rely on some hacks to call into the kernel before really calling into the kernel, which is undesirable. So instead, let's create a mapping for the FDT in the initial ID map, which is feasible now that it has been extended to cover more than a single page or block, and can be updated in place to remap other output addresses. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-15-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: use relative references to the RELA and RELR tables	Ard Biesheuvel	2	-17/+8
	Formerly, we had to access the RELA and RELR tables via the kernel mapping that was being relocated, and so deriving the start and end addresses using ADRP/ADD references was not possible, as the relocation code runs from the ID map. Now that we map the entire kernel image via the ID map, we can simplify this, and just load the entries via the ID map as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-14-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: cover entire kernel image in initial ID map	Ard Biesheuvel	5	-21/+76
	As a first step towards avoiding the need to create, tear down and recreate the kernel virtual mapping with MMU and caches disabled, start by expanding the ID map so it covers the page tables as well as all executable code. This will allow us to populate the page tables with the MMU and caches on, and call KASLR init code before setting up the virtual mapping. Since this ID map is only needed at boot, create it as a temporary set of page tables, and populate the permanent ID map after enabling the MMU and caches. While at it, switch to read-only attributes for the where possible, as writable permissions are only needed for the initial kernel page tables. Note that on 4k granule configurations, the permanent ID map will now be reduced to a single page rather than a 2M block mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-13-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: add helper function to remap regions in early page tables	Ard Biesheuvel	1	-0/+33
	The asm macros used to create the initial ID map and kernel mappings don't support randomly remapping parts of the address space after it has been populated. What we can do, however, given that all block or page mappings are created at the final level, is take a subset of the mapped range and update its attributes or output address. This will permit us to make parts of these page tables read-only, or remap a part of it to cover the device tree. So add a helper that encapsulates this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-12-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: mm: provide idmap pointer to cpu_replace_ttbr1()	Ard Biesheuvel	5	-9/+14
	In preparation for changing the way we initialize the permanent ID map, update cpu_replace_ttbr1() so we can use it with the initial ID map as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-11-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: pass ID map root table address to __enable_mmu()	Ard Biesheuvel	2	-6/+9
	We will be adding an initial ID map that covers the entire kernel image, so we will pass the actual ID map root table to use to __enable_mmu(), rather than hard code it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-10-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: kernel: drop unnecessary PoC cache clean+invalidate	Ard Biesheuvel	1	-11/+0
	Some early boot code runs before the virtual placement of the kernel is finalized, and we used to go back to the very start and recreate the ID map along with the page tables describing the virtual kernel mapping, and this involved setting some global variables with the caches off. In order to ensure that global state created by the KASLR code is not corrupted by the cache invalidation that occurs in that case, we needed to clean those global variables to the PoC explicitly. This is no longer needed now that the ID map is created only once (and the associated global variable updates are no longer repeated). So drop the cache maintenance that is no longer necessary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220624150651.1358849-9-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: split off idmap creation code	Ard Biesheuvel	1	-49/+52
	Split off the creation of the ID map page tables, so that we can avoid running it again unnecessarily when KASLR is in effect (which only randomizes the virtual placement). This will permit us to drop some explicit cache maintenance to the PoC which was necessary because the cache invalidation being performed on some global variables might otherwise clobber unrelated variables that happen to share a cacheline. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-8-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: switch to map_memory macro for the extended ID map	Ard Biesheuvel	1	-39/+37
	In a future patch, we will start using an ID map that covers the entire image, rather than a single page. This means that we need to deal with the pathological case of an extended ID map where the kernel image does not fit neatly inside a single entry at the root level, which means we will need to create additional table entries and map additional pages for page tables. The existing map_memory macro already takes care of most of that, so let's just extend it to deal with this case as well. While at it, drop the conditional branch on the value of T0SZ: we don't set the variable anymore in the entry code, and so we can just let the map_memory macro deal with the case where the output address exceeds VA_BITS. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-7-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: simplify page table mapping macros (slightly)	Ard Biesheuvel	1	-33/+22
	Simplify the macros in head.S that are used to set up the early page tables, by switching to immediates for the number of bits that are interpreted as the table index at each level. This makes it much easier to infer from the instruction stream what is going on, and reduces the number of instructions emitted substantially. Note that the extended ID map for cases where no additional level needs to be configured now uses a compile time size as well, which means that we interpret up to 10 bits as the table index at the root level (for 52-bit physical addressing), without taking into account whether or not this is supported on the current system. However, those bits can only be set if we are executing the image from an address that exceeds the 48-bit PA range, and are guaranteed to be cleared otherwise, and given that we are dealing with a mapping in the lower TTBR0 range of the address space, the result is therefore the same as if we'd mask off only 6 bits. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-6-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: drop idmap_ptrs_per_pgd	Ard Biesheuvel	3	-6/+3
	The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even though it is updated with the MMU and caches disabled. However, we never bother to read the value again except in the very next instruction, and so we can just drop the variable entirely. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220624150651.1358849-5-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: move assignment of idmap_t0sz to C code	Ard Biesheuvel	5	-15/+20
	Setting idmap_t0sz involves fiddling with the caches if done with the MMU off. Since we will be creating an initial ID map with the MMU and caches off, and the permanent ID map with the MMU and caches on, let's move this assignment of idmap_t0sz out of the startup code, and replace it with a macro that simply issues the three instructions needed to calculate the value wherever it is needed before the MMU is turned on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-4-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: mm: make vabits_actual a build time constant if possible	Ard Biesheuvel	4	-16/+22
	Currently, we only support 52-bit virtual addressing on 64k pages configurations, and in all other cases, vabits_actual is guaranteed to equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in that case. While at it, move the assignment out of the asm entry code - it has no need to be there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-3-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24	arm64: head: move kimage_vaddr variable into C file	Ard Biesheuvel	2	-7/+3
	This variable definition does not need to be in head.S so move it out. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220624150651.1358849-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-19	Linux 5.19-rc3	Linus Torvalds	1	-1/+1

2022-06-18	ext4: fix a doubled word "need" in a comment	Xiang wangx	1	-1/+1
	Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Link: https://lore.kernel.org/r/20220605091503.12513-1-wangxiang@cdjrlc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	ext4: add reserved GDT blocks check	Zhang Yi	1	-0/+10
	We capture a NULL pointer issue when resizing a corrupt ext4 image which is freshly clear resize_inode feature (not run e2fsck). It could be simply reproduced by following steps. The problem is because of the resize_inode feature was cleared, and it will convert the filesystem to meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was not reduced to zero, so could we mistakenly call reserve_backup_gdb() and passing an uninitialized resize_inode to it when adding new group descriptors. mkfs.ext4 /dev/sda 3G tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck mount /dev/sda /mnt resize2fs /dev/sda 8G ======== BUG: kernel NULL pointer dereference, address: 0000000000000028 CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748 ... RIP: 0010:ext4_flex_group_add+0xe08/0x2570 ... Call Trace: <TASK> ext4_resize_fs+0xbec/0x1660 __ext4_ioctl+0x1749/0x24e0 ext4_ioctl+0x12/0x20 __x64_sys_ioctl+0xa6/0x110 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f2dd739617b ======== The fix is simple, add a check in ext4_resize_begin() to make sure that the es->s_reserved_gdt_blocks is zero when the resize_inode feature is disabled. Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	ext4: make variable "count" signed	Ding Xiang	1	-1/+2
	Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to be a signed integer so we can correctly check for an error code returned by dx_make_map(). Fixes: 46c116b920eb ("ext4: verify dir block before splitting it") Cc: stable@kernel.org Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	ext4: correct the judgment of BUG in ext4_mb_normalize_request	Baokun Li	1	-1/+16
	ext4_mb_normalize_request() can move logical start of allocated blocks to reduce fragmentation and better utilize preallocation. However logical block requested as a start of allocation (ac->ac_o_ex.fe_logical) should always be covered by allocated blocks so we should check that by modifying and to or in the assertion. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220528110017.354175-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	ext4: fix bug_on ext4_mb_use_inode_pa	Baokun Li	1	-0/+9
	Hulk Robot reported a BUG_ON: ================================================================== kernel BUG at fs/ext4/mballoc.c:3211! [...] RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f [...] Call Trace: ext4_mb_new_blocks+0x9df/0x5d30 ext4_ext_map_blocks+0x1803/0x4d80 ext4_map_blocks+0x3a4/0x1a10 ext4_writepages+0x126d/0x2c30 do_writepages+0x7f/0x1b0 __filemap_fdatawrite_range+0x285/0x3b0 file_write_and_wait_range+0xb1/0x140 ext4_sync_file+0x1aa/0xca0 vfs_fsync_range+0xfb/0x260 do_fsync+0x48/0xa0 [...] ================================================================== Above issue may happen as follows: ------------------------------------- do_fsync vfs_fsync_range ext4_sync_file file_write_and_wait_range __filemap_fdatawrite_range do_writepages ext4_writepages mpage_map_and_submit_extent mpage_map_one_extent ext4_map_blocks ext4_mb_new_blocks ext4_mb_normalize_request >>> start + size <= ac->ac_o_ex.fe_logical ext4_mb_regular_allocator ext4_mb_simple_scan_group ext4_mb_use_best_found ext4_mb_new_preallocation ext4_mb_new_inode_pa ext4_mb_use_inode_pa >>> set ac->ac_b_ex.fe_len <= 0 ext4_mb_mark_diskspace_used >>> BUG_ON(ac->ac_b_ex.fe_len <= 0); we can easily reproduce this problem with the following commands: `fallocate -l100M disk` `mkfs.ext4 -b 1024 -g 256 disk` `mount disk /mnt` `fsstress -d /mnt -l 0 -n 1000 -p 1` The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP. Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur when the size is truncated. So start should be the start position of the group where ac_o_ex.fe_logical is located after alignment. In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP is very large, the value calculated by start_off is more accurate. Cc: stable@kernel.org Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	ext4: fix up test_dummy_encryption handling for new mount API	Eric Biggers	1	-63/+71
	Since ext4 was converted to the new mount API, the test_dummy_encryption mount option isn't being handled entirely correctly, because the needed fscrypt_set_test_dummy_encryption() helper function combines parsing/checking/applying into one function. That doesn't work well with the new mount API, which split these into separate steps. This was sort of okay anyway, due to the parsing logic that was copied from fscrypt_set_test_dummy_encryption() into ext4_parse_param(), combined with an additional check in ext4_check_test_dummy_encryption(). However, these overlooked the case of changing the value of test_dummy_encryption on remount, which isn't allowed but ext4 wasn't detecting until ext4_apply_options() when it's too late to fail. Another bug is that if test_dummy_encryption was specified multiple times with an argument, memory was leaked. Fix this up properly by using the new helper functions that allow splitting up the parse/check/apply steps for test_dummy_encryption. Fixes: cebe85d570cf ("ext4: switch to the new mount api") Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220526040412.173025-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	ext4: use kmemdup() to replace kmalloc + memcpy	Shuqi Zhang	1	-2/+1
	Replace kmalloc + memcpy with kmemdup() Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220525030120.803330-1-zhangshuqi3@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	ext4: fix super block checksum incorrect after mount	Ye Bin	1	-8/+8
	We got issue as follows: [home]# mount /dev/sda test EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended [home]# dmesg EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended EXT4-fs (sda): Errors on filesystem, clearing orphan list. EXT4-fs (sda): recovery complete EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none. [home]# debugfs /dev/sda debugfs 1.46.5 (30-Dec-2021) Checksum errors in superblock! Retrying... Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update super block checksum. To solve above issue, defer update super block checksum after ext4_orphan_cleanup. Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220525012904.1604737-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18	cifs: when a channel is not found for server, log its connection id	Shyam Prasad N	1	-0/+3
	cifs_ses_get_chan_index gets the index for a given server pointer. When a match is not found, we warn about a possible bug. However, printing details about the non-matching server could be more useful to debug here. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-17	x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page	Kirill A. Shutemov	1	-1/+14
	load_unaligned_zeropad() can lead to unwanted loads across page boundaries. The unwanted loads are typically harmless. But, they might be made to totally unrelated or even unmapped memory. load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now #VE) to recover from these unwanted loads. In TDX guests, the second page can be shared page and a VMM may configure it to trigger #VE. The kernel assumes that #VE on a shared page is an MMIO access and tries to decode instruction to handle it. In case of load_unaligned_zeropad() it may result in confusion as it is not MMIO access. Fix it by detecting split page MMIO accesses and failing them. load_unaligned_zeropad() will recover using exception fixups. The issue was discovered by analysis and reproduced artificially. It was not triggered during testing. [ dhansen: fix up changelogs and comments for grammar and clarity, plus incorporate Kirill's off-by-one fix] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220614120135.14812-4-kirill.shutemov@linux.intel.com
2022-06-17	x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"	Hans de Goede	4	-17/+18
	This reverts commit 4c5e242d3e93. Prior to 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions"), E820 regions did not affect PCI host bridge windows. We only looked at E820 regions and avoided them when allocating new MMIO space. If firmware PCI bridge window and BAR assignments used E820 regions, we left them alone. After 4c5e242d3e93, we removed E820 regions from the PCI host bridge windows before looking at BARs, so firmware assignments in E820 regions looked like errors, and we moved things around to fit in the space left (if any) after removing the E820 regions. This unnecessary BAR reassignment broke several machines. Guilherme reported that Steam Deck fails to boot after 4c5e242d3e93. We clipped the window that contained most 32-bit BARs: BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff] which forced us to reassign all those BARs, for example, this NVMe BAR: pci 0000:00:01.2: PCI bridge to [bus 01] pci 0000:00:01.2: bridge window [mem 0x80600000-0x806fffff] pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit] pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff] pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit] All the reassignments were successful, so the devices should have been functional at the new addresses, but some were not. Andy reported a similar failure on an Intel MID platform. Benjamin reported a similar failure on a VMWare Fusion VM. Note: this is not a clean revert; this revert keeps the later change to make the clipping dependent on a new pci_use_e820 bool, moving the checking of this bool to arch_remove_reservations(). [bhelgaas: commit log, add more reporters and testers] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109 Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reported-by: Benjamin Coddington <bcodding@redhat.com> Reported-by: Jongman Heo <jongman.heo@gmail.com> Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions") Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-06-17	arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer	Will Deacon	1	-2/+0
	Invalidating the buffer memory in arch_sync_dma_for_device() for FROM_DEVICE transfers When using the streaming DMA API to map a buffer prior to inbound non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU cachelines so that they will not be written back during the transfer and corrupt the buffer contents written by the DMA. This, however, poses two potential problems: (1) If the DMA transfer does not write to every byte in the buffer, then the unwritten bytes will contain stale data once the transfer has completed. (2) If the buffer has a virtual alias in userspace, then stale data may be visible via this alias during the period between performing the cache invalidation and the DMA writes landing in memory. Address both of these issues by cleaning (aka writing-back) the dirty lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding them using invalidation. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-17	docs/zh_CN/LoongArch: Fix notes rendering by using reST directives	Yanteng Si	2	-12/+16
	Notes are better expressed with reST admonitions. Fixes: f23b22599f8e ("Documentation/zh_CN: Add basic LoongArch documentations") Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-17	docs/LoongArch: Fix notes rendering by using reST directives	Yanteng Si	2	-15/+22
	Notes are better expressed with reST admonitions. Fixes: 0ea8ce61cb2c ("Documentation: LoongArch: Add basic documentations") Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-17	LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS	Youling Tang	1	-0/+1
	Commit c604abc3f6e ("vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG") splits ELF_DETAILS from STABS_DEBUG, resulting in missing ELF_DETAILS information in LoongArch architecture, so add it. Fixes: c604abc3f6e ("vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG") Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-17	io_uring: recycle provided buffer if we punt to io-wq	Jens Axboe	1	-0/+1
	io_arm_poll_handler() will recycle the buffer appropriately if we end up arming poll (or if we're ready to retry), but not for the io-wq case if we have attempted poll first. Explicitly recycle the buffer to avoid both hanging on to it too long, but also to avoid multiple reads grabbing the same one. This can happen for ring mapped buffers, since it hasn't necessarily been committed. Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Link: https://github.com/axboe/liburing/issues/605 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-16	dm mirror log: round up region bitmap size to BITS_PER_LONG	Mikulas Patocka	1	-2/+1
	The code in dm-log rounds up bitset_size to 32 bits. It then uses find_next_zero_bit_le on the allocated region. find_next_zero_bit_le accesses the bitmap using unsigned long pointers. So, on 64-bit architectures, it may access 4 bytes beyond the allocated size. Fix this bug by rounding up bitset_size to BITS_PER_LONG. This bug was found by running the lvm2 testsuite with kasan. Fixes: 29121bd0b00e ("[PATCH] dm mirror log: bitset_size fix") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>