Age | Commit message (Collapse) | Author | Files | Lines |
|
FF-A function IDs and error codes will be needed in the hypervisor too,
so move to them to the header file where they can be shared. Rename the
version constants with an "FFA_" prefix so that they are less likely
to clash with other code in the tree.
Co-developed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-2-qperret@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.
Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.
Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116140915.356601-3-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add the CPU Partnumbers for the new Arm designs.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221116140915.356601-2-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add missing kerneldoc and fix alignment on one of the arguments of
apmt_add_platform_device function.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20221111234323.16182-1-bwicaksono@nvidia.com
[will: Fixed up additional indentation issue]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
FPDT provides some boot timing records useful for analyzing
parts of the UEFI boot stack. Given the existing code works
on arm64, and allows reading the values without utilizing
/dev/mem it seems like a good idea to turn it on.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221109174720.203723-1-jeremy.linton@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Implement dynamic shadow call stack support on Clang, by parsing the
unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
instructions, and replacing them with the shadow call stack push and pop
instructions, respectively.
This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them without breaking
anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
manipulations of the return address, replacing them 1:1 with shadow call
stack pushes and pops is guaranteed to result in the desired behavior.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-4-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to allow arches to use code patching to conditionally emit the
shadow stack pushes and pops, rather than always taking the performance
hit even on CPUs that implement alternatives such as stack pointer
authentication on arm64, add a Kconfig symbol that can be set by the
arch to omit the SCS codegen itself, without otherwise affecting how
support code for SCS and compiler options (for register reservation, for
instance) are emitted.
Also, add a static key and some plumbing to omit the allocation of
shadow call stack for dynamic SCS configurations if SCS is disabled at
runtime.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Enable asynchronous unwind table generation for both the core kernel as
well as modules, and emit the resulting .eh_frame sections as init code
so we can use the unwind directives for code patching at boot or module
load time.
This will be used by dynamic shadow call stack support, which will rely
on code patching rather than compiler codegen to emit the shadow call
stack push and pop instructions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add coverage for FEAT_SVE2p1.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-7-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
FEAT_SVE2p1 introduces a number of new SVE instructions. Since there is no
new architectural state added kernel support is simply a new hwcap which
lets userspace know that the feature is supported.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-6-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Since the newly added instruction is in the HINT space we can't reasonably
test for it actually being present.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
FEAT_RPRFM adds a new range prefetch hint within the existing PRFM space
for range prefetch hinting. Add a new hwcap to allow userspace to discover
support for the new instruction.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add FEAT_CSSC to the set of features checked by the hwcap selftest.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
FEAT_CSSC adds a number of new instructions usable to optimise common short
sequences of instructions, add a hwcap indicating that the feature is
available and can be used by userspace.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently for reasons lost in the mists of time the kernel_neon_ APIs are
EXPORT_SYMBOL() but the general policy for floating point usage is that it
should be GPL only given the non-standard runtime environment that holds
while it is in use and PCS impacts when code is compiled for FP usage.
Given the limited existing deployment of non-GPL modules for arm64 and the
fact that other architectures like x86 already make their equivalent
functions GPL only this is not expected to be disruptive to existing users.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221107170747.276910-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The ARM architecture revision v8.4 introduces a data independent timing
control (DIT) which can be set at any exception level, and instructs the
CPU to avoid optimizations that may result in a correlation between the
execution time of certain instructions and the value of the data they
operate on.
The DIT bit is part of PSTATE, and is therefore context switched as
usual, given that it becomes part of the saved program state (SPSR) when
taking an exception. We have also defined a hwcap for DIT, and so user
space can discover already whether or nor DIT is available. This means
that, as far as user space is concerned, DIT is wired up and fully
functional.
In the kernel, however, we never bothered with DIT: we disable at it
boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
might run with DIT enabled if user space happened to set it.
Currently, we have no idea whether or not running privileged code with
DIT disabled on a CPU that implements support for it may result in a
side channel that exposes privileged data to unprivileged user space
processes, so let's be cautious and just enable DIT while running in the
kernel if supported by all CPUs.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Adam Langley <agl@google.com>
Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@gmail.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20221107172400.1851434-1-ardb@kernel.org
[will: Removed cpu_has_dit() as per Mark's suggestion on the list]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
One of the failure paths in the arm_pmu ACPI code is missing an early
return, permitting a NULL pointer dereference upon a memory allocation
failure.
Add the missing return.
Fixes: fe40ffdb7656 ("arm_pmu: rework ACPI probing")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221108093725.1239563-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The current ACPI PMU probing logic tries to associate PMUs with CPUs
when the CPU is first brought online, in order to handle late hotplug,
though PMUs are only registered during early boot, and so for late
hotplugged CPUs this can only associate the CPU with an existing PMU.
We tried to be clever and the have the arm_pmu_acpi_cpu_starting()
callback allocate a struct arm_pmu when no matching instance is found,
in order to avoid duplication of logic. However, as above this doesn't
do anything useful for late hotplugged CPUs, and this requires us to
allocate memory in an atomic context, which is especially problematic
for PREEMPT_RT, as reported by Valentin and Pierre.
This patch reworks the probing to detect PMUs for all online CPUs in the
arm_pmu_acpi_probe() function, which is more aligned with how DT probing
works. The arm_pmu_acpi_cpu_starting() callback only tries to associate
CPUs with an existing arm_pmu instance, avoiding the problem of
allocating in atomic context.
Note that as we didn't previously register PMUs for late-hotplugged
CPUs, this change doesn't result in a loss of existing functionality,
though we will now warn when we cannot associate a CPU with a PMU.
This change allows us to pull the hotplug callback registration into the
arm_pmu_acpi_probe() function, as we no longer need the callbacks to be
invoked shortly after probing the boot CPUs, and can register it without
invoking the calls.
For the moment the arm_pmu_acpi_init() initcall remains to register the
SPE PMU, though in future this should probably be moved elsewhere (e.g.
the arm64 ACPI init code), since this doesn't need to be tied to the
regular CPU PMU code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20210810134127.1394269-2-valentin.schneider@arm.com/
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/20220912155105.1443303-1-pierre.gondois@arm.com/
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/r/20220930111844.1522365-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
A subsequent patch will rework the ACPI probing of PMUs, and we'll need
to match a CPU with a known cpuid in two separate paths.
Factor out the matching logic into a helper function so that it can be
reused.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/r/20220930111844.1522365-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
A subsequent patch will rework the ACPI probing of PMUs, and we'll need
to associate a CPU with a PMU in two separate paths.
Factor out the association logic into a helper function so that it can
be reused.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/r/20220930111844.1522365-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Inspired by x86 commit 864b435514b2("x86/jump_label: Mark arguments as
const to satisfy asm constraints"), constify alternative_has_feature_*
argument to satisfy asm constraints. And Steven in [1] also pointed
out that "The "i" constraint needs to be a constant."
Tested with building a simple external kernel module with "O0".
Before the patch, got similar gcc warnings and errors as below:
In file included from <command-line>:
In function ‘alternative_has_feature_likely’,
inlined from ‘system_capabilities_finalized’ at
arch/arm64/include/asm/cpufeature.h:440:9,
inlined from ‘arm64_preempt_schedule_irq’ at
arch/arm64/kernel/entry-common.c:264:6:
include/linux/compiler_types.h:285:33: warning:
‘asm’ operand 0 probably does not match constraints
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
arch/arm64/include/asm/alternative-macros.h:232:9:
note: in expansion of macro ‘asm_volatile_goto’
232 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:285:33: error:
impossible constraint in ‘asm’
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
arch/arm64/include/asm/alternative-macros.h:232:9:
note: in expansion of macro ‘asm_volatile_goto’
232 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
After the patch, the simple external test kernel module is built fine
with "-O0".
[1]https://lore.kernel.org/all/20210212094059.5f8d05e8@gandalf.local.home/
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221006075542.2658-3-jszhang@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Inspired by x86 commit 864b435514b2("x86/jump_label: Mark arguments as
const to satisfy asm constraints"), mark arch_static_branch()'s and
arch_static_branch_jump()'s arguments as const to satisfy asm
constraints. And Steven in [1] also pointed out that "The "i"
constraint needs to be a constant."
Tested with building a simple external kernel module with "O0".
[1]https://lore.kernel.org/all/20210212094059.5f8d05e8@gandalf.local.home/
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221006075542.2658-2-jszhang@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
IORT E.e now allows SMMUv3 nodes to describe the DeviceID for MSIs
independently of wired GSIVs, where the previous oddly-restrictive
definition meant that an SMMU without PRI support had to provide a
DeviceID even if it didn't support MSIs either. Support this, with
the usual temporary flag definition while the real one is making
its way through ACPICA.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/4b3e2ead4f392d1a47a7528da119d57918e5d806.1664392886.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
ARM Performance Monitoring Unit Table describes the properties of PMU
support in ARM-based system. The APMT table contains a list of nodes,
each represents a PMU in the system that conforms to ARM CoreSight PMU
architecture. The properties of each node include information required
to access the PMU (e.g. MMIO base address, interrupt number) and also
identification. For more detailed information, please refer to the
specification below:
* APMT: https://developer.arm.com/documentation/den0117/latest
* ARM Coresight PMU:
https://developer.arm.com/documentation/ihi0091/latest
The initial support adds the detection of APMT table and generic
infrastructure to create platform devices for ARM CoreSight PMUs.
Similar to IORT the root pointer of APMT is preserved during runtime
and each PMU platform device is given a pointer to the corresponding
APMT node.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20220929002834.32664-1-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
|
|
With the new fortify string system, rework the memcpy to avoid this
warning:
memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4)
Cc: stable@kernel.org
Fixes: 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The return value is wrong in ext4_load_and_init_journal(). The local
variable 'err' need to be initialized before goto out. The original code
in __ext4_fill_super() is fine because it has two return values 'ret'
and 'err' and 'ret' is initialized as -EINVAL. After we factor out
ext4_load_and_init_journal(), this code is broken. So fix it by directly
returning -EINVAL in the error handler path.
Cc: stable@kernel.org
Fixes: 9c1dd22d7422 ("ext4: factor out ext4_load_and_init_journal()")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Syzkaller report issue as follows:
EXT4-fs (loop0): Free/Dirty block details
EXT4-fs (loop0): free_blocks=0
EXT4-fs (loop0): dirty_blocks=0
EXT4-fs (loop0): Block reservation details
EXT4-fs (loop0): i_reserved_data_blocks=0
EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks
------------[ cut here ]------------
WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524
Modules linked in:
CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528
RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296
RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5
R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000
R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461
mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589
ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852
do_writepages+0x3c3/0x680 mm/page-writeback.c:2469
__writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587
writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870
wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044
wb_do_writeback fs/fs-writeback.c:2187 [inline]
wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Above issue may happens as follows:
ext4_da_write_begin
ext4_create_inline_data
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
__ext4_ioctl
ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag
ext4_da_write_begin
ext4_da_convert_inline_data_to_extent
ext4_da_write_inline_data_begin
ext4_da_map_blocks
ext4_insert_delayed_block
if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1
allocated = true;
ext4_es_insert_delayed_block(inode, lblk, allocated);
ext4_writepages
mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC
mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1
ext4_es_remove_extent
ext4_da_release_space(inode, reserved);
if (unlikely(to_free > ei->i_reserved_data_blocks))
-> to_free == 1 but ei->i_reserved_data_blocks == 0
-> then trigger warning as above
To solve above issue, forbid inode do migrate which has inline data.
Cc: stable@kernel.org
Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The rec_len field in the directory entry has to be a multiple of 4. A
corrupted filesystem image can be used to hit a BUG() in
ext4_rec_len_to_disk(), called from make_indexed_dir().
------------[ cut here ]------------
kernel BUG at fs/ext4/ext4.h:2413!
...
RIP: 0010:make_indexed_dir+0x53f/0x5f0
...
Call Trace:
<TASK>
? add_dirent_to_buf+0x1b2/0x200
ext4_add_entry+0x36e/0x480
ext4_add_nondir+0x2b/0xc0
ext4_create+0x163/0x200
path_openat+0x635/0xe90
do_filp_open+0xb4/0x160
? __create_object.isra.0+0x1de/0x3b0
? _raw_spin_unlock+0x12/0x30
do_sys_openat2+0x91/0x150
__x64_sys_open+0x6c/0xa0
do_syscall_64+0x3c/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The fix simply adds a call to ext4_check_dir_entry() to validate the
directory entry, returning -EFSCORRUPTED if the entry is invalid.
CC: stable@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
xfstests generic/011 reported use-after-free bug as follows:
BUG: KASAN: use-after-free in __d_alloc+0x269/0x859
Read of size 15 at addr ffff8880078933a0 by task dirstress/952
CPU: 1 PID: 952 Comm: dirstress Not tainted 6.1.0-rc3+ #77
Call Trace:
__dump_stack+0x23/0x29
dump_stack_lvl+0x51/0x73
print_address_description+0x67/0x27f
print_report+0x3e/0x5c
kasan_report+0x7b/0xa8
kasan_check_range+0x1b2/0x1c1
memcpy+0x22/0x5d
__d_alloc+0x269/0x859
d_alloc+0x45/0x20c
d_alloc_parallel+0xb2/0x8b2
lookup_open+0x3b8/0x9f9
open_last_lookups+0x63d/0xc26
path_openat+0x11a/0x261
do_filp_open+0xcc/0x168
do_sys_openat2+0x13b/0x3f7
do_sys_open+0x10f/0x146
__se_sys_creat+0x27/0x2e
__x64_sys_creat+0x55/0x6a
do_syscall_64+0x40/0x96
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Allocated by task 952:
kasan_save_stack+0x1f/0x42
kasan_set_track+0x21/0x2a
kasan_save_alloc_info+0x17/0x1d
__kasan_kmalloc+0x7e/0x87
__kmalloc_node_track_caller+0x59/0x155
kstrndup+0x60/0xe6
parse_mf_symlink+0x215/0x30b
check_mf_symlink+0x260/0x36a
cifs_get_inode_info+0x14e1/0x1690
cifs_revalidate_dentry_attr+0x70d/0x964
cifs_revalidate_dentry+0x36/0x62
cifs_d_revalidate+0x162/0x446
lookup_open+0x36f/0x9f9
open_last_lookups+0x63d/0xc26
path_openat+0x11a/0x261
do_filp_open+0xcc/0x168
do_sys_openat2+0x13b/0x3f7
do_sys_open+0x10f/0x146
__se_sys_creat+0x27/0x2e
__x64_sys_creat+0x55/0x6a
do_syscall_64+0x40/0x96
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 950:
kasan_save_stack+0x1f/0x42
kasan_set_track+0x21/0x2a
kasan_save_free_info+0x1c/0x34
____kasan_slab_free+0x1c1/0x1d5
__kasan_slab_free+0xe/0x13
__kmem_cache_free+0x29a/0x387
kfree+0xd3/0x10e
cifs_fattr_to_inode+0xb6a/0xc8c
cifs_get_inode_info+0x3cb/0x1690
cifs_revalidate_dentry_attr+0x70d/0x964
cifs_revalidate_dentry+0x36/0x62
cifs_d_revalidate+0x162/0x446
lookup_open+0x36f/0x9f9
open_last_lookups+0x63d/0xc26
path_openat+0x11a/0x261
do_filp_open+0xcc/0x168
do_sys_openat2+0x13b/0x3f7
do_sys_open+0x10f/0x146
__se_sys_creat+0x27/0x2e
__x64_sys_creat+0x55/0x6a
do_syscall_64+0x40/0x96
entry_SYSCALL_64_after_hwframe+0x63/0xcd
When opened a symlink, link name is from 'inode->i_link', but it may be
reset to a new value when revalidate the dentry. If some processes get the
link name on the race scenario, then UAF will happen on link name.
Fix this by implementing 'get_link' interface to duplicate the link name.
Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In a few places, we do unnecessary iterations of
tcp sessions, even when the server struct is provided.
The change avoids it and uses the server struct provided.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb sessions and tcons currently hang off primary channel only.
Secondary channels have the lists as empty. Whenever there's a
need to iterate sessions or tcons, we should use the list in the
corresponding primary channel.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This reverts commit 54cc3dbfc10dc3db7cb1cf49aee4477a8398fbde.
Zev Weiss reports that the reverted patch may cause a regulator
undercount. Here is his report:
... having regulator-dummy set as a supply on my PMBus regulators
(instead of having them as their own top-level regulators without
an upstream supply) leads to enable-count underflow errors when
disabling them:
# echo 0 > /sys/bus/platform/devices/efuse01/state
[ 906.094477] regulator-dummy: Underflow of regulator enable count
[ 906.100563] Failed to disable vout: -EINVAL
[ 136.992676] reg-userspace-consumer efuse01: Failed to configure state: -22
Zev reports that reverting the patch fixes the problem. So let's do that
for now.
Fixes: 54cc3dbfc10d ("hwmon: (pmbus) Add regulator supply into macro")
Cc: Marcello Sylvester Bauer <sylv@sylv.io>
Reported-by: Zev Weiss <zev@bewilderbeest.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Available sensors are enumerated and reported by the SCMI platform server
using a 16bit identification number; not all such sensors are of a type
supported by hwmon subsystem and, among the supported ones, only a subset
could be temperature sensors that have to be registered with the Thermal
Framework.
Potential clashes between hwmon channels indexes and the underlying real
sensors IDs do not play well with the hwmon<-->thermal bridge automatic
registration routines and could need a sensible number of fake dummy
sensors to be made up in order to keep indexes and IDs in sync.
Avoid to use the hwmon<-->thermal bridge dropping the HWMON_C_REGISTER_TZ
attribute and instead explicit register temperature sensors directly with
the Thermal Framework.
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221031114018.59048-1-cristian.marussi@arm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
At region creation time the next region-id is atomically cached so that
there is predictability of region device names. If that region is
destroyed and then a new one is created the region id increments. That
ends up looking like a memory leak, or is otherwise surprising that
identifiers roll forward even after destroying all previously created
regions.
Try to reuse rather than free old region ids at region release time.
While this fixes a cosmetic issue, the needlessly advancing memory
region-id gives the appearance of a memory leak, hence the "Fixes" tag,
but no "Cc: stable" tag.
Cc: Ben Widawsky <bwidawsk@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Fixes: 779dd20cfb56 ("cxl/region: Add region creation support")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752186062.947915.13200195701224993317.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When programming port decode targets, the algorithm wants to ensure that
two devices are compatible to be programmed as peers beneath a given
port. A compatible peer is a target that shares the same dport, and
where that target's interleave position also routes it to the same
dport. Compatibility is determined by the device's interleave position
being >= to distance. For example, if a given dport can only map every
Nth position then positions less than N away from the last target
programmed are incompatible.
The @distance for the host-bridge's cxl_port in a simple dual-ported
host-bridge configuration with 2 direct-attached devices is 1, i.e. An
x2 region divided by 2 dports to reach 2 region targets.
An x4 region under an x2 host-bridge would need 2 intervening switches
where the @distance at the host bridge level is 2 (x4 region divided by
2 switches to reach 4 devices).
However, the distance between peers underneath a single ported
host-bridge is always zero because there is no limit to the number of
devices that can be mapped. In other words, there are no decoders to
program in a passthrough, all descendants are mapped and distance only
starts matters for the intervening descendant ports of the passthrough
port.
Add tracking for the number of dports mapped to a port, and use that to
detect the passthrough case for calculating @distance.
Cc: <stable@vger.kernel.org>
Reported-by: Bobo WL <lmw.bobo@gmail.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Jonathan reports that region creation fails when a single-port
host-bridge connects to a multi-port switch. Mock up that configuration
so a fix can be tested and regression tested going forward.
Reported-by: Bobo WL <lmw.bobo@gmail.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752184838.947915.2167957540894293891.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Fix a few typos where 'goto err_port' was used rather than the object
specific cleanup.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752184255.947915.16163477849330181425.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When a cxl_nvdimm object goes through a ->remove() event (device
physically removed, nvdimm-bridge disabled, or nvdimm device disabled),
then any associated regions must also be disabled. As highlighted by the
cxl-create-region.sh test [1], a single device may host multiple
regions, but the driver was only tracking one region at a time. This
leads to a situation where only the last enabled region per nvdimm
device is cleaned up properly. Other regions are leaked, and this also
causes cxl_memdev reference leaks.
Fix the tracking by allowing cxl_nvdimm objects to track multiple region
associations.
Cc: <stable@vger.kernel.org>
Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1]
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752183647.947915.2045230911503793901.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When a region is deleted any targets that have been previously assigned
to that region hold references to it. Trigger those references to
drop by detaching all targets at unregister_region() time.
Otherwise that region object will leak as userspace has lost the ability
to detach targets once region sysfs is torn down.
Cc: <stable@vger.kernel.org>
Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752183055.947915.17681995648556534844.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Some regions may not have any address space allocated. Skip them when
validating HPA order otherwise a crash like the following may result:
devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9
BUG: kernel NULL pointer dereference, address: 0000000000000000
[..]
RIP: 0010:store_targetN+0x655/0x1740 [cxl_core]
[..]
Call Trace:
<TASK>
kernfs_fop_write_iter+0x144/0x200
vfs_write+0x24a/0x4d0
ksys_write+0x69/0xf0
do_syscall_64+0x3a/0x90
store_targetN+0x655/0x1740:
alloc_region_ref at drivers/cxl/core/region.c:676
(inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850
(inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290
(inlined by) attach_target at drivers/cxl/core/region.c:1410
(inlined by) store_targetN at drivers/cxl/core/region.c:1453
Cc: <stable@vger.kernel.org>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166752182461.947915.497032805239915067.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
These servers are all on the public versions of the roadmap. The model
numbers for Grand Ridge, Granite Rapids, and Sierra Forest were included
in the September 2022 edition of the Instruction Set Extensions document.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20221103203310.5058-1-tony.luck@intel.com
|
|
test_gen_kprobe_cmd() only free buf in fail path, hence buf will leak
when there is no failure. Move kfree(buf) from fail path to common path
to prevent the memleak. The same reason and solution in
test_gen_kretprobe_cmd().
unreferenced object 0xffff888143b14000 (size 2048):
comm "insmod", pid 52490, jiffies 4301890980 (age 40.553s)
hex dump (first 32 bytes):
70 3a 6b 70 72 6f 62 65 73 2f 67 65 6e 5f 6b 70 p:kprobes/gen_kp
72 6f 62 65 5f 74 65 73 74 20 64 6f 5f 73 79 73 robe_test do_sys
backtrace:
[<000000006d7b836b>] kmalloc_trace+0x27/0xa0
[<0000000009528b5b>] 0xffffffffa059006f
[<000000008408b580>] do_one_initcall+0x87/0x2a0
[<00000000c4980a7e>] do_init_module+0xdf/0x320
[<00000000d775aad0>] load_module+0x3006/0x3390
[<00000000e9a74b80>] __do_sys_finit_module+0x113/0x1b0
[<000000003726480d>] do_syscall_64+0x35/0x80
[<000000003441e93b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Link: https://lore.kernel.org/all/20221102072954.26555-1-shangxiaojing@huawei.com/
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Cc: stable@vger.kernel.org
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Since commit ab51e15d535e ("fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag
for fprobe") introduced fprobe_kprobe_handler() for fprobe::ops::func,
unregister_fprobe() fails to unregister the registered if user specifies
FPROBE_FL_KPROBE_SHARED flag.
Moreover, __register_ftrace_function() is possible to change the
ftrace_ops::func, thus we have to check fprobe::ops::saved_func instead.
To check it correctly, it should confirm the fprobe::ops::saved_func is
either fprobe_handler() or fprobe_kprobe_handler().
Link: https://lore.kernel.org/all/166677683946.1459107.15997653945538644683.stgit@devnote3/
Fixes: cad9931f64dc ("fprobe: Add ftrace based probe APIs")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Check if fp->rethook succeeded to be allocated. Otherwise, if
rethook_alloc() fails, then we end up dereferencing a NULL pointer in
rethook_add_node().
Link: https://lore.kernel.org/all/20221025031209.954836-1-rafaelmendsr@gmail.com/
Fixes: 5b0ab78998e3 ("fprobe: Add exit_handler support")
Cc: stable@vger.kernel.org
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
In aggregate kprobe case, when arm_kprobe failed,
we need set the kp->flags with KPROBE_FLAG_DISABLED again.
If not, the 'kp' kprobe will been considered as enabled
but it actually not enabled.
Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/
Fixes: 12310e343755 ("kprobes: Propagate error from arm_kprobe_ftrace()")
Cc: stable@vger.kernel.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
"struct_size() + n" may cause a integer overflow,
use size_add() to handle it.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20220927070247.23148-1-yuzhe@nfschina.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Commit 237405ebef58 ("arm64: cpufeature: Force HWCAP to be based on the
sysreg visible to user-space") forced the hwcaps to use sanitised
user-space view of the id registers. However, the ID register structures
used to select few compat cpufeatures (vfp, crc32, ...) are masked and
hence such hwcaps do not appear in /proc/cpuinfo anymore for PER_LINUX32
personality.
Add the ID register structures explicitly and set the relevant entry as
visible. As these ID registers are now of type visible so make them
available in 64-bit userspace by making necessary changes in register
emulation logic and documentation.
While at it, update the comment for structure ftr_generic_32bits[] which
lists the ID register that use it.
Fixes: 237405ebef58 ("arm64: cpufeature: Force HWCAP to be based on the sysreg visible to user-space")
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Link: https://lore.kernel.org/r/20221103082232.19189-1-amit.kachhap@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Unlike x86, which has machinery to deal with page faults that occur
during the execution of EFI runtime services, arm64 has nothing like
that, and a synchronous exception raised by firmware code brings down
the whole system.
With more EFI based systems appearing that were not built to run Linux
(such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as
the introduction of PRM (platform specific firmware routines that are
callable just like EFI runtime services), we are more likely to run into
issues of this sort, and it is much more likely that we can identify and
work around such issues if they don't bring down the system entirely.
Since we already use a EFI runtime services call wrapper in assembler,
we can quite easily add some code that captures the execution state at
the point where the call is made, allowing us to revert to this state
and proceed execution if the call triggered a synchronous exception.
Given that the kernel and the firmware don't share any data structures
that could end up in an indeterminate state, we can happily continue
running, as long as we mark the EFI runtime services as unavailable from
that point on.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Swap the 1st and 2nd arguments to be consistent with the usage of
kvcalloc().
Fixes: c9b8fecddb5b ("KVM: use kvcalloc for array allocations")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Message-Id: <20221103011749.139262-1-liaochang1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|