Age | Commit message (Collapse) | Author | Files | Lines |
|
Current ACPI ARM64 NUMA initialization code in
acpi_numa_gicc_affinity_init()
carries out NUMA nodes creation and cpu<->node mappings at the same time
in the arch backend so that a single SRAT walk is needed to parse both
pieces of information. This implies that the cpu<->node mappings must
be stashed in an array (sized NR_CPUS) so that SMP code can later use
the stashed values to avoid another SRAT table walk to set-up the early
cpu<->node mappings.
If the kernel is configured with a NR_CPUS value less than the actual
processor entries in the SRAT (and MADT), the logic in
acpi_numa_gicc_affinity_init() is broken in that the cpu<->node mapping
is only carried out (and stashed for future use) only for a number of
SRAT entries up to NR_CPUS, which do not necessarily correspond to the
possible cpus detected at SMP initialization in
acpi_map_gic_cpu_interface() (ie MADT and SRAT processor entries order
is not enforced), which leaves the kernel with broken cpu<->node
mappings.
Furthermore, given the current ACPI NUMA code parsing logic in
acpi_numa_gicc_affinity_init(), PXM domains for CPUs that are not parsed
because they exceed NR_CPUS entries are not mapped to NUMA nodes (ie the
PXM corresponding node is not created in the kernel) leaving the system
with a broken NUMA topology.
Rework the ACPI ARM64 NUMA initialization process so that the NUMA
nodes creation and cpu<->node mappings are decoupled. cpu<->node
mappings are moved to SMP initialization code (where they are needed),
at the cost of an extra SRAT walk so that ACPI NUMA mappings can be
batched before being applied, fixing current parsing pitfalls.
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: John Garry <john.garry@huawei.com>
Fixes: d8b47fca8c23 ("arm64, ACPI, NUMA: NUMA support based on SRAT and
SLIT")
Link: http://lkml.kernel.org/r/1527768879-88161-2-git-send-email-xiexiuqi@huawei.com
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Flatmem is useful in reducing kernel memory usage.
One usecase is in kdump kernel. We are able to save
~14M by moving to flatmem scheme.
Cc: xe-kernel@external.cisco.com
Cc: Nikunj Kela <nkela@cisco.com>
Signed-off-by: Nikunj Kela <nkela@cisco.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The arm-soc tree does a good job handling .dts files, so exclude them
from the ARM64 entry in MAINTAINERS.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
lkdtm calls flush_icache_range(), which results in an out-of-line call
to __flush_icache_range(), which is not exported to modules.
Export the symbol to modules to fix this build breakage.
Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit 37c3ec2d810f ("arm64: topology: divorce MC scheduling domain from
core_siblings") selected the smallest of LLC, socket siblings, and NUMA
node siblings to ensure that the sched domain we build for the MC layer
isn't larger than the DIE above it or it's shrunk to the socket or NUMA
node if LLC exist acrosis NUMA node/chiplets.
Commit acd32e52e4e0 ("arm64: topology: Avoid checking numa mask for
scheduler MC selection") reverted the NUMA siblings checks since the
CPU topology masks weren't updated on hotplug at that time.
This patch re-introduces numa mask check as the CPU and NUMA topology
is now updated in hotplug paths. Effectively, this patch does the
partial revert of commit acd32e52e4e0.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Similar to core_sibling and thread_sibling, it's better to align and
rename llc_siblings to llc_sibling.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
We already repopulate the information on CPU hotplug-in, so we can safely
remove the CPU topology and NUMA cpumap information during CPU hotplug
out operation. This will help to provide the correct cpumask for
scheduler domains.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
It's incorrect to iterate over all the possible CPUs to update the
sibling masks when any CPU is hotplugged in. In case the topology
siblings masks of the CPU is removed when is it hotplugged out, we
end up updating those masks when one of it's sibling is powered up
again. This will provide inconsistent view.
Further, since the CPU calling update_sibling_masks is yet to be set
online, there's no need to compare itself with each online CPU when
updating the siblings masks.
This patch restricts updation of sibling masks only for CPUs that are
already online. It also the drops the unnecessary cpuid check.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch adds support to remove all the CPU topology information using
clear_cpu_topology and also resetting the sibling information on other
sibling CPUs. This will be used in cpu_disable so that all the topology
sibling information is removed on CPU hotplug out.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently numa_clear_node removes both cpu information from the NUMA
node cpumap as well as the NUMA node id from the cpu. Similarly
numa_store_cpu_info updates both percpu nodeid and NUMA cpumap.
However we need to retain the numa node id for the cpu and only remove
the cpu information from the numa node cpumap during CPU hotplug out.
The same can be extended for hotplugging in the CPU.
This patch separates out numa_{add,remove}_cpu from numa_clear_node and
numa_store_cpu_info.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently reset_cpu_topology clears all the CPU topology information
and resets to default values. However we may need to just clear the
information when we hotplug out the CPU. In preparation to add the
support the same, let's refactor reset_cpu_topology to just reset
the information and move clearing out the topology information to
clear_cpu_topology.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The ERRATA_MIDR_REV_RANGE macro assigns ARM64_CPUCAP_LOCAL_CPU_ERRATUM
to the '.type' field of the 'struct arm64_cpu_capabilities', so there's
no need to assign it explicitly as well.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
arm64 requires break-before-make. Originally, before
setting up new pmd/pud entry for huge mapping, in few
cases, the modifying pmd/pud entry was still valid
and pointing to next level page table as we only
clear off leaf PTE in unmap leg.
a) This was resulting into stale entry in TLBs (as few
TLBs also cache intermediate mapping for performance
reasons)
b) Also, modifying pmd/pud was the only reference to
next level page table and it was getting lost without
freeing it. So, page leaks were happening.
Implement pud_free_pmd_page() and pmd_free_pte_page() to
enforce BBM and also free the leaking page tables.
Implementation requires,
1) Clearing off the current pud/pmd entry
2) Invalidation of TLB
3) Freeing of the un-used next level page tables
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add an interface to invalidate intermediate page tables
from TLB for kernel.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Patching kernel instructions at runtime requires other CPUs to undergo
a context synchronisation event via an explicit ISB or an IPI in order
to ensure that the new instructions are visible. This is required even
for "hotpatch" instructions such as NOP and BL, so avoid optimising in
this case and always go via stop_machine() when performing general
patching.
ftrace isn't quite as strict, so it can continue to call the nosync
code directly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When invalidating the instruction cache for a kernel mapping via
flush_icache_range(), it is also necessary to flush the pipeline for
other CPUs so that instructions fetched into the pipeline before the
I-cache invalidation are discarded. For example, if module 'foo' is
unloaded and then module 'bar' is loaded into the same area of memory,
a CPU could end up executing instructions from 'foo' when branching into
'bar' if these instructions were fetched into the pipeline before 'foo'
was unloaded.
Whilst this is highly unlikely to occur in practice, particularly as
any exception acts as a context-synchronizing operation, following the
letter of the architecture requires us to execute an ISB on each CPU
in order for the new instruction stream to be visible.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Now that users have been migrated to PSR_AA32, kill the unused
COMPAT_PSR definitions.
The only difference we need a definition for is COMPAT_PSR_DIT_BIT,
which differs from PSR_AA32_DIT_BIT.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Some code cares about the SPSR_ELx format for exceptions taken from
AArch32 to inspect or manipulate the SPSR_ELx value, which is already in
the SPSR_ELx format, and not in the AArch32 PSR format.
To separate these from cases where we care about the AArch32 PSR format,
migrate these cases to use the PSR_AA32_* definitions rather than
COMPAT_PSR_*.
There should be no functional change as a result of this patch.
Note that arm64 KVM does not support a compat KVM API, and always uses
the SPSR_ELx format, even for AArch32 guests.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Some code cares about the SPSR_ELx format for exceptions taken from
AArch32 to inspect or manipulate the SPSR_ELx value, which is already in
the SPSR_ELx format, and not in the AArch32 PSR format.
To separate these from cases where we care about the AArch32 PSR format,
migrate these cases to use the PSR_AA32_* definitions rather than
COMPAT_PSR_*.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The SPSR_ELx format for exceptions taken from AArch32 is slightly
different to the AArch32 PSR format.
Map between the two in the compat ptrace code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a58fb764 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The SPSR_ELx format for exceptions taken from AArch32 differs from the
AArch32 PSR format. Thus, we must translate between the two when setting
up a compat sigframe, or restoring context from a compat sigframe.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a58fb764 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently valid_user_regs() treats SPSR_ELx.DIT as a RES0 bit, causing
it to be zeroed upon exception return, rather than preserved. Thus, code
relying on DIT will not function as expected, and may expose an
unexpected timing sidechannel.
Let's remove DIT from the set of RES0 bits, such that it is preserved.
At the same time, the related comment is updated to better describe the
situation, and to take into account the most recent documentation of
SPSR_ELx, in ARM DDI 0487C.a.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a58fb764 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The AArch32 CPSR/SPSR format is *almost* identical to the AArch64
SPSR_ELx format for exceptions taken from AArch32, but the two have
diverged with the addition of DIT, and we need to treat the two as
logically distinct.
This patch adds new definitions for the SPSR_ELx format for exceptions
taken from AArch32, with a consistent PSR_AA32_ prefix. The existing
COMPAT_PSR_ definitions will be used for the PSR format as seen from
AArch32.
Definitions of DIT are provided for both, and inline functions are
provided to map between the two formats. Note that for SPSR_ELx, the
(RES0) J bit has been re-allocated as the DIT bit.
Once users of the COMPAT_PSR definitions have been migrated over to the
PSR_AA32 definitions, the (majority of) the former will be removed, so
no efforts is made to avoid duplication until then.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Track mismatches in the cache type register (CTR_EL0), other
than the D/I min line sizes and trap user accesses if there are any.
Fixes: be68a8aaf925 ("arm64: cpufeature: Fix CTR_EL0 field definitions")
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If there is a mismatch in the I/D min line size, we must
always use the system wide safe value both in applications
and in the kernel, while performing cache operations. However,
we have been checking more bits than just the min line sizes,
which triggers false negatives. We may need to trap the user
accesses in such cases, but not necessarily patch the kernel.
This patch fixes the check to do the right thing as advertised.
A new capability will be added to check mismatches in other
fields and ensure we trap the CTR accesses.
Fixes: be68a8aaf925 ("arm64: cpufeature: Fix CTR_EL0 field definitions")
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When running with CONFIG_PREEMPT=n, the spinlock fastpaths fit inside
64 bytes, which typically coincides with the L1 I-cache line size.
Inline the spinlock fastpaths, like we do already for rwlocks.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
It's fair to say that our ticket lock has served us well over time, but
it's time to bite the bullet and start using the generic qspinlock code
so we can make use of explicit MCS queuing and potentially better PV
performance in future.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
We can provide an implementation of smp_cond_load_relaxed using READ_ONCE
and __cmpwait_relaxed.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: mhocko@suse.com
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: cpandya@codeaurora.org
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Joerg Roedel <joro@8bytes.org>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
|
|
The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.
1. ioremap with 4K size, a valid pte page table is set.
2. iounmap it, its pte entry is set to 0.
3. ioremap the same address with 2M size, update its pmd entry with
a new value.
4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.
Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.
To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.
Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.
[toshi.kani@hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: mhocko@suse.com
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
|
|
ioremap() supports pmd mappings on x86-PAE. However, kernel's pmd
tables are not shared among processes on x86-PAE. Therefore, any
update to sync'd pmd entries need re-syncing. Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().
Disable free page handling on x86-PAE. pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: mhocko@suse.com
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: cpandya@codeaurora.org
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-2-toshi.kani@hpe.com
|
|
Currently machine_kexec() doesn't reset to EL2 in the case of a
crashdump kernel. This leaves potentially dodgy state active at EL2, and
means that if the crashdump kernel attempts to online secondary CPUs,
these will be booted as mismatched ELs.
Let's reset to EL2, as we do in all other cases, and simplify things. If
EL2 state is corrupt, things are already sufficiently bad that kdump is
unlikely to work, and it's best-effort regardless.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
I've got this infinite stacktrace when debugging another problem:
[ 908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 908.796176] 1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
[ 908.797692] 2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
[ 908.799189] (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
[ 908.800284] Task dump for CPU 1:
[ 908.800871] kworker/1:1 R running task 0 32 2 0x00000022
[ 908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
[ 908.820285] Call trace:
[ 908.824785] __switch_to+0x68/0x90
[ 908.837661] 0xfffffe00603afd90
[ 908.844119] 0xfffffe00603afd90
[ 908.850091] 0xfffffe00603afd90
[ 908.854285] 0xfffffe00603afd90
[ 908.863538] 0xfffffe00603afd90
[ 908.865523] 0xfffffe00603afd90
The machine just locked up and kept on printing the same line over and
over again. This patch fixes it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Peng Donglin <dolinux.peng@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
|
|
As suggested by Nick Piggin it seems we can drop the -ffunction-sections
compile flag, now that the kernel uses thin archives. Testing with 32-
and 64-bit kernel showed no difference in kernel size.
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
This was introduced more than a decade ago when sg chaining was
added, but we never really caught anything with it. The scatterlist
entry size can be critical, since drivers allocate it, so remove
the magic member. Recently it's been triggering allocation stalls
and failures in NVMe.
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix the test that verifies whether bio_op(bio) represents a discard
or write zeroes operation. Compile-tested only.
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Fixes: 7435e9018f91 ("drbd: zero-out partial unaligned discards on local backend")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some devices have different queue limits depending on the type of IO. A
classic case is SATA NCQ, where some commands can queue, but others
cannot. If we have NCQ commands inflight and encounter a non-queueable
command, the driver returns busy. Currently we attempt to dispatch more
from the scheduler, if we were able to queue some commands. But for the
case where we ended up stopping due to BUSY, we should not attempt to
retrieve more from the scheduler. If we do, we can get into a situation
where we attempt to queue a non-queueable command, get BUSY, then
successfully retrieve more commands from that scheduler and queue those.
This can repeat forever, starving the non-queuable command indefinitely.
Fix this by NOT attempting to pull more commands from the scheduler, if
we get a BUSY return. This should also be more optimal in terms of
letting requests stay in the scheduler for as long as possible, if we
get a BUSY due to the regular out-of-tags condition.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_pgetevents() will not change the signal mask. Mark it const to make
it clear and to reduce the need for casts in user code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[hch: reapply the patch that got incorrectly reverted]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The big aio poll revert broke various network protocols that don't
implement ->poll as a patch in the aio poll serie removed sock_no_poll
and made the common code handle this case.
Reported-by: syzbot+57727883dbad76db2ef0@syzkaller.appspotmail.com
Reported-by: syzbot+cdb0d3176b53d35ad454@syzkaller.appspotmail.com
Reported-by: syzbot+2c7e8f74f8b2571c87e8@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Fixes: a11e1d432b51 ("Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It seems that during the conversion from gpio* to gpiod*, the initial
state of SCL was wrongly switched to LOW. Fix it to be HIGH again.
Fixes: 7bb75029ef34 ("i2c: gpio: Enforce open drain through gpiolib")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
If DMA safe memory was allocated, but the subsequent I2C transfer
fails the memory is leaked. Plug this leak.
Fixes: 8a77821e74d6 ("i2c: smbus: use DMA safe buffers for emulated SMBus transactions")
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
So, if somebody wants to re-implement this in the future, we pinpoint to
a problem case.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
This reverts commit 3e5f06bed72fe72166a6778f630241a893f67799. As per
bugzilla #200045, this caused a regression. I don't really see a way to
fix it without having the hardware. So, revert the patch and I will fix
the issue I was seeing originally in the i2c-gpio driver itself. I
couldn't find new users of this algorithm since, so there should be no
one depending on the new behaviour.
Reported-by: Sergey Larin <cerg2010cerg2010@mail.ru>
Fixes: 3e5f06bed72f ("i2c: algo-bit: init the bus to a known state")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Sergey Larin <cerg2010cerg2010@mail.ru>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
If a user is accessing a file in selinuxfs with a pointer to a userspace
buffer that is backed by e.g. a userfaultfd, the userspace access can
stall indefinitely, which can block fsi->mutex if it is held.
For sel_read_policy(), remove the locking, since this method doesn't seem
to access anything that requires locking.
For sel_read_bool(), move the user access below the locked region.
For sel_write_bool() and sel_commit_bools_write(), move the user access
up above the locked region.
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: removed an unused variable in sel_read_policy()]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX
flag is set on the device's request queue to decide whether or not the
device supports filesystem DAX. Really we should be using
bdev_dax_supported() like filesystems do at mount time. This performs
other tests like checking to make sure the dax_direct_access() path works.
We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if
any of the underlying devices do not support DAX. This makes the handling
of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags
in dm_table_set_restrictions().
Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this
will ensure that filesystems built upon DM devices will only be able to
mount with DAX if all underlying devices also support DAX.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Add an explicit check for QUEUE_FLAG_DAX to __bdev_dax_supported(). This
is needed for DM configurations where the first element in the dm-linear or
dm-stripe target supports DAX, but other elements do not. Without this
check __bdev_dax_supported() will pass for such devices, letting a
filesystem on that device mount with the DAX option.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
QUEUE_FLAG_DAX is an indication that a given block device supports
filesystem DAX and should not be set for PMEM namespaces which are in "raw"
mode. These namespaces lack struct page and are prevented from
participating in filesystem DAX as of commit 569d0365f571 ("dax: require
'struct page' by default for filesystem dax").
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Fixes: 569d0365f571 ("dax: require 'struct page' by default for filesystem dax")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|