Age | Commit message (Collapse) | Author | Files | Lines |
|
Conflicts:
tools/arch/x86/include/asm/cpufeatures.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-7-mingo@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-4-mingo@kernel.org
|
|
Some operations require checking, or ignoring, specific bits in an address
value. For example, this can be comparing address values to identify unique
structures.
Currently, the full address value is compared when filtering for duplicates.
This results in over counting and creation of extra records. This gives the
impression that more unique events occurred than did in reality.
Mask the address for physical rows on MI300.
[ bp: Simplify. ]
Fixes: 6f15e617cc99 ("RAS: Introduce a FRU memory poison manager")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
|
|
SMN access was bolted into amd_nb mostly as convenience. This has
limitations though that require incurring tech debt to keep it working.
Move SMN access to the newly introduced AMD Node driver.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> # pdx86
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> # PMF, PMC
Link: https://lore.kernel.org/r/20241206161210.163701-11-yazen.ghannam@amd.com
|
|
AMD Zen-based systems report memory error addresses through machine
check banks representing Unified Memory Controllers (UMCs) in the form
of UMC relative "normalized" addresses. A normalized address must be
converted to a system physical address to be usable by the OS.
Future AMD platforms will provide a UEFI PRM module that implements a
number of address translation PRM handlers. This will provide an
interface for the OS to call platform specific code without requiring
the use of SMM or other heavy firmware operations.
Add support for the normalized to system physical address translation
PRM handler in the AMD Address Translation Library and prefer it over
native code if available. The GUID and parameter buffer structure are
specific to the normalized to system physical address handler provided
by the address translation PRM module included in future AMD systems.
The address translation PRM module is documented in chapter 22 of the
publicly available "AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
ACPI v6.5 Porting Guide".
[ bp: Massage commit message. ]
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240730151731.15363-3-john.allen@amd.com
|
|
Pull EDAC updates from Borislav Petkov:
- The AMD memory controllers data fabric version 4.5 supports
non-power-of-2 denormalization in the sense that certain bits of the
system physical address cannot be reconstructed from the normalized
address reported by the RAS hardware. Add support for handling such
addresses
- Switch the EDAC drivers to the new Intel CPU model defines
- The usual fixes and cleanups all over the place
* tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC: Add missing MODULE_DESCRIPTION() macros
EDAC/dmc520: Use devm_platform_ioremap_resource()
EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support
RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA
RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization
RAS/AMD/ATL: Validate address map when information is gathered
RAS/AMD/ATL: Expand helpers for adding and removing base and hole
RAS/AMD/ATL: Read DRAM hole base early
RAS/AMD/ATL: Add amd_atl pr_fmt() prefix
RAS/AMD/ATL: Add a missing module description
EDAC, i10nm: make skx_common.o a separate module
EDAC/skx: Switch to new Intel CPU model defines
EDAC/sb_edac: Switch to new Intel CPU model defines
EDAC, pnd2: Switch to new Intel CPU model defines
EDAC/i10nm: Switch to new Intel CPU model defines
EDAC/ghes: Add missing newline to pr_info() statement
RAS/AMD/ATL: Add missing newline to pr_info() statement
EDAC/thunderx: Remove unused struct error_syndrome
|
|
The currently used normalized address format is not applicable to all
MI300 systems. This leads to incorrect results during address
translation.
Drop the fixed layout and construct the normalized address from system
settings.
Fixes: 87a612375307 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-2-2f11547a178c@amd.com
|
|
Unlike with previous Data Fabric versions, with Data Fabric 4.5
non-power-of-2 denormalization, there are bits of the system physical
address that can't be fully reconstructed from the normalized address.
To determine the proper combination of missing system physical address
bits, iterate through each possible combination of these bits, normalize
the resulting system physical address, and compare to the original
address that is being translated. If the addresses match, then the
correct permutation of bits has been found.
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20240606203313.51197-6-john.allen@amd.com
|
|
The ret_addr field in struct addr_ctx contains the intermediate value of
the returned address as it passes through multiple steps in the
translation process. Currently, adding the DRAM base and legacy hole
is only done once, so it operates directly on the intermediate value.
However, for DF 4.5 non-power-of-2 denormalization, adding and removing
the DRAM base and legacy hole needs to be done for multiple temporary
address values. During this process, the intermediate value should not be
lost so the ret_addr value can't be reused.
Update the existing 'add' helper to operate on an arbitrary address
and introduce a new 'remove' helper to do the inverse operations.
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20240606203313.51197-4-john.allen@amd.com
|
|
Read DRAM hole base when constructing the address map as the value will
not change during run time.
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20240606203313.51197-3-john.allen@amd.com
|
|
Prefix all AMD ATL pr_* statements with "amd_atl:".
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240606203313.51197-2-john.allen@amd.com
|
|
Zen-based AMD systems report DRAM ECC errors through Unified Memory
Controller (UMC) MCA banks. The value provided in MCA_ADDR is
a "normalized" address which represents the UMC's view of its managed
memory. The normalized address must be translated to a system physical
address for software to take action.
MI300 systems, uniquely, do not provide a normalized address in MCA_ADDR
for DRAM ECC errors. Rather, the "DRAM" address is reported. This value
includes identifiers for the bank, row, column, pseudochannel and stack
of the memory location.
The DRAM address must be converted to a normalized address in order to
be further translated to a system physical address.
Add helper functions to do the DRAM to normalized translation for MI300
systems. The method is based on the fixed hardware layout of the on-chip
memory.
[ bp: Massage commit message, decapitalize some, rename function. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Muralidhara M K <muralidhara.mk@amd.com>
Link: https://lore.kernel.org/r/20240131165732.88297-1-yazen.ghannam@amd.com
|
|
AMD MI300 systems include on-die HBM3 memory and a unique topology. And
they fall under Data Fabric version 4.5 in overall design.
Generally, topology information (IDs, etc.) is gathered from Data Fabric
registers. However, the unique topology for MI300 means that some
topology information is fixed in hardware and follows arbitrary
mappings. Furthermore, not all hardware instances are software-visible,
so register accesses must be adjusted.
Recognize and add helper functions for the new MI300 interleave modes.
Add lookup tables for fixed values where appropriate. Adjust how Die and
Node IDs are found and used.
Also, fix some register bitmasks that were mislabeled.
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240128155950.1434067-1-yazen.ghannam@amd.com
|
|
AMD Zen-based systems report memory errors through Machine Check banks
representing Unified Memory Controllers (UMCs). The address value
reported for DRAM ECC errors is a "normalized address" that is relative
to the UMC. This normalized address must be converted to a system
physical address to be usable by the OS.
Support for this address translation was introduced to the MCA subsystem
with Zen1 systems. The code was later moved to the AMD64 EDAC module,
since this was the only user of the code at the time.
However, there are uses for this translation outside of EDAC. The system
physical address can be used in MCA for preemptive page offlining as done
in some MCA notifier functions. Also, this translation is needed as the
basis of similar functionality needed for some CXL configurations on AMD
systems.
Introduce a common address translation library that can be used for
multiple subsystems including MCA, EDAC, and CXL.
Include support for UMC normalized to system physical address
translation for current CPU systems.
The Data Fabric Indirect register access offsets and one of the register
fields were changed. Default to the current offsets and register field
definition. And fallback to the older values if running on a "legacy"
system.
Provide built-in code to facilitate the loading and unloading of the
library module without affecting other modules or built-in code.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240123041401.79812-2-yazen.ghannam@amd.com
|