<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/arch/x86/kernel/cpu/mce, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/arch/x86/kernel/cpu/mce?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/arch/x86/kernel/cpu/mce?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-08-29T07:33:42Z</updated>
<entry>
<title>x86/mce: Retrieve poison range from hardware</title>
<updated>2022-08-29T07:33:42Z</updated>
<author>
<name>Jane Chu</name>
<email>jane.chu@oracle.com</email>
</author>
<published>2022-08-26T23:38:51Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=f9781bb18ed828e7b83b7bac4a4ad7cd497ee7d7'/>
<id>urn:sha1:f9781bb18ed828e7b83b7bac4a4ad7cd497ee7d7</id>
<content type='text'>
When memory poison consumption machine checks fire, MCE notifier
handlers like nfit_handle_mce() record the impacted physical address
range which is reported by the hardware in the MCi_MISC MSR. The error
information includes data about blast radius, i.e. how many cachelines
did the hardware determine are impacted. A recent change

  7917f9cdb503 ("acpi/nfit: rely on mce-&gt;misc to determine poison granularity")

updated nfit_handle_mce() to stop hard coding the blast radius value of
1 cacheline, and instead rely on the blast radius reported in 'struct
mce' which can be up to 4K (64 cachelines).

It turns out that apei_mce_report_mem_error() had a similar problem in
that it hard coded a blast radius of 4K rather than reading the blast
radius from the error information. Fix apei_mce_report_mem_error() to
convey the proper poison granularity.

Signed-off-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com
Link: https://lore.kernel.org/r/20220826233851.1319100-1-jane.chu@oracle.com
</content>
</entry>
<entry>
<title>x86/mce: Check whether writes to MCA_STATUS are getting ignored</title>
<updated>2022-06-28T10:08:10Z</updated>
<author>
<name>Smita Koralahalli</name>
<email>Smita.KoralahalliChannabasappa@amd.com</email>
</author>
<published>2022-06-27T20:56:46Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=891e465a1bd8798d5f97c3afb99393f123817fef'/>
<id>urn:sha1:891e465a1bd8798d5f97c3afb99393f123817fef</id>
<content type='text'>
The platform can sometimes - depending on its settings - cause writes
to MCA_STATUS MSRs to get ignored, regardless of HWCR[McStatusWrEn]'s
value.

For further info see

  PPR for AMD Family 19h, Model 01h, Revision B1 Processors, doc ID 55898

at https://bugzilla.kernel.org/show_bug.cgi?id=206537.

Therefore, probe for ignored writes to MCA_STATUS to determine if hardware
error injection is at all possible.

  [ bp: Heavily massage commit message and patch. ]

Signed-off-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20220214233640.70510-2-Smita.KoralahalliChannabasappa@amd.com
</content>
</entry>
<entry>
<title>Merge tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2022-05-27T22:49:30Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-27T22:49:30Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=35cdd8656eac470b9abc9de8d4bd268fbc0fb34b'/>
<id>urn:sha1:35cdd8656eac470b9abc9de8d4bd268fbc0fb34b</id>
<content type='text'>
Pull libnvdimm and DAX updates from Dan Williams:
 "New support for clearing memory errors when a file is in DAX mode,
  alongside with some other fixes and cleanups.

  Previously it was only possible to clear these errors using a truncate
  or hole-punch operation to trigger the filesystem to reallocate the
  block, now, any page aligned write can opportunistically clear errors
  as well.

  This change spans x86/mm, nvdimm, and fs/dax, and has received the
  appropriate sign-offs. Thanks to Jane for her work on this.

  Summary:

   - Add support for clearing memory error via pwrite(2) on DAX

   - Fix 'security overwrite' support in the presence of media errors

   - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)"

* tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  pmem: implement pmem_recovery_write()
  pmem: refactor pmem_clear_poison()
  dax: add .recovery_write dax_operation
  dax: introduce DAX_RECOVERY_WRITE dax access mode
  mce: fix set_mce_nospec to always unmap the whole page
  x86/mce: relocate set{clear}_mce_nospec() functions
  acpi/nfit: rely on mce-&gt;misc to determine poison granularity
  testing: nvdimm: asm/mce.h is not needed in nfit.c
  testing: nvdimm: iomap: make __nfit_test_ioremap a macro
  nvdimm: Allow overwrite in the presence of disabled dimms
  tools/testing/nvdimm: remove unneeded flush_workqueue
</content>
</entry>
<entry>
<title>Merge tag 'acpi-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2022-05-24T22:46:55Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-24T22:46:55Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=1961b06c9126e5b2b949fab806c4e4304d1eae8b'/>
<id>urn:sha1:1961b06c9126e5b2b949fab806c4e4304d1eae8b</id>
<content type='text'>
Pull ACPI updates from Rafael Wysocki:
 "These update the ACPICA kernel code to upstream revision 20220331,
  improve handling of PCI devices that are in D3cold during system
  initialization, add support for a few features, fix bugs and clean up
  code.

  Specifics:

   - Update ACPICA code in the kernel to upstream revision 20220331
     including the following changes:
       - Add support for the Windows 11 _OSI string (Mario Limonciello)
       - Add the CFMWS subtable to the CEDT table (Lawrence Hileman).
       - iASL: NHLT: Treat Terminator as specific_config (Piotr
         Maziarz).
       - iASL: NHLT: Fix parsing undocumented bytes at the end of
         Endpoint Descriptor (Piotr Maziarz).
       - iASL: NHLT: Rename linux specific strucures to device_info
         (Piotr Maziarz).
       - Add new ACPI 6.4 semantics to Load() and LoadTable() (Bob
         Moore).
       - Clean up double word in comment (Tom Rix).
       - Update copyright notices to the year 2022 (Bob Moore).
       - Remove some tabs and // comments - automated cleanup (Bob
         Moore).
       - Replace zero-length array with flexible-array member (Gustavo
         A. R. Silva).
       - Interpreter: Add units to time variable names (Paul Menzel).
       - Add support for ARM Performance Monitoring Unit Table (Besar
         Wicaksono).
       - Inform users about ACPI spec violation related to sleep length
         (Paul Menzel).
       - iASL/MADT: Add OEM-defined subtable (Bob Moore).
       - Interpreter: Fix some typo mistakes (Selvarasu Ganesan).
       - Updates for revision E.d of IORT (Shameer Kolothum).
       - Use ACPI_FORMAT_UINT64 for 64-bit output (Bob Moore).

   - Improve debug messages in the ACPI device PM code (Rafael Wysocki).

   - Block ASUS B1400CEAE from suspend to idle by default (Mario
     Limonciello).

   - Improve handling of PCI devices that are in D3cold during system
     initialization (Rafael Wysocki).

   - Fix BERT error region memory mapping (Lorenzo Pieralisi).

   - Add support for NVIDIA 16550-compatible port subtype to the SPCR
     parsing code (Jeff Brasen).

   - Use static for BGRT_SHOW kobj_attribute defines (Tom Rix).

   - Fix missing prototype warning for acpi_agdi_init() (Ilkka
     Koskinen).

   - Fix missing ERST record ID in the APEI code (Liu Xinpeng).

   - Make APEI error injection to refuse to inject into the zero page
     (Tony Luck).

   - Correct description of INT3407 / INT3532 DPTF attributes in sysfs
     (Sumeet Pawnikar).

   - Add support for high frequency impedance notification to the DPTF
     driver (Sumeet Pawnikar).

   - Make mp_config_acpi_gsi() a void function (Li kunyu).

   - Unify Package () representation for properties in the ACPI device
     properties documentation (Andy Shevchenko).

   - Include UUID in _DSM evaluation warning (Michael Niewöhner)"

* tag 'acpi-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (41 commits)
  Revert "ACPICA: executer/exsystem: Warn about sleeps greater than 10 ms"
  ACPI: utils: include UUID in _DSM evaluation warning
  ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default
  x86: ACPI: Make mp_config_acpi_gsi() a void function
  ACPI: DPTF: Add support for high frequency impedance notification
  ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init()
  ACPI: bus: Avoid non-ACPI device objects in walks over children
  ACPI: DPTF: Correct description of INT3407 / INT3532 attributes
  ACPI: BGRT: use static for BGRT_SHOW kobj_attribute defines
  ACPI, APEI, EINJ: Refuse to inject into the zero page
  ACPI: PM: Always print final debug message in acpi_device_set_power()
  ACPI: SPCR: Add support for NVIDIA 16550-compatible port subtype
  ACPI: docs: enumeration: Unify Package () for properties (part 2)
  ACPI: APEI: Fix missing ERST record id
  ACPICA: Update version to 20220331
  ACPICA: exsystem.c: Use ACPI_FORMAT_UINT64 for 64-bit output
  ACPICA: IORT: Updates for revision E.d
  ACPICA: executer/exsystem: Fix some typo mistakes
  ACPICA: iASL/MADT: Add OEM-defined subtable
  ACPICA: executer/exsystem: Warn about sleeps greater than 10 ms
  ...
</content>
</entry>
<entry>
<title>mce: fix set_mce_nospec to always unmap the whole page</title>
<updated>2022-05-16T18:46:44Z</updated>
<author>
<name>Jane Chu</name>
<email>jane.chu@oracle.com</email>
</author>
<published>2022-05-16T18:38:10Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=5898b43af954b83c4a4ee4ab85c4dbafa395822a'/>
<id>urn:sha1:5898b43af954b83c4a4ee4ab85c4dbafa395822a</id>
<content type='text'>
The set_memory_uc() approach doesn't work well in all cases.
As Dan pointed out when "The VMM unmapped the bad page from
guest physical space and passed the machine check to the guest."
"The guest gets virtual #MC on an access to that page. When
the guest tries to do set_memory_uc() and instructs cpa_flush()
to do clean caches that results in taking another fault / exception
perhaps because the VMM unmapped the page from the guest."

Since the driver has special knowledge to handle NP or UC,
mark the poisoned page with NP and let driver handle it when
it comes down to repair.

Please refer to discussions here for more details.
https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/

Now since poisoned page is marked as not-present, in order to
avoid writing to a not-present page and trigger kernel Oops,
also fix pmem_do_write().

Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/r/165272615484.103830.2563950688772226611.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>x86/mce: Add messages for panic errors in AMD's MCE grading</title>
<updated>2022-04-25T10:40:48Z</updated>
<author>
<name>Carlos Bilbao</name>
<email>carlos.bilbao@amd.com</email>
</author>
<published>2022-04-05T18:32:14Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=fa619f5156cf1ee3773edc6d756be262c9ef35de'/>
<id>urn:sha1:fa619f5156cf1ee3773edc6d756be262c9ef35de</id>
<content type='text'>
When a machine error is graded as PANIC by the AMD grading logic, the
MCE handler calls mce_panic(). The notification chain does not come
into effect so the AMD EDAC driver does not decode the errors. In these
cases, the messages displayed to the user are more cryptic and miss
information that might be relevant, like the context in which the error
took place.

Add messages to the grading logic for machine errors so that it is clear
what error it was.

  [ bp: Massage commit message. ]

Signed-off-by: Carlos Bilbao &lt;carlos.bilbao@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lore.kernel.org/r/20220405183212.354606-3-carlos.bilbao@amd.com
</content>
</entry>
<entry>
<title>x86/mce: Simplify AMD severity grading logic</title>
<updated>2022-04-25T10:32:03Z</updated>
<author>
<name>Carlos Bilbao</name>
<email>carlos.bilbao@amd.com</email>
</author>
<published>2022-04-05T18:32:13Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=70c459d915e838b7f536b8e26e0b3a6141bd2645'/>
<id>urn:sha1:70c459d915e838b7f536b8e26e0b3a6141bd2645</id>
<content type='text'>
The MCE handler needs to understand the severity of the machine errors to
act accordingly. Simplify the AMD grading logic following a logic that
closely resembles the descriptions of the public PPR documents. This will
help include more fine-grained grading of errors in the future.

  [ bp: Touchups. ]

Signed-off-by: Carlos Bilbao &lt;carlos.bilbao@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lore.kernel.org/r/20220405183212.354606-2-carlos.bilbao@amd.com
</content>
</entry>
<entry>
<title>ACPI: APEI: Fix missing ERST record id</title>
<updated>2022-04-13T18:29:24Z</updated>
<author>
<name>Liu Xinpeng</name>
<email>liuxp11@chinatelecom.cn</email>
</author>
<published>2022-04-12T03:05:19Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=a090931524d03a4975c84b89a32210420d852313'/>
<id>urn:sha1:a090931524d03a4975c84b89a32210420d852313</id>
<content type='text'>
Read a record is cleared by others, but the deleted record cache entry is
still created by erst_get_record_id_next. When next enumerate the records,
get the cached deleted record, then erst_read() return -ENOENT and try to
get next record, loop back to first ID will return 0 in function
__erst_record_id_cache_add_one and then set record_id as
APEI_ERST_INVALID_RECORD_ID, finished this time read operation.
It will result in read the records just in the cache hereafter.

This patch cleared the deleted record cache, fix the issue that
"./erst-inject -p" shows record counts not equal to "./erst-inject -n".

A reproducer of the problem(retry many times):

[root@localhost erst-inject]# ./erst-inject -c 0xaaaaa00011
[root@localhost erst-inject]# ./erst-inject -p
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00012
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00013
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00014
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000006
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000007
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000008
[root@localhost erst-inject]# ./erst-inject -p
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00012
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00013
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00014
[root@localhost erst-inject]# ./erst-inject -n
total error record count: 6

Signed-off-by: Liu Xinpeng &lt;liuxp11@chinatelecom.cn&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails</title>
<updated>2022-04-05T19:24:37Z</updated>
<author>
<name>Ammar Faizi</name>
<email>ammarfaizi2@gnuweeb.org</email>
</author>
<published>2022-03-29T10:47:05Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=e5f28623ceb103e13fc3d7bd45edf9818b227fd0'/>
<id>urn:sha1:e5f28623ceb103e13fc3d7bd45edf9818b227fd0</id>
<content type='text'>
In mce_threshold_create_device(), if threshold_create_bank() fails, the
previously allocated threshold banks array @bp will be leaked because
the call to mce_threshold_remove_device() will not free it.

This happens because mce_threshold_remove_device() fetches the pointer
through the threshold_banks per-CPU variable but bp is written there
only after the bank creation is successful, and not before, when
threshold_create_bank() fails.

Add a helper which unwinds all the bank creation work previously done
and pass into it the previously allocated threshold banks array for
freeing.

  [ bp: Massage. ]

Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Co-developed-by: Alviro Iskandar Setiawan &lt;alviro.iskandar@gnuweeb.org&gt;
Signed-off-by: Alviro Iskandar Setiawan &lt;alviro.iskandar@gnuweeb.org&gt;
Co-developed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Ammar Faizi &lt;ammarfaizi2@gnuweeb.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lore.kernel.org/r/20220329104705.65256-3-ammarfaizi2@gnuweeb.org
</content>
</entry>
<entry>
<title>x86/mce: Avoid unnecessary padding in struct mce_bank</title>
<updated>2022-04-05T19:23:34Z</updated>
<author>
<name>Smita Koralahalli</name>
<email>Smita.KoralahalliChannabasappa@amd.com</email>
</author>
<published>2022-02-25T19:33:40Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=9f1b19b977ee3cbd3fe9135ff63dbf221eac1d6a'/>
<id>urn:sha1:9f1b19b977ee3cbd3fe9135ff63dbf221eac1d6a</id>
<content type='text'>
Convert struct mce_bank member "init" from bool to a bitfield to get rid
of unnecessary padding.

$ pahole -C mce_bank arch/x86/kernel/cpu/mce/core.o

before:

  /* size: 16, cachelines: 1, members: 2 */
  /* padding: 7 */
  /* last cacheline: 16 bytes */

after:

  /* size: 16, cachelines: 1, members: 3 */
  /* last cacheline: 16 bytes */

No functional changes.

Signed-off-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20220225193342.215780-2-Smita.KoralahalliChannabasappa@amd.com
</content>
</entry>
</feed>
