<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/drivers/acpi/apei, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/drivers/acpi/apei?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/drivers/acpi/apei?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-10-13T18:40:09Z</updated>
<entry>
<title>ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()</title>
<updated>2022-10-13T18:40:09Z</updated>
<author>
<name>Ashish Kalra</name>
<email>ashish.kalra@amd.com</email>
</author>
<published>2022-10-05T16:32:53Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=43d2748394c3feb86c0c771466f5847e274fc043'/>
<id>urn:sha1:43d2748394c3feb86c0c771466f5847e274fc043</id>
<content type='text'>
Change num_ghes from int to unsigned int, preventing an overflow
and causing subsequent vmalloc() to fail.

The overflow happens in ghes_estatus_pool_init() when calculating
len during execution of the statement below as both multiplication
operands here are signed int:

len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);

The following call trace is observed because of this bug:

[    9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1
[    9.317131] Call Trace:
[    9.317134]  &lt;TASK&gt;
[    9.317137]  dump_stack_lvl+0x49/0x5f
[    9.317145]  dump_stack+0x10/0x12
[    9.317146]  warn_alloc.cold+0x7b/0xdf
[    9.317150]  ? __device_attach+0x16a/0x1b0
[    9.317155]  __vmalloc_node_range+0x702/0x740
[    9.317160]  ? device_add+0x17f/0x920
[    9.317164]  ? dev_set_name+0x53/0x70
[    9.317166]  ? platform_device_add+0xf9/0x240
[    9.317168]  __vmalloc_node+0x49/0x50
[    9.317170]  ? ghes_estatus_pool_init+0x43/0xa0
[    9.317176]  vmalloc+0x21/0x30
[    9.317177]  ghes_estatus_pool_init+0x43/0xa0
[    9.317179]  acpi_hest_init+0x129/0x19c
[    9.317185]  acpi_init+0x434/0x4a4
[    9.317188]  ? acpi_sleep_proc_init+0x2a/0x2a
[    9.317190]  do_one_initcall+0x48/0x200
[    9.317195]  kernel_init_freeable+0x221/0x284
[    9.317200]  ? rest_init+0xe0/0xe0
[    9.317204]  kernel_init+0x1a/0x130
[    9.317205]  ret_from_fork+0x22/0x30
[    9.317208]  &lt;/TASK&gt;

Signed-off-by: Ashish Kalra &lt;ashish.kalra@amd.com&gt;
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: do not add task_work to kernel thread to avoid memory leak</title>
<updated>2022-10-04T14:04:41Z</updated>
<author>
<name>Shuai Xue</name>
<email>xueshuai@linux.alibaba.com</email>
</author>
<published>2022-09-24T07:49:53Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=415fed694fe11395df56e05022d6e7cee1d39dd3'/>
<id>urn:sha1:415fed694fe11395df56e05022d6e7cee1d39dd3</id>
<content type='text'>
If an error is detected as a result of user-space process accessing a
corrupt memory location, the CPU may take an abort. Then the platform
firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
NOTIFY_SOFTWARE_DELEGATED, etc.

For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the
memory_failure() queue for synchronous errors") keep track of whether
memory_failure() work was queued, and make task_work pending to flush out
the queue so that the work is processed before return to user-space.

The code use init_mm to check whether the error occurs in user space:

    if (current-&gt;mm != &amp;init_mm)

The condition is always true, becase _nobody_ ever has "init_mm" as a real
VM any more.

In addition to abort, errors can also be signaled as asynchronous
exceptions, such as interrupt and SError. In such case, the interrupted
current process could be any kind of thread. When a kernel thread is
interrupted, the work ghes_kick_task_work deferred to task_work will never
be processed because entry_handler returns to call ret_to_kernel() instead
of ret_to_user(). Consequently, the estatus_node alloced from
ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed.
After around 200 allocations in our platform, the ghes_estatus_pool will
run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a
result, the event failed to be processed.

    sdei: event 805 on CPU 113 failed with error: -2

Finally, a lot of unhandled events may cause platform firmware to exceed
some threshold and reboot.

The condition should generally just do

    if (current-&gt;mm)

as described in active_mm.rst documentation.

Then if an asynchronous error is detected when a kernel thread is running,
(e.g. when detected by a background scrubber), do not add task_work to it
as the original patch intends to do.

Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
Signed-off-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: Remove unneeded result variables</title>
<updated>2022-09-24T16:50:42Z</updated>
<author>
<name>ye xingchen</name>
<email>ye.xingchen@zte.com.cn</email>
</author>
<published>2022-09-21T09:28:34Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=382c5fec89f3b0ce870a17f028c547a9f95b7834'/>
<id>urn:sha1:382c5fec89f3b0ce870a17f028c547a9f95b7834</id>
<content type='text'>
Return the erst_get_record_id_begin() and apei_exec_write_register()
return values directly instead of storing them in redundant local
variables.

Reported-by: Zeal Robot &lt;zealci@zte.com.cn&gt;
Signed-off-by: ye xingchen &lt;ye.xingchen@zte.com.cn&gt;
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: Add BERT error log footer</title>
<updated>2022-09-03T18:25:39Z</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmtrmonakhov@yandex-team.ru</email>
</author>
<published>2022-08-24T13:01:13Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=37096428962d125591144876db6ae037096e39ed'/>
<id>urn:sha1:37096428962d125591144876db6ae037096e39ed</id>
<content type='text'>
Print total number of records found during BERT log parsing.
This also simplify dmesg parser implementation for BERT events.

Signed-off-by: Dmitry Monakhov &lt;dmtrmonakhov@yandex-team.ru&gt;
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP</title>
<updated>2022-06-29T18:24:19Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2022-06-24T23:05:26Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=b13a3e5fd40b7d1b394c5ecbb5eb301a4c38e7b2'/>
<id>urn:sha1:b13a3e5fd40b7d1b394c5ecbb5eb301a4c38e7b2</id>
<content type='text'>
When a platform marks a memory range as "special purpose" it is not
onlined as System RAM by default. However, it is still suitable for
error injection. Add IORES_DESC_SOFT_RESERVED to einj_error_inject() as
a permissible memory type in the sanity checking of the arguments to
_EINJ.

Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration")
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Reported-by: Omar Avelar &lt;omar.avelar@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: Better fix to avoid spamming the console with old error logs</title>
<updated>2022-06-29T17:55:11Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2022-06-22T17:09:06Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=c3481b6b75b4797657838f44028fd28226ab48e0'/>
<id>urn:sha1:c3481b6b75b4797657838f44028fd28226ab48e0</id>
<content type='text'>
The fix in commit 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT
table data") does not work as intended on systems where the BIOS has a
fixed size block of memory for the BERT table, relying on s/w to quit
when it finds a record with estatus-&gt;block_status == 0. On these systems
all errors are suppressed because the check:

	if (region_len &lt; ACPI_BERT_PRINT_MAX_LEN)

always fails.

New scheme skips individual CPER records that are too large, and also
limits the total number of records that will be printed to 5.

Fixes: 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT table data")
Cc: All applicable &lt;stable@vger.kernel.org&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: Fix double word in a comment</title>
<updated>2022-06-09T18:46:47Z</updated>
<author>
<name>Xiang wangx</name>
<email>wangxiang@cdjrlc.com</email>
</author>
<published>2022-06-04T04:12:53Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=55b350529e799f8a1a1636dea7df1749cb78ce39'/>
<id>urn:sha1:55b350529e799f8a1a1636dea7df1749cb78ce39</id>
<content type='text'>
Delete the redundant word 'the'.

Signed-off-by: Xiang wangx &lt;wangxiang@cdjrlc.com&gt;
[ rjw: New subject ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI, APEI, EINJ: Refuse to inject into the zero page</title>
<updated>2022-04-22T14:52:27Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2022-04-19T21:19:21Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=ab59c89396c007c360b1a4d762732d1621ff5456'/>
<id>urn:sha1:ab59c89396c007c360b1a4d762732d1621ff5456</id>
<content type='text'>
Some validation tests dynamically inject errors into memory used by
applications to check that the system can recover from a variety of
poison consumption sceenarios.

But sometimes the virtual address picked by these tests is mapped to
the zero page.

This causes additional unexpected machine checks as other processes that
map the zero page also consume the poison.

Disallow injection to the zero page.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: Fix missing ERST record id</title>
<updated>2022-04-13T18:29:24Z</updated>
<author>
<name>Liu Xinpeng</name>
<email>liuxp11@chinatelecom.cn</email>
</author>
<published>2022-04-12T03:05:19Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=a090931524d03a4975c84b89a32210420d852313'/>
<id>urn:sha1:a090931524d03a4975c84b89a32210420d852313</id>
<content type='text'>
Read a record is cleared by others, but the deleted record cache entry is
still created by erst_get_record_id_next. When next enumerate the records,
get the cached deleted record, then erst_read() return -ENOENT and try to
get next record, loop back to first ID will return 0 in function
__erst_record_id_cache_add_one and then set record_id as
APEI_ERST_INVALID_RECORD_ID, finished this time read operation.
It will result in read the records just in the cache hereafter.

This patch cleared the deleted record cache, fix the issue that
"./erst-inject -p" shows record counts not equal to "./erst-inject -n".

A reproducer of the problem(retry many times):

[root@localhost erst-inject]# ./erst-inject -c 0xaaaaa00011
[root@localhost erst-inject]# ./erst-inject -p
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00012
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00013
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00014
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000006
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000007
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000008
[root@localhost erst-inject]# ./erst-inject -p
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00012
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00013
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00014
[root@localhost erst-inject]# ./erst-inject -n
total error record count: 6

Signed-off-by: Liu Xinpeng &lt;liuxp11@chinatelecom.cn&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI, APEI: Use the correct variable for sizeof()</title>
<updated>2022-03-22T17:52:21Z</updated>
<author>
<name>Jakob Koschel</name>
<email>jakobkoschel@gmail.com</email>
</author>
<published>2022-03-19T21:54:47Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=fa3416509605621b8db263b0ee9aaf85285aee4b'/>
<id>urn:sha1:fa3416509605621b8db263b0ee9aaf85285aee4b</id>
<content type='text'>
While the original code is valid, it is not the obvious choice for the
sizeof() call and in preparation to limit the scope of the list iterator
variable the sizeof should be changed to the size of the variable
being allocated.

Signed-off-by: Jakob Koschel &lt;jakobkoschel@gmail.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
</feed>
