Age | Commit message (Collapse) | Author | Files | Lines |
|
When kobject_init_and_add() fails during cache creation,
kobj->name can be leaked because SLUB does not call kobject_put(),
which should be invoked per the kobject API documentation.
This has a bit of historical context, though; SLUB does not call
kobject_put() to avoid double-free for struct kmem_cache because
1) simply calling it would free all resources related to the cache, and
2) struct kmem_cache descriptor is always freed by cache_cache()'s
error handling path, causing struct kmem_cache to be freed twice.
This issue can be reproduced by creating new slab caches while applying
failslab for kernfs_node_cache. This makes kobject_add_varg() succeed,
but causes kobject_add_internal() to fail in kobject_init_and_add()
during cache creation.
Historically, this issue has attracted developers' attention several times.
Each time a fix addressed either the leak or the double-free,
it caused the other issue. Let's summarize a bit of history here:
The leak has existed since the early days of SLUB.
Commit 54b6a731025f ("slub: fix leak of 'name' in sysfs_slab_add")
introduced a double-free bug while fixing the leak.
Commit 80da026a8e5d ("mm/slub: fix slab double-free in case of duplicate
sysfs filename") re-introduced the leak while fixing the double-free
error.
Commit dde3c6b72a16 ("mm/slub: fix a memory leak in sysfs_slab_add()")
fixed the memory leak, but it was later reverted by commit 757fed1d0898
("Revert "mm/slub: fix a memory leak in sysfs_slab_add()"") to avoid
the double-free error.
This is where we are now: we've chosen a memory leak over a double-free.
To resolve this memory leak, skip creating sysfs files if it fails
and continue with cache creation regardless (as suggested by Christoph).
This resolves the memory leak because both the cache and the kobject
remain alive on kobject_init_and_add() failure.
If SLUB tries to create an alias for a cache without sysfs files,
its symbolic link will not be generated.
Since a slab cache might not have associated sysfs files, call kobject_del()
only if such files exist.
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Boot with slub_debug=UFPZ.
If allocated object failed in alloc_consistency_checks, all objects of
the slab will be marked as used, and then the slab will be removed from
the partial list.
When an object belonging to the slab got freed later, the remove_full()
function is called. Because the slab is neither on the partial list nor
on the full list, it eventually lead to a list corruption (actually a
list poison being detected).
So we need to mark and isolate the slab page with metadata corruption,
do not put it back in circulation.
Because the debug caches avoid all the fastpaths, reusing the frozen bit
to mark slab page with metadata corruption seems to be fine.
[ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
[ 4277.387023] ------------[ cut here ]------------
[ 4277.387880] kernel BUG at lib/list_debug.c:56!
[ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G OE 6.6.1-1 #1
[ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
[ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
[ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
[ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
[ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
[ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
[ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
[ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
[ 4277.404049] FS: 0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
[ 4277.405357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
[ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4277.410000] PKRU: 55555554
[ 4277.410645] Call Trace:
[ 4277.411234] <TASK>
[ 4277.411777] ? die+0x32/0x80
[ 4277.412439] ? do_trap+0xd6/0x100
[ 4277.413150] ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.414158] ? do_error_trap+0x6a/0x90
[ 4277.414948] ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.415915] ? exc_invalid_op+0x4c/0x60
[ 4277.416710] ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.417675] ? asm_exc_invalid_op+0x16/0x20
[ 4277.418482] ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.419466] ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.420410] free_to_partial_list+0x515/0x5e0
[ 4277.421242] ? xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.422298] xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.423316] ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.424383] xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
[ 4277.425490] __xfs_bunmapi+0x50d/0x840 [xfs]
[ 4277.426445] xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
[ 4277.427553] xfs_inactive_truncate+0xa3/0x120 [xfs]
[ 4277.428567] xfs_inactive+0x22d/0x290 [xfs]
[ 4277.429500] xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.430479] process_one_work+0x171/0x340
[ 4277.431227] worker_thread+0x277/0x390
[ 4277.431962] ? __pfx_worker_thread+0x10/0x10
[ 4277.432752] kthread+0xf0/0x120
[ 4277.433382] ? __pfx_kthread+0x10/0x10
[ 4277.434134] ret_from_fork+0x2d/0x50
[ 4277.434837] ? __pfx_kthread+0x10/0x10
[ 4277.435566] ret_from_fork_asm+0x1b/0x30
[ 4277.436280] </TASK>
Fixes: 643b113849d8 ("slub: enable tracking of full slabs")
Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Danilo Krummrich raised issue about krealloc+GFP_ZERO [1], and Vlastimil
suggested to add some test case which can sanity test the kmalloc-redzone
and zeroing by utilizing the kmalloc's 'orig_size' debug feature.
It covers the grow and shrink case of krealloc() re-using current kmalloc
object, and the case of re-allocating a new bigger object.
[1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
For current krealloc(), one problem is its caller doesn't pass the old
request size, say the object is 64 bytes kmalloc one, but caller may
only requested 48 bytes. Then when krealloc() shrinks or grows in the
same object, or allocate a new bigger object, it lacks this 'original
size' information to do accurate data preserving or zeroing (when
__GFP_ZERO is set).
Thus with slub debug redzone and object tracking enabled, parts of the
object after krealloc() might contain redzone data instead of zeroes,
which is violating the __GFP_ZERO guarantees. Good thing is in this
case, kmalloc caches do have this 'orig_size' feature. So solve the
problem by utilize 'org_size' to do accurate data zeroing and preserving.
[Thanks to syzbot and V, Narasimhan for discovering kfence and big
kmalloc related issues in early patch version]
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When 'orig_size' of kmalloc object is enabled by debug option, it
should either contains the actual requested size or the cache's
'object_size'.
But it's not true if that object is a kfence-allocated one, and the
data at 'orig_size' offset of metadata could be zero or other values.
This is not a big issue for current 'orig_size' usage, as init_object()
and check_object() during alloc/free process will be skipped for kfence
addresses. But it could cause trouble for other usage in future.
Use the existing kfence helper kfence_ksize() which can return the
real original request size.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The old SLAB allocator used to support memory policies on a per
allocation bases. In SLUB the memory policies are applied on a
per page frame / folio bases. Doing so avoids having to check memory
policies in critical code paths for kmalloc and friends.
This worked on general well on Intel/AMD/PowerPC because the
interconnect technology is mature and can minimize the latencies
through intelligent caching even if a small object is not
placed optimally.
However, on ARM we have an emergence of new NUMA interconnect
technology based more on embedded devices. Caching of remote content
can currently be ineffective using the standard building blocks / mesh
available on that platform. Such architectures benefit if each slab
object is individually placed according to memory policies
and other restrictions.
This patch adds another kernel parameter
slab_strict_numa
If that is set then a static branch is activated that will cause
the hotpaths of the allocator to evaluate the current memory
allocation policy. Each object will be properly placed by
paying the price of extra processing and SLUB will no longer
defer to the page allocator to apply memory policies at the
folio level.
This patch improves performance of memcached running
on Ampere Altra 2P system (ARM Neoverse N1 processor)
by 3.6% due to accurate placement of small kernel objects.
Tested-by: Huang Shijie <shijie@os.amperecomputing.com>
Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
We have many SLAB_ flags but many are used only internally, by kunit
tests or debugging subsystems cooperating with slab, or are set
according to slab_debug boot parameter.
Create kerneldocs for the commonly used flags that may be passed to
kmem_cache_create(). SLAB_TYPESAFE_BY_RCU already had a detailed
description, so turn it to a kerneldoc. Add some details for
SLAB_ACCOUNT, SLAB_RECLAIM_ACCOUNT and SLAB_HWCACHE_ALIGN. Reference
them from the __kmem_cache_create_args() kerneldoc.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The WARN_ON() check in static function create_cache() is done by its only
parent __kmem_cache_create_args() before calling it.
if (... ||
WARN_ON(... ||
object_size - args->usersize < args->useroffset))
args->usersize = args->useroffset = 0;
...
s = create_cache(cache_name, object_size, args, flags);
Therefore, the WARN_ON() check in create_cache() can be safely removed.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
This is a preparation for the following refactoring of krealloc(),
for more efficient function calling as it will call some internal
functions defined in slub.c.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
For a kmalloc object, when both kasan and slub redzone sanity check
are enabled, they could both manipulate its data space like storing
kasan free meta data and setting up kmalloc redzone, and may affect
accuracy of that object's 'orig_size'.
As an accurate 'orig_size' will be needed by some function like
krealloc() soon, save kasan's free meta data in slub's metadata area
instead of inside object when 'orig_size' is enabled.
This will make it easier to maintain/understand the code. Size wise,
when these two options are both enabled, the slub meta data space is
already huge, and this just slightly increase the overall size.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
This patch addresses an issue introduced by commit 1a83a716ec233 ("mm:
krealloc: consider spare memory for __GFP_ZERO") which causes MTE
(Memory Tagging Extension) to falsely report a slab-out-of-bounds error.
The problem occurs when zeroing out spare memory in __do_krealloc. The
original code only considered software-based KASAN and did not account
for MTE. It does not reset the KASAN tag before calling memset, leading
to a mismatch between the pointer tag and the memory tag, resulting
in a false positive.
Example of the error:
==================================================================
swapper/0: BUG: KASAN: slab-out-of-bounds in __memset+0x84/0x188
swapper/0: Write at addr f4ffff8005f0fdf0 by task swapper/0/1
swapper/0: Pointer tag: [f4], memory tag: [fe]
swapper/0:
swapper/0: CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.
swapper/0: Hardware name: MT6991(ENG) (DT)
swapper/0: Call trace:
swapper/0: dump_backtrace+0xfc/0x17c
swapper/0: show_stack+0x18/0x28
swapper/0: dump_stack_lvl+0x40/0xa0
swapper/0: print_report+0x1b8/0x71c
swapper/0: kasan_report+0xec/0x14c
swapper/0: __do_kernel_fault+0x60/0x29c
swapper/0: do_bad_area+0x30/0xdc
swapper/0: do_tag_check_fault+0x20/0x34
swapper/0: do_mem_abort+0x58/0x104
swapper/0: el1_abort+0x3c/0x5c
swapper/0: el1h_64_sync_handler+0x80/0xcc
swapper/0: el1h_64_sync+0x68/0x6c
swapper/0: __memset+0x84/0x188
swapper/0: btf_populate_kfunc_set+0x280/0x3d8
swapper/0: __register_btf_kfunc_id_set+0x43c/0x468
swapper/0: register_btf_kfunc_id_set+0x48/0x60
swapper/0: register_nf_nat_bpf+0x1c/0x40
swapper/0: nf_nat_init+0xc0/0x128
swapper/0: do_one_initcall+0x184/0x464
swapper/0: do_initcall_level+0xdc/0x1b0
swapper/0: do_initcalls+0x70/0xc0
swapper/0: do_basic_setup+0x1c/0x28
swapper/0: kernel_init_freeable+0x144/0x1b8
swapper/0: kernel_init+0x20/0x1a8
swapper/0: ret_from_fork+0x10/0x20
==================================================================
Fixes: 1a83a716ec233 ("mm: krealloc: consider spare memory for __GFP_ZERO")
Signed-off-by: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
'modprobe slub_kunit' will have a warning as shown below. The root cause
is that __kmalloc_cache_noprof was directly used, which resulted in no
alloc_tag being allocated. This caused current->alloc_tag to be null,
leading to a warning in alloc_tag_add_check.
Let's add an alloc_hook layer to __kmalloc_cache_noprof specifically
within lib/slub_kunit.c, which is the only user of this internal slub
function outside kmalloc implementation itself.
[58162.947016] WARNING: CPU: 2 PID: 6210 at
./include/linux/alloc_tag.h:125 alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.957721] Call trace:
[58162.957919] alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.958286] __kmalloc_cache_noprof+0x14c/0x344
[58162.958615] test_kmalloc_redzone_access+0x50/0x10c [slub_kunit]
[58162.959045] kunit_try_run_case+0x74/0x184 [kunit]
[58162.959401] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit]
[58162.959841] kthread+0x10c/0x118
[58162.960093] ret_from_fork+0x10/0x20
[58162.960363] ---[ end trace 0000000000000000 ]---
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Fixes: a0a44d9175b3 ("mm, slab: don't wrap internal functions with alloc_hooks()")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
|
|
The blank line causes execve() to fail:
# strace ./postinst
execve("./postinst", ...) = -1 ENOEXEC (Exec format error)
strace: exec: Exec format error
+++ exited with 1 +++
However running the scripts via shell does work (at least with bash)
because the shell attempts to execute the file as a shell script when
execve() fails.
Fixes: b611daae5efc ("kbuild: deb-pkg: split image and debug objects staging out into functions")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
If we follow "make help" to "make dt_binding_schema", we will see
below error:
$ make dt_binding_schema
make[1]: *** No rule to make target 'dt_binding_schema'. Stop.
make: *** [Makefile:224: __sub-make] Error 2
It should be a typo. So this will fix it.
Fixes: 604a57ba9781 ("dt-bindings: kbuild: Add separate target/dependency for processed-schema.json")
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Import list_is_first, list_is_last, list_replace, and list_replace_init.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
x86_android_tablet_remove() frees the pdevs[] array, so it should not
be used after calling x86_android_tablet_remove().
When platform_device_register() fails, store the pdevs[x] PTR_ERR() value
into the local ret variable before calling x86_android_tablet_remove()
to avoid using pdevs[] after it has been freed.
Fixes: 5eba0141206e ("platform/x86: x86-android-tablets: Add support for instantiating platform-devs")
Fixes: e2200d3f26da ("platform/x86: x86-android-tablets: Add gpio_keys support to x86_android_tablet_init()")
Cc: stable@vger.kernel.org
Reported-by: Aleksandr Burakov <a.burakov@rosalinux.ru>
Closes: https://lore.kernel.org/platform-driver-x86/20240917120458.7300-1-a.burakov@rosalinux.ru/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20241005130545.64136-1-hdegoede@redhat.com
|
|
The WMI driver core now passes the WMI event data to legacy notify
handlers, so WMI devices sharing notification IDs are now being
handled properly.
Fixes: e04e2b760ddb ("platform/x86: wmi: Pass event data directly to legacy notify handlers")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241005213825.701887-1-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Fix typo in word 'diagnostics' in documentation.
Signed-off-by: Anaswara T Rajan <anaswaratrajan@gmail.com>
Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241005070056.16326-1-anaswaratrajan@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Alienware supports firmware-attributes and has its own OEM string.
Signed-off-by: Crag Wang <crag_wang@dell.com>
Link: https://lore.kernel.org/r/20241004152826.93992-1-crag_wang@dell.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to tpmi_cpu_ids to support
domaid id mappings.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20241003215554.3013807-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to SST support list by adding
to isst_cpu_ids.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20241003215554.3013807-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
There have been multiple reports that the ACPI PM Timer disabling is
causing Sky and Kaby Lake systems to hang on all suspend (s2idle, s3,
hibernate) methods.
Remove the acpi_pm_tmr_ctl_offset and acpi_pm_tmr_disable_bit settings from
spt_reg_map to disable the ACPI PM Timer disabling on Sky and Kaby Lake to
fix the hang on suspend.
Fixes: e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/linux-pm/18784f62-91ff-4d88-9621-6c88eb0af2b5@molgen.mpg.de/
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219346
Cc: Marek Maslanka <mmaslanka@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360/0596KF
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20241003202614.17181-2-hdegoede@redhat.com
|
|
If the battery hook encounters a unsupported battery, it will
return an error. This in turn will cause the battery driver to
automatically unregister the battery hook.
On machines with multiple batteries however, this will prevent
the battery hook from handling the primary battery, since it will
always get unregistered upon encountering one of the unsupported
batteries.
Fix this by simply ignoring unsupported batteries.
Reviewed-by: Pali Rohár <pali@kernel.org>
Fixes: ab58016c68cc ("platform/x86:dell-laptop: Add knobs to change battery charge settings")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241001212835.341788-4-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Ashok is no longer with Intel and his e-mail address will start bouncing
soon. Update his email address to the new one he provided to ensure
correct contact details in the MAINTAINERS file.
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Link: https://lore.kernel.org/r/20241001170808.203970-1-jithu.joseph@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
It provides no functionality on its own and it is unnecessary unless one
of the vendor-specific module is compiled. In particular, /dev/kvm is
not created until one of kvm-intel.ko or kvm-amd.ko is loaded.
Use CONFIG_KVM to decide if it is built-in or a module, but use the
vendor-specific modules for the actual decision on whether to build it.
This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
are both disabled. The cpu_emergency_register_virt_callback() function
is called from kvm.ko, but it is only defined if at least one of
CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Attaching SST PCI device to VM causes "BUG: KASAN: slab-out-of-bounds".
kasan report:
[ 19.411889] ==================================================================
[ 19.413702] BUG: KASAN: slab-out-of-bounds in _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[ 19.415634] Read of size 8 at addr ffff888829e65200 by task cpuhp/16/113
[ 19.417368]
[ 19.418627] CPU: 16 PID: 113 Comm: cpuhp/16 Tainted: G E 6.9.0 #10
[ 19.420435] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.20192059.B64.2207280713 07/28/2022
[ 19.422687] Call Trace:
[ 19.424091] <TASK>
[ 19.425448] dump_stack_lvl+0x5d/0x80
[ 19.426963] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[ 19.428694] print_report+0x19d/0x52e
[ 19.430206] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 19.431837] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[ 19.433539] kasan_report+0xf0/0x170
[ 19.435019] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[ 19.436709] _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common]
[ 19.438379] ? __pfx_sched_clock_cpu+0x10/0x10
[ 19.439910] isst_if_cpu_online+0x406/0x58f [isst_if_common]
[ 19.441573] ? __pfx_isst_if_cpu_online+0x10/0x10 [isst_if_common]
[ 19.443263] ? ttwu_queue_wakelist+0x2c1/0x360
[ 19.444797] cpuhp_invoke_callback+0x221/0xec0
[ 19.446337] cpuhp_thread_fun+0x21b/0x610
[ 19.447814] ? __pfx_cpuhp_thread_fun+0x10/0x10
[ 19.449354] smpboot_thread_fn+0x2e7/0x6e0
[ 19.450859] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 19.452405] kthread+0x29c/0x350
[ 19.453817] ? __pfx_kthread+0x10/0x10
[ 19.455253] ret_from_fork+0x31/0x70
[ 19.456685] ? __pfx_kthread+0x10/0x10
[ 19.458114] ret_from_fork_asm+0x1a/0x30
[ 19.459573] </TASK>
[ 19.460853]
[ 19.462055] Allocated by task 1198:
[ 19.463410] kasan_save_stack+0x30/0x50
[ 19.464788] kasan_save_track+0x14/0x30
[ 19.466139] __kasan_kmalloc+0xaa/0xb0
[ 19.467465] __kmalloc+0x1cd/0x470
[ 19.468748] isst_if_cdev_register+0x1da/0x350 [isst_if_common]
[ 19.470233] isst_if_mbox_init+0x108/0xff0 [isst_if_mbox_msr]
[ 19.471670] do_one_initcall+0xa4/0x380
[ 19.472903] do_init_module+0x238/0x760
[ 19.474105] load_module+0x5239/0x6f00
[ 19.475285] init_module_from_file+0xd1/0x130
[ 19.476506] idempotent_init_module+0x23b/0x650
[ 19.477725] __x64_sys_finit_module+0xbe/0x130
[ 19.476506] idempotent_init_module+0x23b/0x650
[ 19.477725] __x64_sys_finit_module+0xbe/0x130
[ 19.478920] do_syscall_64+0x82/0x160
[ 19.480036] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 19.481292]
[ 19.482205] The buggy address belongs to the object at ffff888829e65000
which belongs to the cache kmalloc-512 of size 512
[ 19.484818] The buggy address is located 0 bytes to the right of
allocated 512-byte region [ffff888829e65000, ffff888829e65200)
[ 19.487447]
[ 19.488328] The buggy address belongs to the physical page:
[ 19.489569] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888829e60c00 pfn:0x829e60
[ 19.491140] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 19.492466] anon flags: 0x57ffffc0000840(slab|head|node=1|zone=2|lastcpupid=0x1fffff)
[ 19.493914] page_type: 0xffffffff()
[ 19.494988] raw: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001
[ 19.496451] raw: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000
[ 19.497906] head: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001
[ 19.499379] head: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000
[ 19.500844] head: 0057ffffc0000003 ffffea0020a79801 ffffea0020a79848 00000000ffffffff
[ 19.502316] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000
[ 19.503784] page dumped because: kasan: bad access detected
[ 19.505058]
[ 19.505970] Memory state around the buggy address:
[ 19.507172] ffff888829e65100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 19.508599] ffff888829e65180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 19.510013] >ffff888829e65200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 19.510014] ^
[ 19.510016] ffff888829e65280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 19.510018] ffff888829e65300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 19.515367] ==================================================================
The reason for this error is physical_package_ids assigned by VMware VMM
are not continuous and have gaps. This will cause value returned by
topology_physical_package_id() to be more than topology_max_packages().
Here the allocation uses topology_max_packages(). The call to
topology_max_packages() returns maximum logical package ID not physical
ID. Hence use topology_logical_package_id() instead of
topology_physical_package_id().
Fixes: 9a1aac8a96dc ("platform/x86: ISST: PUNIT device mapping with Sub-NUMA clustering")
Cc: stable@vger.kernel.org
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zach Wade <zachwade.k@gmail.com>
Link: https://lore.kernel.org/r/20240923144508.1764-1-zachwade.k@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Initially it was thought that we just wanted to ignore errors from
logged op replay, but it turns out we do need to catch -EROFS, or we'll
go into an infinite loop.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These shouldn't always be fatal errors - logged op resume, in
particular, and we want it as a parameter there.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It was initially believed that it would be better to be explicit about
the snapshot we're updating when writing inodes in fsck; however, it
turns out that passing around the snapshot separately is more error
prone and we're usually updating the inode in the same snapshow we read
it from.
This is different from normal filesystem paths, where we do the update
in the snapshot of the subvolume we're in.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We want to check for this early so it can be reattached if necessary in
check_unreachable_inodes(); better than letting it be deleted and having
the children reattached, losing their filenames.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
link count works differently in bcachefs - it's only nonzero for files
with multiple hardlinks, which means we can also avoid checking it
except for files that are known to have hardlinks.
That means we need a few different checks instead; in particular, we
don't want fsck to delet a file that has a dirent pointing to it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's legal for regular files to have missing backpointers (due to
hardlinks), and fsck should automatically add them, but for directories
this is an error that should be flagged.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The fragmentation_lru field hasn't been needed since we reworked the LRU
btrees to use the btree write buffer; previously it was used to resolve
collisions, but the revised LRU btree uses the backpointer (the bucket)
as part of the key.
It should have been deleted at the time of the LRU rework; since it
wasn't, that left places for bugs to hide, in check/repair.
This fixes LRU fsck on a filesystem image helpfully provided by a user
who disappeared before I could get his name for the reported-by.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
check_lru_key() wasn't using write buffer updates for deleting bad lru
entries - dating from before the lru btree used the btree write buffer.
And when possibly flushing the btree write buffer (to make sure we're
seeing a real inconsistency), we need to be using the modern
bch2_btree_write_buffer_maybe_flush().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Errors are getting marked as AUTOFIX once they've been (re)-tested and
audited.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Newly generated keys, in the transaction commit path or write path,
should not be AUTOFIX; those indicate bugs that we need to fail fast
for.
Fixes: 5612daafb764 ("bcachefs: Fix fsck warnings from bkey validation")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Ensure a copy of the lost+found inode exists in the snapshot that we're
reattaching, so that we don't trigger warnings in
lookup_inode_for_snapshot() later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes two different bugs:
- Looser locking with the rhashtable means we need to recheck if the
inode is still hashed after prepare_to_wait(), and add a corresponding
wakeup after removing from the hash table.
- da18ecbf0fb6 ("fs: add i_state helpers") changed the bit waitqueues
used for inodes, and bcachefs wasn't updated and thus broke; this
updates bcachefs to the new helper.
Fixes: 112d21fd1a12 ("bcachefs: switch to rhashtable for vfs inodes hash")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Wesley reported an issue:
==================================================================
EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
------------[ cut here ]------------
kernel BUG at fs/ext4/resize.c:324!
CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
RIP: 0010:ext4_resize_fs+0x1212/0x12d0
Call Trace:
__ext4_ioctl+0x4e0/0x1800
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0x99/0xd0
x64_sys_call+0x1206/0x20d0
do_syscall_64+0x72/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================
While reviewing the patch, Honza found that when adjusting resize_bg in
alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than
flexbg_size.
The reproduction of the problem requires the following:
o_group = flexbg_size * 2 * n;
o_size = (o_group + 1) * group_size;
n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
o_size = (n_group + 1) * group_size;
Take n=0,flexbg_size=16 as an example:
last:15
|o---------------|--------------n-|
o_group:0 resize to n_group:30
The corresponding reproducer is:
img=test.img
rm -f $img
truncate -s 600M $img
mkfs.ext4 -F $img -b 1024 -G 16 8M
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 248M
Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE()
to prevent the issue from happening again.
[ Note: another reproucer which this commit fixes is:
img=test.img
rm -f $img
truncate -s 25MiB $img
mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img
truncate -s 3GiB $img
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 3G
umount $dev
losetup -d $dev
-- TYT ]
Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://lore.kernel.org/all/20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com/
Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Eric Sandeen <sandeen@redhat.com>
Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240927133329.1015041-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result
in a fast-commit being done before the filesystem is effectively marked as
ineligible. This patch moves the call to this function so that an handle
can be used. If a transaction fails to start, then there's not point in
trying to mark the filesystem as ineligible, and an error will eventually be
returned to user-space.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240923104909.18342-3-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result
in a fast-commit being done before the filesystem is effectively marked as
ineligible. This patch fixes the calls to this function in
__track_dentry_update() by adding an extra parameter to the callback used in
ext4_fc_track_template().
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240923104909.18342-2-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
This patch reverts two TOMOYO patches that were merged into Linus' tree
during the v6.12 merge window:
8b985bbfabbe ("tomoyo: allow building as a loadable LSM module")
268225a1de1a ("tomoyo: preparation step for building as a loadable LSM module")
Together these two patches introduced the CONFIG_SECURITY_TOMOYO_LKM
Kconfig build option which enabled a TOMOYO specific dynamic LSM loading
mechanism (see the original commits for more details). Unfortunately,
this approach was widely rejected by the LSM community as well as some
members of the general kernel community. Objections included concerns
over setting a bad precedent regarding individual LSMs managing their
LSM callback registrations as well as general kernel symbol exporting
practices. With little to no support for the CONFIG_SECURITY_TOMOYO_LKM
approach outside of Tetsuo, and multiple objections, we need to revert
these changes.
Link: https://lore.kernel.org/all/0c4b443a-9c72-4800-97e8-a3816b6a9ae2@I-love.SAKURA.ne.jp
Link: https://lore.kernel.org/all/CAHC9VhR=QjdoHG3wJgHFJkKYBg7vkQH2MpffgVzQ0tAByo_wRg@mail.gmail.com
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Add the Microsoft Azure Cobalt 100 CPU to the list of CPUs suffering
from erratum 3194386 added in commit 75b3c43eab59 ("arm64: errata:
Expand speculative SSBS workaround")
CC: Mark Rutland <mark.rutland@arm.com>
CC: James More <james.morse@arm.com>
CC: Will Deacon <will@kernel.org>
CC: stable@vger.kernel.org # 6.6+
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://lore.kernel.org/r/20241003225239.321774-1-eahariha@linux.microsoft.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
We received a regression report for System76 Pangolin (pang14) due to
the recent fix for Tuxedo Sirius devices to support the top speaker.
The reason was the conflicting PCI SSID, as often seen.
As a workaround, now the codec SSID is checked and the quirk is
applied conditionally only to Sirius devices.
Fixes: 4178d78cd7a8 ("ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devices")
Reported-by: Christian Heusel <christian@heusel.eu>
Reported-by: Jerry <jerryluo225@gmail.com>
Closes: https://lore.kernel.org/c930b6a6-64e5-498f-b65a-1cd5e0a1d733@heusel.eu
Link: https://patch.msgid.link/20241004082602.29016-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Add hw monitor volume control for POD HD500X. This is done adding
LINE6_CAP_HWMON_CTL to the capabilities
Signed-off-by: Hans P. Moller <hmoller@uc.cl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20241003232828.5819-1-hmoller@uc.cl
|
|
If get_bpos() fails, it is likely that the corresponding error code should
be returned.
Fixes: a6970bb1dd99 ("ALSA: gus: Convert to the new PCM ops")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://patch.msgid.link/d9ca841edad697154afa97c73a5d7a14919330d9.1727984008.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|