aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2024-07-26erofs: fix race in z_erofs_get_gbuf()Gao Xiang1-0/+3
In z_erofs_get_gbuf(), the current task may be migrated to another CPU between `z_erofs_gbuf_id()` and `spin_lock(&gbuf->lock)`. Therefore, z_erofs_put_gbuf() will trigger the following issue which was found by stress test: <2>[772156.434168] kernel BUG at fs/erofs/zutil.c:58! .. <4>[772156.435007] <4>[772156.439237] CPU: 0 PID: 3078 Comm: stress Kdump: loaded Tainted: G E 6.10.0-rc7+ #2 <4>[772156.439239] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017 <4>[772156.439241] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[772156.439243] pc : z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.439252] lr : z_erofs_lz4_decompress+0x600/0x6a0 [erofs] .. <6>[772156.445958] stress (3127): drop_caches: 1 <4>[772156.446120] Call trace: <4>[772156.446121] z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.446761] z_erofs_lz4_decompress+0x600/0x6a0 [erofs] <4>[772156.446897] z_erofs_decompress_queue+0x740/0xa10 [erofs] <4>[772156.447036] z_erofs_runqueue+0x428/0x8c0 [erofs] <4>[772156.447160] z_erofs_readahead+0x224/0x390 [erofs] .. Fixes: f36f3010f676 ("erofs: rename per-CPU buffers to global buffer pool and make it configurable") Cc: <stable@vger.kernel.org> # 6.10+ Reviewed-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240722035110.3456740-1-hsiangkao@linux.alibaba.com
2024-07-26erofs: support STATX_DIOALIGNHongbo Li1-2/+17
Add support for STATX_DIOALIGN to EROFS, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:512 dio offset align:512 ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240718083243.2485437-1-hsiangkao@linux.alibaba.com
2024-07-13erofs: silence uninitialized variable warning in z_erofs_scan_folio()Dan Carpenter1-1/+1
Smatch complains that: fs/erofs/zdata.c:1047 z_erofs_scan_folio() error: uninitialized symbol 'err'. The issue is if we hit this (!(map->m_flags & EROFS_MAP_MAPPED)) { condition then "err" isn't set. It's inside a loop so we would have to hit that condition on every iteration. Initialize "err" to zero to solve this. Fixes: 5b9654efb604 ("erofs: teach z_erofs_scan_folios() to handle multi-page folios") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/f78ab50e-ed6d-4275-8dd4-a4159fa565a2@stanley.mountain Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-07-11erofs: avoid refcounting short-lived pagesGao Xiang3-26/+24
LZ4 always reuses the decompressed buffer as its LZ77 sliding window (dynamic dictionary) for optimal performance. However, in specific cases, the output buffer may not fully contain valid page cache pages, resulting in the use of short-lived pages for temporary purposes. Due to the limited sliding window size, LZ4 shortlived bounce pages can also be reused in a sliding manner, so each bounce page can be vmapped multiple times in different relative positions by design. In order to avoiding double frees, currently, reuse counts are recorded via page refcount, but it will no longer be used as-is in the future world of Memdescs. Just maintain a lookup table to check if a shortlived page is reused. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240711053659.1364989-1-hsiangkao@linux.alibaba.com
2024-07-10erofs: get rid of z_erofs_map_blocks_iter_* tracepointsHongzhen Luo2-31/+5
Consolidate them under erofs_map_blocks_* for simplicity since we have many other ways to know if a given inode is compressed or not. Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240710083459.208362-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-07-09erofs: tidy up stream decompressorsGao Xiang5-304/+209
Just use a generic helper to prepare buffers for all supported stream decompressors, eliminating similar logic. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-3-hsiangkao@linux.alibaba.com
2024-07-09erofs: refine z_erofs_{init,exit}_subsystem()Gao Xiang8-78/+69
Introduce z_erofs_{init,exit}_decompressor() to unexport z_erofs_{deflate,lzma,zstd}_{init,exit}(). Besides, call them in z_erofs_{init,exit}_subsystem() for simplicity. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-2-hsiangkao@linux.alibaba.com
2024-07-09erofs: move each decompressor to its own source fileGao Xiang6-50/+44
Thus *_config() function declarations can be avoided. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-1-hsiangkao@linux.alibaba.com
2024-07-08erofs: tidy up `struct z_erofs_bvec`Gao Xiang1-52/+49
After revisiting the design, I believe `struct z_erofs_bvec` should be page-based instead of folio-based due to the reasons below: - The minimized memory mapping block is a page; - Under the certain circumstances, only temporary pages needs to be used instead of folios since refcount, mapcount for such pages are unnecessary; - Decompressors handle all types of pages including temporary pages, not only folios. When handling `struct z_erofs_bvec`, all folio-related information is now accessed using the page_folio() helper. The final goal of this round adaptation is to eliminate direct accesses to `struct page` in the EROFS codebase, except for some exceptions like `z_erofs_is_shortlived_page()` and `z_erofs_page_is_invalidated()`, which require a new helper to determine the memdesc type of an arbitrary page. Actually large folios of compressed files seem to work now, yet I tend to conduct more tests before officially enabling this for all scenarios. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-4-hsiangkao@linux.alibaba.com
2024-07-08erofs: teach z_erofs_scan_folios() to handle multi-page foliosGao Xiang1-85/+82
Previously, a folio just contains one page. In order to enable large folios, z_erofs_scan_folios() needs to handle multi-page folios. First, this patch eliminates all gotos. Instead, the new loop deal with multiple parts in each folio. It's simple to handle the parts which belong to unmapped extents or fragment extents; but for encoded extents, the page boundaries needs to be considered for `tight` and `split` to keep inplace I/Os work correctly: when a part crosses the page boundary, they needs to be reseted properly. Besides, simplify `tight` derivation since Z_EROFS_PCLUSTER_HOOKED has been removed for quite a while. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-3-hsiangkao@linux.alibaba.com
2024-07-08erofs: convert z_erofs_read_fragment() to foliosGao Xiang1-8/+7
Just a straight-forward conversion. No logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-2-hsiangkao@linux.alibaba.com
2024-07-08erofs: convert z_erofs_pcluster_readmore() to foliosGao Xiang2-17/+12
Unlike `pagecache_get_page()`, `__filemap_get_folio()` returns error pointers instead of NULL, thus switching to `IS_ERR_OR_NULL`. Apart from that, it's just a straightforward conversion. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-1-hsiangkao@linux.alibaba.com
2024-07-07Linux 6.10-rc7Linus Torvalds1-1/+1
2024-07-06selftests/powerpc: Fix build with USERCFLAGS setMichael Ellerman1-4/+1
Currently building the powerpc selftests with USERCFLAGS set to anything causes the build to break: $ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error ... gcc -Wno-error cache_shape.c ... cache_shape.c:18:10: fatal error: utils.h: No such file or directory 18 | #include "utils.h" | ^~~~~~~~~ compilation terminated. This happens because the USERCFLAGS are added to CFLAGS in lib.mk, which causes the check of CFLAGS in powerpc/flags.mk to skip setting CFLAGS at all, resulting in none of the usual CFLAGS being passed. That can be seen in the output above, the only flag passed to the compiler is -Wno-error. Fix it by dropping the conditional setting of CFLAGS in flags.mk. Instead always set CFLAGS, but also append USERCFLAGS if they are set. Note that appending to CFLAGS (with +=) wouldn't work, because flags.mk is included by multiple Makefiles (to support partial builds), causing CFLAGS to be appended to multiple times. Additionally that would place the USERCFLAGS prior to the standard CFLAGS, meaning the USERCFLAGS couldn't override the standard flags. Being able to override the standard flags is desirable, for example for adding -Wno-error. With the fix in place, the CFLAGS are set correctly, including the USERCFLAGS: $ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error ... gcc -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v6.10-rc2-7-gdea17e7e56c3"' -I/home/michael/linux/tools/testing/selftests/powerpc/include -Wno-error cache_shape.c ... Fixes: 5553a79387e9 ("selftests/powerpc: Add flags.mk to support pmu buildable") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240706120833.909853-1-mpe@ellerman.id.au
2024-07-05gpiolib: of: add polarity quirk for TSC2005Dmitry Torokhov1-0/+8
DTS for Nokia N900 incorrectly specifies "active high" polarity for the reset line, while the chip documentation actually specifies it as "active low". In the past the driver fudged gpiod API and inverted the logic internally, but it was changed in d0d89493bff8. Fixes: d0d89493bff8 ("Input: tsc2004/5 - switch to using generic device properties") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/ZoWXwYtwgJIxi-hD@google.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-05tpm: Address !chip->auth in tpm_buf_append_hmac_session*()Jarkko Sakkinen2-124/+130
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause a null derefence in tpm_buf_hmac_session*(). Thus, address !chip->auth in tpm_buf_hmac_session*() and remove the fallback implementation for !TCG_TPM2_HMAC. Cc: stable@vger.kernel.org # v6.9+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-07-05tpm: Address !chip->auth in tpm_buf_append_name()Jarkko Sakkinen3-111/+131
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause a null derefence in tpm_buf_append_name(). Thus, address !chip->auth in tpm_buf_append_name() and remove the fallback implementation for !TCG_TPM2_HMAC. Cc: stable@vger.kernel.org # v6.10+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: d0a25bb961e6 ("tpm: Add HMAC session name/handle append") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-07-05tpm: Address !chip->auth in tpm2_*_auth_session()Jarkko Sakkinen1-2/+12
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause a null derefence in tpm2_*_auth_session(). Thus, address !chip->auth in tpm2_*_auth_session(). Cc: stable@vger.kernel.org # v6.9+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-07-04bnxt_en: Fix the resource check condition for RSS contextsPavan Chebbi1-1/+5
While creating a new RSS context, bnxt_rfs_capable() currently makes a strict check to see if the required VNICs are already available. If the current VNICs are not what is required, either too many or not enough, it will call the firmware to reserve the exact number required. There is a bug in the firmware when the driver tries to relinquish some reserved VNICs and RSS contexts. It will cause the default VNIC to lose its RSS configuration and cause receive packets to be placed incorrectly. Workaround this problem by skipping the resource reduction. The driver will not reduce the VNIC and RSS context reservations when a context is deleted. The resources will be available for use when new contexts are created later. Potentially, this workaround can cause us to run out of VNIC and RSS contexts if there are a lot of VF functions creating and deleting RSS contexts. In the future, we will conditionally disable this workaround when the firmware fix is available. Fixes: 438ba39b25fe ("bnxt_en: Improve RSS context reservation infrastructure") Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI fileAleksandr Mishin1-0/+1
In case of invalid INI file mlxsw_linecard_types_init() deallocates memory but doesn't reset pointer to NULL and returns 0. In case of any error occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init() calls mlxsw_linecard_types_fini() which performs memory deallocation again. Add pointer reset to NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20240703203251.8871-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04inet_diag: Initialize pad field in struct inet_diag_req_v2Shigeru Yoshida1-0/+2
KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw sockets uses the pad field in struct inet_diag_req_v2 for the underlying protocol. This field corresponds to the sdiag_raw_protocol field in struct inet_diag_req_raw. inet_diag_get_exact_compat() converts inet_diag_req to inet_diag_req_v2, but leaves the pad field uninitialized. So the issue occurs when raw_lookup() accesses the sdiag_raw_protocol field. Fix this by initializing the pad field in inet_diag_get_exact_compat(). Also, do the same fix in inet_diag_dump_compat() to avoid the similar issue in the future. [1] BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline] BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71 raw_lookup net/ipv4/raw_diag.c:49 [inline] raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99 inet_diag_cmd_exact+0x7d9/0x980 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline] inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x332/0x3d0 net/socket.c:745 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639 __sys_sendmsg net/socket.c:2668 [inline] __do_sys_sendmsg net/socket.c:2677 [inline] __se_sys_sendmsg net/socket.c:2675 [inline] __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was stored to memory at: raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99 inet_diag_cmd_exact+0x7d9/0x980 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline] inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x332/0x3d0 net/socket.c:745 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639 __sys_sendmsg net/socket.c:2668 [inline] __do_sys_sendmsg net/socket.c:2677 [inline] __se_sys_sendmsg net/socket.c:2675 [inline] __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable req.i created at: inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline] inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240703091649.111773-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-04tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.Kuniyuki Iwashima1-0/+7
When we process segments with TCP AO, we don't check it in tcp_parse_options(). Thus, opt_rx->saw_unknown is set to 1, which unconditionally triggers the BPF TCP option parser. Let's avoid the unnecessary BPF invocation. Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20240703033508.6321-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-04drm/xe/mcr: Avoid clobbering DSS steeringMatt Roper1-3/+3
A couple copy/paste mistakes in the code that selects steering targets for OADDRM and INSTANCE0 unintentionally clobbered the steering target for DSS ranges in some cases. The OADDRM/INSTANCE0 values were also not assigned as intended, although that mistake wound up being harmless since the desired values for those specific ranges were '0' which the kzalloc of the GT structure should have already taken care of implicitly. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240626210536.1620176-2-matthew.d.roper@intel.com (cherry picked from commit 4f82ac6102788112e599a6074d2c1f2afce923df) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-07-04drm/xe: fix error handling in xe_migrate_update_pgtablesMatthew Auld1-4/+4
Don't call drm_suballoc_free with sa_bo pointing to PTR_ERR. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2120 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240620102025.127699-2-matthew.auld@intel.com (cherry picked from commit ce6b63336f79ec5f3996de65f452330e395f99ae) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-07-04drm/ttm: Always take the bo delayed cleanup path for imported bosThomas Hellström1-0/+1
Bos can be put with multiple unrelated dma-resv locks held. But imported bos attempt to grab the bo dma-resv during dma-buf detach that typically happens during cleanup. That leads to lockde splats similar to the below and a potential ABBA deadlock. Fix this by always taking the delayed workqueue cleanup path for imported bos. Requesting stable fixes from when the Xe driver was introduced, since its usage of drm_exec and wide vm dma_resvs appear to be the first reliable trigger of this. [22982.116427] ============================================ [22982.116428] WARNING: possible recursive locking detected [22982.116429] 6.10.0-rc2+ #10 Tainted: G U W [22982.116430] -------------------------------------------- [22982.116430] glxgears:sh0/5785 is trying to acquire lock: [22982.116431] ffff8c2bafa539a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: dma_buf_detach+0x3b/0xf0 [22982.116438] but task is already holding lock: [22982.116438] ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec] [22982.116442] other info that might help us debug this: [22982.116442] Possible unsafe locking scenario: [22982.116443] CPU0 [22982.116444] ---- [22982.116444] lock(reservation_ww_class_mutex); [22982.116445] lock(reservation_ww_class_mutex); [22982.116447] *** DEADLOCK *** [22982.116447] May be due to missing lock nesting notation [22982.116448] 5 locks held by glxgears:sh0/5785: [22982.116449] #0: ffff8c2d9aba58c8 (&xef->vm.lock){+.+.}-{3:3}, at: xe_file_close+0xde/0x1c0 [xe] [22982.116507] #1: ffff8c2e28cc8480 (&vm->lock){++++}-{3:3}, at: xe_vm_close_and_put+0x161/0x9b0 [xe] [22982.116578] #2: ffff8c2e31982970 (&val->lock){.+.+}-{3:3}, at: xe_validation_ctx_init+0x6d/0x70 [xe] [22982.116647] #3: ffffacdc469478a8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: xe_vma_destroy_unlocked+0x7f/0xe0 [xe] [22982.116716] #4: ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec] [22982.116719] stack backtrace: [22982.116720] CPU: 8 PID: 5785 Comm: glxgears:sh0 Tainted: G U W 6.10.0-rc2+ #10 [22982.116721] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [22982.116723] Call Trace: [22982.116724] <TASK> [22982.116725] dump_stack_lvl+0x77/0xb0 [22982.116727] __lock_acquire+0x1232/0x2160 [22982.116730] lock_acquire+0xcb/0x2d0 [22982.116732] ? dma_buf_detach+0x3b/0xf0 [22982.116734] ? __lock_acquire+0x417/0x2160 [22982.116736] __ww_mutex_lock.constprop.0+0xd0/0x13b0 [22982.116738] ? dma_buf_detach+0x3b/0xf0 [22982.116741] ? dma_buf_detach+0x3b/0xf0 [22982.116743] ? ww_mutex_lock+0x2b/0x90 [22982.116745] ww_mutex_lock+0x2b/0x90 [22982.116747] dma_buf_detach+0x3b/0xf0 [22982.116749] drm_prime_gem_destroy+0x2f/0x40 [drm] [22982.116775] xe_ttm_bo_destroy+0x32/0x220 [xe] [22982.116818] ? __mutex_unlock_slowpath+0x3a/0x290 [22982.116821] drm_exec_unlock_all+0xa1/0xd0 [drm_exec] [22982.116823] drm_exec_fini+0x12/0xb0 [drm_exec] [22982.116824] xe_validation_ctx_fini+0x15/0x40 [xe] [22982.116892] xe_vma_destroy_unlocked+0xb1/0xe0 [xe] [22982.116959] xe_vm_close_and_put+0x41a/0x9b0 [xe] [22982.117025] ? xa_find+0xe3/0x1e0 [22982.117028] xe_file_close+0x10a/0x1c0 [xe] [22982.117074] drm_file_free+0x22a/0x280 [drm] [22982.117099] drm_release_noglobal+0x22/0x70 [drm] [22982.117119] __fput+0xf1/0x2d0 [22982.117122] task_work_run+0x59/0x90 [22982.117125] do_exit+0x330/0xb40 [22982.117127] do_group_exit+0x36/0xa0 [22982.117129] get_signal+0xbd2/0xbe0 [22982.117131] arch_do_signal_or_restart+0x3e/0x240 [22982.117134] syscall_exit_to_user_mode+0x1e7/0x290 [22982.117137] do_syscall_64+0xa1/0x180 [22982.117139] ? lock_acquire+0xcb/0x2d0 [22982.117140] ? __set_task_comm+0x28/0x1e0 [22982.117141] ? find_held_lock+0x2b/0x80 [22982.117144] ? __set_task_comm+0xe1/0x1e0 [22982.117145] ? lock_release+0xca/0x290 [22982.117147] ? __do_sys_prctl+0x245/0xab0 [22982.117149] ? lockdep_hardirqs_on_prepare+0xde/0x190 [22982.117150] ? syscall_exit_to_user_mode+0xb0/0x290 [22982.117152] ? do_syscall_64+0xa1/0x180 [22982.117154] ? __lock_acquire+0x417/0x2160 [22982.117155] ? reacquire_held_locks+0xd1/0x1f0 [22982.117156] ? do_user_addr_fault+0x30c/0x790 [22982.117158] ? lock_acquire+0xcb/0x2d0 [22982.117160] ? find_held_lock+0x2b/0x80 [22982.117162] ? do_user_addr_fault+0x357/0x790 [22982.117163] ? lock_release+0xca/0x290 [22982.117164] ? do_user_addr_fault+0x361/0x790 [22982.117166] ? trace_hardirqs_off+0x4b/0xc0 [22982.117168] ? clear_bhb_loop+0x45/0xa0 [22982.117170] ? clear_bhb_loop+0x45/0xa0 [22982.117172] ? clear_bhb_loop+0x45/0xa0 [22982.117174] entry_SYSCALL_64_after_hwframe+0x76/0x7e [22982.117176] RIP: 0033:0x7f943d267169 [22982.117192] Code: Unable to access opcode bytes at 0x7f943d26713f. [22982.117193] RSP: 002b:00007f9430bffc80 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca [22982.117195] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f943d267169 [22982.117196] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00005622f89579d0 [22982.117197] RBP: 00007f9430bffcb0 R08: 0000000000000000 R09: 00000000ffffffff [22982.117198] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [22982.117199] R13: 0000000000000000 R14: 0000000000000000 R15: 00005622f89579d0 [22982.117202] </TASK> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240628153848.4989-1-thomas.hellstrom@linux.intel.com
2024-07-03selftests: make order checking verbose in msg_zerocopy selftestZijian Zhang1-1/+1
We find that when lock debugging is on, notifications may not come in order. Thus, we have order checking outputs managed by cfg_verbose, to avoid too many outputs in this case. Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03selftests: fix OOM in msg_zerocopy selftestZijian Zhang1-1/+11
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg on a socket with MSG_ZEROCOPY flag, and it will recv the notifications until the socket is not writable. Typically, it will start the receiving process after around 30+ sendmsgs. However, as the introduction of commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is always writable and does not get any chance to run recv notifications. The selftest always exits with OUT_OF_MEMORY because the memory used by opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a different value to trigger OOM on older kernels too. Thus, we introduce "cfg_notification_limit" to force sender to receive notifications after some number of sendmsgs. Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03ice: use proper macro for testing bitPetr Oros1-1/+1
Do not use _test_bit() macro for testing bit. The proper macro for this is one without underline. _test_bit() is what test_bit() was prior to const-optimization. It directly calls arch_test_bit(), i.e. the arch-specific implementation (or the generic one). It's strictly _internal_ and shouldn't be used anywhere outside the actual test_bit() macro. test_bit() is a wrapper which checks whether the bitmap and the bit number are compile-time constants and if so, it calls the optimized function which evaluates this call to a compile-time constant as well. If either of them is not a compile-time constant, it just calls _test_bit(). test_bit() is the actual function to use anywhere in the kernel. IOW, calling _test_bit() avoids potential compile-time optimizations. The sensors is not a compile-time constant, thus most probably there are no object code changes before and after the patch. But anyway, we shouldn't call internal wrappers instead of the actual API. Fixes: 4da71a77fc3b ("ice: read internal temperature sensor") Acked-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240702171459.2606611-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03ice: Reject pin requests with unsupported flagsJacob Keller2-16/+23
The driver receives requests for configuring pins via the .enable callback of the PTP clock object. These requests come into the driver with flags which modify the requested behavior from userspace. Current implementation in ice does not reject flags that it doesn't support. This causes the driver to incorrectly apply requests with such flags as PTP_PEROUT_DUTY_CYCLE, or any future flags added by the kernel which it is not yet aware of. Fix this by properly validating flags in both ice_ptp_cfg_perout and ice_ptp_cfg_extts. Ensure that we check by bit-wise negating supported flags rather than just checking and rejecting known un-supported flags. This is preferable, as it ensures better compatibility with future kernels. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240702171459.2606611-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03ice: Don't process extts if PTP is disabledJacob Keller1-0/+4
The ice_ptp_extts_event() function can race with ice_ptp_release() and result in a NULL pointer dereference which leads to a kernel panic. Panic occurs because the ice_ptp_extts_event() function calls ptp_clock_event() with a NULL pointer. The ice driver has already released the PTP clock by the time the interrupt for the next external timestamp event occurs. To fix this, modify the ice_ptp_extts_event() function to check the PTP state and bail early if PTP is not ready. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240702171459.2606611-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03ice: Fix improper extts handlingMilena Olech2-22/+91
Extts events are disabled and enabled by the application ts2phc. However, in case where the driver is removed when the application is running, a specific extts event remains enabled and can cause a kernel crash. As a side effect, when the driver is reloaded and application is started again, remaining extts event for the channel from a previous run will keep firing and the message "extts on unexpected channel" might be printed to the user. To avoid that, extts events shall be disabled when PTP is released. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240702171459.2606611-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03selftest: af_unix: Add test case for backtrack after finalising SCC.Kuniyuki Iwashima1-2/+23
syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking edge_stack after finalising SCC. Let's add a test case exercising the path. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://patch.msgid.link/20240702160428.10153-2-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03af_unix: Fix uninit-value in __unix_walk_scc()Shigeru Yoshida1-4/+5
KMSAN reported uninit-value access in __unix_walk_scc() [1]. In the list_for_each_entry_reverse() loop, when the vertex's index equals it's scc_index, the loop uses the variable vertex as a temporary variable that points to a vertex in scc. And when the loop is finished, the variable vertex points to the list head, in this case scc, which is a local variable on the stack (more precisely, it's not even scc and might underflow the call stack of __unix_walk_scc(): container_of(&scc, struct unix_vertex, scc_entry)). However, the variable vertex is used under the label prev_vertex. So if the edge_stack is not empty and the function jumps to the prev_vertex label, the function will access invalid data on the stack. This causes the uninit-value access issue. Fix this by introducing a new temporary variable for the loop. [1] BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline] BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline] BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 __unix_walk_scc net/unix/garbage.c:478 [inline] unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Uninit was stored to memory at: unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Local variable entries created at: ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222 netdev_tracker_free include/linux/netdevice.h:4058 [inline] netdev_put include/linux/netdevice.h:4075 [inline] dev_put include/linux/netdevice.h:4101 [inline] update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813 CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Workqueue: events_unbound __unix_gc Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()Sam Sun1-3/+3
In function bond_option_arp_ip_targets_set(), if newval->string is an empty string, newval->string+1 will point to the byte after the string, causing an out-of-bound read. BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418 Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107 CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0xc1/0x5e0 mm/kasan/report.c:475 kasan_report+0xbe/0xf0 mm/kasan/report.c:588 strlen+0x7d/0xa0 lib/string.c:418 __fortify_strlen include/linux/fortify-string.h:210 [inline] in4_pton+0xa3/0x3f0 net/core/utils.c:130 bond_option_arp_ip_targets_set+0xc2/0x910 drivers/net/bonding/bond_options.c:1201 __bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767 __bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792 bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817 bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156 dev_attr_store+0x54/0x80 drivers/base/core.c:2366 sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136 kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x96a/0xd80 fs/read_write.c:584 ksys_write+0x122/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b ---[ end trace ]--- Fix it by adding a check of string length before using it. Fixes: f9de11a16594 ("bonding: add ip checks when store ip target") Signed-off-by: Yue Sun <samsun1006219@gmail.com> Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03net: rswitch: Avoid use-after-free in rswitch_poll()Radu Rendec1-2/+2
The use-after-free is actually in rswitch_tx_free(), which is inlined in rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the same pointer, the skb is first freed using dev_kfree_skb_any(), then the value in skb->len is used to update the interface statistics. Let's move around the instructions to use skb->len before the skb is freed. This bug is trivial to reproduce using KFENCE. It will trigger a splat every few packets. A simple ARP request or ICMP echo request is enough. Fixes: 271e015b9153 ("net: rswitch: Add unmap_addrs instead of dma address in each desc") Signed-off-by: Radu Rendec <rrendec@redhat.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/20240702210838.2703228-1-rrendec@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04btrfs: fix folio refcount in __alloc_dummy_extent_buffer()Boris Burkov1-1/+1
Another improper use of __folio_put() in an error path after freshly allocating pages/folios which returns them with the refcount initialized to 1. The refactor from __free_pages() -> __folio_put() (instead of folio_put) removed a refcount decrement found in __free_pages() and folio_put but absent from __folio_put(). Fixes: 13df3775efca ("btrfs: cleanup metadata page pointer usage") CC: stable@vger.kernel.org # 6.8+ Tested-by: Ed Tomlinson <edtoml@gmail.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-04btrfs: fix folio refcount in btrfs_do_encoded_write()Boris Burkov1-1/+1
The conversion to folios switched __free_page() to __folio_put() in the error path in btrfs_do_encoded_write(). However, this gets the page refcounting wrong. If we do hit that error path (I reproduced by modifying btrfs_do_encoded_write to pretend to always fail in a way that jumps to out_folios and running the fstests case btrfs/281), then we always hit the following BUG freeing the folio: BUG: Bad page state in process btrfs pfn:40ab0b page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x61be5 pfn:0x40ab0b flags: 0x5ffff0000000000(node=0|zone=2|lastcpupid=0x1ffff) raw: 05ffff0000000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000061be5 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: nonzero _refcount Call Trace: <TASK> dump_stack_lvl+0x3d/0xe0 bad_page+0xea/0xf0 free_unref_page+0x8e1/0x900 ? __mem_cgroup_uncharge+0x69/0x90 __folio_put+0xe6/0x190 btrfs_do_encoded_write+0x445/0x780 ? current_time+0x25/0xd0 btrfs_do_write_iter+0x2cc/0x4b0 btrfs_ioctl_encoded_write+0x2b6/0x340 It turns out __free_page() decreases the page reference count while __folio_put() does not. Switch __folio_put() to folio_put() which decreases the folio reference count first. Fixes: 400b172b8cdc ("btrfs: compression: migrate compression/decompression paths to folios") Tested-by: Ed Tomlinson <edtoml@gmail.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-04netfilter: nf_tables: unconditionally flush pending work before notifierFlorian Westphal1-2/+1
syzbot reports: KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831 KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530 KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597 Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45 [..] Workqueue: events nf_tables_trans_destroy_work Call Trace: nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline] nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline] nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597 Problem is that the notifier does a conditional flush, but its possible that the table-to-be-removed is still referenced by transactions being processed by the worker, so we need to flush unconditionally. We could make the flush_work depend on whether we found a table to delete in nf-next to avoid the flush for most cases. AFAICS this problem is only exposed in nf-next, with commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"), with this commit applied there is an unconditional fetch of table->family which is whats triggering the above splat. Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier") Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-04i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isrPiotr Wojtaszczyk1-38/+10
When del_timer_sync() is called in an interrupt context it throws a warning because of potential deadlock. The timer is used only to exit from wait_for_completion() after a timeout so replacing the call with wait_for_completion_timeout() allows to remove the problematic timer and its related functions altogether. Fixes: 41561f28e76a ("i2c: New Philips PNX bus driver") Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk@timesys.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-07-03tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2FSteven Rostedt (Google)2-1/+2
To prevent conflicts with other ioctl numbers to allow strace to have an idea of what is happening, add the range of ioctls for the trace buffer mapping from _IO("T", 0x1) to the range of "R" 0x20 - 0x2F. Link: https://lore.kernel.org/linux-trace-kernel/20240630105322.GA17573@altlinux.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240630213626.GA23566@altlinux.org/ Cc: Jonathan Corbet <corbet@lwn.net> Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer") Link: https://lore.kernel.org/20240702153354.367861db@rorschach.local.home Reported-by: "Dmitry V. Levin" <ldv@strace.io> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-07-03riscv: kexec: Avoid deadlock in kexec crash pathSong Shuai1-9/+1
If the kexec crash code is called in the interrupt context, the machine_kexec_mask_interrupts() function will trigger a deadlock while trying to acquire the irqdesc spinlock and then deactivate irqchip in irq_set_irqchip_state() function. Unlike arm64, riscv only requires irq_eoi handler to complete EOI and keeping irq_set_irqchip_state() will only leave this possible deadlock without any use. So we simply remove it. Link: https://lore.kernel.org/linux-riscv/20231208111015.173237-1-songshuaishuai@tinylab.org/ Fixes: b17d19a5314a ("riscv: kexec: Fixup irq controller broken in kexec crash path") Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Ryo Takakura <takakura@valinux.co.jp> Link: https://lore.kernel.org/r/20240626023316.539971-1-songshuaishuai@tinylab.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03riscv: stacktrace: fix usage of ftrace_graph_ret_addr()Puranjay Mohan1-1/+2
ftrace_graph_ret_addr() takes an `idx` integer pointer that is used to optimize the stack unwinding. Pass it a valid pointer to utilize the optimizations that might be available in the future. The commit is making riscv's usage of ftrace_graph_ret_addr() match x86_64. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240618145820.62112-1-puranjay@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03riscv: selftests: Fix vsetivli args for clangCharlie Jenkins1-1/+1
Clang does not support implicit LMUL in the vset* instruction sequences. Introduce an explicit LMUL in the vsetivli instruction. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Fixes: 9d5328eeb185 ("riscv: selftests: Add signal handling vector tests") Link: https://lore.kernel.org/r/20240702-fix_sigreturn_test-v1-1-485f88a80612@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03perf: RISC-V: Check standard event availabilitySamuel Holland2-3/+41
The RISC-V SBI PMU specification defines several standard hardware and cache events. Currently, all of these events are exposed to userspace, even when not actually implemented. They appear in the `perf list` output, and commands like `perf stat` try to use them. This is more than just a cosmetic issue, because the PMU driver's .add function fails for these events, which causes pmu_groups_sched_in() to prematurely stop scheduling in other (possibly valid) hardware events. Add logic to check which events are supported by the hardware (i.e. can be mapped to some counter), so only usable events are reported to userspace. Since the kernel does not know the mapping between events and possible counters, this check must happen during boot, when no counters are in use. Make the check asynchronous to minimize impact on boot time. Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-3-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpusSamuel Holland1-1/+1
Currently, we stop all the counters while a new cpu is brought online. However, the hpmevent to counter mappings are not reset. The firmware may have some stale encoding in their mapping structure which may lead to undesirable results. We have not encountered such scenario though. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-2-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03drivers/perf: riscv: Do not update the event data if uptodateAtish Patra1-1/+1
In case of an counter overflow, the event data may get corrupted if called from an external overflow handler. This happens because we can't update the counter without starting it when SBI PMU extension is in use. However, the prev_count has been already updated at the first pass while the counter value is still the old one. The solution is simple where we don't need to update it again if it is already updated which can be detected using hwc state. The event state in the overflow handler is updated in the following patch. Thus, this fix can't be backported to kernel version where overflow support was added. Fixes: a8625217a054 ("drivers/perf: riscv: Implement SBI PMU snapshot function") Closes:https://lore.kernel.org/all/CC51D53B-846C-4D81-86FC-FBF969D0A0D6@pku.edu.cn/ Reported-by: garthlei@pku.edu.cn Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-1-e01cfddcf035@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03nilfs2: fix incorrect inode allocation from reserved inodesRyusuke Konishi4-12/+20
If the bitmap block that manages the inode allocation status is corrupted, nilfs_ifile_create_inode() may allocate a new inode from the reserved inode area where it should not be allocated. Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of struct nilfs_root"), fixed the problem that reserved inodes with inode numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to bitmap corruption, but since the start number of non-reserved inodes is read from the super block and may change, in which case inode allocation may occur from the extended reserved inode area. If that happens, access to that inode will cause an IO error, causing the file system to degrade to an error state. Fix this potential issue by adding a wraparound option to the common metadata object allocation routine and by modifying nilfs_ifile_create_inode() to disable the option so that it only allocates inodes with inode numbers greater than or equal to the inode number read in "nilfs->ns_first_ino", regardless of the bitmap status of reserved inodes. Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03nilfs2: add missing check for inode numbers on directory entriesRyusuke Konishi2-0/+11
Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lru_add_fn(). As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(), tries to delete that inode (ifile inode in this case). The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking. Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages. Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis. Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 Reported-by: Jan Kara <jack@suse.cz> Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03nilfs2: fix inode number range checksRyusuke Konishi3-3/+10
Patch series "nilfs2: fix potential issues related to reserved inodes". This series fixes one use-after-free issue reported by syzbot, caused by nilfs2's internal inode being exposed in the namespace on a corrupted filesystem, and a couple of flaws that cause problems if the starting number of non-reserved inodes written in the on-disk super block is intentionally (or corruptly) changed from its default value. This patch (of 3): In the current implementation of nilfs2, "nilfs->ns_first_ino", which gives the first non-reserved inode number, is read from the superblock, but its lower limit is not checked. As a result, if a number that overlaps with the inode number range of reserved inodes such as the root directory or metadata files is set in the super block parameter, the inode number test macros (NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly. In addition, these test macros use left bit-shift calculations using with the inode number as the shift count via the BIT macro, but the result of a shift calculation that exceeds the bit width of an integer is undefined in the C specification, so if "ns_first_ino" is set to a large value other than the default value NILFS_USER_INO (=11), the macros may potentially malfunction depending on the environment. Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and by preventing bit shifts equal to or greater than the NILFS_USER_INO constant in the inode number test macros. Also, change the type of "ns_first_ino" from signed integer to unsigned integer to avoid the need for type casting in comparisons such as the lower bound check introduced this time. Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03mm: avoid overflows in dirty throttling logicJan Kara1-4/+26
The dirty throttling logic is interspersed with assumptions that dirty limits in PAGE_SIZE units fit into 32-bit (so that various multiplications fit into 64-bits). If limits end up being larger, we will hit overflows, possible divisions by 0 etc. Fix these problems by never allowing so large dirty limits as they have dubious practical value anyway. For dirty_bytes / dirty_background_bytes interfaces we can just refuse to set so large limits. For dirty_ratio / dirty_background_ratio it isn't so simple as the dirty limit is computed from the amount of available memory which can change due to memory hotplug etc. So when converting dirty limits from ratios to numbers of pages, we just don't allow the result to exceed UINT_MAX. This is root-only triggerable problem which occurs when the operator sets dirty limits to >16 TB. Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Zach O'Keefe <zokeefe@google.com> Reviewed-By: Zach O'Keefe <zokeefe@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>