aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2024-09-16cifs: Update SFU comments about fifos and socketsPali Rohár3-6/+6
In SFU mode, activated by -o sfu mount option is now also support for creating new fifos and sockets. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-16cifs: Add support for creating SFU symlinksPali Rohár5-29/+77
Linux cifs client can already detect SFU symlinks and reads it content (target location). But currently is not able to create new symlink. So implement this missing support. When 'sfu' mount option is specified and 'mfsymlinks' is not specified then create new symlinks in SFU-style. This will provide full SFU compatibility of symlinks when mounting cifs share with 'sfu' option. 'mfsymlinks' option override SFU for better Apple compatibility as explained in fs_context.c file in smb3_update_mnt_flags() function. Extend __cifs_sfu_make_node() function, which now can handle also S_IFLNK type and refactor structures passed to sync_write() in this function, by splitting SFU type and SFU data from original combined struct win_dev as combined fixed-length struct cannot be used for variable-length symlinks. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: use LIST_HEAD() to simplify codeHongbo Li4-16/+7
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). No functional impact. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15cifs: Recognize SFU socket typePali Rohár1-1/+5
SFU since its (first) version 3.0 supports AF_LOCAL sockets and stores them on filesytem as system file with one zero byte. Add support for detecting this SFU socket type into cifs_sfu_type() function. With this change cifs_sfu_type() would correctly detect all special file types created by SFU: fifo, socket, symlink, block and char. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15cifs: Show debug message when SFU Fifo type was detectedPali Rohár1-0/+1
For debugging purposes it is a good idea to show detected SFU type also for Fifo. Debug message is already print for all other special types. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15cifs: Put explicit zero byte into SFU block/char typesPali Rohár2-4/+4
SFU types IntxCHR and IntxBLK are 8 bytes with zero as last byte. Make it explicit in memcpy and memset calls, so the zero byte is visible in the code (and not hidden as string trailing nul byte). It is important for reader to show the last byte for block and char types because it differs from the last byte of symlink type (which has it 0x01). Also it is important to show that the type is not nul-term string, but rather 8 bytes (with some printable bytes). Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15cifs: Add support for reading SFU symlink locationPali Rohár1-0/+29
Currently when sfu mount option is specified then CIFS can recognize SFU symlink, but is not able to read symlink target location. readlink() syscall just returns that operation is not supported. Implement this missing functionality in cifs_sfu_type() function. Read target location of SFU-style symlink, parse it and fill into fattr's cf_symlink_target member. SFU-style symlink is file which has system attribute set and file content is buffer "IntxLNK\1" (8th byte is 0x01) followed by the target location encoded in little endian UCS-2/UTF-16. This format was introduced in Interix 3.0 subsystem, as part of the Microsoft SFU 3.0 and is used also by all later versions. Previous versions had no symlink support. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15cifs: Fix recognizing SFU symlinksPali Rohár1-1/+1
SFU symlinks have 8 byte prefix: "IntxLNK\1". So check also the last 8th byte 0x01. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: client: compress: fix an "illegal accesses" issueQianqiang Liu1-1/+1
Using uninitialized value "bkt" when calling "kfree" Fixes: 13b68d44990d ("smb: client: compress: LZ77 code improvements cleanup") Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: client: compress: fix a potential issue of freeing an invalid pointerQianqiang Liu1-1/+1
The dst pointer may not be initialized when calling kvfree(dst) Fixes: 13b68d44990d9 ("smb: client: compress: LZ77 code improvements cleanup") Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: client: compress: LZ77 code improvements cleanupEnzo Matsumiya4-463/+532
- Check data compressibility with some heuristics (copied from btrfs): - should_compress() final decision is is_compressible(data) - Cleanup compress/lz77.h leaving only lz77_compress() exposed: - Move parts to compress/lz77.c, while removing the rest of it because they were either unused, used only once, were implemented wrong (thanks to David Howells for the help) - Updated the compression parameters (still compatible with Windows implementation) trading off ~20% compression ratio for ~40% performance: - min match len: 3 -> 4 - max distance: 8KiB -> 1KiB - hash table type: u32 * -> u64 * Known bugs: This implementation currently works fine in general, but breaks with some payloads used during testing. Investigation ongoing, to be fixed in a next commit. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: client: insert compression check/call on write requestsEnzo Matsumiya2-0/+9
On smb2_async_writev(), set CIFS_COMPRESS_REQ on request flags if should_compress() returns true. On smb_send_rqst() check the flags, and compress and send the request to the server. (*) If the compression fails with -EMSGSIZE (i.e. compressed size is >= uncompressed size), the original uncompressed request is sent instead. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb3: mark compression as CONFIG_EXPERIMENTAL and fix missing compression operationSteve French9-4/+685
Move SMB3.1.1 compression code into experimental config option, and fix the compress mount option. Implement unchained LZ77 "plain" compression algorithm as per MS-XCA specification section "2.3 Plain LZ77 Compression Algorithm Details". Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15cifs: Remove obsoleted declaration for cifs_dir_openGaosheng Cui1-1/+0
The cifs_dir_open() have been removed since commit 737b758c965a ("[PATCH] cifs: character mapping of special characters (part 3 of 3)"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: client: Use min() macroShen Lichuan2-5/+3
Use the min() macro to simplify the function and improve its readability. Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15cifs: convert to use ERR_CAST()Yuesong Li1-1/+1
Use ERR_CAST() as it is designed for casting an error pointer to another type. This macro uses the __force and __must_check modifiers, which are used to tell the compiler to check for errors where this macro is used. Signed-off-by: Yuesong Li <liyuesong@vivo.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: add comment to STATUS_MCA_OCCUREDChenXiaoSong1-0/+4
Explained why the typo was not corrected. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: move SMB2 Status code to common header fileChenXiaoSong15-1835/+15
There are only 4 different definitions between the client and server: - STATUS_SERVER_UNAVAILABLE: from client/smb2status.h - STATUS_FILE_NOT_AVAILABLE: from client/smb2status.h - STATUS_NO_PREAUTH_INTEGRITY_HASH_OVERLAP: from server/smbstatus.h - STATUS_INVALID_LOCK_RANGE: from server/smbstatus.h Rename client/smb2status.h to common/smb2status.h, and merge the 2 different definitions of server to common header file. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb: move some duplicate definitions to common/smbacl.hChenXiaoSong3-200/+123
In order to maintain the code more easily, move duplicate definitions to new common header file. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb/client: rename cifs_ace to smb_aceChenXiaoSong5-41/+41
Preparation for moving acl definitions to new common header file. Use the following shell command to rename: find fs/smb/client -type f -exec sed -i \ 's/struct cifs_ace/struct smb_ace/g' {} + Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb/client: rename cifs_acl to smb_aclChenXiaoSong2-19/+19
Preparation for moving acl definitions to new common header file. Use the following shell command to rename: find fs/smb/client -type f -exec sed -i \ 's/struct cifs_acl/struct smb_acl/g' {} + Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb/client: rename cifs_sid to smb_sidChenXiaoSong7-63/+63
Preparation for moving acl definitions to new common header file. Use the following shell command to rename: find fs/smb/client -type f -exec sed -i \ 's/struct cifs_sid/struct smb_sid/g' {} + Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15smb/client: rename cifs_ntsd to smb_ntsdChenXiaoSong9-49/+49
Preparation for moving acl definitions to new common header file. Use the following shell command to rename: find fs/smb/client -type f -exec sed -i \ 's/struct cifs_ntsd/struct smb_ntsd/g' {} + Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15Linux 6.11Linus Torvalds1-1/+1
2024-09-15Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"Paolo Bonzini2-11/+7
This reverts commit 377b2f359d1f71c75f8cc352b5c81f2210312d83. This caused a regression with the bochsdrm driver, which used ioremap() instead of ioremap_wc() to map the video RAM. After the commit, the WB memory type is used without the IGNORE_PAT, resulting in the slower UC memory type. In fact, UC is slow enough to basically cause guests to not boot... but only on new processors such as Sapphire Rapids and Cascade Lake. Coffee Lake for example works properly, though that might also be an effect of being on a larger, more NUMA system. The driver has been fixed but that does not help older guests. Until we figure out whether Cascade Lake and newer processors are working as intended, revert the commit. Long term we might add a quirk, but the details depend on whether the processors are working as intended: for example if they are, the quirk might reference bochs-compatible devices, e.g. in the name and documentation, so that userspace can disable the quirk by default and only leave it enabled if such a device is being exposed to the guest. If instead this is actually a bug in CLX+, then the actions we need to take are different and depend on the actual cause of the bug. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-13pinctrl: pinctrl-cy8c95x0: Fix regcachePatrick Rudolph1-5/+9
The size of the mux stride was off by one, which could result in invalid pin configuration on the device side or invalid state readings on the software side. While on it also update the code and: - Increase the mux stride size to 16 - Align the virtual muxed regmap range to 16 - Start the regmap window at the selector - Mark reserved registers as not-readable Fixes: 8670de9fae49 ("pinctrl: cy8c95x0: Use regmap ranges") Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com> Reported-by: Andy Shevchenko <andy@kernel.org> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/20240902072859.583490-1-patrick.rudolph@9elements.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2024-09-12cifs: Fix signature miscalculationDavid Howells1-1/+1
Fix the calculation of packet signatures by adding the offset into a page in the read or write data payload when hashing the pages from it. Fixes: 39bc58203f04 ("cifs: Add a function to Hash the contents of an iterator") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-12mm: avoid leaving partial pfn mappings around in error caseLinus Torvalds1-5/+22
As Jann points out, PFN mappings are special, because unlike normal memory mappings, there is no lifetime information associated with the mapping - it is just a raw mapping of PFNs with no reference counting of a 'struct page'. That's all very much intentional, but it does mean that it's easy to mess up the cleanup in case of errors. Yes, a failed mmap() will always eventually clean up any partial mappings, but without any explicit lifetime in the page table mapping itself, it's very easy to do the error handling in the wrong order. In particular, it's easy to mistakenly free the physical backing store before the page tables are actually cleaned up and (temporarily) have stale dangling PTE entries. To make this situation less error-prone, just make sure that any partial pfn mapping is torn down early, before any other error handling. Reported-and-tested-by: Jann Horn <jannh@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-12drm/xe/client: add missing bo locking in show_meminfo()Matthew Auld1-3/+36
bo_meminfo() wants to inspect bo state like tt and the ttm resource, however this state can change at any point leading to stuff like NPD and UAF, if the bo lock is not held. Grab the bo lock when calling bo_meminfo(), ensuring we drop any spinlocks first. In the case of object_idr we now also need to hold a ref. v2 (MattB) - Also add xe_bo_assert_held() Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-6-matthew.auld@intel.com (cherry picked from commit 4f63d712fa104c3ebefcb289d1e733e86d8698c7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe/client: fix deadlock in show_meminfo()Matthew Auld1-1/+5
There is a real deadlock as well as sleeping in atomic() bug in here, if the bo put happens to be the last ref, since bo destruction wants to grab the same spinlock and sleeping locks. Fix that by dropping the ref using xe_bo_put_deferred(), and moving the final commit outside of the lock. Dropping the lock around the put is tricky since the bo can go out of scope and delete itself from the list, making it difficult to navigate to the next list entry. Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-5-matthew.auld@intel.com (cherry picked from commit 0083b8e6f11d7662283a267d4ce7c966812ffd8a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe/oa: Enable Xe2+ PES disaggregationAshutosh Dixit2-0/+5
Enable Xe2+ PES disaggregation (for OAG) to retrieve disaggregated metrics when disaggregated data is needed. Userspace can select whether to receive aggregated or disaggregated metrics via the particular OA configuration it uses (programmed via DRM_XE_OBSERVATION_OP_ADD_CONFIG). Bspec: 61101 Fixes: e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240909165933.2638765-1-ashutosh.dixit@intel.com Cc: stable@vger.kernel.org (cherry picked from commit fb2551a0e93897aec7fb3d4f473ebc06b146d160) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe/display: fix compat IS_DISPLAY_STEP() range endJani Nikula1-1/+1
It's supposed to be an open range at the end like in i915. Fingers crossed that nobody relies on this definition. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fe8743770694e429f6902491cdb306c97bdf701a.1724180287.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 453afb1a439994deeacb8d9ecbb48c1f2348ea0a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe: Fix access_ok check in user_fence_createNirmoy Das1-1/+1
Check size of the data not size of the pointer. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407300421.IBkAja96-lkp@intel.com/ Fixes: ddeb7989a98f ("drm/xe: Validate user fence during creation") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Apoorva Singh <apoorva.singh@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240806110722.28661-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit e102b5ed6e283a144793cab8fcd95f61d0ddbadb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe: Fix possible UAF in guc_exec_queue_process_msgMatthew Brost1-1/+3
Store xe_device ahead of processing message as message can be free'd in some cases. v2: - Including missing local changes v3: - Resend for CI Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202407231445.rpisd1vA-lkp@intel.com/ Fixes: 55ea73aacfb9 ("drm/xe: Build PM into GuC CT layer") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240724164341.1848954-1-matthew.brost@intel.com (cherry picked from commit 1a394b4f504f33eac8c38b6f42ba025105c7e869) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe: Remove fence check from send_tlb_invalidationMatthew Brost1-2/+2
'fence' argument in send_tlb_invalidation cannot be NULL, remove non-NULL check from send_tlb_invalidation. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202407231049.esig0Fkb-lkp@intel.com/ Fixes: 58bfe6674467 ("drm/xe: Drop xe_gt_tlb_invalidation_wait") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240723190714.1744653-1-matthew.brost@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 6482253e6e1ad1c3a76645a3899d3cfdb5b918cb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe/gt: Remove double includeLucas De Marchi1-1/+0
The header generated/xe_wa_oob.h is included twice. Remove one. Fixes: 27cb2b7fec2a ("drm/xe/bmg: implement Wa_16023588340") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202407052122.AzuWSPuo-lkp@intel.com/ Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240708173301.1543871-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 3d122660dc70029d9cccb4e8670125f0affa959e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12net: netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init()Lorenzo Bianconi2-1/+7
Move nf flowtable bpf initialization in nf_flow_table module load routine since nf_flow_table_bpf is part of nf_flow_table module and not nf_flow_table_inet one. This patch allows to avoid the following kernel warning running the reproducer below: $modprobe nf_flow_table_inet $rmmod nf_flow_table_inet $modprobe nf_flow_table_inet modprobe: ERROR: could not insert 'nf_flow_table_inet': Invalid argument [ 184.081501] ------------[ cut here ]------------ [ 184.081527] WARNING: CPU: 0 PID: 1362 at kernel/bpf/btf.c:8206 btf_populate_kfunc_set+0x23c/0x330 [ 184.081550] CPU: 0 UID: 0 PID: 1362 Comm: modprobe Kdump: loaded Not tainted 6.11.0-0.rc5.22.el10.x86_64 #1 [ 184.081553] Hardware name: Red Hat OpenStack Compute, BIOS 1.14.0-1.module+el8.4.0+8855+a9e237a9 04/01/2014 [ 184.081554] RIP: 0010:btf_populate_kfunc_set+0x23c/0x330 [ 184.081558] RSP: 0018:ff22cfb38071fc90 EFLAGS: 00010202 [ 184.081559] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 [ 184.081560] RDX: 000000000000006e RSI: ffffffff95c00000 RDI: ff13805543436350 [ 184.081561] RBP: ffffffffc0e22180 R08: ff13805543410808 R09: 000000000001ec00 [ 184.081562] R10: ff13805541c8113c R11: 0000000000000010 R12: ff13805541b83c00 [ 184.081563] R13: ff13805543410800 R14: 0000000000000001 R15: ffffffffc0e2259a [ 184.081564] FS: 00007fa436c46740(0000) GS:ff1380557ba00000(0000) knlGS:0000000000000000 [ 184.081569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 184.081570] CR2: 000055e7b3187000 CR3: 0000000100c48003 CR4: 0000000000771ef0 [ 184.081571] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 184.081572] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 184.081572] PKRU: 55555554 [ 184.081574] Call Trace: [ 184.081575] <TASK> [ 184.081578] ? show_trace_log_lvl+0x1b0/0x2f0 [ 184.081580] ? show_trace_log_lvl+0x1b0/0x2f0 [ 184.081582] ? __register_btf_kfunc_id_set+0x199/0x200 [ 184.081585] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081586] ? __warn.cold+0x93/0xed [ 184.081590] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081592] ? report_bug+0xff/0x140 [ 184.081594] ? handle_bug+0x3a/0x70 [ 184.081596] ? exc_invalid_op+0x17/0x70 [ 184.081597] ? asm_exc_invalid_op+0x1a/0x20 [ 184.081601] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081602] __register_btf_kfunc_id_set+0x199/0x200 [ 184.081605] ? __pfx_nf_flow_inet_module_init+0x10/0x10 [nf_flow_table_inet] [ 184.081607] do_one_initcall+0x58/0x300 [ 184.081611] do_init_module+0x60/0x230 [ 184.081614] __do_sys_init_module+0x17a/0x1b0 [ 184.081617] do_syscall_64+0x7d/0x160 [ 184.081620] ? __count_memcg_events+0x58/0xf0 [ 184.081623] ? handle_mm_fault+0x234/0x350 [ 184.081626] ? do_user_addr_fault+0x347/0x640 [ 184.081630] ? clear_bhb_loop+0x25/0x80 [ 184.081633] ? clear_bhb_loop+0x25/0x80 [ 184.081634] ? clear_bhb_loop+0x25/0x80 [ 184.081637] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 184.081639] RIP: 0033:0x7fa43652e4ce [ 184.081647] RSP: 002b:00007ffe8213be18 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 184.081649] RAX: ffffffffffffffda RBX: 000055e7b3176c20 RCX: 00007fa43652e4ce [ 184.081650] RDX: 000055e7737fde79 RSI: 0000000000003990 RDI: 000055e7b3185380 [ 184.081651] RBP: 000055e7737fde79 R08: 0000000000000007 R09: 000055e7b3179bd0 [ 184.081651] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000040000 [ 184.081652] R13: 000055e7b3176fa0 R14: 0000000000000000 R15: 000055e7b3179b80 Fixes: 391bb6594fd3 ("netfilter: Add bpf_xdp_flow_lookup kfunc") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20240911-nf-flowtable-bpf-modprob-fix-v1-1-f9fc075aafc3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-12PCI: Fix potential deadlock in pcim_intx()Philipp Stanner1-0/+2
25216afc9db5 ("PCI: Add managed pcim_intx()") moved the allocation step for pci_intx()'s device resource from pcim_enable_device() to pcim_intx(). As before, pcim_enable_device() sets pci_dev.is_managed to true; and it is never set to false again. Due to the lifecycle of a struct pci_dev, it can happen that a second driver obtains the same pci_dev after a first driver ran. If one driver uses pcim_enable_device() and the other doesn't, this causes the other driver to run into managed pcim_intx(), which will try to allocate when called for the first time. Allocations might sleep, so calling pci_intx() while holding spinlocks becomes then invalid, which causes lockdep warnings and could cause deadlocks: ======================================================== WARNING: possible irq lock inversion dependency detected 6.11.0-rc6+ #59 Tainted: G W -------------------------------------------------------- CPU 0/KVM/1537 just changed the state of lock: ffffa0f0cff965f0 (&vdev->irqlock){-...}-{2:2}, at: vfio_intx_handler+0x21/0xd0 [vfio_pci_core] but this lock took another, HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.}-{0:0} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(&vdev->irqlock); lock(fs_reclaim); <Interrupt> lock(&vdev->irqlock); *** DEADLOCK *** Have pcim_enable_device()'s release function, pcim_disable_device(), set pci_dev.is_managed to false so that subsequent drivers using the same struct pci_dev do not implicitly run into managed code. Link: https://lore.kernel.org/r/20240905072556.11375-2-pstanner@redhat.com Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/all/20240903094431.63551744.alex.williamson@redhat.com/ Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
2024-09-11workqueue: Clear worker->pool in the worker thread contextLai Jiangshan1-2/+6
Marc Hartmayer reported: [ 23.133876] Unable to handle kernel pointer dereference in virtual kernel address space [ 23.133950] Failing address: 0000000000000000 TEID: 0000000000000483 [ 23.133954] Fault in home space mode while using kernel ASCE. [ 23.133957] AS:000000001b8f0007 R3:0000000056cf4007 S:0000000056cf3800 P:000000000000003d [ 23.134207] Oops: 0004 ilc:2 [#1] SMP (snip) [ 23.134516] Call Trace: [ 23.134520] [<0000024e326caf28>] worker_thread+0x48/0x430 [ 23.134525] ([<0000024e326caf18>] worker_thread+0x38/0x430) [ 23.134528] [<0000024e326d3a3e>] kthread+0x11e/0x130 [ 23.134533] [<0000024e3264b0dc>] __ret_from_fork+0x3c/0x60 [ 23.134536] [<0000024e333fb37a>] ret_from_fork+0xa/0x38 [ 23.134552] Last Breaking-Event-Address: [ 23.134553] [<0000024e333f4c04>] mutex_unlock+0x24/0x30 [ 23.134562] Kernel panic - not syncing: Fatal exception: panic_on_oops With debuging and analysis, worker_thread() accesses to the nullified worker->pool when the newly created worker is destroyed before being waken-up, in which case worker_thread() can see the result detach_worker() reseting worker->pool to NULL at the begining. Move the code "worker->pool = NULL;" out from detach_worker() to fix the problem. worker->pool had been designed to be constant for regular workers and changeable for rescuer. To share attaching/detaching code for regular and rescuer workers and to avoid worker->pool being accessed inadvertently when the worker has been detached, worker->pool is reset to NULL when detached no matter the worker is rescuer or not. To maintain worker->pool being reset after detached, move the code "worker->pool = NULL;" in the worker thread context after detached. It is either be in the regular worker thread context after PF_WQ_WORKER is cleared or in rescuer worker thread context with wq_pool_attach_mutex held. So it is safe to do so. Cc: Marc Hartmayer <mhartmay@linux.ibm.com> Link: https://lore.kernel.org/lkml/87wmjj971b.fsf@linux.ibm.com/ Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Fixes: f4b7b53c94af ("workqueue: Detach workers directly in idle_cull_fn()") Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-11net: tighten bad gso csum offset check in virtio_net_hdrWillem de Bruijn1-1/+2
The referenced commit drops bad input, but has false positives. Tighten the check to avoid these. The check detects illegal checksum offload requests, which produce csum_start/csum_off beyond end of packet after segmentation. But it is based on two incorrect assumptions: 1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO. True in callers that inject into the tx path, such as tap. But false in callers that inject into rx, like virtio-net. Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal. 2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL. False, as tcp[46]_gso_segment will fix up csum_start and offset for all other ip_summed by calling __tcp_v4_send_check. Because of 2, we can limit the scope of the fix to virtio_net_hdr that do try to set these fields, with a bogus value. Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/ Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240910213553.839926-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11netlink: specs: mptcp: fix port endiannessAsbjørn Sloth Tønnesen1-1/+0
The MPTCP port attribute is in host endianness, but was documented as big-endian in the ynl specification. Below are two examples from net/mptcp/pm_netlink.c showing that the attribute is converted to/from host endianness for use with netlink. Import from netlink: addr->port = htons(nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_PORT])) Export to netlink: nla_put_u16(skb, MPTCP_PM_ADDR_ATTR_PORT, ntohs(addr->port)) Where addr->port is defined as __be16. No functional change intended. Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240911091003.1112179-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: dpaa: Pad packets to ETH_ZLENSean Anderson1-1/+8
When sending packets under 60 bytes, up to three bytes of the buffer following the data may be leaked. Avoid this by extending all packets to ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be reproduced by running $ ping -s 11 destination Fixes: 9ad1a3749333 ("dpaa_eth: add support for DPAA Ethernet") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240910143144.1439910-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11mptcp: pm: Fix uaf in __timer_delete_syncEdward Adam Davis1-4/+9
There are two paths to access mptcp_pm_del_add_timer, result in a race condition: CPU1 CPU2 ==== ==== net_rx_action napi_poll netlink_sendmsg __napi_poll netlink_unicast process_backlog netlink_unicast_kernel __netif_receive_skb genl_rcv __netif_receive_skb_one_core netlink_rcv_skb NF_HOOK genl_rcv_msg ip_local_deliver_finish genl_family_rcv_msg ip_protocol_deliver_rcu genl_family_rcv_msg_doit tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit tcp_v4_do_rcv mptcp_nl_remove_addrs_list tcp_rcv_established mptcp_pm_remove_addrs_and_subflows tcp_data_queue remove_anno_list_by_saddr mptcp_incoming_options mptcp_pm_del_add_timer mptcp_pm_del_add_timer kfree(entry) In remove_anno_list_by_saddr(running on CPU2), after leaving the critical zone protected by "pm.lock", the entry will be released, which leads to the occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). Keeping a reference to add_timer inside the lock, and calling sk_stop_timer_sync() with this reference, instead of "entry->add_timer". Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock, do not directly access any members of the entry outside the pm lock, which can avoid similar "entry->x" uaf. Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: libwx: fix number of Rx and Tx descriptorsJiawen Wu1-3/+3
The number of transmit and receive descriptors must be a multiple of 128 due to the hardware limitation. If it is set to a multiple of 8 instead of a multiple 128, the queues will easily be hung. Cc: stable@vger.kernel.org Fixes: 883b5984a5d2 ("net: wangxun: add ethtool_ops for ring parameters") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240910095629.570674-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: dsa: felix: ignore pending status of TAS module when it's disabledXiaoliang Yang1-4/+7
The TAS module could not be configured when it's running in pending status. We need disable the module and configure it again. However, the pending status is not cleared after the module disabled. TC taprio set will always return busy even it's disabled. For example, a user uses tc-taprio to configure Qbv and a future basetime. The TAS module will run in a pending status. There is no way to reconfigure Qbv, it always returns busy. Actually the TAS module can be reconfigured when it's disabled. So it doesn't need to check the pending status if the TAS module is disabled. After the patch, user can delete the tc taprio configuration to disable Qbv and reconfigure it again. Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Link: https://patch.msgid.link/20240906093550.29985-1-xiaoliang.yang_1@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: hsr: prevent NULL pointer dereference in hsr_proxy_announce()Jeongjun Park1-0/+4
In the function hsr_proxy_annouance() added in the previous commit 5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data"), the return value of the hsr_port_get_hsr() function is not checked to be a NULL pointer, which causes a NULL pointer dereference. To solve this, we need to add code to check whether the return value of hsr_port_get_hsr() is NULL. Reported-by: syzbot+02a42d9b1bd395cbcab4@syzkaller.appspotmail.com Fixes: 5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240907190341.162289-1-aha310510@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11selftests: mptcp: include net_helper.sh fileMatthieu Baerts (NGI0)1-1/+1
Similar to the previous commit, the net_helper.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: 1af3bc912eac ("selftests: mptcp: lib: use wait_local_port_listen helper") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11selftests: mptcp: include lib.sh fileMatthieu Baerts (NGI0)1-0/+2
The lib.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11selftests: mptcp: join: restrict fullmesh endp on 1st sfMatthieu Baerts (NGI0)1-1/+3
A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15. Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so. Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12netfilter: nft_socket: make cgroupsv2 matching work with namespacesFlorian Westphal1-3/+38
When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup. Example: In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 .. The expectation would be for: nft add rule .. socket cgroupv2 level 1 "foo" counter to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs". However, 'level 3' is needed to make this work. Seen from initial namespace, the complete hierarchy is: % stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 .. i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo ... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match. Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container. In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3). v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot) Cc: cgroups@vger.kernel.org Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>