Age | Commit message (Collapse) | Author | Files | Lines |
|
In SFU mode, activated by -o sfu mount option is now also support for
creating new fifos and sockets.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Linux cifs client can already detect SFU symlinks and reads it content
(target location). But currently is not able to create new symlink. So
implement this missing support.
When 'sfu' mount option is specified and 'mfsymlinks' is not specified then
create new symlinks in SFU-style. This will provide full SFU compatibility
of symlinks when mounting cifs share with 'sfu' option. 'mfsymlinks' option
override SFU for better Apple compatibility as explained in fs_context.c
file in smb3_update_mnt_flags() function.
Extend __cifs_sfu_make_node() function, which now can handle also S_IFLNK
type and refactor structures passed to sync_write() in this function, by
splitting SFU type and SFU data from original combined struct win_dev as
combined fixed-length struct cannot be used for variable-length symlinks.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD(). No functional impact.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SFU since its (first) version 3.0 supports AF_LOCAL sockets and stores them
on filesytem as system file with one zero byte. Add support for detecting
this SFU socket type into cifs_sfu_type() function.
With this change cifs_sfu_type() would correctly detect all special file
types created by SFU: fifo, socket, symlink, block and char.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
For debugging purposes it is a good idea to show detected SFU type also for
Fifo. Debug message is already print for all other special types.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SFU types IntxCHR and IntxBLK are 8 bytes with zero as last byte. Make it
explicit in memcpy and memset calls, so the zero byte is visible in the
code (and not hidden as string trailing nul byte).
It is important for reader to show the last byte for block and char types
because it differs from the last byte of symlink type (which has it 0x01).
Also it is important to show that the type is not nul-term string, but
rather 8 bytes (with some printable bytes).
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Currently when sfu mount option is specified then CIFS can recognize SFU
symlink, but is not able to read symlink target location. readlink()
syscall just returns that operation is not supported.
Implement this missing functionality in cifs_sfu_type() function. Read
target location of SFU-style symlink, parse it and fill into fattr's
cf_symlink_target member.
SFU-style symlink is file which has system attribute set and file content
is buffer "IntxLNK\1" (8th byte is 0x01) followed by the target location
encoded in little endian UCS-2/UTF-16. This format was introduced in
Interix 3.0 subsystem, as part of the Microsoft SFU 3.0 and is used also by
all later versions. Previous versions had no symlink support.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SFU symlinks have 8 byte prefix: "IntxLNK\1".
So check also the last 8th byte 0x01.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Using uninitialized value "bkt" when calling "kfree"
Fixes: 13b68d44990d ("smb: client: compress: LZ77 code improvements cleanup")
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The dst pointer may not be initialized when calling kvfree(dst)
Fixes: 13b68d44990d9 ("smb: client: compress: LZ77 code improvements cleanup")
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
- Check data compressibility with some heuristics (copied from
btrfs):
- should_compress() final decision is is_compressible(data)
- Cleanup compress/lz77.h leaving only lz77_compress() exposed:
- Move parts to compress/lz77.c, while removing the rest of it
because they were either unused, used only once, were
implemented wrong (thanks to David Howells for the help)
- Updated the compression parameters (still compatible with
Windows implementation) trading off ~20% compression ratio
for ~40% performance:
- min match len: 3 -> 4
- max distance: 8KiB -> 1KiB
- hash table type: u32 * -> u64 *
Known bugs:
This implementation currently works fine in general, but breaks with
some payloads used during testing. Investigation ongoing, to be
fixed in a next commit.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
On smb2_async_writev(), set CIFS_COMPRESS_REQ on request flags if
should_compress() returns true.
On smb_send_rqst() check the flags, and compress and send the request to
the server.
(*) If the compression fails with -EMSGSIZE (i.e. compressed size is >=
uncompressed size), the original uncompressed request is sent instead.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Move SMB3.1.1 compression code into experimental config option,
and fix the compress mount option. Implement unchained LZ77
"plain" compression algorithm as per MS-XCA specification
section "2.3 Plain LZ77 Compression Algorithm Details".
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The cifs_dir_open() have been removed since
commit 737b758c965a ("[PATCH] cifs: character mapping of special
characters (part 3 of 3)"), and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Use the min() macro to simplify the function and improve
its readability.
Signed-off-by: Shen Lichuan <shenlichuan@vivo.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Use ERR_CAST() as it is designed for casting an error pointer to
another type.
This macro uses the __force and __must_check modifiers, which are used
to tell the compiler to check for errors where this macro is used.
Signed-off-by: Yuesong Li <liyuesong@vivo.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Explained why the typo was not corrected.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There are only 4 different definitions between the client and server:
- STATUS_SERVER_UNAVAILABLE: from client/smb2status.h
- STATUS_FILE_NOT_AVAILABLE: from client/smb2status.h
- STATUS_NO_PREAUTH_INTEGRITY_HASH_OVERLAP: from server/smbstatus.h
- STATUS_INVALID_LOCK_RANGE: from server/smbstatus.h
Rename client/smb2status.h to common/smb2status.h, and merge the
2 different definitions of server to common header file.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In order to maintain the code more easily, move duplicate definitions
to new common header file.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Preparation for moving acl definitions to new common header file.
Use the following shell command to rename:
find fs/smb/client -type f -exec sed -i \
's/struct cifs_ace/struct smb_ace/g' {} +
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Preparation for moving acl definitions to new common header file.
Use the following shell command to rename:
find fs/smb/client -type f -exec sed -i \
's/struct cifs_acl/struct smb_acl/g' {} +
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Preparation for moving acl definitions to new common header file.
Use the following shell command to rename:
find fs/smb/client -type f -exec sed -i \
's/struct cifs_sid/struct smb_sid/g' {} +
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Preparation for moving acl definitions to new common header file.
Use the following shell command to rename:
find fs/smb/client -type f -exec sed -i \
's/struct cifs_ntsd/struct smb_ntsd/g' {} +
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
|
|
This reverts commit 377b2f359d1f71c75f8cc352b5c81f2210312d83.
This caused a regression with the bochsdrm driver, which used ioremap()
instead of ioremap_wc() to map the video RAM. After the commit, the
WB memory type is used without the IGNORE_PAT, resulting in the slower
UC memory type. In fact, UC is slow enough to basically cause guests
to not boot... but only on new processors such as Sapphire Rapids and
Cascade Lake. Coffee Lake for example works properly, though that might
also be an effect of being on a larger, more NUMA system.
The driver has been fixed but that does not help older guests. Until we
figure out whether Cascade Lake and newer processors are working as
intended, revert the commit. Long term we might add a quirk, but the
details depend on whether the processors are working as intended: for
example if they are, the quirk might reference bochs-compatible devices,
e.g. in the name and documentation, so that userspace can disable the
quirk by default and only leave it enabled if such a device is being
exposed to the guest.
If instead this is actually a bug in CLX+, then the actions we need to
take are different and depend on the actual cause of the bug.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The size of the mux stride was off by one, which could result in
invalid pin configuration on the device side or invalid state
readings on the software side.
While on it also update the code and:
- Increase the mux stride size to 16
- Align the virtual muxed regmap range to 16
- Start the regmap window at the selector
- Mark reserved registers as not-readable
Fixes: 8670de9fae49 ("pinctrl: cy8c95x0: Use regmap ranges")
Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com>
Reported-by: Andy Shevchenko <andy@kernel.org>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://lore.kernel.org/20240902072859.583490-1-patrick.rudolph@9elements.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Fix the calculation of packet signatures by adding the offset into a page
in the read or write data payload when hashing the pages from it.
Fixes: 39bc58203f04 ("cifs: Add a function to Hash the contents of an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
As Jann points out, PFN mappings are special, because unlike normal
memory mappings, there is no lifetime information associated with the
mapping - it is just a raw mapping of PFNs with no reference counting of
a 'struct page'.
That's all very much intentional, but it does mean that it's easy to
mess up the cleanup in case of errors. Yes, a failed mmap() will always
eventually clean up any partial mappings, but without any explicit
lifetime in the page table mapping itself, it's very easy to do the
error handling in the wrong order.
In particular, it's easy to mistakenly free the physical backing store
before the page tables are actually cleaned up and (temporarily) have
stale dangling PTE entries.
To make this situation less error-prone, just make sure that any partial
pfn mapping is torn down early, before any other error handling.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
bo_meminfo() wants to inspect bo state like tt and the ttm resource,
however this state can change at any point leading to stuff like NPD and
UAF, if the bo lock is not held. Grab the bo lock when calling
bo_meminfo(), ensuring we drop any spinlocks first. In the case of
object_idr we now also need to hold a ref.
v2 (MattB)
- Also add xe_bo_assert_held()
Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-6-matthew.auld@intel.com
(cherry picked from commit 4f63d712fa104c3ebefcb289d1e733e86d8698c7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
There is a real deadlock as well as sleeping in atomic() bug in here, if
the bo put happens to be the last ref, since bo destruction wants to
grab the same spinlock and sleeping locks. Fix that by dropping the ref
using xe_bo_put_deferred(), and moving the final commit outside of the
lock. Dropping the lock around the put is tricky since the bo can go
out of scope and delete itself from the list, making it difficult to
navigate to the next list entry.
Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-5-matthew.auld@intel.com
(cherry picked from commit 0083b8e6f11d7662283a267d4ce7c966812ffd8a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Enable Xe2+ PES disaggregation (for OAG) to retrieve disaggregated metrics
when disaggregated data is needed. Userspace can select whether to receive
aggregated or disaggregated metrics via the particular OA configuration it
uses (programmed via DRM_XE_OBSERVATION_OP_ADD_CONFIG).
Bspec: 61101
Fixes: e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240909165933.2638765-1-ashutosh.dixit@intel.com
Cc: stable@vger.kernel.org
(cherry picked from commit fb2551a0e93897aec7fb3d4f473ebc06b146d160)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
It's supposed to be an open range at the end like in i915. Fingers
crossed that nobody relies on this definition.
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fe8743770694e429f6902491cdb306c97bdf701a.1724180287.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 453afb1a439994deeacb8d9ecbb48c1f2348ea0a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Check size of the data not size of the pointer.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407300421.IBkAja96-lkp@intel.com/
Fixes: ddeb7989a98f ("drm/xe: Validate user fence during creation")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Apoorva Singh <apoorva.singh@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240806110722.28661-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
(cherry picked from commit e102b5ed6e283a144793cab8fcd95f61d0ddbadb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Store xe_device ahead of processing message as message can be free'd in
some cases.
v2:
- Including missing local changes
v3:
- Resend for CI
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202407231445.rpisd1vA-lkp@intel.com/
Fixes: 55ea73aacfb9 ("drm/xe: Build PM into GuC CT layer")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240724164341.1848954-1-matthew.brost@intel.com
(cherry picked from commit 1a394b4f504f33eac8c38b6f42ba025105c7e869)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
'fence' argument in send_tlb_invalidation cannot be NULL, remove
non-NULL check from send_tlb_invalidation.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202407231049.esig0Fkb-lkp@intel.com/
Fixes: 58bfe6674467 ("drm/xe: Drop xe_gt_tlb_invalidation_wait")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240723190714.1744653-1-matthew.brost@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
(cherry picked from commit 6482253e6e1ad1c3a76645a3899d3cfdb5b918cb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The header generated/xe_wa_oob.h is included twice. Remove one.
Fixes: 27cb2b7fec2a ("drm/xe/bmg: implement Wa_16023588340")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202407052122.AzuWSPuo-lkp@intel.com/
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708173301.1543871-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 3d122660dc70029d9cccb4e8670125f0affa959e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Move nf flowtable bpf initialization in nf_flow_table module load
routine since nf_flow_table_bpf is part of nf_flow_table module and not
nf_flow_table_inet one. This patch allows to avoid the following kernel
warning running the reproducer below:
$modprobe nf_flow_table_inet
$rmmod nf_flow_table_inet
$modprobe nf_flow_table_inet
modprobe: ERROR: could not insert 'nf_flow_table_inet': Invalid argument
[ 184.081501] ------------[ cut here ]------------
[ 184.081527] WARNING: CPU: 0 PID: 1362 at kernel/bpf/btf.c:8206 btf_populate_kfunc_set+0x23c/0x330
[ 184.081550] CPU: 0 UID: 0 PID: 1362 Comm: modprobe Kdump: loaded Not tainted 6.11.0-0.rc5.22.el10.x86_64 #1
[ 184.081553] Hardware name: Red Hat OpenStack Compute, BIOS 1.14.0-1.module+el8.4.0+8855+a9e237a9 04/01/2014
[ 184.081554] RIP: 0010:btf_populate_kfunc_set+0x23c/0x330
[ 184.081558] RSP: 0018:ff22cfb38071fc90 EFLAGS: 00010202
[ 184.081559] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
[ 184.081560] RDX: 000000000000006e RSI: ffffffff95c00000 RDI: ff13805543436350
[ 184.081561] RBP: ffffffffc0e22180 R08: ff13805543410808 R09: 000000000001ec00
[ 184.081562] R10: ff13805541c8113c R11: 0000000000000010 R12: ff13805541b83c00
[ 184.081563] R13: ff13805543410800 R14: 0000000000000001 R15: ffffffffc0e2259a
[ 184.081564] FS: 00007fa436c46740(0000) GS:ff1380557ba00000(0000) knlGS:0000000000000000
[ 184.081569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 184.081570] CR2: 000055e7b3187000 CR3: 0000000100c48003 CR4: 0000000000771ef0
[ 184.081571] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 184.081572] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 184.081572] PKRU: 55555554
[ 184.081574] Call Trace:
[ 184.081575] <TASK>
[ 184.081578] ? show_trace_log_lvl+0x1b0/0x2f0
[ 184.081580] ? show_trace_log_lvl+0x1b0/0x2f0
[ 184.081582] ? __register_btf_kfunc_id_set+0x199/0x200
[ 184.081585] ? btf_populate_kfunc_set+0x23c/0x330
[ 184.081586] ? __warn.cold+0x93/0xed
[ 184.081590] ? btf_populate_kfunc_set+0x23c/0x330
[ 184.081592] ? report_bug+0xff/0x140
[ 184.081594] ? handle_bug+0x3a/0x70
[ 184.081596] ? exc_invalid_op+0x17/0x70
[ 184.081597] ? asm_exc_invalid_op+0x1a/0x20
[ 184.081601] ? btf_populate_kfunc_set+0x23c/0x330
[ 184.081602] __register_btf_kfunc_id_set+0x199/0x200
[ 184.081605] ? __pfx_nf_flow_inet_module_init+0x10/0x10 [nf_flow_table_inet]
[ 184.081607] do_one_initcall+0x58/0x300
[ 184.081611] do_init_module+0x60/0x230
[ 184.081614] __do_sys_init_module+0x17a/0x1b0
[ 184.081617] do_syscall_64+0x7d/0x160
[ 184.081620] ? __count_memcg_events+0x58/0xf0
[ 184.081623] ? handle_mm_fault+0x234/0x350
[ 184.081626] ? do_user_addr_fault+0x347/0x640
[ 184.081630] ? clear_bhb_loop+0x25/0x80
[ 184.081633] ? clear_bhb_loop+0x25/0x80
[ 184.081634] ? clear_bhb_loop+0x25/0x80
[ 184.081637] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 184.081639] RIP: 0033:0x7fa43652e4ce
[ 184.081647] RSP: 002b:00007ffe8213be18 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 184.081649] RAX: ffffffffffffffda RBX: 000055e7b3176c20 RCX: 00007fa43652e4ce
[ 184.081650] RDX: 000055e7737fde79 RSI: 0000000000003990 RDI: 000055e7b3185380
[ 184.081651] RBP: 000055e7737fde79 R08: 0000000000000007 R09: 000055e7b3179bd0
[ 184.081651] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000040000
[ 184.081652] R13: 000055e7b3176fa0 R14: 0000000000000000 R15: 000055e7b3179b80
Fixes: 391bb6594fd3 ("netfilter: Add bpf_xdp_flow_lookup kfunc")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20240911-nf-flowtable-bpf-modprob-fix-v1-1-f9fc075aafc3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
25216afc9db5 ("PCI: Add managed pcim_intx()") moved the allocation step for
pci_intx()'s device resource from pcim_enable_device() to pcim_intx(). As
before, pcim_enable_device() sets pci_dev.is_managed to true; and it is
never set to false again.
Due to the lifecycle of a struct pci_dev, it can happen that a second
driver obtains the same pci_dev after a first driver ran. If one driver
uses pcim_enable_device() and the other doesn't, this causes the other
driver to run into managed pcim_intx(), which will try to allocate when
called for the first time.
Allocations might sleep, so calling pci_intx() while holding spinlocks
becomes then invalid, which causes lockdep warnings and could cause
deadlocks:
========================================================
WARNING: possible irq lock inversion dependency detected
6.11.0-rc6+ #59 Tainted: G W
--------------------------------------------------------
CPU 0/KVM/1537 just changed the state of lock:
ffffa0f0cff965f0 (&vdev->irqlock){-...}-{2:2}, at:
vfio_intx_handler+0x21/0xd0 [vfio_pci_core] but this lock took another,
HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.}-{0:0}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
local_irq_disable();
lock(&vdev->irqlock);
lock(fs_reclaim);
<Interrupt>
lock(&vdev->irqlock);
*** DEADLOCK ***
Have pcim_enable_device()'s release function, pcim_disable_device(), set
pci_dev.is_managed to false so that subsequent drivers using the same
struct pci_dev do not implicitly run into managed code.
Link: https://lore.kernel.org/r/20240905072556.11375-2-pstanner@redhat.com
Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/all/20240903094431.63551744.alex.williamson@redhat.com/
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Marc Hartmayer reported:
[ 23.133876] Unable to handle kernel pointer dereference in virtual kernel address space
[ 23.133950] Failing address: 0000000000000000 TEID: 0000000000000483
[ 23.133954] Fault in home space mode while using kernel ASCE.
[ 23.133957] AS:000000001b8f0007 R3:0000000056cf4007 S:0000000056cf3800 P:000000000000003d
[ 23.134207] Oops: 0004 ilc:2 [#1] SMP
(snip)
[ 23.134516] Call Trace:
[ 23.134520] [<0000024e326caf28>] worker_thread+0x48/0x430
[ 23.134525] ([<0000024e326caf18>] worker_thread+0x38/0x430)
[ 23.134528] [<0000024e326d3a3e>] kthread+0x11e/0x130
[ 23.134533] [<0000024e3264b0dc>] __ret_from_fork+0x3c/0x60
[ 23.134536] [<0000024e333fb37a>] ret_from_fork+0xa/0x38
[ 23.134552] Last Breaking-Event-Address:
[ 23.134553] [<0000024e333f4c04>] mutex_unlock+0x24/0x30
[ 23.134562] Kernel panic - not syncing: Fatal exception: panic_on_oops
With debuging and analysis, worker_thread() accesses to the nullified
worker->pool when the newly created worker is destroyed before being
waken-up, in which case worker_thread() can see the result detach_worker()
reseting worker->pool to NULL at the begining.
Move the code "worker->pool = NULL;" out from detach_worker() to fix the
problem.
worker->pool had been designed to be constant for regular workers and
changeable for rescuer. To share attaching/detaching code for regular
and rescuer workers and to avoid worker->pool being accessed inadvertently
when the worker has been detached, worker->pool is reset to NULL when
detached no matter the worker is rescuer or not.
To maintain worker->pool being reset after detached, move the code
"worker->pool = NULL;" in the worker thread context after detached.
It is either be in the regular worker thread context after PF_WQ_WORKER
is cleared or in rescuer worker thread context with wq_pool_attach_mutex
held. So it is safe to do so.
Cc: Marc Hartmayer <mhartmay@linux.ibm.com>
Link: https://lore.kernel.org/lkml/87wmjj971b.fsf@linux.ibm.com/
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: f4b7b53c94af ("workqueue: Detach workers directly in idle_cull_fn()")
Cc: stable@vger.kernel.org # v6.11+
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The referenced commit drops bad input, but has false positives.
Tighten the check to avoid these.
The check detects illegal checksum offload requests, which produce
csum_start/csum_off beyond end of packet after segmentation.
But it is based on two incorrect assumptions:
1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO.
True in callers that inject into the tx path, such as tap.
But false in callers that inject into rx, like virtio-net.
Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or
CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal.
2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL.
False, as tcp[46]_gso_segment will fix up csum_start and offset for
all other ip_summed by calling __tcp_v4_send_check.
Because of 2, we can limit the scope of the fix to virtio_net_hdr
that do try to set these fields, with a bogus value.
Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20240910213553.839926-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MPTCP port attribute is in host endianness, but was documented
as big-endian in the ynl specification.
Below are two examples from net/mptcp/pm_netlink.c showing that the
attribute is converted to/from host endianness for use with netlink.
Import from netlink:
addr->port = htons(nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_PORT]))
Export to netlink:
nla_put_u16(skb, MPTCP_PM_ADDR_ATTR_PORT, ntohs(addr->port))
Where addr->port is defined as __be16.
No functional change intended.
Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp")
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240911091003.1112179-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When sending packets under 60 bytes, up to three bytes of the buffer
following the data may be leaked. Avoid this by extending all packets to
ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be
reproduced by running
$ ping -s 11 destination
Fixes: 9ad1a3749333 ("dpaa_eth: add support for DPAA Ethernet")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910143144.1439910-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Keeping a reference to add_timer inside the lock, and calling
sk_stop_timer_sync() with this reference, instead of "entry->add_timer".
Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock,
do not directly access any members of the entry outside the pm lock, which
can avoid similar "entry->x" uaf.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of transmit and receive descriptors must be a multiple of 128
due to the hardware limitation. If it is set to a multiple of 8 instead of
a multiple 128, the queues will easily be hung.
Cc: stable@vger.kernel.org
Fixes: 883b5984a5d2 ("net: wangxun: add ethtool_ops for ring parameters")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240910095629.570674-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The TAS module could not be configured when it's running in pending
status. We need disable the module and configure it again. However, the
pending status is not cleared after the module disabled. TC taprio set
will always return busy even it's disabled.
For example, a user uses tc-taprio to configure Qbv and a future
basetime. The TAS module will run in a pending status. There is no way
to reconfigure Qbv, it always returns busy.
Actually the TAS module can be reconfigured when it's disabled. So it
doesn't need to check the pending status if the TAS module is disabled.
After the patch, user can delete the tc taprio configuration to disable
Qbv and reconfigure it again.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Link: https://patch.msgid.link/20240906093550.29985-1-xiaoliang.yang_1@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the function hsr_proxy_annouance() added in the previous commit
5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network
with ProxyNodeTable data"), the return value of the hsr_port_get_hsr()
function is not checked to be a NULL pointer, which causes a NULL
pointer dereference.
To solve this, we need to add code to check whether the return value
of hsr_port_get_hsr() is NULL.
Reported-by: syzbot+02a42d9b1bd395cbcab4@syzkaller.appspotmail.com
Fixes: 5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Lukasz Majewski <lukma@denx.de>
Link: https://patch.msgid.link/20240907190341.162289-1-aha310510@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Similar to the previous commit, the net_helper.sh file from the parent
directory is used by the MPTCP selftests and it needs to be present when
running the tests.
This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:
make -C tools/testing/selftests \
TARGETS=net/mptcp \
install INSTALL_PATH=$KSFT_INSTALL_PATH
cd $KSFT_INSTALL_PATH
./run_kselftest.sh -c net/mptcp
Fixes: 1af3bc912eac ("selftests: mptcp: lib: use wait_local_port_listen helper")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The lib.sh file from the parent directory is used by the MPTCP selftests
and it needs to be present when running the tests.
This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:
make -C tools/testing/selftests \
TARGETS=net/mptcp \
install INSTALL_PATH=$KSFT_INSTALL_PATH
cd $KSFT_INSTALL_PATH
./run_kselftest.sh -c net/mptcp
Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A new endpoint using the IP of the initial subflow has been recently
added to increase the code coverage. But it breaks the test when using
old kernels not having commit 86e39e04482b ("mptcp: keep track of local
endpoint still available for each msk"), e.g. on v5.15.
Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local
endpoint being tracked or not"), it is possible to add the new endpoint
conditionally, by checking if "mptcp_pm_subflow_check_next" is present
in kallsyms: this is not directly linked to the commit introducing this
symbol but for the parent one which is linked anyway. So we can know in
advance what will be the expected behaviour, and add the new endpoint
only when it makes sense to do so.
Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When running in container environmment, /sys/fs/cgroup/ might not be
the real root node of the sk-attached cgroup.
Example:
In container:
% stat /sys//fs/cgroup/
Device: 0,21 Inode: 2214 ..
% stat /sys/fs/cgroup/foo
Device: 0,21 Inode: 2264 ..
The expectation would be for:
nft add rule .. socket cgroupv2 level 1 "foo" counter
to match traffic from a process that got added to "foo" via
"echo $pid > /sys/fs/cgroup/foo/cgroup.procs".
However, 'level 3' is needed to make this work.
Seen from initial namespace, the complete hierarchy is:
% stat /sys/fs/cgroup/system.slice/docker-.../foo
Device: 0,21 Inode: 2264 ..
i.e. hierarchy is
0 1 2 3
/ -> system.slice -> docker-1... -> foo
... but the container doesn't know that its "/" is the "docker-1.."
cgroup. Current code will retrieve the 'system.slice' cgroup node
and store its kn->id in the destination register, so compare with
2264 ("foo" cgroup id) will not match.
Fetch "/" cgroup from ->init() and add its level to the level we try to
extract. cgroup root-level is 0 for the init-namespace or the level
of the ancestor that is exposed as the cgroup root inside the container.
In the above case, cgrp->level of "/" resolved in the container is 2
(docker-1...scope/) and request for 'level 1' will get adjusted
to fetch the actual level (3).
v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it.
(kernel test robot)
Cc: cgroups@vger.kernel.org
Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2")
Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|