Age | Commit message (Collapse) | Author | Files | Lines |
|
e500 KVM tries to bypass __kvm_faultin_pfn() in order to map VM_PFNMAP
VMAs as huge pages. This is a Bad Idea because VM_PFNMAP VMAs could
become noncontiguous as a result of callsto remap_pfn_range().
Instead, use the already existing host PTE lookup to retrieve a
valid host-side mapping level after __kvm_faultin_pfn() has
returned. Then find the largest size that will satisfy the
guest's request while staying within a single host PTE.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The new __kvm_faultin_pfn() function is upset by the fact that e500 KVM
ignores host page permissions - __kvm_faultin requires a "writable"
outgoing argument, but e500 KVM is nonchalantly passing NULL.
If the host page permissions do not include writability, the shadow
TLB entry is forcibly mapped read-only.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add the possibility of marking a page so that the UW and SW bits are
force-cleared. This is stored in the private info so that it persists
across multiple calls to kvmppc_e500_setup_stlbe.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvmppc_e500_ref_setup is returning whether the guest TLB entry is writable,
which is than passed to kvm_release_faultin_page. This makes little sense
for two reasons: first, because the function sets up the private data for
the page and the return value feels like it has been bolted on the side;
second, because what really matters is whether the _shadow_ TLB entry is
writable. If it is not writable, the page can be released as non-dirty.
Shift from using tlbe_is_writable(gtlbe) to doing the same check on
the shadow TLB entry.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If find_linux_pte fails, IRQs will not be restored. This is unlikely
to happen in practice since it would have been reported as hanging
hosts, but it should of course be fixed anyway.
Cc: stable@vger.kernel.org
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
exynos_bus_parse_of() still declares a parameter struct device_node that
is not used yet. This parameter is unnecessary and should be removed.
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
|
|
Maddy is taking over the day-to-day maintenance of powerpc. I will still
be around to help, and as a backup.
Re-order the main POWERPC list to put Maddy first to reflect that.
KVM/powerpc patches will be handled by Maddy via the powerpc tree with
review from Nick, so replace myself with Maddy there.
Remove myself from BPF, leaving Hari & Christophe as maintainers.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In some cases, when password2 becomes the working password, the
client swaps the two password fields in the root session struct, but
not in the smb3_fs_context struct in cifs_sb. DFS automounts inherit
fs context from their parent mounts. Therefore, they might end up
getting the passwords in the stale order.
The automount should succeed, because the mount function will end up
retrying with the actual password anyway. But to reduce these
unnecessary session setup retries for automounts, we can sync the
parent context's passwords with the root session's passwords before
duplicating it to the child's fs context.
Cc: stable@vger.kernel.org
Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
With the consolidation of put_prev_task/set_next_task(), see
commit 436f3eed5c69 ("sched: Combine the last put_prev_task() and the
first set_next_task()"), we are now skipping the transition between
these two functions when the previous and the next tasks are the same.
As a result, the scx idle state of a CPU is updated only when
transitioning to or from the idle thread. While this is generally
correct, it can lead to uneven and inefficient core utilization in
certain scenarios [1].
A typical scenario involves proactive wake-ups: scx_bpf_pick_idle_cpu()
selects and marks an idle CPU as busy, followed by a wake-up via
scx_bpf_kick_cpu(), without dispatching any tasks. In this case, the CPU
continues running the idle thread, returns to idle, but remains marked
as busy, preventing it from being selected again as an idle CPU (until a
task eventually runs on it and releases the CPU).
For example, running a workload that uses 20% of each CPU, combined with
an scx scheduler using proactive wake-ups, results in the following core
utilization:
CPU 0: 25.7%
CPU 1: 29.3%
CPU 2: 26.5%
CPU 3: 25.5%
CPU 4: 0.0%
CPU 5: 25.5%
CPU 6: 0.0%
CPU 7: 10.5%
To address this, refresh the idle state also in pick_task_idle(), during
idle-to-idle transitions, but only trigger ops.update_idle() on actual
state changes to prevent unnecessary updates to the scx scheduler and
maintain balanced state transitions.
With this change in place, the core utilization in the previous example
becomes the following:
CPU 0: 18.8%
CPU 1: 19.4%
CPU 2: 18.0%
CPU 3: 18.7%
CPU 4: 19.3%
CPU 5: 18.9%
CPU 6: 18.7%
CPU 7: 19.3%
[1] https://github.com/sched-ext/scx/pull/1139
Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
With IORING_SETUP_SQPOLL all requests are created by the SQPOLL task,
which means that req->task should always match sqd->thread. Since
accesses to sqd->thread should be separately protected, use req->task
in io_req_normal_work_add() instead.
Note, in the eyes of io_req_normal_work_add(), the SQPOLL task struct
is always pinned and alive, and sqd->thread can either be the task or
NULL. It's only problematic if the compiler decides to reload the value
after the null check, which is not so likely.
Cc: stable@vger.kernel.org
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Reported-by: lizetao <lizetao1@huawei.com>
Fixes: 78f9b61bd8e54 ("io_uring: wake SQPOLL task when task_work is added to an empty queue")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1cbbe72cf32c45a8fee96026463024cd8564a7d7.1736541357.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Syzkeller reports:
BUG: KASAN: slab-use-after-free in thread_group_cputime+0x409/0x700 kernel/sched/cputime.c:341
Read of size 8 at addr ffff88803578c510 by task syz.2.3223/27552
Call Trace:
<TASK>
...
kasan_report+0x143/0x180 mm/kasan/report.c:602
thread_group_cputime+0x409/0x700 kernel/sched/cputime.c:341
thread_group_cputime_adjusted+0xa6/0x340 kernel/sched/cputime.c:639
getrusage+0x1000/0x1340 kernel/sys.c:1863
io_uring_show_fdinfo+0xdfe/0x1770 io_uring/fdinfo.c:197
seq_show+0x608/0x770 fs/proc/fd.c:68
...
That's due to sqd->task not being cleared properly in cases where
SQPOLL task tctx setup fails, which can essentially only happen with
fault injection to insert allocation errors.
Cc: stable@vger.kernel.org
Fixes: 1251d2025c3e1 ("io_uring/sqpoll: early exit thread if task_context wasn't allocated")
Reported-by: syzbot+3d92cfcfa84070b0a470@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/efc7ec7010784463b2e7466d7b5c02c2cb381635.1736519461.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
delayed_work submitted to an offlined cpu, will not get executed,
after the specified delay if the cpu remains offline. If the cpu
never comes online the work will never get executed.
checking for online cpu in __queue_delayed_work, does not sound
like a good idea because to do this reliably we need hotplug lock
and since work may be submitted from atomic contexts, we would
have to use cpus_read_trylock. But if trylock fails we would queue
the work on any cpu and this may not be optimal because our intended
cpu might still be online.
Putting a WARN_ON_ONCE for an already offlined cpu, will indicate users
of queue_delayed_work_on, if they are (wrongly) trying to queue
delayed_work on offlined cpu. Also indicate the problem of using
offlined cpu with queue_delayed_work_on, in its description.
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
It no longer has users.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250107162743.GA18947@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that poll_wait() provides a full barrier we can remove smp_mb() from
sock_poll_wait().
Also, the poll_does_not_wait() check before poll_wait() just adds the
unnecessary confusion, kill it. poll_wait() does the same "p && p->_qproc"
check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250107162736.GA18944@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that poll_wait() provides a full barrier we can remove smp_rmb() from
io_uring_poll().
In fact I don't think smp_rmb() was correct, it can't serialize LOADs and
STOREs.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250107162730.GA18940@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This check is historical and no longer needed, wait_address is never NULL.
These days we rely on the poll_table->_qproc check. NULL if select/poll
is not going to sleep, or it already has a data to report, or all waiters
have already been registered after the 1st iteration.
However, poll_table *p can be NULL, see p9_fd_poll() for example, so we
can't remove the "p != NULL" check.
Link: https://lore.kernel.org/all/20250106180325.GF7233@redhat.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250107162724.GA18926@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
As the comment above waitqueue_active() explains, it can only be used
if both waker and waiter have mb()'s that pair with each other. However
__pollwait() is broken in this respect.
This is not pipe-specific, but let's look at pipe_poll() for example:
poll_wait(...); // -> __pollwait() -> add_wait_queue()
LOAD(pipe->head);
LOAD(pipe->head);
In theory these LOAD()'s can leak into the critical section inside
add_wait_queue() and can happen before list_add(entry, wq_head), in this
case pipe_poll() can race with wakeup_pipe_readers/writers which do
smp_mb();
if (waitqueue_active(wq_head))
wake_up_interruptible(wq_head);
There are more __pollwait()-like functions (grep init_poll_funcptr), and
it seems that at least ep_ptable_queue_proc() has the same problem, so the
patch adds smp_mb() into poll_wait().
Link: https://lore.kernel.org/all/20250102163320.GA17691@redhat.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250107162717.GA18922@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We have to lock the buffer before we can delete the dquot log item from
the buffer's log item list.
Cc: stable@vger.kernel.org # v6.13-rc3
Fixes: acc8f8628c3737 ("xfs: attach dquot buffer to dquot log item buffer")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
In the previous commit referenced below, I had to split
the short fops handling into different proxy fops. This
necessitated knowing out-of-band whether or not the ops
are short or full, when attempting to convert from fops
to allocated fsdata.
Unfortunately, I only converted full_proxy_open() which
is used for the new full_proxy_open_regular() and
full_proxy_open_short(), but forgot about the call in
open_proxy_open(), used for debugfs_create_file_unsafe().
Fix that, it never has short fops.
Fixes: f8f25893a477 ("fs: debugfs: differentiate short fops with proxy ops")
Reported-by: Suresh Kumar Kurmi <suresh.kumar.kurmi@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202501101055.bb8bf3e7-lkp@intel.com
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20250110085826.cd74f3b7a36b.I430c79c82ec3f954c2ff9665753bf6ac9e63eef8@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Max Makarov reported kernel panic [1] in perf user callchain code.
The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
- after current->utask is freed and
- before current->utask is set to NULL
general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
...
? die_addr+0x36/0x90
? exc_general_protection+0x217/0x420
? asm_exc_general_protection+0x26/0x30
? is_uprobe_at_func_entry+0x28/0x80
perf_callchain_user+0x20a/0x360
get_perf_callchain+0x147/0x1d0
bpf_get_stackid+0x60/0x90
bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
? __smp_call_single_queue+0xad/0x120
bpf_overflow_handler+0x75/0x110
...
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:__kmem_cache_free+0x1cb/0x350
...
? uprobe_free_utask+0x62/0x80
? acct_collect+0x4c/0x220
uprobe_free_utask+0x62/0x80
mm_release+0x12/0xb0
do_exit+0x26b/0xaa0
__x64_sys_exit+0x1b/0x20
do_syscall_64+0x5a/0x80
It can be easily reproduced by running following commands in
separate terminals:
# while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done
# bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'
Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.
[1] https://github.com/grafana/pyroscope/issues/3673
Fixes: cfa7f3d2c526 ("perf,x86: avoid missing caller address in stack traces captured in uprobe")
Reported-by: Max Makarov <maxpain@linux.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250109141440.2692173-1-jolsa@kernel.org
|
|
In __trace_kprobe_create(), if something fails it must goto error block
to free objects. But when strdup() a symbol, it returns without that.
Fix it to goto the error block to free objects correctly.
Link: https://lore.kernel.org/all/173643297743.1514810.2408159540454241947.stgit@devnote2/
Fixes: 6212dd29683e ("tracing/kprobes: Use dyn_event framework for kprobe events")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Remove misplaced colon in stm32_firewall_get_firewall()
which results in a syntax error when the code is compiled
without CONFIG_STM32_FIREWALL.
Fixes: 5c9668cfc6d7 ("firewall: introduce stm32_firewall framework")
Signed-off-by: guanjing <guanjing@cmss.chinamobile.com>
Reviewed-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The SBI specification allows only lower 48bits of hpmeventX to be
configured via SBI PMU. Currently, the driver masks of the higher
bits but doesn't return an error. This will lead to an additional
SBI call for config matching which should return for an invalid
event error in most of the cases.
However, if a platform(i.e Rocket and sifive cores) implements a
bitmap of all bits in the event encoding this will lead to an
incorrect event being programmed leading to user confusion.
Report the error to the user if higher bits are set during the
event mapping itself to avoid the confusion and save an additional
SBI call.
Suggested-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-3-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
If the upper two bits has an invalid valid (0x1), the event mapping
is not reliable as it returns an uninitialized variable.
Return appropriate value for the default case.
Fixes: f0c9363db2dd ("perf/riscv-sbi: Add platform specific firmware event handling")
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-2-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Platform firmware event data field is allowed to be 62 bits for
Linux as uppper most two bits are reserved to indicate SBI fw or
platform specific firmware events.
However, the event data field is masked as per the hardware raw
event mask which is not correct.
Fix the platform firmware event data field with proper mask.
Fixes: f0c9363db2dd ("perf/riscv-sbi: Add platform specific firmware event handling")
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-1-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add the test count to drop the warning message.
"Planned tests != run tests (0 != 1)"
Fixes: 7cf6198ce22d ("selftests: Test RISC-V Vector prctl interface")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andy Chiu <AndybnAC@gmail.com>
Link: https://lore.kernel.org/r/20241220091730.28006-3-yongxuan.wang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add the pass message after we successfully complete the test.
Fixes: 5c93c4c72fbc ("selftests: Test RISC-V Vector's first-use handler")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andy Chiu <AndybnAC@gmail.com>
Link: https://lore.kernel.org/r/20241220091730.28006-2-yongxuan.wang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is
used.
Fixes: d1e462a7a5f3 ("sctp: add probe_interval in sysctl and sock/asoc/transport")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-8-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-7-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-6-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-5-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is
used.
Fixes: 3c68198e7511 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-4-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'pernet' structure can be obtained from the table->data using
container_of().
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-3-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using the 'net' structure via 'current' is not recommended for different
reasons.
First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table->data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.
Another reason is that access to current->nsproxy->netns can oops if
attempted when current->nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
__kernel_write_iter+0x318/0xa80 fs/read_write.c:612
__kernel_write+0xf6/0x140 fs/read_write.c:632
do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
acct_pin_kill+0x2d/0x100 kernel/acct.c:192
pin_kill+0x194/0x7c0 fs/fs_pin.c:44
mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
task_work_run+0x14e/0x250 kernel/task_work.c:239
exit_task_work include/linux/task_work.h:43 [inline]
do_exit+0xad8/0x2d70 kernel/exit.c:938
do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
get_signal+0x2576/0x2610 kernel/signal.c:3017
arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fee3cb87a6a
Code: Unable to access opcode bytes at 0x7fee3cb87a40.
RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1)
5: 0f 85 fe 02 00 00 jne 0x309
b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12
12: 00
13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
1a: fc ff df
1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi
22: 48 89 fa mov %rdi,%rdx
25: 48 c1 ea 03 shr $0x3,%rdx
* 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
2d: 0f 85 cc 02 00 00 jne 0x2ff
33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15
38: 48 rex.W
39: 8d .byte 0x8d
3a: 84 24 c8 test %ah,(%rax,%rcx,8)
Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table->data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.
Fixes: 6963c508fd7a ("mptcp: only allow set existing scheduler for net.mptcp.scheduler")
Cc: stable@vger.kernel.org
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'net.mptcp.available_schedulers' sysctl knob is there to list available
schedulers, not to modify this list.
There are then no reasons to give write access to it.
Nothing would have been written anyway, but no errors would have been
returned, which is unexpected.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-1-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have not seen emails or tags from Lars in almost 4 years.
Steen and Daniel are pretty active, but the review coverage
isn't stellar (35% of changes go in without a review tag).
Subsystem ARM/Microchip Sparx5 SoC support
Changes 28 / 79 (35%)
Last activity: 2024-11-24
Lars Povlsen <lars.povlsen@microchip.com>:
Steen Hegelund <Steen.Hegelund@microchip.com>:
Tags 6c7c4b91aa43 2024-04-08 00:00:00 15
Daniel Machon <daniel.machon@microchip.com>:
Author 48ba00da2eb4 2024-04-09 00:00:00 2
Tags f164b296638d 2024-11-24 00:00:00 6
Top reviewers:
[7]: horms@kernel.org
[1]: jacob.e.keller@intel.com
[1]: jensemil.schulzostergaard@microchip.com
[1]: horatiu.vultur@microchip.com
INACTIVE MAINTAINER Lars Povlsen <lars.povlsen@microchip.com>
Acked-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20250108155242.2575530-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Noam Dagan was added to ENA reviewers in 2021, we have not seen
a single email from this person to any list, ever (according to lore).
Git history mentions the name in 2 SoB tags from 2020.
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20250108155242.2575530-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a steady stream of fixes for TIPC, even tho the development
has slowed down a lot. Over last 2 years we have merged almost 70
TIPC patches, but we haven't heard from Ying Xue once:
Subsystem TIPC NETWORK LAYER
Changes 42 / 69 (60%)
Last activity: 2023-10-04
Jon Maloy <jmaloy@redhat.com>:
Tags 08e50cf07184 2023-10-04 00:00:00 6
Ying Xue <ying.xue@windriver.com>:
Top reviewers:
[9]: horms@kernel.org
[8]: tung.q.nguyen@dektech.com.au
[4]: jiri@nvidia.com
[3]: tung.q.nguyen@endava.com
[2]: kuniyu@amazon.com
INACTIVE MAINTAINER Ying Xue <ying.xue@windriver.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250108155242.2575530-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mailing lists have seen no email from Mark Lee in the last 4 years.
gitdm missingmaints says:
Subsystem MEDIATEK ETHERNET DRIVER
Changes 103 / 400 (25%)
Last activity: 2024-12-19
Felix Fietkau <nbd@nbd.name>:
Author 88806efc034a 2024-10-17 00:00:00 44
Tags 88806efc034a 2024-10-17 00:00:00 51
Sean Wang <sean.wang@mediatek.com>:
Tags a5d75538295b 2020-04-07 00:00:00 1
Mark Lee <Mark-MC.Lee@mediatek.com>:
Lorenzo Bianconi <lorenzo@kernel.org>:
Author 0c7469ee718e 2024-12-19 00:00:00 123
Tags 0c7469ee718e 2024-12-19 00:00:00 139
Top reviewers:
[32]: horms@kernel.org
[15]: leonro@nvidia.com
[9]: andrew@lunn.ch
INACTIVE MAINTAINER Mark Lee <Mark-MC.Lee@mediatek.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250108155242.2575530-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I tried a couple of things to reinvigorate the stmmac maintainers
over the last few years but with little effect. The maintainers
are not active, let the MAINTAINERS file reflect reality.
The Synopsys IP this driver supports is very popular we need
a solid maintainer to deal with the complexity of the driver.
gitdm missingmaints says:
Subsystem STMMAC ETHERNET DRIVER
Changes 344 / 978 (35%)
Last activity: 2020-05-01
Alexandre Torgue <alexandre.torgue@foss.st.com>:
Tags 1bb694e20839 2020-05-01 00:00:00 1
Jose Abreu <joabreu@synopsys.com>:
Top reviewers:
[75]: horms@kernel.org
[49]: andrew@lunn.ch
[46]: fancer.lancer@gmail.com
INACTIVE MAINTAINER Jose Abreu <joabreu@synopsys.com>
Acked-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Link: https://patch.msgid.link/20250108155242.2575530-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Andy does not participate much in bonding reviews, unfortunately.
Move him to CREDITS.
gitdm missingmaint says:
Subsystem BONDING DRIVER
Changes 149 / 336 (44%)
Last activity: 2024-09-05
Jay Vosburgh <jv@jvosburgh.net>:
Tags 68db604e16d5 2024-09-05 00:00:00 8
Andy Gospodarek <andy@greyhouse.net>:
Top reviewers:
[65]: jay.vosburgh@canonical.com
[23]: liuhangbin@gmail.com
[16]: razor@blackwall.org
INACTIVE MAINTAINER Andy Gospodarek <andy@greyhouse.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250108155242.2575530-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Woojung Huh seems to have only replied to the list 35 times
in the last 5 years, and didn't provide any reviews in 3 years.
The LAN78XX driver has seen quite a bit of activity lately.
gitdm missingmaints says:
Subsystem USB LAN78XX ETHERNET DRIVER
Changes 35 / 91 (38%)
(No activity)
Top reviewers:
[23]: andrew@lunn.ch
[3]: horms@kernel.org
[2]: mateusz.polchlopek@intel.com
INACTIVE MAINTAINER Woojung Huh <woojung.huh@microchip.com>
Move Woojung to CREDITS and add new maintainers who are more
likely to review LAN78xx patches.
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Acked-by: Rengarajan Sundararajan <rengarajan.s@microchip.com>
Link: https://patch.msgid.link/20250108155242.2575530-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's not much review support from Jose, there is a sharp
drop in his participation around 4 years ago.
The DW XPCS IP is very popular and the driver requires active
maintenance.
gitdm missingmaints says:
Subsystem SYNOPSYS DESIGNWARE ETHERNET XPCS DRIVER
Changes 33 / 94 (35%)
(No activity)
Top reviewers:
[16]: andrew@lunn.ch
[12]: vladimir.oltean@nxp.com
[2]: f.fainelli@gmail.com
INACTIVE MAINTAINER Jose Abreu <Jose.Abreu@synopsys.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250108155242.2575530-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When cmd_alloc_index(), fails cmd_work_handler() needs
to complete ent->slotted before returning early.
Otherwise the task which issued the command may hang:
mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry
INFO: task kworker/13:2:4055883 blocked for more than 120 seconds.
Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/13:2 D 0 4055883 2 0x00000228
Workqueue: events mlx5e_tx_dim_work [mlx5_core]
Call trace:
__switch_to+0xe8/0x150
__schedule+0x2a8/0x9b8
schedule+0x2c/0x88
schedule_timeout+0x204/0x478
wait_for_common+0x154/0x250
wait_for_completion+0x28/0x38
cmd_exec+0x7a0/0xa00 [mlx5_core]
mlx5_cmd_exec+0x54/0x80 [mlx5_core]
mlx5_core_modify_cq+0x6c/0x80 [mlx5_core]
mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core]
mlx5e_tx_dim_work+0x54/0x68 [mlx5_core]
process_one_work+0x1b0/0x448
worker_thread+0x54/0x468
kthread+0x134/0x138
ret_from_fork+0x10/0x18
Fixes: 485d65e13571 ("net/mlx5: Add a timeout to acquire the command queue semaphore")
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250108030009.68520-1-zhaochenguang@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pci_irq_vector() function never returns zero. It returns negative
error codes or a positive non-zero IRQ number. Fix the error checking to
test for negatives.
Fixes: a36e9f5cfe9e ("rtase: Add support for a pci table in this module")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/f2ecc88d-af13-4651-9820-7cc665230019@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a lock held when returning to userspace[1]. This is
because if argc is less than 0 and the function returns directly, the held
inode lock is not released.
Fix this by store the error in ret and jump to done to clean up instead of
returning directly.
[dh: Modified Lizhi Xu's original patch to make it honour the error code
from afs_split_string()]
[1]
WARNING: lock held when returning to user space!
6.13.0-rc3-syzkaller-00209-g499551201b5f #0 Not tainted
------------------------------------------------
syz-executor133/5823 is leaving the kernel with locks still held!
1 lock held by syz-executor133/5823:
#0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: inode_lock include/linux/fs.h:818 [inline]
#0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: afs_proc_addr_prefs_write+0x2bb/0x14e0 fs/afs/addr_prefs.c:388
Reported-by: syzbot+76f33569875eb708e575@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=76f33569875eb708e575
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241226012616.2348907-1-lizhi.xu@windriver.com/
Link: https://lore.kernel.org/r/529850.1736261552@warthog.procyon.org.uk
Tested-by: syzbot+76f33569875eb708e575@syzkaller.appspotmail.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be
written to the MGBE_WRAP_AXI_ASID0_CTRL register.
The current driver is hard coded to use MGBE0's SID for all controllers.
This causes softirq time outs and kernel panics when using controllers
other than MGBE0.
Example dmesg errors when an ethernet cable is connected to MGBE1:
[ 116.133290] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 121.851283] tegra-mgbe 6910000.ethernet eth1: NETDEV WATCHDOG: CPU: 5: transmit queue 0 timed out 5690 ms
[ 121.851782] tegra-mgbe 6910000.ethernet eth1: Reset adapter.
[ 121.892464] tegra-mgbe 6910000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 121.905920] tegra-mgbe 6910000.ethernet eth1: PHY [stmmac-1:00] driver [Aquantia AQR113] (irq=171)
[ 121.907356] tegra-mgbe 6910000.ethernet eth1: Enabling Safety Features
[ 121.907578] tegra-mgbe 6910000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
[ 121.908399] tegra-mgbe 6910000.ethernet eth1: registered PTP clock
[ 121.908582] tegra-mgbe 6910000.ethernet eth1: configuring for phy/10gbase-r link mode
[ 125.961292] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 181.921198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 181.921404] rcu: 7-....: (1 GPs behind) idle=540c/1/0x4000000000000002 softirq=1748/1749 fqs=2337
[ 181.921684] rcu: (detected by 4, t=6002 jiffies, g=1357, q=1254 ncpus=8)
[ 181.921878] Sending NMI from CPU 4 to CPUs 7:
[ 181.921886] NMI backtrace for cpu 7
[ 181.922131] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
[ 181.922390] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
[ 181.922658] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 181.922847] pc : handle_softirqs+0x98/0x368
[ 181.922978] lr : __do_softirq+0x18/0x20
[ 181.923095] sp : ffff80008003bf50
[ 181.923189] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
[ 181.923379] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
[ 181.924486] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
[ 181.925568] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
[ 181.926655] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
[ 181.931455] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
[ 181.938628] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
[ 181.945804] x8 : ffff8000827b3160 x7 : f9157b241586f343 x6 : eeb6502a01c81c74
[ 181.953068] x5 : a4acfcdd2e8096bb x4 : ffffce78ea277340 x3 : 00000000ffffd1e1
[ 181.960329] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
[ 181.967591] Call trace:
[ 181.970043] handle_softirqs+0x98/0x368 (P)
[ 181.974240] __do_softirq+0x18/0x20
[ 181.977743] ____do_softirq+0x14/0x28
[ 181.981415] call_on_irq_stack+0x24/0x30
[ 181.985180] do_softirq_own_stack+0x20/0x30
[ 181.989379] __irq_exit_rcu+0x114/0x140
[ 181.993142] irq_exit_rcu+0x14/0x28
[ 181.996816] el1_interrupt+0x44/0xb8
[ 182.000316] el1h_64_irq_handler+0x14/0x20
[ 182.004343] el1h_64_irq+0x80/0x88
[ 182.007755] cpuidle_enter_state+0xc4/0x4a8 (P)
[ 182.012305] cpuidle_enter+0x3c/0x58
[ 182.015980] cpuidle_idle_call+0x128/0x1c0
[ 182.020005] do_idle+0xe0/0xf0
[ 182.023155] cpu_startup_entry+0x3c/0x48
[ 182.026917] secondary_start_kernel+0xdc/0x120
[ 182.031379] __secondary_switched+0x74/0x78
[ 212.971162] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6103 jiffies s: 417 root: 0x80/.
[ 212.985935] rcu: blocking rcu_node structures (internal RCU debug):
[ 212.992758] Sending NMI from CPU 0 to CPUs 7:
[ 212.998539] NMI backtrace for cpu 7
[ 213.004304] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
[ 213.016116] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
[ 213.030817] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 213.040528] pc : handle_softirqs+0x98/0x368
[ 213.046563] lr : __do_softirq+0x18/0x20
[ 213.051293] sp : ffff80008003bf50
[ 213.055839] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
[ 213.067304] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
[ 213.077014] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
[ 213.087339] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
[ 213.097313] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
[ 213.107201] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
[ 213.116651] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
[ 213.127500] x8 : ffff8000827b3160 x7 : 0a37b344852820af x6 : 3f049caedd1ff608
[ 213.138002] x5 : cff7cfdbfaf31291 x4 : ffffce78ea277340 x3 : 00000000ffffde04
[ 213.150428] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
[ 213.162063] Call trace:
[ 213.165494] handle_softirqs+0x98/0x368 (P)
[ 213.171256] __do_softirq+0x18/0x20
[ 213.177291] ____do_softirq+0x14/0x28
[ 213.182017] call_on_irq_stack+0x24/0x30
[ 213.186565] do_softirq_own_stack+0x20/0x30
[ 213.191815] __irq_exit_rcu+0x114/0x140
[ 213.196891] irq_exit_rcu+0x14/0x28
[ 213.202401] el1_interrupt+0x44/0xb8
[ 213.207741] el1h_64_irq_handler+0x14/0x20
[ 213.213519] el1h_64_irq+0x80/0x88
[ 213.217541] cpuidle_enter_state+0xc4/0x4a8 (P)
[ 213.224364] cpuidle_enter+0x3c/0x58
[ 213.228653] cpuidle_idle_call+0x128/0x1c0
[ 213.233993] do_idle+0xe0/0xf0
[ 213.237928] cpu_startup_entry+0x3c/0x48
[ 213.243791] secondary_start_kernel+0xdc/0x120
[ 213.249830] __secondary_switched+0x74/0x78
This bug has existed since the dwmac-tegra driver was added in Dec 2022
(See Fixes tag below for commit hash).
The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit
only uses MGBE0 which is why the bug was not found previously. Connect Tech
has many products that use 2 (or more) MGBE controllers.
The solution is to read the controller's SID from the existing "iommus"
device tree property. The 2nd field of the "iommus" device tree property
is the controller's SID.
Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property:
smmu_niso0: iommu@12000000 {
compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500";
...
}
/* MGBE1 */
ethernet@6900000 {
compatible = "nvidia,tegra234-mgbe";
...
iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
...
}
Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in
the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the
SID using the tegra_dev_iommu_get_stream_id() helper function found in
linux/iommu.h.
Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus"
property is removed from the device tree or the IOMMU is disabled.
While the Tegra234 SOC technically supports bypassing the IOMMU, it is not
supported by the current firmware, has not been tested and not recommended.
More detailed discussion with Thierry Reding from Nvidia linked below.
Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support")
Link: https://lore.kernel.org/netdev/cover.1731685185.git.pnewman@connecttech.com
Signed-off-by: Parker Newman <pnewman@connecttech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/6fb97f32cf4accb4f7cf92846f6b60064ba0a3bd.1736284360.git.pnewman@connecttech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix netfslib's read-retry to only call ->prepare_read() in the backing
filesystem such a function is provided. We can get to this point if a
there's an active cache as failed reads from the cache need negotiating
with the server instead.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/529329.1736261010@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array. Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators. This can be
triggered through a combination of cifs and a loopback blockdev with
something like:
mount //my/cifs/share /foo
dd if=/dev/zero of=/foo/m0 bs=4K count=1K
losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
echo hello >/dev/loop2046
This causes the following to appear in syslog:
WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
and the write to fail.
Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes. Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.
Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported-by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/608725.1736275167@warthog.procyon.org.uk
Tested-by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|