<feed xmlns='http://www.w3.org/2005/Atom'>
<title>wireguard-linux/kernel, branch devel</title>
<subtitle>WireGuard for the Linux kernel</subtitle>
<id>https://git.zx2c4.com/wireguard-linux/atom/kernel?h=devel</id>
<link rel='self' href='https://git.zx2c4.com/wireguard-linux/atom/kernel?h=devel'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/'/>
<updated>2026-04-30T05:21:44Z</updated>
<entry>
<title>Merge tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2026-04-30T05:21:44Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-30T05:21:44Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=e75a43c7cec459a07d91ed17de4de13ede2b7758'/>
<id>urn:sha1:e75a43c7cec459a07d91ed17de4de13ede2b7758</id>
<content type='text'>
Pull tracing fixes from Steven Rostedt:

 - Fix inverted check of registering the stats for branch tracing

   When calling register_stat_tracer() which returns zero on success and
   negative on error, the callers were checking the return of zero as an
   error and printing a warning message. Because this was just a normal
   printk() message and not a WARN(), it wasn't caught in any testing.

   Fix the check to print the warning message when an error actually
   happens.

 - Fix a typo in a comment in tracepoint.h

 - Limit the size of event probes to 3K in size

   It is possible to create a dynamic event probe via the tracefs system
   that is greater than the max size of an event that the ring buffer
   can hold. This basically causes the event to become useless.

   Limit the size of an event probe to be 3K as that should be large
   enough to handle any dynamic events being created, and fits within
   the PAGE_SIZE sub-buffers of the ring buffer.

* tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Limit size of event probe to 3K
  tracepoint: Fix typo in tracepoint.h comment
  tracing: branch: Fix inverted check on stat tracer registration
</content>
</entry>
<entry>
<title>tracing/probes: Limit size of event probe to 3K</title>
<updated>2026-04-29T20:07:38Z</updated>
<author>
<name>Steven Rostedt</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2026-04-28T16:23:02Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=b2aa3b4d64e460ac606f386c24e7d8a873ce6f1a'/>
<id>urn:sha1:b2aa3b4d64e460ac606f386c24e7d8a873ce6f1a</id>
<content type='text'>
There currently isn't a max limit an event probe can be. One could make an
event greater than PAGE_SIZE, which makes the event useless because if
it's bigger than the max event that can be recorded into the ring buffer,
then it will never be recorded.

A event probe should never need to be greater than 3K, so make that the
max size. As long as the max is less than the max that can be recorded
onto the ring buffer, it should be fine.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Fixes: 93ccae7a22274 ("tracing/kprobes: Support basic types on dynamic events")
Link: https://patch.msgid.link/20260428122302.706610ba@gandalf.local.home
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2026-04-28T23:26:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-28T23:26:11Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=664f0f6be37ce4ef80992cf2ed74761cd5bbe207'/>
<id>urn:sha1:664f0f6be37ce4ef80992cf2ed74761cd5bbe207</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:
 "The merge window pulled in the cgroup sub-scheduler infrastructure,
  and new AI reviews are accelerating bug reporting and fixing - hence
  the larger than usual fixes batch:

   - Use-after-frees during scheduler load/unload:
       - The disable path could free the BPF scheduler while deferred
         irq_work / kthread work was still in flight
       - cgroup setter callbacks read the active scheduler outside the
         rwsem that synchronizes against teardown
     Fix both, and reuse the disable drain in the enable error paths so
     the BPF JIT page can't be freed under live callbacks.

   - Several BPF op invocations didn't tell the framework which runqueue
     was already locked, so helper kfuncs that re-acquire the runqueue
     by CPU could deadlock on the held lock

     Fix the affected callsites, including recursive parent-into-child
     dispatch.

   - The hardlockup notifier ran from NMI but eventually took a
     non-NMI-safe lock. Bounce it through irq_work.

   - A handful of bugs in the new sub-scheduler hierarchy:
       - helper kfuncs hard-coded the root instead of resolving the
         caller's scheduler
       - the enable error path tried to disable per-task state that had
         never been initialized, and leaked cpus_read_lock on the way
         out
       - a sysfs object was leaked on every load/unload
       - the dispatch fast-path used the root scheduler instead of the
         task's
       - a couple of CONFIG #ifdef guards were misclassified

   - Verifier-time hardening: BPF programs of unrelated struct_ops types
     (e.g. tcp_congestion_ops) could call sched_ext kfuncs - a semantic
     bug and, once sub-sched was enabled, a KASAN out-of-bounds read.
     Now rejected at load. Plus a few NULL and cross-task argument
     checks on sched_ext kfuncs, and a selftest covering the new deny.

   - rhashtable (Herbert): restore the insecure_elasticity toggle and
     bounce the deferred-resize kick through irq_work to break a
     lock-order cycle observable from raw-spinlock callers. sched_ext's
     scheduler-instance hash is the first user of both.

   - The bypass-mode load balancer used file-scope cpumasks; with
     multiple scheduler instances now possible, those raced. Move to
     per-instance cpumasks, plus a follow-up to skip tasks whose
     recorded CPU is stale relative to the new owning runqueue.

   - Smaller fixes:
       - a dispatch queue's first-task tracking misbehaved when a parked
         iterator cursor sat in the list
       - the runqueue's next-class wasn't promoted on local-queue
         enqueue, leaving an SCX task behind RT in edge cases
       - the reference qmap scheduler stopped erroring on legitimate
         cross-scheduler task-storage misses"

* tag 'sched_ext-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (26 commits)
  sched_ext: Fix scx_flush_disable_work() UAF race
  sched_ext: Call wakeup_preempt() in local_dsq_post_enq()
  sched_ext: Release cpus_read_lock on scx_link_sched() failure in root enable
  sched_ext: Reject NULL-sch callers in scx_bpf_task_set_slice/dsq_vtime
  sched_ext: Refuse cross-task select_cpu_from_kfunc calls
  sched_ext: Align cgroup #ifdef guards with SUB_SCHED vs GROUP_SCHED
  sched_ext: Make bypass LB cpumasks per-scheduler
  sched_ext: Pass held rq to SCX_CALL_OP() for core_sched_before
  sched_ext: Pass held rq to SCX_CALL_OP() for dump_cpu/dump_task
  sched_ext: Save and restore scx_locked_rq across SCX_CALL_OP
  sched_ext: Use dsq-&gt;first_task instead of list_empty() in dispatch_enqueue() FIFO-tail
  sched_ext: Resolve caller's scheduler in scx_bpf_destroy_dsq() / scx_bpf_dsq_nr_queued()
  sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters
  sched_ext: Don't disable tasks in scx_sub_enable_workfn() abort path
  sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
  sched_ext: Guard scx_dsq_move() against NULL kit-&gt;dsq after failed iter_new
  sched_ext: Unregister sub_kset on scheduler disable
  sched_ext: Defer scx_hardlockup() out of NMI
  sched_ext: sync disable_irq_work in bpf_scx_unreg()
  sched_ext: Fix local_dsq_post_enq() to use task's scheduler in sub-sched
  ...
</content>
</entry>
<entry>
<title>tracing: branch: Fix inverted check on stat tracer registration</title>
<updated>2026-04-28T18:28:29Z</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-04-20T13:25:09Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=3b75dd76e64a04771861bb5647951c264919e563'/>
<id>urn:sha1:3b75dd76e64a04771861bb5647951c264919e563</id>
<content type='text'>
init_annotated_branch_stats() and all_annotated_branch_stats() check the
return value of register_stat_tracer() with "if (!ret)", but
register_stat_tracer() returns 0 on success and a negative errno on
failure. The inverted check causes the warning to be printed on every
successful registration, e.g.:

  Warning: could not register annotated branches stats

while leaving real failures silent. The initcall also returned a
hard-coded 1 instead of the actual error.

Invert the check and propagate ret so that the warning fires on real
errors and the initcall reports the correct status.

Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Link: https://patch.msgid.link/20260420-tracing-v1-1-d8f4cd0d6af1@debian.org
Fixes: 002bb86d8d42 ("tracing/ftrace: separate events tracing and stats tracing engine")
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix scx_flush_disable_work() UAF race</title>
<updated>2026-04-28T17:40:03Z</updated>
<author>
<name>Cheng-Yang Chou</name>
<email>yphbchou0911@gmail.com</email>
</author>
<published>2026-04-28T17:36:12Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=d99f7a32f09dccbe396187370ec1a74a31b73d7e'/>
<id>urn:sha1:d99f7a32f09dccbe396187370ec1a74a31b73d7e</id>
<content type='text'>
scx_flush_disable_work() calls irq_work_sync() followed by
kthread_flush_work() to ensure that the disable kthread work has
fully completed before bpf_scx_unreg() frees the SCX scheduler.

However, a concurrent scx_vexit() (e.g., triggered by a watchdog stall)
creates a race window between scx_claim_exit() and irq_work_queue():

  CPU A (scx_vexit (watchdog))        CPU B (bpf_scx_unreg)
  ----                                ----
  scx_claim_exit()
    atomic_try_cmpxchg(NONE-&gt;kind)
  stack_trace_save()
  vscnprintf()
                                      scx_disable()
                                        scx_claim_exit() -&gt; FAIL
                                      scx_flush_disable_work()
                                        irq_work_sync()      // no-op: not queued yet
                                        kthread_flush_work() // no-op: not queued yet
                                      kobject_put(&amp;sch-&gt;kobj) -&gt; free %sch
  irq_work_queue() -&gt; UAF on %sch
  scx_disable_irq_workfn()
    kthread_queue_work() -&gt; UAF

The root cause is that CPU B's scx_flush_disable_work() returns after
syncing an irq_work that has not yet been queued, while CPU A is still
executing the code between scx_claim_exit() and irq_work_queue().

Loop until exit_kind reaches SCX_EXIT_DONE or SCX_EXIT_NONE, draining
disable_irq_work and disable_work in each pass. This ensures that any
work queued after the previous check is caught, while also correctly
handling cases where no disable was triggered (e.g., the
scx_sub_enable_workfn() abort path).

Fixes: 510a27055446 ("sched_ext: sync disable_irq_work in bpf_scx_unreg()")
Reported-by: https://sashiko.dev/#/patchset/20260424100221.32407-1-icheng%40nvidia.com
Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Cheng-Yang Chou &lt;yphbchou0911@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Call wakeup_preempt() in local_dsq_post_enq()</title>
<updated>2026-04-28T16:28:48Z</updated>
<author>
<name>Kuba Piecuch</name>
<email>jpiecuch@google.com</email>
</author>
<published>2026-04-28T12:46:01Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=163f8b7f9a84086c67c76aeadc04e6d43e32df6e'/>
<id>urn:sha1:163f8b7f9a84086c67c76aeadc04e6d43e32df6e</id>
<content type='text'>
There are several edge cases (see linked thread) where an IMMED task
can be left lingering on a local DSQ if an RT task swoops in at the
wrong time. All of these edge cases are due to rq-&gt;next_class being idle
even after dispatching a task to rq's local DSQ. We should bump
rq-&gt;next_class to &amp;ext_sched_class as soon as we've inserted a task into
the local DSQ.

To optimize the common case of rq-&gt;next_class == &amp;ext_sched_class,
only call wakeup_preempt() if rq-&gt;next_class is below EXT. If next_class
is EXT or above, wakeup_preempt() is a no-op anyway.

This lets us also simplify the preempt_curr() logic a bit since
wakeup_preempt() will call preempt_curr() for us if next_class is
below EXT.

Link: https://lore.kernel.org/all/DHZPHUFXB4N3.2RY28MUEWBNYK@google.com/
Signed-off-by: Kuba Piecuch &lt;jpiecuch@google.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2026-04-27T23:51:27Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-27T23:51:27Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=3b3bea6d4b9c162f9e555905d96b8c1da67ecd5b'/>
<id>urn:sha1:3b3bea6d4b9c162f9e555905d96b8c1da67ecd5b</id>
<content type='text'>
Pull cgroup fixes from Tejun Heo:

 - Fix UAF race in psi pressure_write() against cgroup file release by
   extending cgroup_mutex coverage and ordering of-&gt;priv access after
   cgroup_kn_lock_live()

 - Fix integer overflow in rdmacg_try_charge() when usage equals INT_MAX
   by performing the increment in s64

 - Fix asymmetric DL bandwidth accounting on cpuset attach rollback by
   recording the CPU used by dl_bw_alloc() so cancel_attach() returns
   the reservation to the same root domain

 - Fix nr_dying_subsys_* race that briefly showed 0 in cgroup.stat after
   rmdir by incrementing from kill_css() instead of offline_css()

 - Typo fix in cgroup-v2 documentation

* tag 'cgroup-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup: fix typo 'protetion' -&gt; 'protection'
  cgroup: Increment nr_dying_subsys_* from rmdir context
  cgroup/cpuset: record DL BW alloc CPU for attach rollback
  cgroup/rdma: fix integer overflow in rdmacg_try_charge()
  sched/psi: fix race between file release and pressure write
</content>
</entry>
<entry>
<title>sched_ext: Release cpus_read_lock on scx_link_sched() failure in root enable</title>
<updated>2026-04-25T00:31:36Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-04-25T00:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=deb7b2f93d0129b79425f830a1e5e7e1bb2c4973'/>
<id>urn:sha1:deb7b2f93d0129b79425f830a1e5e7e1bb2c4973</id>
<content type='text'>
scx_root_enable_workfn() takes cpus_read_lock() before
scx_link_sched(sch), but the `if (ret) goto err_disable` on failure
skips the matching cpus_read_unlock() - all other err_disable gotos
along this path drop the lock first.

scx_link_sched() only returns non-zero on the sub-sched path
(parent != NULL), so the leak path is unreachable via the root
caller today. Still, the unwind is out of line with the surrounding
paths.

Drop cpus_read_lock() before goto err_disable.

v2: Correct Fixes: tag (Andrea Righi).

Fixes: 25037af712eb ("sched_ext: Add rhashtable lookup for sub-schedulers")
Reported-by: Chris Mason &lt;clm@meta.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Reject NULL-sch callers in scx_bpf_task_set_slice/dsq_vtime</title>
<updated>2026-04-25T00:31:36Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-04-25T00:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=05b4a9a9bc37f1fa289a8f07b4fbfc3ae681b650'/>
<id>urn:sha1:05b4a9a9bc37f1fa289a8f07b4fbfc3ae681b650</id>
<content type='text'>
scx_prog_sched(aux) returns NULL for TRACING / SYSCALL BPF progs that
have no struct_ops association when the root scheduler has sub_attach
set. scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() pass
that NULL into scx_task_on_sched(sch, p), which under
CONFIG_EXT_SUB_SCHED is rcu_access_pointer(p-&gt;scx.sched) == sch. For
any non-scx task p-&gt;scx.sched is NULL, so NULL == NULL returns true
and the authority gate is bypassed - a privileged but
non-struct_ops-associated prog can poke p-&gt;scx.slice /
p-&gt;scx.dsq_vtime on arbitrary tasks.

Reject !sch up front so the gate only admits callers with a resolved
scheduler.

Fixes: 245d09c594ea ("sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime")
Reported-by: Chris Mason &lt;clm@meta.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Refuse cross-task select_cpu_from_kfunc calls</title>
<updated>2026-04-25T00:31:36Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-04-25T00:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=ea7c716a24aebe887e0990649ab697bd698cc325'/>
<id>urn:sha1:ea7c716a24aebe887e0990649ab697bd698cc325</id>
<content type='text'>
select_cpu_from_kfunc() skipped pi_lock for @p when called from
ops.select_cpu() or another rq-locked SCX op, assuming the held lock
protects @p. scx_bpf_select_cpu_dfl() / __scx_bpf_select_cpu_and() accept an
arbitrary KF_RCU task_struct, so a caller in e.g. ops.select_cpu(p1) or
ops.enqueue(p1) can pass some other p2 - the held pi_lock / rq lock is p1's,
not p2's - and reading p2-&gt;cpus_ptr / nr_cpus_allowed races with
set_cpus_allowed_ptr() and migrate_disable_switch() on another CPU.

Abort the scheduler on cross-task calls in both branches: for
ops.select_cpu() use scx_kf_arg_task_ok() to verify @p is the wake-up
task recorded in current-&gt;scx.kf_tasks[] by SCX_CALL_OP_TASK_RET();
for other rq-locked SCX ops compare task_rq(p) against scx_locked_rq().

v2: Switch the in_select_cpu cross-task check from direct_dispatch_task
    comparison to scx_kf_arg_task_ok(). The former spuriously rejects when
    ops.select_cpu() calls scx_bpf_dsq_insert() first, then calls
    scx_bpf_select_cpu_*() on the same task. (Andrea Righi)

Fixes: 0022b328504d ("sched_ext: Decouple kfunc unlocked-context check from kf_mask")
Reported-by: Chris Mason &lt;clm@meta.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
</feed>
