| Age | Commit message (Collapse) | Author | Files | Lines |
|
Merge an ACPICA change, device ACPI properties handling update, ACPI
power management updates, and an ACPI battery driver update for
6.19-rc1:
- Avoid walking the ACPI namespace in the AML interpreter if the
starting node cannot be determined (Cryolitia PukNgae)
- Use min() instead of min_t() in the ACPI device properties handling
code to avoid discarding significant bits (David Laight)
- Fix potential fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
that may prevent the parent fwnode from being released (Haotian Zhang)
- Rework acpi_graph_get_next_endpoint() to use ACPI functions only, remove
unnecessary contitionals from it to make it easier to follow, and make
acpi_get_next_subnode() static (Sakari Ailus)
- Drop unused function acpi_get_lps0_constraint(), make some Low-Power
S0 callback functions for suspend-to-idle static, and rearrange the
code retrieving Low-Power S0 constraits so it only runs when the
constraits are actually used (Rafael Wysocki)
- Drop redundant locking from the ACPI battery driver (Rafael Wysocki)
* acpica:
ACPICA: Avoid walking the Namespace if start_node is NULL
* acpi-property:
ACPI: property: use min() instead of min_t()
ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
ACPI: property: Rework acpi_graph_get_next_endpoint()
ACPI: property: Use ACPI functions in acpi_graph_get_next_endpoint() only
ACPI: property: Make acpi_get_next_subnode() static
* acpi-pm:
ACPI: PM: s2idle: Only retrieve constraints when needed
ACPI: PM: s2idle: Staticise LPS0 callback functions
ACPI: PM: s2idle: Drop acpi_get_lps0_constraint()
* acpi-battery:
ACPI: battery: Drop redundant locking
|
|
I've been playing with this to allow for moderately flexible usage of
the get_unused_fd_flags() + create file + fd_install() pattern that's
used quite extensively.
How callers allocate files is really heterogenous so it's not really
convenient to fold them into a single class. It's possibe to split them
into subclasses like for anon inodes. I think that's not necessarily
nice as well.
My take is to add two primites:
(1) FD_ADD() the simple cases a file is installed:
fd = FD_ADD(O_CLOEXEC, open_file(some, args)));
if (fd >= 0)
kvm_get_kvm(vcpu->kvm);
return fd;
(2) FD_PREPARE() that captures all the cases where access to fd or file
or additional work before publishing the fd is needed:
FD_PREPARE(fdf, open_flag, file_open_handle(&path, open_flag));
if (fdf.err)
return fdf.err;
if (copy_to_user(/* something something */))
return -EFAULT;
return fd_publish(fdf);
I've converted all of the easy cases over to it and it gets rid of an
aweful lot of convoluted cleanup logic.
It's centered around struct fd_prepare. FD_PREPARE() encapsulates all of
allocation and cleanup logic and must be followed by a call to
fd_publish() which associates the fd with the file and installs it into
the callers fdtable. If fd_publish() isn't called both are deallocated.
It mandates a specific order namely that first we allocate the fd and
then instantiate the file. But that shouldn't be a problem nearly
everyone I've converted uses this exact pattern anyway.
There's a bunch of additional cases where it would be easy to convert
them to this pattern. For example, the whole sync file stuff in dma
currently retains the containing structure of the file instead of the
file itself even though it's only used to allocate files. Changing that
would make it fall into the FD_PREPARE() pattern easily. I've not done
that work yet.
There's room for extending this in a way that wed'd have subclasses for
some particularly often use patterns but as I said I'm not even sure
that's worth it.
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-1-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The definition of struct delegation uses stdint.h integer types. Add the
necessary headers to ensure that always works.
Fixes: 1602bad16d7d ("vfs: expose delegation support to userland")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull ceph fixes from Ilya Dryomov:
"A patch to make sparse read handling work in msgr2 secure mode from
Slava and a couple of fixes from Ziming and myself to avoid operating
on potentially invalid memory, all marked for stable"
* tag 'ceph-for-6.18-rc8' of https://github.com/ceph/ceph-client:
libceph: prevent potential out-of-bounds writes in handle_auth_session_key()
libceph: replace BUG_ON with bounds check for map->max_osd
ceph: fix crash in process_v2_sparse_read() for encrypted directories
libceph: drop started parameter of __ceph_open_session()
libceph: fix potential use-after-free in have_mon_and_osd_map()
|
|
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and CAN. No known outstanding
regressions.
Current release - regressions:
- mptcp: initialize rcv_mss before calling tcp_send_active_reset()
- eth: mlx5e: fix validation logic in rate limiting
Previous releases - regressions:
- xsk: avoid data corruption on cq descriptor number
- bluetooth:
- prevent race in socket write iter and sock bind
- fix not generating mackey and ltk when repairing
- can:
- kvaser_usb: fix potential infinite loop in command parsers
- rcar_canfd: fix CAN-FD mode as default
- eth:
- veth: reduce XDP no_direct return section to fix race
- virtio-net: avoid unnecessary checksum calculation on guest RX
Previous releases - always broken:
- sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr()
- bluetooth: mediatek: fix kernel crash when releasing iso interface
- vhost: rewind next_avail_head while discarding descriptors
- eth:
- r8169: fix RTL8127 hang on suspend/shutdown
- aquantia: add missing descriptor cache invalidation on ATL2
- dsa: microchip: fix resource releases in error path"
* tag 'net-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
mptcp: Initialise rcv_mss before calling tcp_send_active_reset() in mptcp_do_fastclose().
net: fec: do not register PPS event for PEROUT
net: fec: do not allow enabling PPS and PEROUT simultaneously
net: fec: do not update PEROUT if it is enabled
net: fec: cancel perout_timer when PEROUT is disabled
net: mctp: unconditionally set skb->dev on dst output
net: atlantic: fix fragment overflow handling in RX path
MAINTAINERS: separate VIRTIO NET DRIVER and add netdev
virtio-net: avoid unnecessary checksum calculation on guest RX
eth: fbnic: Fix counter roll-over issue
mptcp: clear scheduled subflows on retransmit
net: dsa: sja1105: fix SGMII linking at 10M or 100M but not passing traffic
s390/net: list Aswin Karuvally as maintainer
net: wwan: mhi: Keep modem name match with Foxconn T99W640
vhost: rewind next_avail_head while discarding descriptors
net/sched: em_canid: fix uninit-value in em_canid_match
can: rcar_canfd: Fix CAN-FD mode as default
xsk: avoid data corruption on cq descriptor number
r8169: fix RTL8127 hang on suspend/shutdown
net: sxgbe: fix potential NULL dereference in sxgbe_rx()
...
|
|
Pull devfreq changes for v6.19 from Chanwoo Choi:
"- Move governor.h under include/linux/ and rename to devfreq-governor.h
in order to allow devfreq governor definitions in out of drivers/devfreq/.
- Fix potential use-after-free issue of OPP handling on hisi_uncore_freq.c
- Use min() to improve the readability on tegra30-devfreq.c
- Fix typo in DFSO_DOWNDIFFERENTIAL macro name on governor_simpleondemand.c"
* tag 'devfreq-next-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
PM / devfreq: hisi: Fix potential UAF in OPP handling
PM / devfreq: Move governor.h to a public header location
|
|
Commit a2fb4bc4e2a6 ("net: implement virtio helpers to handle UDP
GSO tunneling.") inadvertently altered checksum offload behavior
for guests not using UDP GSO tunneling.
Before, tun_put_user called tun_vnet_hdr_from_skb, which passed
has_data_valid = true to virtio_net_hdr_from_skb.
After, tun_put_user began calling tun_vnet_hdr_tnl_from_skb instead,
which passes has_data_valid = false into both call sites.
This caused virtio hdr flags to not include VIRTIO_NET_HDR_F_DATA_VALID
for SKBs where skb->ip_summed == CHECKSUM_UNNECESSARY. As a result,
guests are forced to recalculate checksums unnecessarily.
Restore the previous behavior by ensuring has_data_valid = true is
passed in the !tnl_gso_type case, but only from tun side, as
virtio_net_hdr_tnl_from_skb() is used also by the virtio_net driver,
which in turn must not use VIRTIO_NET_HDR_F_DATA_VALID on tx.
cc: stable@vger.kernel.org
Fixes: a2fb4bc4e2a6 ("net: implement virtio helpers to handle UDP GSO tunneling.")
Signed-off-by: Jon Kohler <jon@nutanix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20251125222754.1737443-1-jon@nutanix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With the previous commit revamping the timeout handling, started isn't
used anymore. It could be taken into account by adjusting the initial
value of the timeout, but there is little point as both callers capture
the timestamp shortly before calling __ceph_open_session() -- the only
thing of note that happens in the interim is taking client->mount_mutex
and that isn't expected to take multiple seconds.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
|
|
Now that all pieces are in place, change the implementations of
sched_mm_cid_fork() and sched_mm_cid_exit() to adhere to the new strict
ownership scheme and switch context_switch() over to use the new
mm_cid_schedin() functionality.
The common case is that there is no mode change required, which makes
fork() and exit() just update the user count and the constraints.
In case that a new user would exceed the CID space limit the fork() context
handles the transition to per CPU mode with mm::mm_cid::mutex held. exit()
handles the transition back to per task mode when the user count drops
below the switch back threshold. fork() might also be forced to handle a
deferred switch back to per task mode, when a affinity change increased the
number of allowed CPUs enough.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.280380631@linutronix.de
|
|
When affinity changes cause an increase of the number of CPUs allowed for
tasks which are related to a MM, that might results in a situation where
the ownership mode can go back from per CPU mode to per task mode.
As affinity changes happen with runqueue lock held there is no way to do
the actual mode change and required fixup right there.
Add the infrastructure to defer it to a workqueue. The scheduled work can
race with a fork() or exit(). Whatever happens first takes care of it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.216484739@linutronix.de
|
|
... to avoid header recursion hell.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.152813625@linutronix.de
|
|
CIDs are either owned by tasks or by CPUs. The ownership mode depends on
the number of tasks related to a MM and the number of CPUs on which these
tasks are theoretically allowed to run on. Theoretically because that
number is the superset of CPU affinities of all tasks which only grows and
never shrinks.
Switching to per CPU mode happens when the user count becomes greater than
the maximum number of CIDs, which is calculated by:
opt_cids = min(mm_cid::nr_cpus_allowed, mm_cid::users);
max_cids = min(1.25 * opt_cids, nr_cpu_ids);
The +25% allowance is useful for tight CPU masks in scenarios where only a
few threads are created and destroyed to avoid frequent mode
switches. Though this allowance shrinks, the closer opt_cids becomes to
nr_cpu_ids, which is the (unfortunate) hard ABI limit.
At the point of switching to per CPU mode the new user is not yet visible
in the system, so the task which initiated the fork() runs the fixup
function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either
transfers each tasks owned CID to the CPU the task runs on or drops it into
the CID pool if a task is not on a CPU at that point in time. Tasks which
schedule in before the task walk reaches them do the handover in
mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's
guaranteed that no task related to that MM owns a CID anymore.
Switching back to task mode happens when the user count goes below the
threshold which was recorded on the per CPU mode switch:
pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2);
This threshold is updated when a affinity change increases the number of
allowed CPUs for the MM, which might cause a switch back to per task mode.
If the switch back was initiated by a exiting task, then that task runs the
fixup function. If it was initiated by a affinity change, then it's run
either in the deferred update function in context of a workqueue or by a
task which forks a new one or by a task which exits. Whatever happens
first. mm_cid_fixup_cpus_to_task() walks through the possible CPUs and
either transfers the CPU owned CIDs to a related task which runs on the CPU
or drops it into the pool. Tasks which schedule in on a CPU which the walk
did not cover yet do the handover themselves.
This transition from CPU to per task ownership happens in two phases:
1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task
CID and denotes that the CID is only temporarily owned by the
task. When it schedules out the task drops the CID back into the
pool if this bit is set.
2) The initiating context walks the per CPU space and after completion
clears mm:mm_cid.transit. After that point the CIDs are strictly
task owned again.
This two phase transition is required to prevent CID space exhaustion
during the transition as a direct transfer of ownership would fail if
two tasks are scheduled in on the same CPU before the fixup freed per
CPU CIDs.
When mm_cid_fixup_cpus_to_tasks() completes it's guaranteed that no CID
related to that MM is owned by a CPU anymore.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.088189028@linutronix.de
|
|
The MM CID management has two fundamental requirements:
1) It has to guarantee that at no given point in time the same CID is
used by concurrent tasks in userspace.
2) The CID space must not exceed the number of possible CPUs in a
system. While most allocators (glibc, tcmalloc, jemalloc) do not
care about that, there seems to be at least some LTTng library
depending on it.
The CID space compaction itself is not a functional correctness
requirement, it is only a useful optimization mechanism to reduce the
memory foot print in unused user space pools.
The optimal CID space is:
min(nr_tasks, nr_cpus_allowed);
Where @nr_tasks is the number of actual user space threads associated to
the mm and @nr_cpus_allowed is the superset of all task affinities. It is
growth only as it would be insane to take a racy snapshot of all task
affinities when the affinity of one task changes just do redo it 2
milliseconds later when the next task changes it's affinity.
That means that as long as the number of tasks is lower or equal than the
number of CPUs allowed, each task owns a CID. If the number of tasks
exceeds the number of CPUs allowed it switches to per CPU mode, where the
CPUs own the CIDs and the tasks borrow them as long as they are scheduled
in.
For transition periods CIDs can go beyond the optimal space as long as they
don't go beyond the number of possible CPUs.
The current upstream implementation adds overhead into task migration to
keep the CID with the task. It also has to do the CID space consolidation
work from a task work in the exit to user space path. As that work is
assigned to a random task related to a MM this can inflict unwanted exit
latencies.
Implement the context switch parts of a strict ownership mechanism to
address this.
This removes most of the work from the task which schedules out. Only
during transitioning from per CPU to per task ownership it is required to
drop the CID when leaving the CPU to prevent CID space exhaustion. Other
than that scheduling out is just a single check and branch.
The task which schedules in has to check whether:
1) The ownership mode changed
2) The CID is within the optimal CID space
In stable situations this results in zero work. The only short disruption
is when ownership mode changes or when the associated CID is not in the
optimal CID space. The latter only happens when tasks exit and therefore
the optimal CID space shrinks.
That mechanism is strictly optimized for the common case where no change
happens. The only case where it actually causes a temporary one time spike
is on mode changes when and only when a lot of tasks related to a MM
schedule exactly at the same time and have eventually to compete on
allocating a CID from the bitmap.
In the sysbench test case which triggered the spinlock contention in the
initial CID code, __schedule() drops significantly in perf top on a 128
Core (256 threads) machine when running sysbench with 255 threads, which
fits into the task mode limit of 256 together with the parent thread:
Upstream rseq/perf branch +CID rework
0.42% 0.37% 0.32% [k] __schedule
Increasing the number of threads to 256, which puts the test process into
per CPU mode looks about the same.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.023984859@linutronix.de
|
|
The MM CID management has two fundamental requirements:
1) It has to guarantee that at no given point in time the same CID is
used by concurrent tasks in userspace.
2) The CID space must not exceed the number of possible CPUs in a
system. While most allocators (glibc, tcmalloc, jemalloc) do not care
about that, there seems to be at least librseq depending on it.
The CID space compaction itself is not a functional correctness
requirement, it is only a useful optimization mechanism to reduce the
memory foot print in unused user space pools.
The optimal CID space is:
min(nr_tasks, nr_cpus_allowed);
Where @nr_tasks is the number of actual user space threads associated to
the mm and @nr_cpus_allowed is the superset of all task affinities. It is
growth only as it would be insane to take a racy snapshot of all task
affinities when the affinity of one task changes just do redo it 2
milliseconds later when the next task changes its affinity.
That means that as long as the number of tasks is lower or equal than the
number of CPUs allowed, each task owns a CID. If the number of tasks
exceeds the number of CPUs allowed it switches to per CPU mode, where the
CPUs own the CIDs and the tasks borrow them as long as they are scheduled
in.
For transition periods CIDs can go beyond the optimal space as long as they
don't go beyond the number of possible CPUs.
The current upstream implementation adds overhead into task migration to
keep the CID with the task. It also has to do the CID space consolidation
work from a task work in the exit to user space path. As that work is
assigned to a random task related to a MM this can inflict unwanted exit
latencies.
This can be done differently by implementing a strict CID ownership
mechanism. Either the CIDs are owned by the tasks or by the CPUs. The
latter provides less locality when tasks are heavily migrating, but there
is no justification to optimize for overcommit scenarios and thereby
penalizing everyone else.
Provide the basic infrastructure to implement this:
- Change the UNSET marker to BIT(31) from ~0U
- Add the ONCPU marker as BIT(30)
- Add the TRANSIT marker as BIT(29)
That allows to check for ownership trivially and provides a simple check for
UNSET as well. The TRANSIT marker is required to prevent CID space
exhaustion when switching from per CPU to per task mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251119172549.960252358@linutronix.de
|
|
Prepare for the new CID management scheme which puts the CID ownership
transition into the fork() and exit() slow path by serializing
sched_mm_cid_fork()/exit() with it, so task list and cpu mask walks can be
done in interruptible and preemptible code.
The contention on it is not worse than on other concurrency controls in the
fork()/exit() machinery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.895826703@linutronix.de
|
|
Reading mm::mm_users and mm:::mm_cid::nr_cpus_allowed every time to compute
the maximal CID value is just wasteful as that value is only changing on
fork(), exit() and eventually when the affinity changes.
So it can be easily precomputed at those points and provided in mm::mm_cid
for consumption in the hot path.
But there is an issue with using mm::mm_users for accounting because that
does not necessarily reflect the number of user space tasks as other kernel
code can take temporary references on the MM which skew the picture.
Solve that by adding a users counter to struct mm_mm_cid, which is modified
by fork() and exit() and used for precomputing under mm_mm_cid::lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.832764634@linutronix.de
|
|
It's getting bigger soon, so just move it out of line to the rest of the
code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.769636491@linutronix.de
|
|
There is no need anymore to keep this under sighand lock as the current
code and the upcoming replacement are not depending on the exit state of a
task anymore.
That allows to use a mutex in the exit path.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.706439391@linutronix.de
|
|
This is truly a bitmap and just conveniently uses a cpumask because the
maximum size of the bitmap is nr_cpu_ids.
But that prevents to do searches for a zero bit in a limited range, which
is helpful to provide an efficient mechanism to consolidate the CID space
when the number of users decreases.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://patch.msgid.link/20251119172549.642866767@linutronix.de
|
|
Reevaluating num_possible_cpus() over and over does not make sense. That
becomes a constant after init as cpu_possible_mask is marked ro_after_init.
Cache the value during initialization and provide that for consumption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20251119172549.578653738@linutronix.de
|
|
A CPU system wakeup QoS limit may have been requested by user space. To
avoid breaking this constraint when entering a low power state during
s2idle, let's start to take into account the QoS limit.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-5-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
A CPU system wakeup QoS limit may have been requested by user space. To
avoid breaking this constraint when entering a low power state during
s2idle through genpd, let's extend the corresponding genpd governor for
CPUs. More precisely, during s2idle let the genpd governor select a
suitable domain idle state, by taking into account the QoS limit.
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-3-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Some platforms supports multiple low power states for CPUs that can be used
when entering system-wide suspend. Currently we are always selecting the
deepest possible state for the CPUs, which can break the system wakeup
latency constraint that may be required for a use case.
Let's take the first step towards addressing this problem, by introducing
an interface for user space, that allows us to specify the CPU system
wakeup QoS limit. Subsequent changes will start taking into account the new
QoS limit.
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-2-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Revert commit 7a8c994cbb2d ("ACPI: processor: idle: Optimize ACPI idle
driver registration") because it is reported to introduce a cpuidle
regression leading to a kernel crash on a platform using the ACPI idle
driver.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Closes: https://lore.kernel.org/lkml/20251124200019.GIaSS5U9HhsWBotrQZ@fat_crate.local/
|
|
Revert commit 5020d05b3476 ("ACPI: processor: Remove unused empty stubs
of some functions") because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Revert commit bdf780fbcef5 ("ACPI: processor: idle: Rearrange declarations
in header file") because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Revert commit fbd401e95e56 ("ACPI: processor: idle: Redefine two
functions as void") because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Revert commit 559f2eacc8a2 ACPI: processor: Do not expose global variable
acpi_idle_driver" because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Since 0f67b56d84b4c ("clocksource/drivers/arm_arch_timer_mmio: Switch
over to standalone driver"), acpi_arch_timer_mem_init() is unused.
Remove it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
1. inode_bit_waitqueue() was somehow placed between __inode_add_lru() and
inode_add_lru(). move it up
2. assert ->i_lock is held in __inode_add_lru instead of just claiming it is
needed
3. s/__inode_add_lru/__inode_lru_list_add/ for consistency with itself
(inode_lru_list_del()) and similar routines for sb and io list
management
4. push list presence check into inode_lru_list_del(), just like sb and
io list
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251029131428.654761-2-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In the inode hash code grab the state while ->i_lock is held. If found
to be set, synchronize the sleep once more with the lock held.
In the real world the flag is not set most of the time.
Apart from being simpler to reason about, it comes with a minor speed up
as now clearing the flag does not require the smp_mb() fence.
While here rename wait_on_inode() to wait_on_new_inode() to line it up
with __wait_on_freeing_inode().
Christian Brauner <brauner@kernel.org> says:
As per the discussion in [1] I folded in the diff sent in [2].
Link: https://lore.kernel.org/69238e4d.a70a0220.d98e3.006e.GAE@google.com [1]
Link: https://lore.kernel.org/c2kpawomkbvtahjm7y5mposbhckb7wxthi3iqy5yr22ggpucrm@ufvxwy233qxo [2]
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251010221737.1403539-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This was added by commit 099ada2c8726 ("io_uring/rw: add write support
for IOCB_DIO_CALLER_COMP") and disabled a little later by commit
838b35bb6a89 ("io_uring/rw: disable IOCB_DIO_CALLER_COMP") because it
didn't work. Remove all the related code that sat unused for 2 years.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113170633.1453259-2-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Trivial fix.
Signed-off-by: Askar Safin <safinaskar@gmail.com>
Link: https://patch.msgid.link/20251120195140.571608-1-safinaskar@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In a recent commit, I inadvertently changed a comparison from being an
unsigned comparison (on 64-bit systems) to being a signed comparison
(which it had always been on 32-bit systems). This led to a sporadic
fstests failure.
To make sure this comparison is always unsigned, introduce a new type,
uoff_t which is the unsigned version of loff_t. Generally file sizes
are restricted to being a signed integer, but in these two places it is
convenient to pass -1 to indicate "up to the end of the file".
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20251123220518.1447261-1-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make the enum name in kernel-doc match the code to prevent kernel-doc warnings:
Warning: include/linux/cc_platform.h:106 Enum value
'CC_ATTR_GUEST_SEV_SNP' not described in enum 'cc_attr'
Warning: include/linux/cc_platform.h:106 Excess enum value
'%CC_ATTR_SEV_SNP' description in 'cc_attr'
Fixes: f742b90e61bb ("x86/mm: Extend cc_attr to include AMD SEV-SNP")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251125022730.3163679-1-rdunlap@infradead.org
|
|
syzbot reported that tcf_get_base_ptr() can be called while transport
header is not set [1].
Instead of returning a dangling pointer, return NULL.
Fix tcf_get_base_ptr() callers to handle this NULL value.
[1]
WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 skb_transport_header include/linux/skbuff.h:3071 [inline]
WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 tcf_get_base_ptr include/net/pkt_cls.h:539 [inline]
WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 em_nbyte_match+0x2d8/0x3f0 net/sched/em_nbyte.c:43
Modules linked in:
CPU: 1 UID: 0 PID: 6019 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Call Trace:
<TASK>
tcf_em_match net/sched/ematch.c:494 [inline]
__tcf_em_tree_match+0x1ac/0x770 net/sched/ematch.c:520
tcf_em_tree_match include/net/pkt_cls.h:512 [inline]
basic_classify+0x115/0x2d0 net/sched/cls_basic.c:50
tc_classify include/net/tc_wrapper.h:197 [inline]
__tcf_classify net/sched/cls_api.c:1764 [inline]
tcf_classify+0x4cf/0x1140 net/sched/cls_api.c:1860
multiq_classify net/sched/sch_multiq.c:39 [inline]
multiq_enqueue+0xfd/0x4c0 net/sched/sch_multiq.c:66
dev_qdisc_enqueue+0x4e/0x260 net/core/dev.c:4118
__dev_xmit_skb net/core/dev.c:4214 [inline]
__dev_queue_xmit+0xe83/0x3b50 net/core/dev.c:4729
packet_snd net/packet/af_packet.c:3076 [inline]
packet_sendmsg+0x3e33/0x5080 net/packet/af_packet.c:3108
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:742
____sys_sendmsg+0x505/0x830 net/socket.c:2630
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+f3a497f02c389d86ef16@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6920855a.a70a0220.2ea503.0058.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251121154100.1616228-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement WARN_ONCE like WARN using BUGFLAG_ONCE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115758.339309119@infradead.org
|
|
Since we have an explicit format string, use it for the condition string
instead of frobbing it in the file string.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115758.097401406@infradead.org
|
|
Arm FEAT_SPE_FDS adds the ability to filter on the data source of a
packet using another 64-bits of event filtering control. As the existing
perf_event_attr::configN fields are all used up for SPE PMU, an
additional field is needed. Add a new 'config4' field.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Since the rework of the kernel virtual address space [1] the module area
and the kernel image are within the same 4GB area. Therefore there is no
need for the weak per cpu workaround for modules anymore. Remove it.
[1] commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add the at_least (i.e. 'static') decoration to the fixed-size array
parameters of the sha2 library functions. This causes clang to warn
when a too-small array of known size is passed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251122194206.31822-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add the at_least (i.e. 'static') decoration to the fixed-size array
parameters of the sha1 library functions. This causes clang to warn
when a too-small array of known size is passed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251122194206.31822-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add the at_least (i.e. 'static') decoration to the fixed-size array
parameters of the poly1305 library functions. This causes clang to warn
when a too-small array of known size is passed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251122194206.31822-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add the at_least (i.e. 'static') decoration to the fixed-size array
parameters of the md5 library functions. This causes clang to warn when
a too-small array of known size is passed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251122194206.31822-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add the at_least (i.e. 'static') decoration to the fixed-size array
parameters of the curve25519 library functions. This causes clang to
warn when a too-small array of known size is passed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251122194206.31822-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add the at_least (i.e. 'static') decoration to the fixed-size array
parameters of the chacha library functions. This causes clang to warn
when a too-small array of known size is passed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251122194206.31822-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Several parameters of the chacha20poly1305 functions require arrays of
an exact length. Use the new at_least keyword to instruct gcc and
clang to statically check that the caller is passing an object of at
least that length.
Here it is in action, with this faulty patch to wireguard's cookie.h:
struct cookie_checker {
u8 secret[NOISE_HASH_LEN];
- u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN];
+ u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN - 1];
u8 message_mac1_key[NOISE_SYMMETRIC_KEY_LEN];
If I try compiling this code, I get this helpful warning:
CC drivers/net/wireguard/cookie.o
drivers/net/wireguard/cookie.c: In function ‘wg_cookie_message_create’:
drivers/net/wireguard/cookie.c:193:9: warning: ‘xchacha20poly1305_encrypt’ reading 32 bytes from a region of size 31 [-Wstringop-overread]
193 | xchacha20poly1305_encrypt(dst->encrypted_cookie, cookie, COOKIE_LEN,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
194 | macs->mac1, COOKIE_LEN, dst->nonce,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195 | checker->cookie_encryption_key);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/wireguard/cookie.c:193:9: note: referencing argument 7 of type ‘const u8 *’ {aka ‘const unsigned char *’}
In file included from drivers/net/wireguard/messages.h:10,
from drivers/net/wireguard/cookie.h:9,
from drivers/net/wireguard/cookie.c:6:
include/crypto/chacha20poly1305.h:28:6: note: in a call to function ‘xchacha20poly1305_encrypt’
28 | void xchacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251123054819.2371989-4-Jason@zx2c4.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Clang and recent gcc support warning if they are able to prove that the
user is passing to a function an array that is too short in size. For
example:
void blah(unsigned char herp[at_least 7]);
static void schma(void)
{
unsigned char good[] = { 1, 2, 3, 4, 5, 6, 7 };
unsigned char bad[] = { 1, 2, 3, 4, 5, 6 };
blah(good);
blah(bad);
}
The notation here, `static 7`, which this commit makes explicit by
allowing us to write it as `at_least 7`, means that it's incorrect to
pass anything less than 7 elements. This is section 6.7.5.3 of C99:
If the keyword static also appears within the [ and ] of the array
type derivation, then for each call to the function, the value of
the corresponding actual argument shall provide access to the first
element of an array with at least as many elements as specified by
the size expression.
Here is the output from gcc 15:
zx2c4@thinkpad /tmp $ gcc -c a.c
a.c: In function ‘schma’:
a.c:9:9: warning: ‘blah’ accessing 7 bytes in a region of size 6 [-Wstringop-overflow=]
9 | blah(bad);
| ^~~~~~~~~
a.c:9:9: note: referencing argument 1 of type ‘unsigned char[7]’
a.c:2:6: note: in a call to function ‘blah’
2 | void blah(unsigned char herp[at_least 7]);
| ^~~~
And from clang 21:
zx2c4@thinkpad /tmp $ clang -c a.c
a.c:9:2: warning: array argument is too small; contains 6 elements, callee requires at least 7
[-Warray-bounds]
9 | blah(bad);
| ^ ~~~
a.c:2:25: note: callee declares array parameter as static here
2 | void blah(unsigned char herp[at_least 7]);
| ^ ~~~~~~~~~~
1 warning generated.
So these are covered by, variously, -Wstringop-overflow and
-Warray-bounds.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251123054819.2371989-3-Jason@zx2c4.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Some device drivers (and out-of-tree modules) might want to define
device-specific device governors. Rather than restricting all of them to
be a part of drivers/devfreq/ (which is not possible for out-of-tree
drivers anyway) move governor.h to include/linux/devfreq-governor.h and
update all drivers to use it.
The devfreq_cpu_data is only used internally, by the passive governor,
so it is moved to the driver source rather than being a part of the
public interface.
Reported-by: Robie Basak <robibasa@qti.qualcomm.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20251030-governor-public-v2-1-432a11a9975a@oss.qualcomm.com/
|
|
Pull input fixes from Dmitry Torokhov:
- INPUT_PROP_HAPTIC_TOUCHPAD definition added early in 6.18 cycle has
been renamed to INPUT_PROP_PRESSUREPAD to better reflect the kind of
devices it is supposed to be set for
- a new ID for a touchscreen found in Ayaneo Flip DS in Goodix driver
- Goodix driver no longer tries to set reset pin as "input" as it
causes issues when there is no pull up resistor installed on the
board
- fixes for cros_ec_keyb, imx_sc_key, and pegasus-notetaker drivers to
deal with potential out-of-bounds access and memory corruption issues
* tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: rename INPUT_PROP_HAPTIC_TOUCHPAD to INPUT_PROP_PRESSUREPAD
Input: cros_ec_keyb - fix an invalid memory access
Input: imx_sc_key - fix memory corruption on unload
Input: pegasus-notetaker - fix potential out-of-bounds access
Input: goodix - remove setting of RST pin to input
Input: goodix - add support for ACPI ID GDIX1003
|