aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-02-12KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)Sean Christopherson1-1/+2
When updating paravirtual clocks, setup the Hyper-V TSC page before Xen PV clocks. This will allow dropping xen_pvclock_tsc_unstable in favor of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Remove per-vCPU "cache" of its reference pvclockSean Christopherson3-17/+21
Remove the per-vCPU "cache" of the reference pvclock and instead cache only the TSC shift+multiplier. All other fields in pvclock are fully recomputed by kvm_guest_time_update(), i.e. aren't actually persisted. In addition to shaving a few bytes, explicitly tracking the TSC shift/mul fields makes it easier to see that those fields are tied to hw_tsc_khz (they exist to avoid having to do expensive math in the common case). And conversely, not tracking the other fields makes it easier to see that things like the version number are pulled from the guest's copy, not from KVM's reference. Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock()Sean Christopherson1-7/+7
Pass the reference pvclock structure that's used to setup each individual pvclock as a parameter to kvm_setup_guest_pvclock() as a preparatory step toward removing kvm_vcpu_arch.hv_clock. No functional change intended. Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clockSean Christopherson1-8/+9
Handle "guest stopped" propagation only for kvmclock, as the flag is set if and only if kvmclock is "active", i.e. can only be set for Xen PV clock if kvmclock *and* Xen PV clock are in-use by the guest, which creates very bizarre behavior for the guest. Simply restrict the flag to kvmclock, e.g. instead of trying to handle Xen PV clock, as propagation of PVCLOCK_GUEST_STOPPED was unintentionally added during a refactoring, and while Xen proper defines XEN_PVCLOCK_GUEST_STOPPED, there's no evidence that Xen guests actually support the flag. Check and clear pvclock_set_guest_stopped_request if and only if kvmclock is active to preserve the original behavior, i.e. keep the flag pending if kvmclock happens to be disabled when KVM processes the initial request. Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates") Cc: Paul Durrant <pdurrant@amazon.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocksSean Christopherson1-5/+8
When updating a specific PV clock, make a full copy of KVM's reference copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks. E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV clock. Using a local copy of the pvclock structure also sets the stage for eliminating the per-vCPU copy/cache (only the TSC frequency information actually "needs" to be cached/persisted). Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates") Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86/xen: Use guest's copy of pvclock when starting timerSean Christopherson1-5/+60
Use the guest's copy of its pvclock when starting a Xen timer, as KVM's reference copy may not be up-to-date, i.e. may yield a false positive of sorts. In the unlikely scenario that the guest is starting a Xen timer and has used a Xen pvclock in the past, but has since but turned it "off", then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated if and only if at least one pvclock is enabled. Furthermore, vcpu->arch.hv_clock is currently used by three different pvclocks: kvmclock, Xen, and Xen compat. While it's extremely unlikely a guest would ever enable multiple pvclocks, effectively sharing KVM's reference clock could yield very weird behavior. Using the guest's active Xen pvclock instead of KVM's reference will allow dropping KVM's reference copy. Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers") Cc: Paul Durrant <pdurrant@amazon.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Process "guest stopped request" once per guest time updateSean Christopherson1-5/+12
Handle "guest stopped" requests once per guest time update in preparation of restoring KVM's historical behavior of setting PVCLOCK_GUEST_STOPPED for kvmclock and only kvmclock. For now, simply move the code to minimize the probability of an unintentional change in functionally. Note, in practice, all clocks are guaranteed to see the request (or not) even though each PV clock processes the request individual, as KVM holds vcpu->mutex (blocks KVM_KVMCLOCK_CTRL) and it should be impossible for KVM's suspend notifier to run while KVM is handling requests. And because the helper updates the reference flags, all subsequent PV clock updates will pick up PVCLOCK_GUEST_STOPPED. Note #2, once PVCLOCK_GUEST_STOPPED is restricted to kvmclock, the horrific #ifdef will go away. Cc: Paul Durrant <pdurrant@amazon.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()Sean Christopherson1-5/+2
Drop the local pvclock_flags in kvm_guest_time_update(), the local variable is immediately shoved into the per-vCPU "cache", i.e. the local variable serves no purpose. No functional change intended. Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Eliminate "handling" of impossible errors during SUSPENDSean Christopherson1-13/+7
Drop KVM's handling of kvm_set_guest_paused() failure when reacting to a SUSPEND notification, as kvm_set_guest_paused() only "fails" if the vCPU isn't using kvmclock, and KVM's notifier callback pre-checks that kvmclock is active. I.e. barring some bizarre edge case that shouldn't be treated as an error in the first place, kvm_arch_suspend_notifier() can't fail. Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifierSean Christopherson1-2/+0
When queueing vCPU PVCLOCK updates in response to SUSPEND or HIBERNATE, don't take kvm->lock as doing so can trigger a largely theoretical deadlock, it is perfectly safe to iterate over the xarray of vCPUs without holding kvm->lock, and kvm->lock doesn't protect kvm_set_guest_paused() in any way (pv_time.active and pvclock_set_guest_stopped_request are protected by vcpu->mutex, not kvm->lock). Reported-by: syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/677c0f36.050a0220.3b3668.0014.GAE@google.com Fixes: 7d62874f69d7 ("kvm: x86: implement KVM PM-notifier") Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20250201013827.680235-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-09Linux 6.14-rc2Linus Torvalds1-1/+1
2025-02-09PM: sleep: core: Restrict power.set_active propagationRafael J. Wysocki1-12/+9
Commit 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status of parents and children") exposed an issue related to simple_pm_bus_pm_ops that uses pm_runtime_force_suspend() and pm_runtime_force_resume() as bus type PM callbacks for the noirq phases of system-wide suspend and resume. The problem is that pm_runtime_force_suspend() does not distinguish runtime-suspended devices from devices for which runtime PM has never been enabled, so if it sees a device with runtime PM status set to RPM_ACTIVE, it will assume that runtime PM is enabled for that device and so it will attempt to suspend it with the help of its runtime PM callbacks which may not be ready for that. As it turns out, this causes simple_pm_bus_runtime_suspend() to crash due to a NULL pointer dereference. Another problem related to the above commit and simple_pm_bus_pm_ops is that setting runtime PM status of a device handled by the latter to RPM_ACTIVE will actually prevent it from being resumed because pm_runtime_force_resume() only resumes devices with runtime PM status set to RPM_SUSPENDED. To mitigate these issues, do not allow power.set_active to propagate beyond the parent of the device with DPM_FLAG_SMART_SUSPEND set that will need to be resumed, which should be a sufficient stop-gap for the time being, but they will need to be properly addressed in the future because in general during system-wide resume it is necessary to resume all devices in a dependency chain in which at least one device is going to be resumed. Fixes: 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status of parents and children") Closes: https://lore.kernel.org/linux-pm/1c2433d4-7e0f-4395-b841-b8eac7c25651@nvidia.com/ Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/6137505.lOV4Wx5bFT@rjwysocki.net
2025-02-08fgraph: Fix set_graph_notrace with setting TRACE_GRAPH_NOTRACE_BITSteven Rostedt1-1/+1
The code was restructured where the function graph notrace code, that would not trace a function and all its children is done by setting a NOTRACE flag when the function that is not to be traced is hit. There's a TRACE_GRAPH_NOTRACE_BIT which defines the bit in the flags and a TRACE_GRAPH_NOTRACE which is the mask with that bit set. But the restructuring used TRACE_GRAPH_NOTRACE_BIT when it should have used TRACE_GRAPH_NOTRACE. For example: # cd /sys/kernel/tracing # echo set_track_prepare stack_trace_save > set_graph_notrace # echo function_graph > current_tracer # cat trace [..] 0) | __slab_free() { 0) | free_to_partial_list() { 0) | arch_stack_walk() { 0) | __unwind_start() { 0) 0.501 us | get_stack_info(); Where a non filter trace looks like: # echo > set_graph_notrace # cat trace 0) | free_to_partial_list() { 0) | set_track_prepare() { 0) | stack_trace_save() { 0) | arch_stack_walk() { 0) | __unwind_start() { Where the filter should look like: # cat trace 0) | free_to_partial_list() { 0) | _raw_spin_lock_irqsave() { 0) 0.350 us | preempt_count_add(); 0) 0.351 us | do_raw_spin_lock(); 0) 2.440 us | } Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250208001511.535be150@batman.local.home Fixes: b84214890a9bc ("function_graph: Move graph notrace bit to shadow stack global var") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-07kbuild: Move -Wenum-enum-conversion to W=2Nathan Chancellor1-1/+4
-Wenum-enum-conversion was strengthened in clang-19 to warn for C, which caused the kernel to move it to W=1 in commit 75b5ab134bb5 ("kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1") because there were numerous instances that would break builds with -Werror. Unfortunately, this is not a full solution, as more and more developers, subsystems, and distributors are building with W=1 as well, so they continue to see the numerous instances of this warning. Since the move to W=1, there have not been many new instances that have appeared through various build reports and the ones that have appeared seem to be following similar existing patterns, suggesting that most instances of this warning will not be real issues. The only alternatives for silencing this warning are adding casts (which is generally seen as an ugly practice) or refactoring the enums to macro defines or a unified enum (which may be undesirable because of type safety in other parts of the code). Move the warning to W=2, where warnings that occur frequently but may be relevant should reside. Cc: stable@vger.kernel.org Fixes: 75b5ab134bb5 ("kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1") Link: https://lore.kernel.org/ZwRA9SOcOjjLJcpi@google.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-08kbuild: install-extmod-build: add missing quotation marks for CC variableWangYuli1-1/+1
While attempting to build a Debian packages with CC="ccache gcc", I saw the following error as builddeb builds linux-headers-$KERNELVERSION: make HOSTCC=ccache gcc VPATH= srcroot=. -f ./scripts/Makefile.build obj=debian/linux-headers-6.14.0-rc1/usr/src/linux-headers-6.14.0-rc1/scripts make[6]: *** No rule to make target 'gcc'. Stop. Upon investigation, it seems that one instance of $(CC) variable reference in ./scripts/package/install-extmod-build was missing quotation marks, causing the above error. Add the missing quotation marks around $(CC) to fix build. Fixes: 5f73e7d0386d ("kbuild: refactor cross-compiling linux-headers package") Co-developed-by: Mingcong Bai <jeffbai@aosc.io> Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Tested-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-02-07MAINTAINERS: Remove myselfHector Martin1-1/+0
I no longer have any faith left in the kernel development process or community management approach. Apple/ARM platform development will continue downstream. If I feel like sending some patches upstream in the future myself for whatever subtree I may, or I may not. Anyone who feels like fighting the upstreaming fight themselves is welcome to do so. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-07MAINTAINERS: Move Pavel to kernel.org addressPavel Machek2-9/+7
I need to filter my emails better, switch to pavel@kernel.org address to help with that. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-07vfs: sanity check the length passed to inode_set_cached_link()Mateusz Guzik1-0/+13
This costs a strlen() call when instatianating a symlink. Preferably it would be hidden behind VFS_WARN_ON (or compatible), but there is no such facility at the moment. With the facility in place the call can be patched out in production kernels. In the meantime, since the cost is being paid unconditionally, use the result to a fixup the bad caller. This is not expected to persist in the long run (tm). Sample splat: bad length passed for symlink [/tmp/syz-imagegen43743633/file0/file0] (got 131109, expected 37) [rest of WARN blurp goes here] Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250204213207.337980-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07pidfs: improve ioctl handlingChristian Brauner1-1/+11
Pidfs supports extensible and non-extensible ioctls. The extensible ioctls need to check for the ioctl number itself not just the ioctl command otherwise both backward- and forward compatibility are broken. The pidfs ioctl handler also needs to look at the type of the ioctl command to guard against cases where "[...] a daemon receives some random file descriptor from a (potentially less privileged) client and expects the FD to be of some specific type, it might call ioctl() on this FD with some type-specific command and expect the call to fail if the FD is of the wrong type; but due to the missing type check, the kernel instead performs some action that userspace didn't expect." (cf. [1]] Link: https://lore.kernel.org/r/20250204-work-pidfs-ioctl-v1-1-04987d239575@kernel.org Link: https://lore.kernel.org/r/CAG48ez2K9A5GwtgqO31u9ZL292we8ZwAA=TJwwEv7wRuJ3j4Lw@mail.gmail.com [1] Fixes: 8ce352818820 ("pidfs: check for valid ioctl commands") Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org # v6.13; please backport with 8ce352818820 ("pidfs: check for valid ioctl commands") Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fsnotify: disable pre-content and permission events by defaultAmir Goldstein1-0/+5
After introducing pre-content events, we had a regression related to disabling huge faults on files that should never have pre-content events enabled. This happened because the default f_mode of allocated files (0) does not disable pre-content events. Pre-content events are disabled in file_set_fsnotify_mode_by_watchers() but internal files may not get to call this helper. Initialize f_mode to disable permission and pre-content events for all files and if needed they will be enabled for the callers of file_set_fsnotify_mode_by_watchers(). Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/ Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-4-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07selftests: always check mask returned by statmount(2)Miklos Szeredi1-1/+21
STATMOUNT_MNT_OPTS can actually be missing if there are no options. This is a change of behavior since 75ead69a7173 ("fs: don't let statmount return empty strings"). The other checks shouldn't actually trigger, but add them for correctness and for easier debugging if the test fails. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129160641.35485-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fsnotify: disable notification by default for all pseudo filesAmir Goldstein4-2/+24
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events. Disable notifications to all files allocated with alloc_file_pseudo() and enable legacy inotify events for the specific cases of pipe and socket, which have known users of inotify events. Pre-content events are also kept disabled for sockets and pipes. Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/ Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fs: fix adding security options to statmount.mnt_optMiklos Szeredi1-15/+14
Prepending security options was made conditional on sb->s_op->show_options, but security options are independent of sb options. Fixes: 056d33137bf9 ("fs: prepend statmount.mnt_opts string with security_sb_mnt_opts()") Fixes: f9af549d1fd3 ("fs: export mount options via statmount()") Cc: stable@vger.kernel.org # v6.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129151253.33241-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fsnotify: use accessor to set FMODE_NONOTIFY_*Amir Goldstein5-13/+25
The FMODE_NONOTIFY_* bits are a 2-bits mode. Open coding manipulation of those bits is risky. Use an accessor file_set_fsnotify_mode() to set the mode. Rename file_set_fsnotify_mode() => file_set_fsnotify_mode_from_watchers() to make way for the simple accessor name. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07lockref: remove count argument of lockref_initAndreas Gruenbacher5-7/+8
All users of lockref_init() now initialize the count to 1, so hardcode that and remove the count argument. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lore.kernel.org/r/20250130135624.1899988-4-agruenba@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07gfs2: switch to lockref_init(..., 1)Andreas Gruenbacher1-2/+2
In qd_alloc(), initialize the lockref count to 1 to cover the common case. Compensate for that in gfs2_quota_init() by adjusting the count back down to 0; this only occurs when mounting the filesystem rw. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lore.kernel.org/r/20250130135624.1899988-3-agruenba@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07gfs2: use lockref_init for gl_lockrefAndreas Gruenbacher2-2/+1
Move the initialization of gl_lockref from gfs2_init_glock_once() to gfs2_glock_get(). This allows to use lockref_init() there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lore.kernel.org/r/20250130135624.1899988-2-agruenba@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07statmount: let unset strings be emptyMiklos Szeredi1-9/+16
Just like it's normal for unset values to be zero, unset strings should be empty instead of containing random values. It seems to be a typical mistake that the mask returned by statmount is not checked, which can result in various bugs. With this fix, these bugs are prevented, since it is highly likely that userspace would just want to turn the missing mask case into an empty string anyway (most of the recently found cases are of this type). Link: https://lore.kernel.org/all/CAJfpegsVCPfCn2DpM8iiYSS5DpMsLB8QBUCHecoj6s0Vxf4jzg@mail.gmail.com/ Fixes: 68385d77c05b ("statmount: simplify string option retrieval") Fixes: 46eae99ef733 ("add statmount(2) syscall") Cc: stable@vger.kernel.org # v6.8 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250130121500.113446-1-mszeredi@redhat.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07vboxsf: fix building with GCC 15Brahmajit Das1-1/+2
Building with GCC 15 results in build error fs/vboxsf/super.c:24:54: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization] 24 | static const unsigned char VBSF_MOUNT_SIGNATURE[4] = "\000\377\376\375"; | ^~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Due to GCC having enabled -Werror=unterminated-string-initialization[0] by default. Separately initializing each array element of VBSF_MOUNT_SIGNATURE to ensure NUL termination, thus satisfying GCC 15 and fixing the build error. [0]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-unterminated-string-initialization Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com> Link: https://lore.kernel.org/r/20250121162648.1408743-1-brahmajit.xyz@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()Su Hui1-1/+3
Clang static checker(scan-build) warning: fs/stat.c:287:21: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage. 287 | stat->result_mask |= STATX_MNT_ID_UNIQUE; | ~~~~~~~~~~~~~~~~~ ^ fs/stat.c:290:21: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage. 290 | stat->result_mask |= STATX_MNT_ID; When vfs_getattr() failed because of security_inode_getattr(), 'stat' is uninitialized. In this case, there is a harmless garbage problem in vfs_statx_path(). It's better to return error directly when vfs_getattr() failed, avoiding garbage value and more clearly. Signed-off-by: Su Hui <suhui@nfschina.com> Link: https://lore.kernel.org/r/20250119025946.1168957-1-suhui@nfschina.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07timers/migration: Fix off-by-one root mis-connectionFrederic Weisbecker1-1/+9
Before attaching a new root to the old root, the children counter of the new root is checked to verify that only the upcoming CPU's top group have been connected to it. However since the recently added commit b729cc1ec21a ("timers/migration: Fix another race between hotplug and idle entry/exit") this check is not valid anymore because the old root is pre-accounted as a child to the new root. Therefore after connecting the upcoming CPU's top group to the new root, the children count to be expected must be 2 and not 1 anymore. This omission results in the old root to not be connected to the new root. Then eventually the system may run with more than one top level, which defeats the purpose of a single idle migrator. Also the old root is pre-accounted but not connected upon the new root creation. But it can be connected to the new root later on. Therefore the old root may be accounted twice to the new root. The propagation of such overcommit can end up creating a double final top-level root with a groupmask incorrectly initialized. Although harmless given that the final top level roots will never have a parent to walk up to, this oddity opportunistically reported the core issue: WARNING: CPU: 8 PID: 0 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote CPU: 8 UID: 0 PID: 0 Comm: swapper/8 RIP: 0010:tmigr_requires_handle_remote Call Trace: <IRQ> ? tmigr_requires_handle_remote ? hrtimer_run_queues update_process_times tick_periodic tick_handle_periodic __sysvec_apic_timer_interrupt sysvec_apic_timer_interrupt </IRQ> Fix the problem by taking the old root into account in the children count of the new root so the connection is not omitted. Also warn when more than one top level group exists to better detect similar issues in the future. Fixes: b729cc1ec21a ("timers/migration: Fix another race between hotplug and idle entry/exit") Reported-by: Matt Fleming <mfleming@cloudflare.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250205160220.39467-1-frederic@kernel.org
2025-02-07genirq: Remove leading space from irq_chip::irq_print_chip() callbacksGeert Uytterhoeven4-4/+4
The space separator was factored out from the multiple chip name prints, but several irq_chip::irq_print_chip() callbacks still print a leading space. Remove the superfluous double spaces. Fixes: 9d9f204bdf7243bf ("genirq/proc: Add missing space separator back") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/893f7e9646d8933cd6786d5a1ef3eb076d263768.1738764803.git.geert+renesas@glider.be
2025-02-06bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalanceKent Overstreet4-20/+26
Previously, bch2_bkey_sectors_need_rebalance() called bch2_target_accepts_data(), checking whether the target is writable. However, this means that adding or removing devices from a target would change the value of bch2_bkey_sectors_need_rebalance() for an existing extent; this needs to be invariant so that the extent trigger can correctly maintain rebalance_work accounting. Instead, check target_accepts_data() in io_opts_to_rebalance_opts(), before creating the bch_extent_rebalance entry. This fixes (one?) cause of rebalance_work accounting being off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()Kent Overstreet1-1/+0
Spotted by sparse. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: Fix discard path journal flushingKent Overstreet8-35/+55
The discard path is supposed to issue journal flushes when there's too many buckets empty buckets that need a journal commit before they can be written to again, but at some point this code seems to have been lost. Bring it back with a new optimization to make sure we don't issue too many journal flushes: the journal now tracks the sequence number of the most recent flush in progress, which the discard path uses when deciding which buckets need a journal flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: fix deadlock in journal_entry_open()Jeongjun Park4-2/+28
In the previous commit b3d82c2f2761, code was added to prevent journal sequence overflow. Among them, the code added to journal_entry_open() uses the bch2_fs_fatal_err_on() function to handle errors. However, __journal_res_get() , which calls journal_entry_open() , calls journal_entry_open() while holding journal->lock , but bch2_fs_fatal_err_on() internally tries to acquire journal->lock , which results in a deadlock. So we need to add a locked helper to handle fatal errors even when the journal->lock is held. Fixes: b3d82c2f2761 ("bcachefs: Guard against journal seq overflow") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()Jeongjun Park1-1/+6
For some unknown reason, checks on struct bkey_s_c_snapshot and struct bkey_s_c_snapshot_tree pointers are missing. Therefore, I think it would be appropriate to fix the incorrect pointer checking through this patch. Fixes: 4bd06f07bcb5 ("bcachefs: Fixes for snapshot_tree.master_subvol") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs docs: SubmittingPatches.rstKent Overstreet3-0/+100
Add an (initial?) patch submission checklist, focusing mainly on testing. Yes, all patches must be tested, and that starts (but does not end) with the patch author. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()Kees Cook1-4/+8
The destination argument of memtostr*() and strtomem*() must be a fixed-size char array at compile time, so there is no need to use __builtin_object_size() (which is useful for when an argument is either a pointer or unknown). Instead use ARRAY_SIZE(), which has the benefit of working around a bug in Clang (fixed[1] in 15+) that got __builtin_object_size() wrong sometimes. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501310832.kiAeOt2z-lkp@intel.com/ Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://github.com/llvm/llvm-project/commit/d8e0a6d5e9dd2311641f9a8a5d2bf90829951ddc [1] Tested-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06compiler.h: Introduce __must_be_byte_array()Kees Cook1-1/+7
In preparation for adding stricter type checking to the str/mem*() helpers, provide a way to check that a variable is a byte array via __must_be_byte_array(). Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06compiler.h: Move C string helpers into C-only kernel sectionKees Cook1-13/+13
The C kernel helpers for evaluating C Strings were positioned where they were visible to assembly inclusion, which was not intended. Move them into the kernel and C-only area of the header so future changes won't confuse the assembler. Fixes: d7a516c6eeae ("compiler.h: Fix undefined BUILD_BUG_ON_ZERO()") Fixes: 559048d156ff ("string: Check for "nonstring" attribute on strscpy() arguments") Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-07x86: rust: set rustc-abi=x86-softfloat on rustc>=1.86.0Alice Ryhl1-0/+18
When using Rust on the x86 architecture, we are currently using the unstable target.json feature to specify the compilation target. Rustc is going to change how softfloat is specified in the target.json file on x86, thus update generate_rust_target.rs to specify softfloat using the new option. Note that if you enable this parameter with a compiler that does not recognize it, then that triggers a warning but it does not break the build. [ For future reference, this solves the following error: RUSTC L rust/core.o error: Error loading target specification: target feature `soft-float` is incompatible with the ABI but gets enabled in target spec. Run `rustc --print target-list` for a list of built-in targets - Miguel ] Cc: <stable@vger.kernel.org> # Needed in 6.12.y and 6.13.y only (Rust is pinned in older LTSs). Link: https://github.com/rust-lang/rust/pull/136146 Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86 Link: https://lore.kernel.org/r/20250203-rustc-1-86-x86-softfloat-v1-1-220a72a5003e@google.com [ Added 6.13.y too to Cc: stable tag and added reasoning to avoid over-backporting. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-02-06selftests/seccomp: validate uretprobe syscall passes through seccompEyal Birger1-0/+199
The uretprobe syscall is implemented as a performance enhancement on x86_64 by having the kernel inject a call to it on function exit; User programs cannot call this system call explicitly. As such, this syscall is considered a kernel implementation detail and should not be filtered by seccomp. Enhance the seccomp bpf test suite to check that uretprobes can be attached to processes without the killing the process regardless of seccomp policy. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20250202162921.335813-3-eyal.birger@gmail.com [kees: Skip archs without __NR_uretprobe] Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06seccomp: passthrough uretprobe systemcall without filteringEyal Birger1-0/+12
When attaching uretprobes to processes running inside docker, the attached process is segfaulted when encountering the retprobe. The reason is that now that uretprobe is a system call the default seccomp filters in docker block it as they only allow a specific set of known syscalls. This is true for other userspace applications which use seccomp to control their syscall surface. Since uretprobe is a "kernel implementation detail" system call which is not used by userspace application code directly, it is impractical and there's very little point in forcing all userspace applications to explicitly allow it in order to avoid crashing tracked processes. Pass this systemcall through seccomp without depending on configuration. Note: uretprobe is currently only x86_64 and isn't expected to ever be supported in i386. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Rafael Buchbinder <rafi@rbk.io> Closes: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/ Link: https://lore.kernel.org/lkml/20250121182939.33d05470@gandalf.local.home/T/#me2676c378eff2d6a33f3054fed4a5f3afa64e65b Link: https://lore.kernel.org/lkml/20250128145806.1849977-1-eyal.birger@gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20250202162921.335813-2-eyal.birger@gmail.com [kees: minimized changes for easier backporting, tweaked commit log] Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06stackinit: Fix comment for test_small_endGeert Uytterhoeven1-1/+1
In union test_small_end, the small members are three and four. Fixes: e71a29db79da1946 ("stackinit: Add union initialization to selftests") Closes: https://lore.kernel.org/CAMuHMdWvcKOc6v5o3-9-SqP_4oh5-GZQjZZb=-krhY=mVRED_Q@mail.gmail.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/3f8faa2d7d0d6b36571093ab0fb1fd5157abd7bb.1738593178.git.geert+renesas@glider.be Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06stackinit: Keep selftest union size small on m68kKees Cook1-1/+3
The stack frame on m68k is very sensitive to the size of what needs to be stored. Like done for long string testing, reduce the size of the large trailing struct in the union initialization testing. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdXW8VbtOAixO7w+aDOG70aZtZ50j1Ybcr8B3eYnRUcrcA@mail.gmail.com Fixes: e71a29db79da ("stackinit: Add union initialization to selftests") Link: https://lore.kernel.org/r/20250204174509.work.711-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
2025-02-06cpufreq/amd-pstate: Fix cpufreq_policy ref countingDhananjay Ugwekar1-4/+5
amd_pstate_update_limits() takes a cpufreq_policy reference but doesn't decrement the refcount in one of the exit paths, fix that. Fixes: 45722e777fd9 ("cpufreq: amd-pstate: Optimize amd_pstate_update_limits()") Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250205112523.201101-10-dhananjay.ugwekar@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-02-06rust: kbuild: do not export generated KASAN ODR symbolsMatthew Maurer1-1/+1
ASAN generates special synthetic symbols to help check for ODR violations. These synthetic symbols lack debug information, so gendwarfksyms emits warnings when processing them. No code should ever have a dependency on these symbols, so we should not be exporting them, just like the __cfi symbols. Signed-off-by: Matthew Maurer <mmaurer@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250122-gendwarfksyms-kasan-rust-v1-1-5ee5658f4fb6@google.com [ Fixed typo in commit message. Slightly reworded title. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-02-06PCI/TPH: Restore TPH Requester Enable correctlyRobin Murphy1-1/+1
When we reenable TPH after changing a Steering Tag value, we need the actual TPH Requester Enable value, not the ST Mode (which only happens to work out by chance for non-extended TPH in interrupt vector mode). Link: https://lore.kernel.org/r/13118098116d7bce07aa20b8c52e28c7d1847246.1738759933.git.robin.murphy@arm.com Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Wei Huang <wei.huang2@amd.com>
2025-02-06rust: kbuild: add -fzero-init-padding-bits to bindgen_skip_cflagsJustin M. Forbes1-0/+1
This seems to break the build when building with gcc15: Unable to generate bindings: ClangDiagnostic("error: unknown argument: '-fzero-init-padding-bits=all'\n") Thus skip that flag. Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org> Fixes: dce4aab8441d ("kbuild: Use -fzero-init-padding-bits=all") Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250129215003.1736127-1-jforbes@fedoraproject.org [ Slightly reworded commit. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>