aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/exported-sql-viewer.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2024-12-24drivers: base: test: Add ...find_device_by...(... NULL) testsBrian Norris1-1/+40
We recently updated these device_match*() (and therefore, various *find_device_by*()) functions to return a consistent 'false' value when trying to match a NULL handle. Add tests for this. This provides regression-testing coverage for the sorts of bugs that underly commit 5c8418cf4025 ("PCI/pwrctrl: Unregister platform device only if one actually exists"). Reviewed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: David Gow <davidgow@google.com> Link: https://lore.kernel.org/r/20241216201148.535115-4-briannorris@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24drivers: base: test: Enable device model tests with KUNIT_ALL_TESTSBrian Norris1-0/+1
Per commit bebe94b53eb7 ("drivers: base: default KUNIT_* fragments to KUNIT_ALL_TESTS"), it seems like we should default to KUNIT_ALL_TESTS. This enables these platform_device tests for common configurations, such as with: ./tools/testing/kunit/kunit.py run Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20241216201148.535115-3-briannorris@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24drivers: base: Don't match devices with NULL of_node/fwnode/etcBrian Norris1-4/+4
of_find_device_by_node(), bus_find_device_by_of_node(), bus_find_device_by_fwnode(), ..., all produce arbitrary results when provided with a NULL of_node, fwnode, ACPI handle, etc. This is counterintuitive, and the source of a few bugs, such as the one fixed by commit 5c8418cf4025 ("PCI/pwrctrl: Unregister platform device only if one actually exists"). It's hard to imagine a good reason that these device_match_*() APIs should return 'true' for a NULL argument. Augment these to return 0 (false). Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Acked-by: David Gow <davidgow@google.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20241216201148.535115-2-briannorris@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24kheaders: Simplify attribute through __BIN_ATTR_SIMPLE_RO()Thomas Weißschuh1-16/+3
The utility macro from the sysfs core is sufficient to implement this attribute. Make use of it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20241221-sysfs-const-bin_attr-kheaders-v2-1-8205538aa012@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20MAINTAINERS: add Danilo to DRIVER COREDanilo Krummrich1-0/+1
Let's keep an eye on the Rust abstractions; add myself as reviewer to DRIVER CORE. Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-17-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20samples: rust: add Rust platform sample driverDanilo Krummrich5-0/+66
Add a sample Rust platform driver illustrating the usage of the platform bus abstractions. This driver probes through either a match of device / driver name or a match within the OF ID table. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-16-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: platform: add basic platform device / driver abstractionsDanilo Krummrich6-0/+215
Implement the basic platform bus abstractions required to write a basic platform driver. This includes the following data structures: The `platform::Driver` trait represents the interface to the driver and provides `platform::Driver::probe` for the driver to implement. The `platform::Device` abstraction represents a `struct platform_device`. In order to provide the platform bus specific parts to a generic `driver::Registration` the `driver::RegistrationOps` trait is implemented by `platform::Adapter`. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-15-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: driver: implement `Adapter`Danilo Krummrich2-1/+58
In order to not duplicate code in bus specific implementations (e.g. platform), implement a generic `driver::Adapter` to represent the connection of matched drivers and devices. Bus specific `Adapter` implementations can simply implement this trait to inherit generic functionality, such as matching OF or ACPI device IDs and ID table entries. Suggested-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-14-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: of: add `of::DeviceId` abstractionDanilo Krummrich3-0/+62
`of::DeviceId` is an abstraction around `struct of_device_id`. This is used by subsequent patches, in particular the platform bus abstractions, to create OF device ID tables. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-13-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20samples: rust: add Rust PCI sample driverDanilo Krummrich4-0/+123
This commit adds a sample Rust PCI driver for QEMU's "pci-testdev" device. To enable this device QEMU has to be called with `-device pci-testdev`. The same driver shows how to use the PCI device / driver abstractions, as well as how to request and map PCI BARs, including a short sequence of MMIO operations. Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-12-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: pci: implement I/O mappable `pci::Bar`Danilo Krummrich1-1/+141
Implement `pci::Bar`, `pci::Device::iomap_region` and `pci::Device::iomap_region_sized` to allow for I/O mappings of PCI BARs. To ensure that a `pci::Bar`, and hence the I/O memory mapping, can't out-live the PCI device, the `pci::Bar` type is always embedded into a `Devres` container, such that the `pci::Bar` is revoked once the device is unbound and hence the I/O mapped memory is unmapped. A `pci::Bar` can be requested with (`pci::Device::iomap_region_sized`) or without (`pci::Device::iomap_region`) a const generic representing the minimal requested size of the I/O mapped memory region. In case of the latter only runtime checked I/O reads / writes are possible. Co-developed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-11-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: pci: add basic PCI device / driver abstractionsDanilo Krummrich6-0/+315
Implement the basic PCI abstractions required to write a basic PCI driver. This includes the following data structures: The `pci::Driver` trait represents the interface to the driver and provides `pci::Driver::probe` for the driver to implement. The `pci::Device` abstraction represents a `struct pci_dev` and provides abstractions for common functions, such as `pci::Device::set_master`. In order to provide the PCI specific parts to a generic `driver::Registration` the `driver::RegistrationOps` trait is implemented by `pci::Adapter`. `pci::DeviceId` implements PCI device IDs based on the generic `device_id::RawDevceId` abstraction. Co-developed-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-10-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: add devres abstractionDanilo Krummrich5-0/+191
Add a Rust abstraction for the kernel's devres (device resource management) implementation. The Devres type acts as a container to manage the lifetime and accessibility of device bound resources. Therefore it registers a devres callback and revokes access to the resource on invocation. Users of the Devres abstraction can simply free the corresponding resources in their Drop implementation, which is invoked when either the Devres instance goes out of scope or the devres callback leads to the resource being revoked, which implies a call to drop_in_place(). Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-9-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: add `io::{Io, IoRaw}` base typesDanilo Krummrich4-0/+363
I/O memory is typically either mapped through direct calls to ioremap() or subsystem / bus specific ones such as pci_iomap(). Even though subsystem / bus specific functions to map I/O memory are based on ioremap() / iounmap() it is not desirable to re-implement them in Rust. Instead, implement a base type for I/O mapped memory, which generically provides the corresponding accessors, such as `Io::readb` or `Io:try_readb`. `Io` supports an optional const generic, such that a driver can indicate the minimal expected and required size of the mapping at compile time. Correspondingly, calls to the 'non-try' accessors, support compile time checks of the I/O memory offset to read / write, while the 'try' accessors, provide boundary checks on runtime. `IoRaw` is meant to be embedded into a structure (e.g. pci::Bar or io::IoMem) which creates the actual I/O memory mapping and initializes `IoRaw` accordingly. To ensure that I/O mapped memory can't out-live the device it may be bound to, subsystems must embed the corresponding I/O memory type (e.g. pci::Bar) into a `Devres` container, such that it gets revoked once the device is unbound. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Tested-by: Daniel Almeida <daniel.almeida@collabora.com> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-8-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: add `Revocable` typeWedson Almeida Filho2-0/+220
Revocable allows access to objects to be safely revoked at run time. This is useful, for example, for resources allocated during device probe; when the device is removed, the driver should stop accessing the device resources even if another state is kept in memory due to existing references (i.e., device context data is ref-counted and has a non-zero refcount after removal of the device). Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com> Co-developed-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20241219170425.12036-7-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: types: add `Opaque::pin_init`Danilo Krummrich1-0/+11
Analogous to `Opaque::new` add `Opaque::pin_init`, which instead of a value `T` takes a `PinInit<T>` and returns a `PinInit<Opaque<T>>`. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Suggested-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-6-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: add rcu abstractionWedson Almeida Filho5-0/+63
Add a simple abstraction to guard critical code sections with an rcu read lock. Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com> Co-developed-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-5-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: implement `IdArray`, `IdTable` and `RawDeviceId`Danilo Krummrich3-0/+172
Most subsystems use some kind of ID to match devices and drivers. Hence, we have to provide Rust drivers an abstraction to register an ID table for the driver to match. Generally, those IDs are subsystem specific and hence need to be implemented by the corresponding subsystem. However, the `IdArray`, `IdTable` and `RawDeviceId` types provide a generalized implementation that makes the life of subsystems easier to do so. Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com> Co-developed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Gary Guo <gary@garyguo.net> Co-developed-by: Fabien Parent <fabien.parent@linaro.org> Signed-off-by: Fabien Parent <fabien.parent@linaro.org> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-4-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: implement generic driver registrationDanilo Krummrich3-0/+119
Implement the generic `Registration` type and the `RegistrationOps` trait. The `Registration` structure is the common type that represents a driver registration and is typically bound to the lifetime of a module. However, it doesn't implement actual calls to the kernel's driver core to register drivers itself. Instead the `RegistrationOps` trait is provided to subsystems, which have to implement `RegistrationOps::register` and `RegistrationOps::unregister`. Subsystems have to provide an implementation for both of those methods where the subsystem specific variants to register / unregister a driver have to implemented. For instance, the PCI subsystem would call __pci_register_driver() from `RegistrationOps::register` and pci_unregister_driver() from `DrvierOps::unregister`. Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-3-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20rust: module: add trait `ModuleMetadata`Danilo Krummrich2-0/+10
In order to access static metadata of a Rust kernel module, add the `ModuleMetadata` trait. In particular, this trait provides the name of a Rust kernel module as specified by the `module!` macro. Signed-off-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Tested-by: Fabien Parent <fabien.parent@linaro.org> Link: https://lore.kernel.org/r/20241219170425.12036-2-dakr@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16rust: miscdevice: add fops->show_fdinfo() hookAlice Ryhl1-0/+34
File descriptors should generally provide a fops->show_fdinfo() hook for debugging purposes. Thus, add such a hook to the miscdevice abstractions. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20241203-miscdevice-showfdinfo-v1-1-7e990732d430@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16samples: rust_misc_device: Provide an example C program to exercise functionalityLee Jones1-0/+90
Here is the expected output (manually spliced together): USERSPACE: Opening /dev/rust-misc-device for reading and writing KERNEL: rust_misc_device: Opening Rust Misc Device Sample USERSPACE: Calling Hello KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample KERNEL: rust_misc_device: -> Hello from the Rust Misc Device USERSPACE: Fetching initial value KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample KERNEL: rust_misc_device: -> Copying data to userspace (value: 0) USERSPACE: Submitting new value (1) KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample KERNEL: rust_misc_device: -> Copying data from userspace (value: 1) USERSPACE: Fetching new value KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample KERNEL: rust_misc_device: -> Copying data to userspace (value: 1) USERSPACE: Attempting to call in to an non-existent IOCTL KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample KERNEL: rust_misc_device: -> IOCTL not recognised: 20992 USERSPACE: ioctl: Succeeded to fail - this was expected: Inappropriate ioctl for device USERSPACE: Closing /dev/rust-misc-device KERNEL: rust_misc_device: Exiting the Rust Misc Device Sample USERSPACE: Success Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20241213134715.601415-6-lee@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16MAINTAINERS: Add Rust Misc Sample to MISC entryLee Jones1-0/+1
Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20241213134715.601415-5-lee@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16samples: rust_misc_device: Demonstrate additional get/set value functionalityLee Jones1-14/+75
Expand the complexity of the sample driver by providing the ability to get and set an integer. The value is protected by a mutex. Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20241213134715.601415-4-lee@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16samples: rust: Provide example using the new Rust MiscDevice abstractionLee Jones3-0/+98
This sample driver demonstrates the following basic operations: * Register a Misc Device * Create /dev/rust-misc-device * Provide open call-back for the aforementioned character device * Operate on the character device via a simple ioctl() * Provide close call-back for the character device Signed-off-by: Lee Jones <lee@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20241213134715.601415-3-lee@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16Documentation: ioctl-number: Carve out some identifiers for use by sample driversLee Jones1-0/+1
32 IDs should be plenty (at yet, not too greedy) since a lot of sample drivers will use their own subsystem allocations. Sample drivers predominately reside in <KERNEL_ROOT>/samples, but there should be no issue with in-place example drivers using them. Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20241213134715.601415-2-lee@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16rust: miscdevice: Provide accessor to pull out miscdevice::this_deviceLee Jones1-0/+11
There are situations where a pointer to a `struct device` will become necessary (e.g. for calling into dev_*() functions). This accessor allows callers to pull this out from the `struct miscdevice`. Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Tested-by: Lee Jones <lee@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20241210-miscdevice-file-param-v3-3-b2a79b666dc5@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16rust: miscdevice: access the `struct miscdevice` from fops->open()Alice Ryhl1-8/+22
Providing access to the underlying `struct miscdevice` is useful for various reasons. For example, this allows you access the miscdevice's internal `struct device` for use with the `dev_*` printing macros. Note that since the underlying `struct miscdevice` could get freed at any point after the fops->open() call (if misc_deregister is called), only the open call is given access to it. To use `dev_*` printing macros from other fops hooks, take a refcount on `miscdevice->this_device` to keep it alive. See the linked thread for further discussion on the lifetime of `struct miscdevice`. Link: https://lore.kernel.org/r/2024120951-botanist-exhale-4845@gregkh Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Lee Jones <lee@kernel.org> Tested-by: Lee Jones <lee@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20241210-miscdevice-file-param-v3-2-b2a79b666dc5@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16rust: miscdevice: access file in fopsAlice Ryhl1-6/+25
This allows fops to access information about the underlying struct file for the miscdevice. For example, the Binder driver needs to inspect the O_NONBLOCK flag inside the fops->ioctl() hook. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Lee Jones <lee@kernel.org> Tested-by: Lee Jones <lee@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20241210-miscdevice-file-param-v3-1-b2a79b666dc5@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-15Linux 6.13-rc3Linus Torvalds1-1/+1
2024-12-14bpf: Avoid deadlock caused by nested kprobe and fentry bpf programsPriya Bala Govindasamy1-0/+6
BPF program types like kprobe and fentry can cause deadlocks in certain situations. If a function takes a lock and one of these bpf programs is hooked to some point in the function's critical section, and if the bpf program tries to call the same function and take the same lock it will lead to deadlock. These situations have been reported in the following bug reports. In percpu_freelist - Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@mail.gmail.com/T/ Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com/T/ In bpf_lru_list - Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@mail.gmail.com/T/ Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@mail.gmail.com/T/ Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@mail.gmail.com/T/ Similar bugs have been reported by syzbot. In queue_stack_maps - Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/ Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@sina.com/T/ In lpm_trie - Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@google.com/T/ In ringbuf - Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@sina.com/T/ Prevent kprobe and fentry bpf programs from attaching to these critical sections by removing CC_FLAGS_FTRACE for percpu_freelist.o, bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files. The bugs reported by syzbot are due to tracepoint bpf programs being called in the critical sections. This patch does not aim to fix deadlocks caused by tracepoint programs. However, it does prevent deadlocks from occurring in similar situations due to kprobe and fentry programs. Signed-off-by: Priya Bala Govindasamy <pgovind2@uci.edu> Link: https://lore.kernel.org/r/CAPPBnEZpjGnsuA26Mf9kYibSaGLm=oF6=12L21X1GEQdqjLnzQ@mail.gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13selftests/bpf: Add tests for raw_tp NULL argsKumar Kartikeya Dwivedi2-0/+27
Add tests to ensure that arguments are correctly marked based on their specified positions, and whether they get marked correctly as maybe null. For modules, all tracepoint parameters should be marked PTR_MAYBE_NULL by default. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13bpf: Augment raw_tp arguments with PTR_MAYBE_NULLKumar Kartikeya Dwivedi2-10/+147
Arguments to a raw tracepoint are tagged as trusted, which carries the semantics that the pointer will be non-NULL. However, in certain cases, a raw tracepoint argument may end up being NULL. More context about this issue is available in [0]. Thus, there is a discrepancy between the reality, that raw_tp arguments can actually be NULL, and the verifier's knowledge, that they are never NULL, causing explicit NULL check branch to be dead code eliminated. A previous attempt [1], i.e. the second fixed commit, was made to simulate symbolic execution as if in most accesses, the argument is a non-NULL raw_tp, except for conditional jumps. This tried to suppress branch prediction while preserving compatibility, but surfaced issues with production programs that were difficult to solve without increasing verifier complexity. A more complete discussion of issues and fixes is available at [2]. Fix this by maintaining an explicit list of tracepoints where the arguments are known to be NULL, and mark the positional arguments as PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments are known to be ERR_PTR, and mark these arguments as scalar values to prevent potential dereference. Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2), shifted by the zero-indexed argument number x 4. This can be represented as follows: 1st arg: 0x1 2nd arg: 0x10 3rd arg: 0x100 ... and so on (likewise for ERR_PTR case). In the future, an automated pass will be used to produce such a list, or insert __nullable annotations automatically for tracepoints. Each compilation unit will be analyzed and results will be collated to find whether a tracepoint pointer is definitely not null, maybe null, or an unknown state where verifier conservatively marks it PTR_MAYBE_NULL. A proof of concept of this tool from Eduard is available at [3]. Note that in case we don't find a specification in the raw_tp_null_args array and the tracepoint belongs to a kernel module, we will conservatively mark the arguments as PTR_MAYBE_NULL. This is because unlike for in-tree modules, out-of-tree module tracepoints may pass NULL freely to the tracepoint. We don't protect against such tracepoints passing ERR_PTR (which is uncommon anyway), lest we mark all such arguments as SCALAR_VALUE. While we are it, let's adjust the test raw_tp_null to not perform dereference of the skb->mark, as that won't be allowed anymore, and make it more robust by using inline assembly to test the dead code elimination behavior, which should still stay the same. [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com [3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Co-developed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"Kumar Kartikeya Dwivedi4-87/+9
This patch reverts commit cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The patch was well-intended and meant to be as a stop-gap fixing branch prediction when the pointer may actually be NULL at runtime. Eventually, it was supposed to be replaced by an automated script or compiler pass detecting possibly NULL arguments and marking them accordingly. However, it caused two main issues observed for production programs and failed to preserve backwards compatibility. First, programs relied on the verifier not exploring == NULL branch when pointer is not NULL, thus they started failing with a 'dereference of scalar' error. Next, allowing raw_tp arguments to be modified surfaced the warning in the verifier that warns against reg->off when PTR_MAYBE_NULL is set. More information, context, and discusson on both problems is available in [0]. Overall, this approach had several shortcomings, and the fixes would further complicate the verifier's logic, and the entire masking scheme would have to be removed eventually anyway. Hence, revert the patch in preparation of a better fix avoiding these issues to replace this commit. [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com Reported-by: Manu Bretelle <chantra@meta.com> Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13ARC: build: Try to guess GCC variant of cross compilerLeon Romanovsky1-1/+1
ARC GCC compiler is packaged starting from Fedora 39i and the GCC variant of cross compile tools has arc-linux-gnu- prefix and not arc-linux-. This is causing that CROSS_COMPILE variable is left unset. This change allows builds without need to supply CROSS_COMPILE argument if distro package is used. Before this change: $ make -j 128 ARCH=arc W=1 drivers/infiniband/hw/mlx4/ gcc: warning: ‘-mcpu=’ is deprecated; use ‘-mtune=’ or ‘-march=’ instead gcc: error: unrecognized command-line option ‘-mmedium-calls’ gcc: error: unrecognized command-line option ‘-mlock’ gcc: error: unrecognized command-line option ‘-munaligned-access’ [1] https://packages.fedoraproject.org/pkgs/cross-gcc/gcc-arc-linux-gnu/index.html Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2024-12-13KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module initSean Christopherson3-5/+29
Snapshot the output of CPUID.0xD.[1..n] during kvm.ko initiliaization to avoid the overead of CPUID during runtime. The offset, size, and metadata for CPUID.0xD.[1..n] sub-leaves does not depend on XCR0 or XSS values, i.e. is constant for a given CPU, and thus can be cached during module load. On Intel's Emerald Rapids, CPUID is *wildly* expensive, to the point where recomputing XSAVE offsets and sizes results in a 4x increase in latency of nested VM-Enter and VM-Exit (nested transitions can trigger xstate_required_size() multiple times per transition), relative to using cached values. The issue is easily visible by running `perf top` while triggering nested transitions: kvm_update_cpuid_runtime() shows up at a whopping 50%. As measured via RDTSC from L2 (using KVM-Unit-Test's CPUID VM-Exit test and a slightly modified L1 KVM to handle CPUID in the fastpath), a nested roundtrip to emulate CPUID on Skylake (SKX), Icelake (ICX), and Emerald Rapids (EMR) takes: SKX 11650 ICX 22350 EMR 28850 Using cached values, the latency drops to: SKX 6850 ICX 9000 EMR 7900 The underlying issue is that CPUID itself is slow on ICX, and comically slow on EMR. The problem is exacerbated on CPUs which support XSAVES and/or XSAVEC, as KVM invokes xstate_required_size() twice on each runtime CPUID update, and because there are more supported XSAVE features (CPUID for supported XSAVE feature sub-leafs is significantly slower). SKX: CPUID.0xD.2 = 348 cycles CPUID.0xD.3 = 400 cycles CPUID.0xD.4 = 276 cycles CPUID.0xD.5 = 236 cycles <other sub-leaves are similar> EMR: CPUID.0xD.2 = 1138 cycles CPUID.0xD.3 = 1362 cycles CPUID.0xD.4 = 1068 cycles CPUID.0xD.5 = 910 cycles CPUID.0xD.6 = 914 cycles CPUID.0xD.7 = 1350 cycles CPUID.0xD.8 = 734 cycles CPUID.0xD.9 = 766 cycles CPUID.0xD.10 = 732 cycles CPUID.0xD.11 = 718 cycles CPUID.0xD.12 = 734 cycles CPUID.0xD.13 = 1700 cycles CPUID.0xD.14 = 1126 cycles CPUID.0xD.15 = 898 cycles CPUID.0xD.16 = 716 cycles CPUID.0xD.17 = 748 cycles CPUID.0xD.18 = 776 cycles Note, updating runtime CPUID information multiple times per nested transition is itself a flaw, especially since CPUID is a mandotory intercept on both Intel and AMD. E.g. KVM doesn't need to ensure emulated CPUID state is up-to-date while running L2. That flaw will be fixed in a future patch, as deferring runtime CPUID updates is more subtle than it appears at first glance, the benefits aren't super critical to have once the XSAVE issue is resolved, and caching CPUID output is desirable even if KVM's updates are deferred. Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20241211013302.1347853-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-13block: Fix potential deadlock while freezing queue and acquiring sysfs_lockNilay Shroff3-23/+26
For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue. [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/ Reported-by: kjain@linux.ibm.com Fixes: af2814149883 ("block: freeze the queue in queue_attr_store") Tested-by: kjain@linux.ibm.com Cc: hch@lst.de Cc: axboe@kernel.dk Cc: ritesh.list@gmail.com Cc: ming.lei@redhat.com Cc: gjoyce@linux.ibm.com Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-13irqchip/gic-v3: Work around insecure GIC integrationsMarc Zyngier1-1/+16
It appears that the relatively popular RK3399 SoC has been put together using a large amount of illicit substances, as experiments reveal that its integration of GIC500 exposes the *secure* programming interface to non-secure. This has some pretty bad effects on the way priorities are handled, and results in a dead machine if booting with pseudo-NMI enabled (irqchip.gicv3_pseudo_nmi=1) if the kernel contains 18fdb6348c480 ("arm64: irqchip/gic-v3: Select priorities at boot time"), which relies on the priorities being programmed using the NS view. Let's restore some sanity by going one step further and disable security altogether in this case. This is not any worse, and puts us in a mode where priorities actually make some sense. Huge thanks to Mark Kettenis who initially identified this issue on OpenBSD, and to Chen-Yu Tsai who reported the problem in Linux. Fixes: 18fdb6348c480 ("arm64: irqchip/gic-v3: Select priorities at boot time") Reported-by: Mark Kettenis <mark.kettenis@xs4all.nl> Reported-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chen-Yu Tsai <wens@csie.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241213141037.3995049-1-maz@kernel.org
2024-12-13irqchip/gic: Correct declaration of *percpu_base pointer in union gic_baseUros Bizjak1-1/+1
percpu_base is used in various percpu functions that expect variable in __percpu address space. Correct the declaration of percpu_base to void __iomem * __percpu *percpu_base; to declare the variable as __percpu pointer. The patch fixes several sparse warnings: irq-gic.c:1172:44: warning: incorrect type in assignment (different address spaces) irq-gic.c:1172:44: expected void [noderef] __percpu *[noderef] __iomem *percpu_base irq-gic.c:1172:44: got void [noderef] __iomem *[noderef] __percpu * ... irq-gic.c:1231:43: warning: incorrect type in argument 1 (different address spaces) irq-gic.c:1231:43: expected void [noderef] __percpu *__pdata irq-gic.c:1231:43: got void [noderef] __percpu *[noderef] __iomem *percpu_base There were no changes in the resulting object files. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/20241213145809.2918-2-ubizjak@gmail.com
2024-12-13block: Fix queue_iostats_passthrough_show()Bart Van Assche1-1/+1
Make queue_iostats_passthrough_show() report 0/1 in sysfs instead of 0/4. This patch fixes the following sparse warning: block/blk-sysfs.c:266:31: warning: incorrect type in argument 1 (different base types) block/blk-sysfs.c:266:31: expected unsigned long var block/blk-sysfs.c:266:31: got restricted blk_flags_t Cc: Keith Busch <kbusch@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Fixes: 110234da18ab ("block: enable passthrough command statistics") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241212212941.1268662-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-13blk-mq: Clean up blk_mq_requeue_work()Bart Van Assche1-6/+4
Move a statement that occurs in both branches of an if-statement in front of the if-statement. Fix a typo in a source code comment. No functionality has been changed. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241212212941.1268662-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-13mq-deadline: Remove a local variableBart Van Assche1-4/+1
Since commit fde02699c242 ("block: mq-deadline: Remove support for zone write locking"), the local variable 'insert_before' is assigned once and is used once. Hence remove this local variable. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241212212941.1268662-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-13kselftest/arm64: abi: fix SVCR detectionWeizhao Ouyang1-17/+15
When using svcr_in to check ZA and Streaming Mode, we should make sure that the value in x2 is correct, otherwise it may trigger an Illegal instruction if FEAT_SVE and !FEAT_SME. Fixes: 43e3f85523e4 ("kselftest/arm64: Add SME support to syscall ABI test") Signed-off-by: Weizhao Ouyang <o451686892@gmail.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241211111639.12344-1-o451686892@gmail.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-13iommu/vt-d: Avoid draining PRQ in sva mm release pathLu Baolu1-1/+2
When a PASID is used for SVA by a device, it's possible that the PASID entry is cleared before the device flushes all ongoing DMA requests and removes the SVA domain. This can occur when an exception happens and the process terminates before the device driver stops DMA and calls the iommu driver to unbind the PASID. There's no need to drain the PRQ in the mm release path. Instead, the PRQ will be drained in the SVA unbind path. Unfortunately, commit c43e1ccdebf2 ("iommu/vt-d: Drain PRQs when domain removed from RID") changed this behavior by unconditionally draining the PRQ in intel_pasid_tear_down_entry(). This can lead to a potential sleeping-in-atomic-context issue. Smatch static checker warning: drivers/iommu/intel/prq.c:95 intel_iommu_drain_pasid_prq() warn: sleeping in atomic context To avoid this issue, prevent draining the PRQ in the SVA mm release path and restore the previous behavior. Fixes: c43e1ccdebf2 ("iommu/vt-d: Drain PRQs when domain removed from RID") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-iommu/c5187676-2fa2-4e29-94e0-4a279dc88b49@stanley.mountain/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241212021529.1104745-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-13iommu/vt-d: Fix qi_batch NULL pointer with nested parent domainYi Liu1-7/+27
The qi_batch is allocated when assigning cache tag for a domain. While for nested parent domain, it is missed. Hence, when trying to map pages to the nested parent, NULL dereference occurred. Also, there is potential memleak since there is no lock around domain->qi_batch allocation. To solve it, add a helper for qi_batch allocation, and call it in both the __cache_tag_assign_domain() and __cache_tag_assign_parent_domain(). BUG: kernel NULL pointer dereference, address: 0000000000000200 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8104795067 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632 Call Trace: ? __die+0x24/0x70 ? page_fault_oops+0x80/0x150 ? do_user_addr_fault+0x63/0x7b0 ? exc_page_fault+0x7c/0x220 ? asm_exc_page_fault+0x26/0x30 ? cache_tag_flush_range_np+0x13c/0x260 intel_iommu_iotlb_sync_map+0x1a/0x30 iommu_map+0x61/0xf0 batch_to_domain+0x188/0x250 iopt_area_fill_domains+0x125/0x320 ? rcu_is_watching+0x11/0x50 iopt_map_pages+0x63/0x100 iopt_map_common.isra.0+0xa7/0x190 iopt_map_user_pages+0x6a/0x80 iommufd_ioas_map+0xcd/0x1d0 iommufd_fops_ioctl+0x118/0x1c0 __x64_sys_ioctl+0x93/0xc0 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation") Cc: stable@vger.kernel.org Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241210130322.17175-1-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-13iommu/vt-d: Remove cache tags before disabling ATSLu Baolu1-1/+3
The current implementation removes cache tags after disabling ATS, leading to potential memory leaks and kernel crashes. Specifically, CACHE_TAG_DEVTLB type cache tags may still remain in the list even after the domain is freed, causing a use-after-free condition. This issue really shows up when multiple VFs from different PFs passed through to a single user-space process via vfio-pci. In such cases, the kernel may crash with kernel messages like: BUG: kernel NULL pointer dereference, address: 0000000000000014 PGD 19036a067 P4D 1940a3067 PUD 136c9b067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 74 UID: 0 PID: 3183 Comm: testCli Not tainted 6.11.9 #2 RIP: 0010:cache_tag_flush_range+0x9b/0x250 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x163/0x590 ? exc_page_fault+0x72/0x190 ? asm_exc_page_fault+0x22/0x30 ? cache_tag_flush_range+0x9b/0x250 ? cache_tag_flush_range+0x5d/0x250 intel_iommu_tlb_sync+0x29/0x40 intel_iommu_unmap_pages+0xfe/0x160 __iommu_unmap+0xd8/0x1a0 vfio_unmap_unpin+0x182/0x340 [vfio_iommu_type1] vfio_remove_dma+0x2a/0xb0 [vfio_iommu_type1] vfio_iommu_type1_ioctl+0xafa/0x18e0 [vfio_iommu_type1] Move cache_tag_unassign_domain() before iommu_disable_pci_caps() to fix it. Fixes: 3b1d9e2b2d68 ("iommu/vt-d: Add cache tag assignment interface") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241129020506.576413-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-13arm64: signal: Ensure signal delivery failure is recoverableKevin Brodsky1-15/+33
Commit eaf62ce1563b ("arm64/signal: Set up and restore the GCS context for signal handlers") introduced a potential failure point at the end of setup_return(). This is unfortunate as it is too late to deliver a SIGSEGV: if that SIGSEGV is handled, the subsequent sigreturn will end up returning to the original handler, which is not the intention (since we failed to deliver that signal). Make sure this does not happen by calling gcs_signal_entry() at the very beginning of setup_return(), and add a comment just after to discourage error cases being introduced from that point onwards. While at it, also take care of copy_siginfo_to_user(): since it may fail, we shouldn't be calling it after setup_return() either. Call it before setup_return() instead, and move the setting of X1/X2 inside setup_return() where it belongs (after the "point of no failure"). Background: the first part of setup_rt_frame(), including setup_sigframe(), has no impact on the execution of the interrupted thread. The signal frame is written to the stack, but the stack pointer remains unchanged. Failure at this stage can be recovered by a SIGSEGV handler, and sigreturn will restore the original context, at the point where the original signal occurred. On the other hand, once setup_return() has updated registers including SP, the thread's control flow has been modified and we must deliver the original signal. Fixes: eaf62ce1563b ("arm64/signal: Set up and restore the GCS context for signal handlers") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241210160940.2031997-1-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-13sched/dlserver: Fix dlserver time accountingVineeth Pillai (Google)1-6/+9
dlserver time is accounted when: - dlserver is active and the dlserver proxies the cfs task. - dlserver is active but deferred and cfs task runs after being picked through the normal fair class pick. dl_server_update is called in two places to make sure that both the above times are accounted for. But it doesn't check if dlserver is active or not. Now that we have this dl_server_active flag, we can consolidate dl_server_update into one place and all we need to check is whether dlserver is active or not. When dlserver is active there is only two possible conditions: - dlserver is deferred. - cfs task is running on behalf of dlserver. Fixes: a110a81c52a9 ("sched/deadline: Deferrable dl server") Signed-off-by: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # ROCK 5B Link: https://lore.kernel.org/r/20241213032244.877029-2-vineeth@bitbyteword.org
2024-12-13sched/dlserver: Fix dlserver double enqueueVineeth Pillai (Google)3-2/+18
dlserver can get dequeued during a dlserver pick_task due to the delayed deueue feature and this can lead to issues with dlserver logic as it still thinks that dlserver is on the runqueue. The dlserver throttling and replenish logic gets confused and can lead to double enqueue of dlserver. Double enqueue of dlserver could happend due to couple of reasons: Case 1 ------ Delayed dequeue feature[1] can cause dlserver being stopped during a pick initiated by dlserver: __pick_next_task pick_task_dl -> server_pick_task pick_task_fair pick_next_entity (if (sched_delayed)) dequeue_entities dl_server_stop server_pick_task goes ahead with update_curr_dl_se without knowing that dlserver is dequeued and this confuses the logic and may lead to unintended enqueue while the server is stopped. Case 2 ------ A race condition between a task dequeue on one cpu and same task's enqueue on this cpu by a remote cpu while the lock is released causing dlserver double enqueue. One cpu would be in the schedule() and releasing RQ-lock: current->state = TASK_INTERRUPTIBLE(); schedule(); deactivate_task() dl_stop_server(); pick_next_task() pick_next_task_fair() sched_balance_newidle() rq_unlock(this_rq) at which point another CPU can take our RQ-lock and do: try_to_wake_up() ttwu_queue() rq_lock() ... activate_task() dl_server_start() --> first enqueue wakeup_preempt() := check_preempt_wakeup_fair() update_curr() update_curr_task() if (current->dl_server) dl_server_update() enqueue_dl_entity() --> second enqueue This bug was not apparent as the enqueue in dl_server_start doesn't usually happen because of the defer logic. But as a side effect of the first case(dequeue during dlserver pick), dl_throttled and dl_yield will be set and this causes the time accounting of dlserver to messup and then leading to a enqueue in dl_server_start. Have an explicit flag representing the status of dlserver to avoid the confusion. This is set in dl_server_start and reset in dlserver_stop. Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # ROCK 5B Link: https://lkml.kernel.org/r/20241213032244.877029-1-vineeth@bitbyteword.org
2024-12-13efi/esrt: remove esre_attribute::store()Jiri Slaby (SUSE)1-2/+0
esre_attribute::store() is not needed since commit af97a77bc01c (efi: Move some sysfs files to be read-only by root). Drop it. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-efi@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org>