aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-07mm: vmalloc: support more granular vrealloc() sizingKees Cook1-0/+1
Introduce struct vm_struct::requested_size so that the requested (re)allocation size is retained separately from the allocated area size. This means that KASAN will correctly poison the correct spans of requested bytes. This also means we can support growing the usable portion of an allocation that can already be supported by the existing area's existing allocation. Link: https://lkml.kernel.org/r/20250426001105.it.679-kees@kernel.org Fixes: 3ddc2fefe6f3 ("mm: vmalloc: implement vrealloc()") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20250408192503.6149a816@outsider.home/ Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07ipmi:msghandler: Export and fix panic messaging capabilityCorey Minyard1-0/+10
Don't have the other users that do things at panic time (the watchdog) do all this themselves, provide a function to do it. Also, with the new design where most stuff happens at thread context, a few things needed to be fixed to avoid doing locking in a panic context. Signed-off-by: Corey Minyard <cminyard@mvista.com>
2025-05-07ipmi: Add a note about the pretimeout callbackCorey Minyard1-1/+2
You can't do IPMI calls from the callback, it's called with locks held. Signed-off-by: Corey Minyard <cminyard@mvista.com>
2025-05-07fs: add atomic write unit max opt to statxJohn Garry2-1/+3
XFS will be able to support large atomic writes (atomic write > 1x block) in future. This will be achieved by using different operating methods, depending on the size of the write. Specifically a new method of operation based in FS atomic extent remapping will be supported in addition to the current HW offload-based method. The FS method will generally be appreciably slower performing than the HW-offload method. However the FS method will be typically able to contribute to achieving a larger atomic write unit max limit. XFS will support a hybrid mode, where HW offload method will be used when possible, i.e. HW offload is used when the length of the write is supported, and for other times FS-based atomic writes will be used. As such, there is an atomic write length at which the user may experience appreciably slower performance. Advertise this limit in a new statx field, stx_atomic_write_unit_max_opt. When zero, it means that there is no such performance boundary. Masks STATX{_ATTR}_WRITE_ATOMIC can be used to get this new field. This is ok for older kernels which don't support this new field, as they would report 0 in this field (from zeroing in cp_statx()) already. Furthermore those older kernels don't support large atomic writes - apart from block fops, but there would be consistent performance there for atomic writes in range [unit min, unit max]. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com>
2025-05-07arch_topology: Relocate cpu_scale to topology.[h|c]Ricardo Neri2-8/+9
arch_topology.c provides functionality to parse and scale CPU capacity. It also provides a corresponding sysfs interface. Some architectures parse and scale CPU capacity differently as per their own needs. On Intel processors, for instance, it is responsibility of the Intel P-state driver. Relocate the implementation of that interface to a common location in topology.c. Architectures can use the interface and populate it using their own mechanisms. An alternative approach would be to compile arch_topology.c even if not needed only to get this interface. This approach would create duplicated and conflicting functionality and data structures. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/20250419025504.9760-2-ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-07cpufreq/sched: Move cpufreq-specific EAS checks to cpufreqRafael J. Wysocki1-0/+2
Doing cpufreq-specific EAS checks that require accessing policy internals directly from sched_is_eas_possible() is a bit unfortunate, so introduce cpufreq_ready_for_eas() in cpufreq, move those checks into that new function and make sched_is_eas_possible() call it. While at it, address a possible race between the EAS governor check and governor change by doing the former under the policy rwsem. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/2317800.iZASKD2KPV@rjwysocki.net
2025-05-07cpufreq/sched: schedutil: Add helper for governor checksRafael J. Wysocki1-0/+9
Add a helper for checking if schedutil is the current governor for a given cpufreq policy and use it in sched_is_eas_possible() to avoid accessing cpufreq policy internals directly from there. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/3365956.44csPzL39Z@rjwysocki.net
2025-05-07irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()Frank Li1-0/+7
Add the flag IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and the API function irq_domain_is_msi_immutable() to check if the MSI controller retains an immutable address/data pair during irq_set_affinity(). Ensure compatibility with MSI users like PCIe Endpoint Doorbell, which require the address/data pair to remain unchanged after setup. Use this function to verify if the MSI controller is immutable. Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250414-ep-msi-v18-2-f69b49917464@nxp.com
2025-05-07block: remove the q argument from blk_rq_map_kernChristoph Hellwig1-2/+2
Remove the q argument from blk_rq_map_kern and the internal helpers called by it as the queue can trivially be derived from the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250507120451.4000627-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07block: add a bio_add_vmalloc helpersChristoph Hellwig1-0/+3
Add a helper to add a vmalloc region to a bio, abstracting away the vmalloc addresses from the underlying pages and another one wrapping it for the simple case where all data fits into a single bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250507120451.4000627-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07block: add a bio_add_max_vecs helperChristoph Hellwig1-0/+15
Add a helper to check how many bio_vecs are needed to add a kernel virtual address range to a bio, accounting for the always contiguous direct mapping and vmalloc mappings that usually need a bio_vec per page sized chunk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250507120451.4000627-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07block: add a bdev_rw_virt helperChristoph Hellwig1-1/+4
Add a helper to perform synchronous I/O on a kernel direct map range. Currently this is implemented in various places in usually not very efficient ways, so provide a generic helper instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250507120451.4000627-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07block: add a bio_add_virt_nofail helperChristoph Hellwig1-0/+2
Add a helper to add a directly mapped kernel virtual address to a bio so that callers don't have to convert to pages or folios. For now only the _nofail variant is provided as that is what all the obvious callers want. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250507120451.4000627-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07block: fix warning on 'make htmldocs'Ming Lei1-0/+3
Fix the following warning when running 'make htmldocs': +WARNING: include/linux/blk-mq.h:532 struct member 'update_nr_hwq_lock' not described in 'blk_mq_tag_set' Fixes: 98e68f67020c ("block: prevent adding/deleting disk during updating nr_hw_queues") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20250507163220.00141d77@canb.auug.org.au/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250507092537.3009112-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2Mostafa Saleh1-1/+1
Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables UBSAN for EL2 code (in protected/nvhe/hvhe) modes. This will re-use the same checks enabled for the kernel for the hypervisor. The only difference is that for EL2 it always emits a "brk" instead of implementing hooks as the hypervisor can't print reports. The KVM code will re-use the same code for the kernel "report_ubsan_failure()" so #ifdefs are changed to also have this code for CONFIG_UBSAN_KVM_EL2 Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07ubsan: Remove regs from report_ubsan_failure()Mostafa Saleh1-2/+2
report_ubsan_failure() doesn't use argument regs, and soon it will be called from the hypervisor context were regs are not available. So, remove the unused argument. Signed-off-by: Mostafa Saleh <smostafa@google.com> Acked-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-3-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07genirq: Remove unused remove_percpu_irq()Dr. David Alan Gilbert1-1/+0
remove_percpu_irq() has been unused since it was added in 2011 by commit 31d9d9b6d830 ("genirq: Add support for per-cpu dev_id interrupts") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250420164656.112641-1-linux@treblig.org
2025-05-07bus: firewall: Fix missing static inline annotations for stubsKrzysztof Kozlowski1-6/+9
Stubs in the header file for !CONFIG_STM32_FIREWALL case should be both static and inline, because they do not come with earlier declaration and should be inlined in every unit including the header. Cc: Patrice Chotard <patrice.chotard@foss.st.com> Cc: stable@vger.kernel.org Fixes: 5c9668cfc6d7 ("firewall: introduce stm32_firewall framework") Link: https://lore.kernel.org/r/20250507092121.95121-2-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-05-07genirq/manage: Rework can_request_irq()Thomas Gleixner1-1/+1
Use the new guards to get and lock the interrupt descriptor and tidy up the code. Make the return value boolean to reflect it's meaning. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20250429065422.187250840@linutronix.de
2025-05-06Merge tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wirelessJakub Kicinski1-1/+1
Johannes Berg says: ==================== Couple of fixes: * iwlwifi: add two missing device entries * cfg80211: fix a potential out-of-bounds access * mac80211: fix format of TID to link mapping action frames * tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: add support for Killer on MTL wifi: mac80211: fix the type of status_code for negotiated TID to Link Mapping wifi: cfg80211: fix out-of-bounds access during multi-link element defragmentation ==================== Link: https://patch.msgid.link/20250506203506.158818-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06Merge tag 'wireless-next-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-nextJakub Kicinski2-3/+77
Johannes Berg says: ==================== wireless features, notably * stack - free SKBTX_WIFI_STATUS flag - fixes for VLAN multicast in multi-link - improve codel parameters (revert some old twiddling) * ath12k - Enable AHB support for IPQ5332. - Add monitor interface support to QCN9274. - Add MLO support to WCN7850. - Add 802.11d scan offload support to WCN7850. * ath11k - Restore hibernation support * iwlwifi - EMLSR on two 5 GHz links * mwifiex - cleanups/refactoring along with many other small features/cleanups * tag 'wireless-next-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (177 commits) Revert "wifi: iwlwifi: clean up config macro" wifi: iwlwifi: move phy_filters to fw_runtime wifi: iwlwifi: pcie: make sure to lock rxq->read wifi: iwlwifi: add definitions for iwl_mac_power_cmd version 2 wifi: iwlwifi: clean up config macro wifi: iwlwifi: mld: simplify iwl_mld_rx_fill_status() wifi: iwlwifi: mld: rx: simplify channel handling wifi: iwlwifi: clean up band in RX metadata wifi: iwlwifi: mld: skip unknown FW channel load values wifi: iwlwifi: define API for external FSEQ images wifi: iwlwifi: mld: allow EMLSR on separated 5 GHz subbands wifi: iwlwifi: mld: use cfg80211_chandef_get_width() wifi: iwlwifi: mld: fix iwl_mld_emlsr_disallowed_with_link() return wifi: iwlwifi: mld: clarify variable type wifi: iwlwifi: pcie: add support for the reset handshake in MSI wifi: mac80211_hwsim: Prevent tsf from setting if beacon is disabled wifi: mac80211: restructure tx profile retrieval for MLO MBSSID wifi: nl80211: add link id of transmitted profile for MLO MBSSID wifi: ieee80211: Add helpers to fetch EMLSR delay and timeout values wifi: mac80211: update ML STA with EML capabilities ... ==================== Link: https://patch.msgid.link/20250506174656.119970-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06net: add missing instance lock to dev_set_promiscuityStanislav Fomichev1-0/+1
Accidentally spotted while trying to understand what else needs to be renamed to netif_ prefix. Most of the calls to dev_set_promiscuity are adjacent to dev_set_allmulti or dev_disable_lro so it should be safe to add the lock. Note that new netif_set_promiscuity is currently unused, the locked paths call __dev_set_promiscuity directly. Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250506011919.2882313-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-07This patch set did some clean up and add runtime pmMark Brown4-48/+60
Merge series from Haibo Chen <haibo.chen@nxp.com>: PATCH1/3/4 to clean up the code, make the code more readable PATCH2 add the runtime pm support PATCH5 use devm_add_action_or_reset() to replace remove() callback, this can avoid oops when do bind/unbind test
2025-05-06wifi: mac80211: fix the type of status_code for negotiated TID to Link MappingMichael-CY Lee1-1/+1
The status code should be type of __le16. Fixes: 83e897a961b8 ("wifi: ieee80211: add definitions for negotiated TID to Link map") Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request") Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20250505081946.3927214-1-michael-cy.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-06rpmsg: core: Remove deadcodeDr. David Alan Gilbert1-22/+0
rpmsg_send_offchannel() and rpmsg_trysend_offchannel() have been unused since they were added in 2011's commit bcabbccabffe ("rpmsg: add virtio-based remote processor messaging bus") Remove them and associated docs. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com> Link: https://lore.kernel.org/r/20250429234600.301083-2-linux@treblig.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2025-05-06kill vfs_submount()Al Viro2-4/+0
The last remaining user of vfs_submount() (tracefs) is easy to convert to fs_context_for_submount(); do that and bury that thing, along with SB_SUBMOUNT Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-05-06f2fs: fix to bail out in get_new_segment()Chao Yu1-0/+1
------------[ cut here ]------------ WARNING: CPU: 3 PID: 579 at fs/f2fs/segment.c:2832 new_curseg+0x5e8/0x6dc pc : new_curseg+0x5e8/0x6dc Call trace: new_curseg+0x5e8/0x6dc f2fs_allocate_data_block+0xa54/0xe28 do_write_page+0x6c/0x194 f2fs_do_write_node_page+0x38/0x78 __write_node_page+0x248/0x6d4 f2fs_sync_node_pages+0x524/0x72c f2fs_write_checkpoint+0x4bc/0x9b0 __checkpoint_and_complete_reqs+0x80/0x244 issue_checkpoint_thread+0x8c/0xec kthread+0x114/0x1bc ret_from_fork+0x10/0x20 get_new_segment() detects inconsistent status in between free_segmap and free_secmap, let's record such error into super block, and bail out get_new_segment() instead of continue using the segment. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06nvme: add FDP definitionsChristoph Hellwig1-0/+77
Add the config feature result, config log page, and management receive commands needed for FDP. Partially based on a patch from Kanchan Joshi <joshi.k@samsung.com>. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20250506121732.8211-10-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06block: introduce a write_stream_granularity queue limitChristoph Hellwig1-0/+1
Export the granularity that write streams should be discarded with, as it is essential for making good use of them. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20250506121732.8211-5-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06block: introduce max_write_streams queue limitKeith Busch1-0/+9
Drivers with hardware that support write streams need a way to export how many are available so applications can generically query this. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org> [hch: renamed hints to streams, removed stacking] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20250506121732.8211-4-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06block: add a bi_write_stream fieldChristoph Hellwig1-0/+1
Add the ability to pass a write stream for placement control in the bio. The new field fits in an existing hole, so does not change the size of the struct. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20250506121732.8211-3-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06fs: add a write stream field to the kiocbChristoph Hellwig1-0/+1
Prepare for io_uring passthrough of write streams. The write stream field in the kiocb structure fits into an existing 2-byte hole, so its size is not changed. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20250506121732.8211-2-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06block: move wbt_enable_default() out of queue freezing from sched ->exit()Ming Lei1-0/+2
scheduler's ->exit() is called with queue frozen and elevator lock is held, and wbt_enable_default() can't be called with queue frozen, otherwise the following lockdep warning is triggered: #6 (&q->rq_qos_mutex){+.+.}-{4:4}: #5 (&eq->sysfs_lock){+.+.}-{4:4}: #4 (&q->elevator_lock){+.+.}-{4:4}: #3 (&q->q_usage_counter(io)#3){++++}-{0:0}: #2 (fs_reclaim){+.+.}-{0:0}: #1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}: #0 (&q->debugfs_mutex){+.+.}-{4:4}: Fix the issue by moving wbt_enable_default() out of bfq's exit(), and call it from elevator_change_done(). Meantime add disk->rqos_state_mutex for covering wbt state change, which matches the purpose more than ->elevator_lock. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250505141805.2751237-26-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06block: add new helper for disabling elevator switch when deleting diskMing Lei1-0/+3
Add new helper disable_elv_switch() and new flag QUEUE_FLAG_NO_ELV_SWITCH for disabling elevator switch before deleting disk: - originally flag QUEUE_FLAG_REGISTERED is added for preventing elevator switch during removing disk, but this flag has been used widely for other purposes, so add one new flag for disabling elevator switch only - for avoiding deadlock risk, we have to move elevator queue register/unregister out of elevator lock and queue freeze, which will be done in next patch. However, this way adds small race window between elevator switch and deleting ->queue_kobj, in which elevator queue register/unregister could be run concurrently. The added helper will be used for avoiding the race in the following patch. - drain in-progress elevator switch before deleting disk Suggested-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://lore.kernel.org/r/20250505141805.2751237-21-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06block: prevent adding/deleting disk during updating nr_hw_queuesMing Lei1-0/+3
Both adding/deleting disk code are reader of `nr_hw_queues`, so we can't allow them in-progress when updating nr_hw_queues, kernel panic and kasan has been reported in [1]. Prevent adding/deleting disk during updating nr_hw_queues by adding rw_semaphore to tagset, write lock is grabbed in blk_mq_update_nr_hw_queues(), and read lock is acquired when adding/deleting disk. Also mark GFP_NOIO allocation scope for adding/deleting disk because blk_mq_update_nr_hw_queues() is part of some driver's error handler. This way avoids lot of trouble. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Suggested-by: Nilay Shroff <nilay@linux.ibm.com> Reported-by: Nilay Shroff <nilay@linux.ibm.com> Closes: https://lore.kernel.org/linux-block/a5896cdb-a59a-4a37-9f99-20522f5d2987@linux.ibm.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250505141805.2751237-9-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flagMing Lei1-0/+3
ELEVATOR_FLAG_DISABLE_WBT is only used by BFQ to disallow wbt when BFQ is in use. The flag is set in BFQ's init(), and cleared in BFQ's exit(). Making it as request queue flag, so that we can avoid to deal with elevator switch race. Also it isn't graceful to checking one scheduler flag in wbt_enable_default(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250505141805.2751237-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06BackMerge tag 'v6.15-rc5' into drm-nextDave Airlie24-350/+339
Linux 6.15-rc5, requested by tzimmerman for fixes required in drm-next. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-06dma-mapping: add a dma_need_unmap helperChristoph Hellwig1-0/+5
Add helper that allows a driver to skip calling dma_unmap_* if the DMA layer can guarantee that they are no-nops. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: Implement link/unlink ranges APILeon Romanovsky1-0/+32
Introduce new DMA APIs to perform DMA linkage of buffers in layers higher than DMA. In proposed API, the callers will perform the following steps. In map path: if (dma_can_use_iova(...)) dma_iova_alloc() for (page in range) dma_iova_link_next(...) dma_iova_sync(...) else /* Fallback to legacy map pages */ for (all pages) dma_map_page(...) In unmap path: if (dma_can_use_iova(...)) dma_iova_destroy() else for (all pages) dma_unmap_page(...) Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: Provide an interface to allow allocate IOVALeon Romanovsky1-0/+48
The existing .map_pages() callback provides both allocating of IOVA and linking DMA pages. That combination works great for most of the callers who use it in control paths, but is less effective in fast paths where there may be multiple calls to map_page(). These advanced callers already manage their data in some sort of database and can perform IOVA allocation in advance, leaving range linkage operation to be in fast path. Provide an interface to allocate/deallocate IOVA and next patch link/unlink DMA ranges to that specific IOVA. In the new API a DMA mapping transaction is identified by a struct dma_iova_state, which holds some recomputed information for the transaction which does not change for each page being mapped, so add a check if IOVA can be used for the specific transaction. The API is exported from dma-iommu as it is the only implementation supported, the namespace is clearly different from iommu_* functions which are not allowed to be used. This code layout allows us to save function call per API call used in datapath as well as a lot of boilerplate code. Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06iommu: generalize the batched sync after map interfaceChristoph Hellwig1-0/+4
For the upcoming IOVA-based DMA API we want to batch the ops->iotlb_sync_map() call after mapping multiple IOVAs from dma-iommu without having a scatterlist. Improve the API. Add a wrapper for the map_sync as iommu_sync_map() so that callers don't need to poke into the methods directly. Formalize __iommu_map() into iommu_map_nosync() which requires the caller to call iommu_sync_map() after all maps are completed. Refactor the existing sanity checks from all the different layers into iommu_map_nosync(). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.hChristoph Hellwig2-85/+85
To support the upcoming non-scatterlist mapping helpers, we need to go back to have them called outside of the DMA API. Thus move them out of dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h, which is for driver use. Note that the core helper is still not exported as the mapping is expected to be done only by very highlevel subsystem code at least for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06PCI/P2PDMA: Refactor the p2pdma mapping helpersChristoph Hellwig1-10/+41
The current scheme with a single helper to determine the P2P status and map a scatterlist segment force users to always use the map_sg helper to DMA map, which we're trying to get away from because they are very cache inefficient. Refactor the code so that there is a single helper that checks the P2P state for a page, including the result that it is not a P2P page to simplify the callers, and a second one to perform the address translation for a bus mapped P2P transfer that does not depend on the scatterlist structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-05workqueue: fix typo in commentGuan-Chun Wu1-1/+1
Fix a duplicated word "that that" in the comment describing the @max_active behavior for unbound workqueues. Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-06AsoC: Phase out hybrid PCI devresMark Brown4-48/+60
Merge series from Philipp Stanner <phasta@kernel.org>: A year ago we spent quite some work trying to get PCI into better shape. Some pci_ functions can be sometimes managed with devres, which is obviously bad. We want to provide an obvious API, where pci_ functions are never, and pcim_ functions are always managed. Thus, everyone enabling his device with pcim_enable_device() must be ported to pcim_ functions. Porting all users will later enable us to significantly simplify parts of the PCI subsystem. See here [1] for details. This patch series does that for sound. Feel free to squash the commits as you see fit. P. [1] https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/pci/devres.c#L18
2025-05-05of: reserved_mem: Add functions to parse "memory-region"Rob Herring (Arm)1-0/+26
Drivers with "memory-region" properties currently have to do their own parsing of "memory-region" properties. The result is all the drivers have similar patterns of a call to parse "memory-region" and then get the region's address and size. As this is a standard property, it should have common functions for drivers to use. Add new functions to count the number of regions and retrieve the region's address as a resource. Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Acked-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com> Link: https://lore.kernel.org/r/20250423-dt-memory-region-v2-v2-1-2fbd6ebd3c88@kernel.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-05-05Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-0/+3
Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-05-02 We've added 14 non-merge commits during the last 10 day(s) which contain a total of 13 files changed, 740 insertions(+), 121 deletions(-). The main changes are: 1) Avoid skipping or repeating a sk when using a UDP bpf_iter, from Jordan Rife. 2) Fixed a crash when a bpf qdisc is set in the net.core.default_qdisc, from Amery Hung. 3) A few other fixes in the bpf qdisc, from Amery Hung. - Always call qdisc_watchdog_init() in the .init prologue such that the .reset/.destroy epilogue can always call qdisc_watchdog_cancel() without issue. - bpf_qdisc_init_prologue() was incorrectly returning an error when the bpf qdisc is set as the default_qdisc and the mq is creating the default_qdisc. It is now fixed. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Cleanup bpf qdisc selftests selftests/bpf: Test attaching a bpf qdisc with incomplete operators bpf: net_sched: Make some Qdisc_ops ops mandatory selftests/bpf: Test setting and creating bpf qdisc as default qdisc bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc selftests/bpf: Add tests for bucket resume logic in UDP socket iterators selftests/bpf: Return socket cookies from sock_iter_batch progs bpf: udp: Avoid socket skips and repeats during iteration bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items bpf: udp: Get rid of st_bucket_done bpf: udp: Make sure iter->batch always contains a full bucket snapshot bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch bpf: net_sched: Fix using bpf qdisc as default qdisc selftests/bpf: Fix compilation errors ==================== Link: https://patch.msgid.link/20250503010755.4030524-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05mm: remove NR_BOUNCE zone statChristoph Hellwig1-1/+0
The stat is always 0 now, so remove it and hardwire the user visible output to 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05block: remove bounce buffering supportChristoph Hellwig2-5/+1
The block layer bounce buffering support is unused now, remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05dma-fence: Add helper to sort and deduplicate dma_fence arraysArvind Yadav1-0/+2
Export a new helper function `dma_fence_dedup_array()` that sorts an array of dma_fence pointers by context, then deduplicates the array by retaining only the most recent fence per context. This utility is useful when merging or optimizing sets of fences where redundant entries from the same context can be pruned. The operation is performed in-place and releases references to dropped fences using dma_fence_put(). v2: - Export this code from dma-fence-unwrap.c(by Christian). v3: - To split this in a dma_buf patch and amd userq patch(by Sunil). - No need to add a new function just re-use existing(by Christian). v4: - Export dma_fence_dedub_array and use it(by Christian). Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>