aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-02-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski17-34/+49
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds5-11/+15
Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29dpll: fix build failure due to rcu_dereference_check() on unknown typeEric Dumazet1-4/+4
Tasmiya reports that their compiler complains that we deref a pointer to unknown type with rcu_dereference_rtnl(): include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’ Unclear what compiler it is, at the moment, and we can't report but since DPLL can't be a module - move the code from the header into the source file. Fixes: 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com> Link: https://lore.kernel.org/all/3fcf3a2c-1c1b-42c1-bacb-78fdcd700389@linux.vnet.ibm.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240229190515.2740221-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29Merge tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni1-0/+1
Pablo Neira Ayuso says: ==================== Netfilter fixes for net Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin. Patch #2 fixes an issue with bridge netfilter and broadcast/multicast packets. There is a day 0 bug in br_netfilter when used with connection tracking. Conntrack assumes that an nf_conn structure that is not yet added to hash table ("unconfirmed"), is only visible by the current cpu that is processing the sk_buff. For bridge this isn't true, sk_buff can get cloned in between, and clones can be processed in parallel on different cpu. This patch disables NAT and conntrack helpers for multicast packets. Patch #3 adds a selftest to cover for the br_netfilter bug. netfilter pull request 24-02-29 * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() ==================== Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-28tcp: remove some holes in struct tcp_sockEric Dumazet1-4/+4
By moving some fields around, this patch shrinks holes size from 56 to 32, saving 24 bytes on 64bit arches. After the patch pahole gives the following for 'struct tcp_sock': /* size: 2304, cachelines: 36, members: 162 */ /* sum members: 2234, holes: 6, sum holes: 32 */ /* sum bitfield members: 34 bits, bit holes: 5, sum bit holes: 14 bits */ /* padding: 32 */ /* paddings: 3, sum paddings: 10 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 12 */ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240227192721.3558982-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28inet: annotate devconf data-racesEric Dumazet1-6/+8
Add READ_ONCE() in ipv4_devconf_get() and corresponding WRITE_ONCE() in ipv4_devconf_set() Add IPV4_DEVCONF_RO() and IPV4_DEVCONF_ALL_RO() macros, and use them when reading devconf fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240227092411.2315725-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29netfilter: bridge: confirm multicast packets before passing them up the stackFlorian Westphal1-0/+1
conntrack nf_confirm logic cannot handle cloned skbs referencing the same nf_conn entry, which will happen for multicast (broadcast) frames on bridges. Example: macvlan0 | br0 / \ ethX ethY ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table. 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. -> skb->_nfct now references a unconfirmed entry 2. skb is broad/mcast packet. bridge now passes clones out on each bridge interface. 3. skb gets passed up the stack. 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices. The clone skb->_nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RX_HANDLER_PASS. 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb. The Macvlan broadcast worker and normal confirm path will race. This race will not happen if step 2 already confirmed a clone. In that case later steps perform skb_clone() with skb->_nfct already confirmed (in hash table). This works fine. But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting. Pablo points out that nf_conntrack_bridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again. This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook. Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry. The downside is that this disables NAT and conntrack helpers. Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example: -m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5 For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings. Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-28net: ethtool: eee: Remove legacy _u32 from keeeAndrew Lunn1-3/+0
All MAC drivers have been converted to use the link mode members of keee. So remove the _u32 values, and the code in the ethtool core to convert the legacy _u32 values to link modes. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28net: ioam6: multicast eventJustin Iurman1-0/+4
Add a multicast group to the ioam6 generic netlink family and provide ioam6_event() to send an ioam6 event to the multicast group. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28uapi: ioam6: API for netlink multicast eventsJustin Iurman1-0/+20
Add new api to support ioam6 events for generic netlink multicast. A first "trace" event is added to the list of ioam6 events, which will represent an IOAM Pre-allocated Trace Option-Type. It provides another solution to share IOAM data with user space. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: make the dropreason really work when calling tcp_rcv_state_process()Jason Xing1-2/+2
Update three callers including both ipv4 and ipv6 and let the dropreason mechanism work in reality. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: add dropreasons in tcp_rcv_state_process()Jason Xing1-1/+1
In this patch, I equipped this function with more dropreasons, but it still doesn't work yet, which I will do later. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: introduce dropreasons in receive pathJason Xing1-1/+14
Soon later patches can use these relatively more accurate reasons to recognise and find out the cause. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28tcp: add a dropreason definitions and prepare for cookie checkJason Xing1-1/+10
Adding one drop reason to detect the condition of skb dropped because of hook points in cookie check and extending NO_SOCKET to consider another two cases can be used later. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-28net: make SK_MEMORY_PCPU_RESERV tunableAdam Li1-2/+3
This patch adds /proc/sys/net/core/mem_pcpu_rsv sysctl file, to make SK_MEMORY_PCPU_RESERV tunable. Commit 3cd3399dd7a8 ("net: implement per-cpu reserves for memory_allocated") introduced per-cpu forward alloc cache: "Implement a per-cpu cache of +1/-1 MB, to reduce number of changes to sk->sk_prot->memory_allocated, which would otherwise be cause of false sharing." sk_prot->memory_allocated points to global atomic variable: atomic_long_t tcp_memory_allocated ____cacheline_aligned_in_smp; If increasing the per-cpu cache size from 1MB to e.g. 16MB, changes to sk->sk_prot->memory_allocated can be further reduced. Performance may be improved on system with many cores. Signed-off-by: Adam Li <adamli@os.amperecomputing.com> Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-27uapi: in6: replace temporary label with rfc9486Justin Iurman1-1/+1
Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY IANA allocation for IOAM", while RFC 9486 is available for some time now. Just update the reference. Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27Merge tag 'mm-hotfixes-stable-2024-02-27-14-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds1-0/+3
Pull misc fixes from Andrew Morton: "Six hotfixes. Three are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-02-27-14-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/debug_vm_pgtable: fix BUG_ON with pud advanced test mm: cachestat: fix folio read-after-free in cache walk MAINTAINERS: add memory mapping entry with reviewers mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index kasan: revert eviction of stack traces in generic mode stackdepot: use variable size records for non-evictable entries
2024-02-26ipv6: anycast: complete RCU handling of struct ifacaddr6Eric Dumazet1-2/+2
struct ifacaddr6 are already freed after RCU grace period. Add __rcu qualifier to aca_next pointer, and idev->ac_list Add relevant rcu_assign_pointer() and dereference accessors. ipv6_chk_acast_dev() no longer needs to acquire idev->lock. /proc/net/anycast6 is now purely RCU protected, it no longer acquires idev->lock. Similarly in6_dump_addrs() can use RCU protection to iterate through anycast addresses. It was relying on a mixture of RCU and RTNL but next patches will get rid of RTNL there. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26dpll: rely on rcu for netdev_dpll_pin()Eric Dumazet2-10/+12
This fixes a possible UAF in if_nlmsg_size(), which can run without RTNL. Add rcu protection to "struct dpll_pin" Move netdev_dpll_pin() from netdevice.h to dpll.h to decrease name pollution. Note: This looks possible to no longer acquire RTNL in netdev_dpll_pin_assign() later in net-next. v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko) Fixes: 5f1842692880 ("netdev: expose DPLL pin handle for netdevice") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26nexthop: allow nexthop_mpath_fill_node() to be called without RTNLEric Dumazet1-1/+1
nexthop_mpath_fill_node() will be potentially called from contexts holding rcu_lock instead of RTNL. Suggested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCUEric Dumazet1-0/+1
Add a new field into struct fib_dump_filter, to let callers tell if they use RTNL locking or RCU. This is used in the following patch, when inet_dump_fib() no longer holds RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flagEric Dumazet2-0/+3
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: prepare inet6_fill_ifinfo() for RCU protectionEric Dumazet1-2/+4
We want to use RCU protection instead of RTNL for inet6_fill_ifinfo(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-25Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-0/+3
Pull RCU pathwalk fixes from Al Viro: "We still have some races in filesystem methods when exposed to RCU pathwalk. This series is a result of code audit (the second round of it) and it should deal with most of that stuff. Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate(). Up to maintainers (a note for NTFS folks - when documentation says that a method may not block, it *does* imply that blocking allocations are to be avoided. Really)" [ More explanations for people who aren't familiar with the vagaries of RCU path walking: most of it is hidden from filesystems, but if a filesystem actively participates in the low-level path walking it needs to make sure the fields involved in that walk are RCU-safe. That "actively participate in low-level path walking" includes things like having its own ->d_hash()/->d_compare() routines, or by having its own directory permission function that doesn't just use the common helpers. Having a ->d_revalidate() function will also have this issue. Note that instead of making everything RCU safe you can also choose to abort the RCU pathwalk if your operation cannot be done safely under RCU, but that obviously comes with a performance penalty. One common pattern is to allow the simple cases under RCU, and abort only if you need to do something more complicated. So not everything needs to be RCU-safe, and things like the inode etc that the VFS itself maintains obviously already are. But these fixes tend to be about properly RCU-delaying things like ->s_fs_info that are maintained by the filesystem and that got potentially released too early. - Linus ] * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4_get_link(): fix breakage in RCU mode cifs_get_link(): bail out in unsafe case fuse: fix UAF in rcu pathwalks procfs: make freeing proc_fs_info rcu-delayed procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() nfs: fix UAF on pathwalk running into umount nfs: make nfs_set_verifier() safe for use in RCU pathwalk afs: fix __afs_break_callback() / afs_drop_open_mmap() race hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper affs: free affs_sb_info with kfree_rcu() rcu pathwalk: prevent bogus hard errors from may_lookup() fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
2024-02-25Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-0/+1
Pull vfs fixes from Al Viro: "A couple of fixes - revert of regression from this cycle and a fix for erofs failure exit breakage (had been there since way back)" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: erofs: fix handling kern_mount() failure Revert "get rid of DCACHE_GENOCIDE"
2024-02-25procfs: make freeing proc_fs_info rcu-delayedAl Viro1-0/+1
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns() is still synchronous, but that's not a problem - it does rcu-delay everything that needs to be) Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25nfs: fix UAF on pathwalk running into umountAl Viro1-0/+2
NFS ->d_revalidate(), ->permission() and ->get_link() need to access some parts of nfs_server when called in RCU mode: server->flags server->caps *(server->io_stats) and, worst of all, call server->nfs_client->rpc_ops->have_delegation (the last one - as NFS_PROTO(inode)->have_delegation()). We really don't want to RCU-delay the entire nfs_free_server() (it would have to be done with schedule_work() from RCU callback, since it can't be made to run from interrupt context), but actual freeing of nfs_server and ->io_stats can be done via call_rcu() just fine. nfs_client part is handled simply by making nfs_free_client() use kfree_rcu(). Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-24Merge tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds1-0/+3
Pull iommu fixes from Joerg Roedel: - Intel VT-d fixes for nested domain handling: - Cache invalidation for changes in a parent domain - Dirty tracking setting for parent and nested domains - Fix a constant-out-of-range warning - ARM SMMU fixes: - Fix CD allocation from atomic context when using SVA with SMMUv3 - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it breaks the boot for Qualcomm MSM8996 devices - Restore SVA handle sharing in core code as it turned out there are still drivers relying on it * tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/sva: Restore SVA handle sharing iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock iommu/vt-d: Fix constant-out-of-range warning iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking iommu/vt-d: Add missing dirty tracking set for parent domain iommu/vt-d: Wrap the dirty tracking loop to be a helper iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking() iommu/vt-d: Add missing device iotlb flush for parent domain iommu/vt-d: Update iotlb in nested domain attach iommu/vt-d: Add missing iotlb flush for parent domain iommu/vt-d: Add __iommu_flush_iotlb_psi() iommu/vt-d: Track nested domains in parent Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
2024-02-24Merge tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds2-18/+2
Pull cxl fixes from Dan Williams: "A collection of significant fixes for the CXL subsystem. The largest change in this set, that bordered on "new development", is the fix for the fact that the location of the new qos_class attribute did not match the Documentation. The fix ends up deleting more code than it added, and it has a new unit test to backstop basic errors in this interface going forward. So the "red-diff" and unit test saved the "rip it out and try again" response. In contrast, the new notification path for firmware reported CXL errors (CXL CPER notifications) has a locking context bug that can not be fixed with a red-diff. Given where the release cycle stands, it is not comfortable to squeeze in that fix in these waning days. So, that receives the "back it out and try again later" treatment. There is a regression fix in the code that establishes memory NUMA nodes for platform CXL regions. That has an ack from x86 folks. There are a couple more fixups for Linux to understand (reassemble) CXL regions instantiated by platform firmware. The policy around platforms that do not match host-physical-address with system-physical-address (i.e. systems that have an address translation mechanism between the address range reported in the ACPI CEDT.CFMWS and endpoint decoders) has been softened to abort driver load rather than teardown the memory range (can cause system hangs). Lastly, there is a robustness / regression fix for cases where the driver would previously continue in the face of error, and a fixup for PCI error notification handling. Summary: - Fix NUMA initialization from ACPI CEDT.CFMWS - Fix region assembly failures due to async init order - Fix / simplify export of qos_class information - Fix cxl_acpi initialization vs single-window-init failures - Fix handling of repeated 'pci_channel_io_frozen' notifications - Workaround platforms that violate host-physical-address == system-physical address assumptions - Defer CXL CPER notification handling to v6.9" * tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/acpi: Fix load failures due to single window creation failure acpi/ghes: Remove CXL CPER notifications cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window cxl/test: Add support for qos_class checking cxl: Fix sysfs export of qos_class for memdev cxl: Remove unnecessary type cast in cxl_qos_class_verify() cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf' cxl/region: Allow out of order assembly of autodiscovered regions cxl/region: Handle endpoint decoders in cxl_region_find_decoder() x86/numa: Fix the sort compare func used in numa_fill_memblks() x86/numa: Fix the address overlap check in numa_fill_memblks() cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
2024-02-24Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-4/+1
Pull SCSI fixes from James Bottomley: "Six fixes: the four driver ones are pretty trivial. The larger two core changes are to try to fix various USB attached devices which have somewhat eccentric ways of handling the VPD and other mode pages which necessitate multiple revalidates (that were removed in the interests of efficiency) and updating the heuristic for supported VPD pages" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: jazz_esp: Only build if SCSI core is builtin scsi: smartpqi: Fix disable_managed_interrupts scsi: ufs: Uninitialized variable in ufshcd_devfreq_target() scsi: target: pscsi: Fix bio_put() for error case scsi: core: Consult supported VPD page list prior to fetching page scsi: sd: usb_storage: uas: Access media prior to querying device properties
2024-02-23genetlink: make info in GENL_REQ_ATTR_CHECK() constJakub Kicinski1-1/+1
Make the local variable in GENL_REQ_ATTR_CHECK() const. genl_info_dump() returns a const pointer, so the macro is currently hard to use in genl dumps. Link: https://lore.kernel.org/r/20240222222819.156320-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23stackdepot: use variable size records for non-evictable entriesMarco Elver1-0/+3
With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely store differently sized stack traces. In all cases that do not make use of evictions, this wastes lots of space. Fix it by re-introducing variable size stack records (up to the max allowed size) for entries that will never be evicted. We know if an entry will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided, since a later stack_depot_put() attempt is undefined behavior. With my current kernel config that enables KASAN and also SLUB owner tracking, I observe (after a kernel boot) a whopping reduction of 296 stack depot pools, which translates into 4736 KiB saved. The savings here are from SLUB owner tracking only, because KASAN generic mode still uses refcounting. Before: pools: 893 allocations: 29841 frees: 6524 in_use: 23317 freelist_size: 3454 After: pools: 597 refcounted_allocations: 17547 refcounted_frees: 6477 refcounted_in_use: 11070 freelist_size: 3497 persistent_count: 12163 persistent_bytes: 1717008 [elver@google.com: fix -Wstringop-overflow warning] Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/ Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23net: mpls: error out if inner headers are not setFlorian Westphal1-0/+5
mpls_gso_segment() assumes skb_inner_network_header() returns a valid result: mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb); if (unlikely(!mpls_hlen || mpls_hlen % MPLS_HLEN)) goto out; if (unlikely(!pskb_may_pull(skb, mpls_hlen))) With syzbot reproducer, skb_inner_network_header() yields 0, skb_network_header() returns 108, so this will "pskb_may_pull(skb, -108)))" which triggers a newly added DEBUG_NET_WARN_ON_ONCE() check: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull_reason include/linux/skbuff.h:2723 [inline] WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull include/linux/skbuff.h:2739 [inline] WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 mpls_gso_segment+0x773/0xaa0 net/mpls/mpls_gso.c:34 [..] skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53 nsh_gso_segment+0x40a/0xad0 net/nsh/nsh.c:108 skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53 __skb_gso_segment+0x324/0x4c0 net/core/gso.c:124 skb_gso_segment include/net/gso.h:83 [inline] [..] sch_direct_xmit+0x11a/0x5f0 net/sched/sch_generic.c:327 [..] packet_sendmsg+0x46a9/0x6130 net/packet/af_packet.c:3113 [..] First iteration of this patch made mpls_hlen signed and changed test to error out to "mpls_hlen <= 0 || ..". Eric Dumazet said: > I was thinking about adding a debug check in skb_inner_network_header() > if inner_network_header is zero (that would mean it is not 'set' yet), > but this would trigger even after your patch. So add new skb_inner_network_header_was_set() helper and use that. The syzbot reproducer injects data via packet socket. The skb that gets allocated and passed down the stack has ->protocol set to NSH (0x894f) and gso_type set to SKB_GSO_UDP | SKB_GSO_DODGY. This gets passed to skb_mac_gso_segment(), which sees NSH as ptype to find a callback for. nsh_gso_segment() retrieves next type: proto = tun_p_to_eth_p(nsh_hdr(skb)->np); ... which is MPLS (TUN_P_MPLS_UC). It updates skb->protocol and then calls mpls_gso_segment(). Inner offsets are all 0, so mpls_gso_segment() ends up with a negative header size. In case more callers rely on silent handling of such large may_pull values we could also 'legalize' this behaviour, either replacing the debug check with (len > INT_MAX) test or removing it and instead adding a comment before existing if (unlikely(len > skb->len)) return SKB_DROP_REASON_PKT_TOO_SMALL; test in pskb_may_pull_reason(), saying that this check also implicitly takes care of callers that miscompute header sizes. Cc: Simon Horman <horms@kernel.org> Fixes: 219eee9c0d16 ("net: skbuff: add overflow debug check to pull/push helpers") Reported-by: syzbot+99d15fcdb0132a1e1a82@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/00000000000043b1310611e388aa@google.com/raw Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240222140321.14080-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23Merge tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds1-0/+5
Pull misc fixes from Andrew Morton: "A batch of MM (and one non-MM) hotfixes. Ten are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kasan: guard release_free_meta() shadow access with kasan_arch_is_ready() mm/damon/lru_sort: fix quota status loss due to online tunings mm/damon/reclaim: fix quota stauts loss due to online tunings MAINTAINERS: mailmap: update Shakeel's email address mm/damon/sysfs-schemes: handle schemes sysfs dir removal before commit_schemes_quota_goals mm: memcontrol: clarify swapaccount=0 deprecation warning mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array mm/zswap: invalidate duplicate entry when !zswap_enabled lib/Kconfig.debug: TEST_IOV_ITER depends on MMU mm/swap: fix race when skipping swapcache mm/swap_state: update zswap LRU's protection range with the folio locked selftests/mm: uffd-unit-test check if huge page size is 0 mm/damon/core: check apply interval in damon_do_apply_schemes() mm: zswap: fix missing folio cleanup in writeback race path
2024-02-23Merge tag 'drm-fixes-2024-02-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-1/+14
Pull drm fixes from Dave Airlie: "This is the weekly drm fixes. Non-drivers there is a fbdev/sparc fix, syncobj, ttm and buddy fixes. On the driver side, ivpu, meson, i915 have a small fix each. Then amdgpu and xe have a bunch. Nouveau has some minor uapi additions to give userspace some useful info along with a Kconfig change to allow the new GSP firmware paths to be used by default on the GPUs it supports. Seems about the usual amount for this time of release cycle. fbdev: - fix sparc undefined reference syncobj: - fix sync obj fence waiting - handle NULL fence in syncobj eventfd code ttm: - fix invalid free buddy: - fix list handling - fix 32-bit build meson: - don't remove bridges from other drivers nouveau: - fix build warnings - add two minor info parameters - add a Kconfig to allow GSP by default on some GPUs ivpu: - allow fw to do initial tile config i915: - fix TV mode amdgpu: - Suspend/resume fixes - Backlight error fix - DCN 3.5 fixes - Misc fixes xe: - Remove support for persistent exec_queues - Drop a reduntant sysfs newline printout - A three-patch fix for a VM_BIND rebind optimization path - Fix a modpost warning on an xe KUNIT module" * tag 'drm-fixes-2024-02-23' of git://anongit.freedesktop.org/drm/drm: (27 commits) nouveau: add an ioctl to report vram usage nouveau: add an ioctl to return vram bar size. nouveau/gsp: add kconfig option to enable GSP paths by default drm/amdgpu: Fix the runtime resume failure issue drm/amd/display: fix null-pointer dereference on edid reading drm/amd/display: Fix memory leak in dm_sw_fini() drm/amd/display: fix input states translation error for dcn35 & dcn351 drm/amd/display: Fix potential null pointer dereference in dc_dmub_srv drm/amd/display: Only allow dig mapping to pwrseq in new asic drm/amd/display: adjust few initialization order in dm drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set drm/ttm: Fix an invalid freeing on already freed page in error path sparc: Fix undefined reference to fb_is_primary_device drm/xe: Fix modpost warning on xe_mocs kunit module drm/xe/xe_gt_idle: Drop redundant newline in name drm/xe: Return 2MB page size for compact 64k PTEs drm/xe: Add XE_VMA_PTE_64K VMA flag drm/xe: Fix xe_vma_set_pte_size drm/xe/uapi: Remove support for persistent exec_queues ...
2024-02-23iommu/sva: Restore SVA handle sharingJason Gunthorpe1-0/+3
Prior to commit 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains") the code allowed a SVA handle to be bound multiple times to the same (mm, device) pair. This was alluded to in the kdoc comment, but we had understood this to be more a remark about allowing multiple devices, not a literal same-driver re-opening the same SVA. It turns out uacce and idxd were both relying on the core code to handle reference counting for same-device same-mm scenarios. As this looks hard to resolve in the drivers bring it back to the core code. The new design has changed the meaning of the domain->users refcount to refer to the number of devices that are sharing that domain for the same mm. This is part of the design to lift the SVA domain de-duplication out of the drivers. Return the old behavior by explicitly de-duplicating the struct iommu_sva handle. The same (mm, device) will return the same handle pointer and the core code will handle tracking this. The last unbind of the handle will destroy it. Fixes: 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains") Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org> Closes: https://lore.kernel.org/all/20240221110658.529-1-zhangfei.gao@linaro.org/ Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-9455fc497a6f+3b4-iommu_sva_sharing_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-22net: mctp: take ownership of skb in mctp_local_outputJeremy Kerr1-0/+1
Currently, mctp_local_output only takes ownership of skb on success, and we may leak an skb if mctp_local_output fails in specific states; the skb ownership isn't transferred until the actual output routing occurs. Instead, make mctp_local_output free the skb on all error paths up to the route action, so it always consumes the passed skb. Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski2-2/+0
Florian Westphal says: ==================== netfilter updates for net-next 1. Prefer KMEM_CACHE() macro to create kmem caches, from Kunwu Chan. Patches 2 and 3 consolidate nf_log NULL checks and introduces extra boundary checks on family and type to make it clear that no out of bounds access will happen. No in-tree user currently passes such values, but thats not clear from looking at the function. From Pablo Neira Ayuso. Patch 4, also from Pablo, gets rid of unneeded conditional in nft_osf init function. Patch 5, from myself, fixes erroneous Kconfig dependencies that came in an earlier net-next pull request. This should get rid of the xtables related build failure reports. Patches 6 to 10 are an update to nftables' concatenated-ranges set type to speed up element insertions. This series also compacts a few data structures and cleans up a few oddities such as reliance on ZERO_SIZE_PTR when asking to allocate a set with no elements. From myself. Patches 11 moves the nf_reinject function from the netfilter core (vmlinux) into the nfnetlink_queue backend, the only location where this is called from. Also from myself. Patch 12, from Kees Cook, switches xtables' compat layer to use unsafe_memcpy because xt_entry_target cannot easily get converted to a real flexible array (its UAPI and used inside other structs). * tag 'nf-next-24-02-21' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: x_tables: Use unsafe_memcpy() for 0-sized destination netfilter: move nf_reinject into nfnetlink_queue modules netfilter: nft_set_pipapo: use GFP_KERNEL for insertions netfilter: nft_set_pipapo: speed up bulk element insertions netfilter: nft_set_pipapo: shrink data structures netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR netfilter: nft_set_pipapo: constify lookup fn args where possible netfilter: xtables: fix up kconfig dependencies netfilter: nft_osf: simplify init path netfilter: nf_log: validate nf_logger_find_get() netfilter: nf_log: consolidate check for NULL logger in lookup function netfilter: expect: Simplify the allocation of slab caches in nf_conntrack_expect_init ==================== Link: https://lore.kernel.org/r/20240221112637.5396-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23nouveau: add an ioctl to report vram usageDave Airlie1-0/+7
This reports the currently used vram allocations. userspace using this has been proposed for nvk, but it's a rather trivial uapi addition. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-02-23nouveau: add an ioctl to return vram bar size.Dave Airlie1-0/+7
This returns the BAR resources size so userspace can make decisions based on rebar support. userspace using this has been proposed for nvk, but it's a rather trivial uapi addition. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski15-23/+76
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/udp.c f796feabb9f5 ("udp: add local "peek offset enabled" flag") 56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)") Adjacent changes: net/unix/garbage.c aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.") 11498715f266 ("af_unix: Remove io_uring code for GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-nextJakub Kicinski4-7/+58
Kalle Valo says: ==================== wireless-next patches for v6.9 The third "new features" pull request for v6.9. This is a quick followup to send commit 04edb5dc68f4 ("wifi: ath12k: Fix uninitialized use of ret in ath12k_mac_allocate()") to fix the ath12k clang warning introduced in the previous pull request. We also have support for QCA2066 in ath11k, several new features in ath12k and few other changes in drivers. In stack it's mostly cleanup and refactoring. Major changes: ath12k * firmware-2.bin support * support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) * QCN9274: support split-PHY devices * WCN7850: enable Power Save Mode in station mode * WCN7850: P2P support ath11k: * QCA6390 & WCN6855: support 2 concurrent station interfaces * QCA2066 support iwlwifi * mvm: support wider-bandwidth OFDMA * bump firmware API to 90 for BZ/SC devices brcmfmac * DMI nvram filename quirk for ACEPC W5 Pro * tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (75 commits) wifi: wilc1000: revert reset line logic flip wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro wifi: rtlwifi: set initial values for unexpected cases of USB endpoint priority wifi: rtl8xxxu: check vif before using in rtl8xxxu_tx() wifi: rtlwifi: rtl8192cu: Fix TX aggregation wifi: wilc1000: remove AKM suite be32 conversion for external auth request wifi: nl80211: refactor parsing CSA offsets wifi: nl80211: force WLAN_AKM_SUITE_SAE in big endian in NL80211_CMD_EXTERNAL_AUTH wifi: iwlwifi: load b0 version of ucode for HR1/HR2 wifi: iwlwifi: handle per-phy statistics from fw wifi: iwlwifi: iwl-fh.h: fix kernel-doc issues wifi: iwlwifi: api: fix kernel-doc reference wifi: iwlwifi: mvm: unlock mvm if there is no primary link wifi: iwlwifi: bump FW API to 90 for BZ/SC devices wifi: iwlwifi: mvm: support PHY context version 6 wifi: iwlwifi: mvm: partially support PHY context version 6 wifi: iwlwifi: mvm: support wider-bandwidth OFDMA wifi: cfg80211: use ML element parsing helpers wifi: mac80211: align ieee80211_mle_get_bss_param_ch_cnt() wifi: cfg80211: refactor RNR parsing ... ==================== Link: https://lore.kernel.org/r/20240222105205.CEC54C433F1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge tag 'vfs-6.8-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-0/+2
Pull vfs fixes from Christian Brauner: - Fix a memory leak in cachefiles - Restrict aio cancellations to I/O submitted through the aio interfaces as this is otherwise causing issues for I/O submitted via io_uring - Increase buffer for afs volume status to avoid overflow - Fix a missing zero-length check in unbuffered writes in the netfs library. If generic_write_checks() returns zero make netfs_unbuffered_write_iter() return right away - Prevent a leak in i_dio_count caused by netfs_begin_read() operating past i_size. It will return early and leave i_dio_count incremented - Account for ipv4 addresses as well as ipv6 addresses when processing incoming callbacks in afs * tag 'vfs-6.8-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio afs: Increase buffer size in afs_update_volume_status() afs: Fix ignored callbacks over ipv4 cachefiles: fix memory leak in cachefiles_add_cache() netfs: Fix missing zero-length check in unbuffered write netfs: Fix i_dio_count leak on DIO read past i_size
2024-02-22Merge tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds3-2/+5
Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - af_unix: fix another unix GC hangup Previous releases - regressions: - core: fix a possible AF_UNIX deadlock - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready() - netfilter: nft_flow_offload: release dst in case direct xmit path is used - bridge: switchdev: ensure MDB events are delivered exactly once - l2tp: pass correct message length to ip6_append_data - dccp/tcp: unhash sk from ehash for tb2 alloc failure after check_estalblished() - tls: fixes for record type handling with PEEK - devlink: fix possible use-after-free and memory leaks in devlink_init() Previous releases - always broken: - bpf: fix an oops when attempting to read the vsyscall page through bpf_probe_read_kernel - sched: act_mirred: use the backlog for mirred ingress - netfilter: nft_flow_offload: fix dst refcount underflow - ipv6: sr: fix possible use-after-free and null-ptr-deref - mptcp: fix several data races - phonet: take correct lock to peek at the RX queue Misc: - handful of fixes and reliability improvements for selftests" * tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) l2tp: pass correct message length to ip6_append_data net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY selftests: ioam: refactoring to align with the fix Fix write to cloned skb in ipv6_hop_ioam() phonet/pep: fix racy skb_queue_empty() use phonet: take correct lock to peek at the RX queue net: sparx5: Add spinlock for frame transmission from CPU net/sched: flower: Add lock protection when remove filter handle devlink: fix port dump cmd type net: stmmac: Fix EST offset for dwmac 5.10 tools: ynl: don't leak mcast_groups on init error tools: ynl: make sure we always pass yarg to mnl_cb_run net: mctp: put sock on tag allocation failure netfilter: nf_tables: use kzalloc for hook allocation netfilter: nf_tables: register hooks last when adding new chain/flowtable netfilter: nft_flow_offload: release dst in case direct xmit path is used netfilter: nft_flow_offload: reset dst in route object after setting up flow netfilter: nf_tables: set dormant flag on hook register failure selftests: tls: add test for peeking past a record of a different type selftests: tls: add test for merging of same-type control messages ...
2024-02-22net: mctp: provide a more specific tag allocation ioctlJeremy Kerr1-0/+32
Now that we have net-specific tags, extend the tag allocation ioctls (SIOCMCTPALLOCTAG / SIOCMCTPDROPTAG) to allow a network parameter to be passed to the tag allocation. We also add a local_addr member to the ioc struct, to allow for a future finer-grained tag allocation using local EIDs too. We don't add any specific support for that now though, so require MCTP_ADDR_ANY or MCTP_ADDR_NULL for those at present. The old ioctls will still work, but allocate for the default MCTP net. These are now marked as deprecated in the header. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: separate key correlation across netsJeremy Kerr1-0/+2
Currently, we lookup sk_keys from the entire struct net_namespace, which may contain multiple MCTP net IDs. In those cases we want to distinguish between endpoints with the same EID but different net ID. Add the net ID data to the struct mctp_sk_key, populate on add and filter on this during route lookup. For the ioctl interface, we use a default net of MCTP_INITIAL_DEFAULT_NET (ie., what will be in use for single-net configurations), but we'll extend the ioctl interface to provide net-specific tag allocation in an upcoming change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: avoid confusion over local/peer dest/source addressesJeremy Kerr1-2/+2
We have a double-swap of local and peer addresses in mctp_alloc_local_tag; the arguments in both call sites are swapped, but there is also a swap in the implementation of alloc_local_tag. This is opaque because we're using source/dest address references, which don't match the local/peer semantics. Avoid this confusion by naming the arguments as 'local' and 'peer', and remove the double swap. The calling order now matches mctp_key_alloc. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-21udp: add local "peek offset enabled" flagPaolo Abeni1-0/+10
We want to re-organize the struct sock layout. The sk_peek_off field location is problematic, as most protocols want it in the RX read area, while UDP wants it on a cacheline different from sk_receive_queue. Create a local (inside udp_sock) copy of the 'peek offset is enabled' flag and place it inside the same cacheline of reader_queue. Check such flag before reading sk_peek_off. This will save potential false sharing and cache misses in the fast-path. Tested under UDP flood with small packets. The struct sock layout update causes a 4% performance drop, and this patch restores completely the original tput. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/67ab679c15fbf49fa05b3ffe05d91c47ab84f147.1708426665.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22netfilter: nft_flow_offload: reset dst in route object after setting up flowPablo Neira Ayuso1-1/+1
dst is transferred to the flow object, route object does not own it anymore. Reset dst in route object, otherwise if flow_offload_add() fails, error path releases dst twice, leading to a refcount underflow. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-21net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2220 PHYDimitri Fedrau1-0/+1
Add a driver for the Marvell 88Q2220. This driver allows to detect the link, switch between 100BASE-T1 and 1000BASE-T1 and switch between master and slave mode. Autonegotiation is supported. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-6-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>