wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2025-01-29	block: get rid of request queue ->sysfs_dir_lock	Nilay Shroff	5	-31/+5
	The request queue uses ->sysfs_dir_lock for protecting the addition/ deletion of kobject entries under sysfs while we register/unregister blk-mq. However kobject addition/deletion is already protected with kernfs/sysfs internal synchronization primitives. So use of q->sysfs_ dir_lock seems redundant. Moreover, q->sysfs_dir_lock is also used at few other callsites along with q->sysfs_lock for protecting the addition/deletion of kojects. One such example is when we register with sysfs a set of independent access ranges for a disk. Here as well we could get rid off q->sysfs_ dir_lock and only use q->sysfs_lock. The only variable which q->sysfs_dir_lock appears to protect is q-> mq_sysfs_init_done which is set/unset while registering/unregistering blk-mq with sysfs. But use of q->mq_sysfs_init_done could be easily replaced using queue registered bit QUEUE_FLAG_REGISTERED. So with this patch we remove q->sysfs_dir_lock from each callsite and replace q->mq_sysfs_init_done using QUEUE_FLAG_REGISTERED. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250128143436.874357-2-nilay@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-27	loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}	Christoph Hellwig	1	-2/+1
	LOOP_SET_STATUS{,64} can set a lot more flags than it is supposed to clear (the LOOP_SET_STATUS_CLEARABLE_FLAGS vs LOOP_SET_STATUS_SETTABLE_FLAGS defines should have been a hint..). Fix this by only clearing the bits in LOOP_SET_STATUS_CLEARABLE_FLAGS. Fixes: ae074d07a0e5 ("loop: move updating lo_flag s out of loop_set_status_from_info") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250127143045.538279-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-24	md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime	Yu Kuai	2	-1/+9
	After commit ec6bb299c7c3 ("md/md-bitmap: add 'sync_size' into struct md_bitmap_stats"), following panic is reported: Oops: general protection fault, probably for non-canonical address RIP: 0010:bitmap_get_stats+0x2b/0xa0 Call Trace: <TASK> md_seq_show+0x2d2/0x5b0 seq_read_iter+0x2b9/0x470 seq_read+0x12f/0x180 proc_reg_read+0x57/0xb0 vfs_read+0xf6/0x380 ksys_read+0x6c/0xf0 do_syscall_64+0x82/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Root cause is that bitmap_get_stats() can be called at anytime if mddev is still there, even if bitmap is destroyed, or not fully initialized. Deferenceing bitmap in this case can crash the kernel. Meanwhile, the above commit start to deferencing bitmap->storage, make the problem easier to trigger. Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex. Cc: stable@vger.kernel.org # v6.12+ Fixes: 32a7627cf3a3 ("[PATCH] md: optimised resync using Bitmap based intent logging") Reported-and-tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Closes: https://lore.kernel.org/linux-raid/ca3a91a2-50ae-4f68-b317-abd9889f3907@oracle.com/T/#m6e5086c95201135e4941fe38f9efa76daf9666c5 Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250124092055.4050195-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-23	blk-mq: create correct map for fallback case	Daniel Wagner	1	-2/+1
	The fallback code in blk_mq_map_hw_queues is original from blk_mq_pci_map_queues and was added to handle the case where pci_irq_get_affinity will return NULL for !SMP configuration. blk_mq_map_hw_queues replaces besides blk_mq_pci_map_queues also blk_mq_virtio_map_queues which used to use blk_mq_map_queues for the fallback. It's possible to use blk_mq_map_queues for both cases though. blk_mq_map_queues creates the same map as blk_mq_clear_mq_map for !SMP that is CPU 0 will be mapped to hctx 0. The WARN_ON_ONCE has to be dropped for virtio as the fallback is also taken for certain configuration on default. Though there is still a WARN_ON_ONCE check in lib/group_cpus.c: WARN_ON(nr_present + nr_others < numgrps); which will trigger if the caller tries to create more hardware queues than CPUs. It tests the same as the WARN_ON_ONCE in blk_mq_pci_map_queues did. Fixes: a5665c3d150c ("virtio: blk/scsi: replace blk_mq_virtio_map_queues with blk_mq_map_hw_queues") Reported-by: Steven Rostedt <rostedt@goodmis.org> Closes: https://lore.kernel.org/all/20250122093020.6e8a4e5b@gandalf.local.home/ Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20250123-fix-blk_mq_map_hw_queues-v1-1-08dbd01f2c39@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-23	block: don't revert iter for -EIOCBQUEUED	Jens Axboe	1	-2/+3
	blkdev_read_iter() has a few odd checks, like gating the position and count adjustment on whether or not the result is bigger-than-or-equal to zero (where bigger than makes more sense), and not checking the return value of blkdev_direct_IO() before doing an iov_iter_revert(). The latter can lead to attempting to revert with a negative value, which when passed to iov_iter_revert() as an unsigned value will lead to throwing a WARN_ON() because unroll is bigger than MAX_RW_COUNT. Be sane and don't revert for -EIOCBQUEUED, like what is done in other spots. Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-21	cachestat: fix page cache statistics permission checking	Linus Torvalds	1	-0/+17
	When the 'cachestat()' system call was added in commit cf264e1329fb ("cachestat: implement cachestat syscall"), it was meant to be a much more convenient (and performant) version of mincore() that didn't need mapping things into the user virtual address space in order to work. But it ended up missing the "check for writability or ownership" fix for mincore(), done in commit 134fca9063ad ("mm/mincore.c: make mincore() more conservative"). This just adds equivalent logic to 'cachestat()', modified for the file context (rather than vma). Reported-by: Sudheendra Raghav Neela <sneela@tugraz.at> Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") Tested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-21	MAINTAINERS: ipmi: update my email address	Corey Minyard	1	-1/+1
	Old email still works, and will indefinitely, but I'm switching to a new one. Signed-off-by: Corey Minyard <corey@minyard.net>
2025-01-21	rseq: Fix rseq unregistration regression	Mathieu Desnoyers	1	-1/+1
	A logic inversion in rseq_reset_rseq_cpu_node_id() causes the rseq unregistration to fail when rseq_validate_ro_fields() succeeds rather than the opposite. This affects both CONFIG_DEBUG_RSEQ=y and CONFIG_DEBUG_RSEQ=n. Fixes: 7d5265ffcd8b ("rseq: Validate read-only fields under DEBUG_RSEQ config") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250116205956.836074-1-mathieu.desnoyers@efficios.com
2025-01-20	Input: synaptics - fix crash when enabling pass-through port	Dmitry Torokhov	2	-14/+43
	When enabling a pass-through port an interrupt might come before psmouse driver binds to the pass-through port. However synaptics sub-driver tries to access psmouse instance presumably associated with the pass-through port to figure out if only 1 byte of response or entire protocol packet needs to be forwarded to the pass-through port and may crash if psmouse instance has not been attached to the port yet. Fix the crash by introducing open() and close() methods for the port and check if the port is open before trying to access psmouse instance. Because psmouse calls serio_open() only after attaching psmouse instance to serio port instance this prevents the potential crash. Reported-by: Takashi Iwai <tiwai@suse.de> Fixes: 100e16959c3c ("Input: libps2 - attach ps2dev instances as serio port's drvdata") Link: https://bugzilla.suse.com/show_bug.cgi?id=1219522 Cc: stable@vger.kernel.org Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/Z4qSHORvPn7EU2j1@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-01-20	Input: atkbd - map F23 key to support default copilot shortcut	Mark Pearson	1	-1/+1
	Microsoft defined Meta+Shift+F23 as the Copilot shortcut instead of a dedicated keycode, and multiple vendors have their keyboards emit this sequence in response to users pressing a dedicated "Copilot" key. Unfortunately the default keymap table in atkbd does not map scancode 0x6e (F23) and so the key combination does not work even if userspace is ready to handle it. Because this behavior is common between multiple vendors and the scancode is currently unused map 0x6e to keycode 193 (KEY_F23) so that key sequence is generated properly. MS documentation for the scan code: https://learn.microsoft.com/en-us/windows/win32/inputdev/about-keyboard-input#scan-codes Confirmed on Lenovo, HP and Dell machines by Canonical. Tested on Lenovo T14s G6 AMD. Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250107034554.25843-1-mpearson-lenovo@squebb.ca Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-01-20	net/rose: prevent integer overflows in rose_setsockopt()	Nikita Zhandarovich	1	-8/+8
	In case of possible unpredictably large arguments passed to rose_setsockopt() and multiplied by extra values on top of that, integer overflows may occur. Do the safest minimum and fix these issues by checking the contents of 'opt' and returning -EINVAL if they are too large. Also, switch to unsigned int and remove useless check for negative 'opt' in ROSE_IDLE case. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://patch.msgid.link/20250115164220.19954-1-n.zhandarovich@fintech.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: phylink: fix regression when binding a PHY	Russell King (Oracle)	1	-1/+5
	Some PHYs don't support clause 45 access, and return -EOPNOTSUPP from phy_modify_mmd(), which causes phylink_bringup_phy() to fail. Prevent this failure by allowing -EOPNOTSUPP to also mean success. Reported-by: Jiawen Wu <jiawenwu@trustnetic.com> Tested-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1tZp1a-001V62-DT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup	Roger Quadros	1	-46/+77
	Introduce am65_cpsw_create_txqs() and am65_cpsw_destroy_txqs() and use them. Signed-off-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250117-am65-cpsw-streamline-v2-3-91a29c97e569@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup	Roger Quadros	1	-124/+119
	Introduce am65_cpsw_create_rxqs() and am65_cpsw_destroy_rxqs() and use them. Signed-off-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250117-am65-cpsw-streamline-v2-2-91a29c97e569@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path	Roger Quadros	1	-23/+44
	We are missing netif_napi_del() and am65_cpsw_nuss_free_tx/rx_chns() in error path when am65_cpsw_nuss_init_tx/rx_chns() is used anywhere other than at probe(). i.e. am65_cpsw_nuss_update_tx_rx_chns and am65_cpsw_nuss_resume() As reported, in am65_cpsw_nuss_update_tx_rx_chns(), if am65_cpsw_nuss_init_tx_chns() partially fails then devm_add_action(dev, am65_cpsw_nuss_free_tx_chns,..) is added but the cleanup via am65_cpsw_nuss_free_tx_chns() will not run. Same issue exists for am65_cpsw_nuss_init_tx/rx_chns() failures in am65_cpsw_nuss_resume() as well. This would otherwise require more instances of devm_add/remove_action and is clearly more of a distraction than any benefit. So, drop devm_add/remove_action for am65_cpsw_nuss_free_tx/rx_chns() and call am65_cpsw_nuss_free_tx/rx_chns() and netif_napi_del() where required. Reported-by: Siddharth Vadapalli <s-vadapalli@ti.com> Closes: https://lore.kernel.org/all/m4rhkzcr7dlylxr54udyt6lal5s2q4krrvmyay6gzgzhcu4q2c@r34snfumzqxy/ Signed-off-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250117-am65-cpsw-streamline-v2-1-91a29c97e569@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.	Kuniyuki Iwashima	1	-4/+8
	Let's register inet6_rtm_deladdr() with RTNL_FLAG_DOIT_PERNET and hold rtnl_net_lock() before inet6_addr_del(). Now that inet6_addr_del() is always called under per-netns RTNL. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.	Kuniyuki Iwashima	1	-8/+17
	Let's register inet6_rtm_newaddr() with RTNL_FLAG_DOIT_PERNET and hold rtnl_net_lock() before __dev_get_by_index(). Now that inet6_addr_add() and inet6_addr_modify() are always called under per-netns RTNL. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Move lifetime validation to inet6_rtm_newaddr().	Kuniyuki Iwashima	1	-58/+35
	inet6_addr_add() and inet6_addr_modify() have the same code to validate IPv6 lifetime that is done under RTNL. Let's factorise it out to inet6_rtm_newaddr() so that we can validate the lifetime without RTNL later. Note that inet6_addr_add() is called from addrconf_add_ifaddr(), but the lifetime is INFINITY_LIFE_TIME in the path, so expires and flags are 0. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().	Kuniyuki Iwashima	1	-7/+7
	We will convert inet6_rtm_newaddr() to per-netns RTNL. Except for IFA_F_OPTIMISTIC, cfg.ifa_flags can be set before __dev_get_by_index(). Let's move ifa_flags setup before __dev_get_by_index() so that we can set ifa_flags without RTNL. Also, now it's moved before tb[IFA_CACHEINFO] in preparing for the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Pass dev to inet6_addr_add().	Kuniyuki Iwashima	1	-10/+10
	inet6_addr_add() is called from inet6_rtm_newaddr() and addrconf_add_ifaddr(). inet6_addr_add() looks up dev by __dev_get_by_index(), but it's already done in inet6_rtm_newaddr(). Let's move the 2nd lookup to addrconf_add_ifaddr() and pass dev to inet6_addr_add(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Convert inet6_ioctl() to per-netns RTNL.	Kuniyuki Iwashima	1	-6/+6
	These functions are called from inet6_ioctl() with a socket's netns and hold RTNL. * SIOCSIFADDR : addrconf_add_ifaddr() * SIOCDIFADDR : addrconf_del_ifaddr() * SIOCSIFDSTADDR : addrconf_set_dstaddr() Let's use rtnl_net_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().	Kuniyuki Iwashima	1	-5/+5
	addrconf_init() holds RTNL for blackhole_netdev, which is the global device in init_net. addrconf_cleanup() holds RTNL to clean up devices in init_net too. Let's use rtnl_net_lock(&init_net) there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Hold rtnl_net_lock() in addrconf_dad_work().	Kuniyuki Iwashima	1	-3/+6
	addrconf_dad_work() is per-address work and holds RTNL internally. We can fetch netns as dev_net(ifp->idev->dev). Let's use rtnl_net_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Hold rtnl_net_lock() in addrconf_verify_work().	Kuniyuki Iwashima	1	-2/+2
	addrconf_verify_work() is per-netns work to call addrconf_verify_rtnl() under RTNL. Let's use rtnl_net_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.	Kuniyuki Iwashima	1	-30/+30
	net.ipv6.conf.${DEV}.XXX sysctl are changed under RTNL: * forwarding * ignore_routes_with_linkdown * disable_ipv6 * proxy_ndp * addr_gen_mode * stable_secret * disable_policy Let's use rtnl_net_lock() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	ipv6: Add __in6_dev_get_rtnl_net().	Kuniyuki Iwashima	1	-0/+5
	We will convert rtnl_lock() with rtnl_net_lock(), and we want to convert __in6_dev_get() too. __in6_dev_get() uses rcu_dereference_rtnl(), but as written in its comment, rtnl_dereference() or rcu_dereference() is preferable. Let's add __in6_dev_get_rtnl_net() that uses rtnl_net_dereference(). We can add the RCU version helper later if needed. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115080608.28127-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags	Furong Xu	1	-6/+0
	After commit df542f669307 ("net: stmmac: Switch to zero-copy in non-XDP RX path"), SKBs are always marked for recycle, it is redundant to mark SKBs more than once when new frags are appended. Signed-off-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250117062805.192393-1-0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: mii: Fix the Speed display when the network cable is not connected	Xiangqian Zhang	1	-0/+3
	Two different models of usb card, the drivers are r8152 and asix. If no network cable is connected, Speed = 10Mb/s. This problem is repeated in linux 3.10, 4.19, 5.4, 6.12. This problem also exists on the latest kernel. Both drivers call mii_ethtool_get_link_ksettings, but the value of cmd->base.speed in this function can only be SPEED_1000 or SPEED_100 or SPEED_10. When the network cable is not connected, set cmd->base.speed =SPEED_UNKNOWN. Signed-off-by: Xiangqian Zhang <zhangxiangqian@kylinos.cn> Link: https://patch.msgid.link/20250117094603.4192594-1-zhangxiangqian@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	sysctl net: Remove macro checks for CONFIG_SYSCTL	Denis Kirjanov	2	-8/+0
	Since dccp and llc makefiles already check sysctl code compilation with xxx-$(CONFIG_SYSCTL) we can drop the checks Signed-off-by: Denis Kirjanov <kirjanov@gmail.com> Link: https://patch.msgid.link/20250119134254.19250-1-kirjanov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	eth: bnxt: update header sizing defaults	Jakub Kicinski	1	-1/+6
	300-400B RPC requests are fairly common. With the current default of 256B HDS threshold bnxt ends up splitting those, lowering PCIe bandwidth efficiency and increasing the number of memory allocation. Increase the HDS threshold to fit 4 buffers in a 4k page. This works out to 640B as the threshold on a typical kernel confing. This change increases the performance for a microbenchmark which receives 400B RPCs and sends empty responses by 4.5%. Admittedly this is just a single benchmark, but 256B works out to just 6 (so 2 more) packets per head page, because shinfo size dominates the headers. Now that we use page pool for the header pages I was also tempted to default rx_copybreak to 0, but in synthetic testing the copybreak size doesn't seem to make much difference. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	eth: bnxt: allocate enough buffer space to meet HDS threshold	Jakub Kicinski	1	-3/+4
	Now that we can configure HDS threshold separately from the rx_copybreak HDS threshold may be higher than rx_copybreak. We need to make sure that we have enough space for the headers. Fixes: 6b43673a25c3 ("bnxt_en: add support for hds-thresh ethtool command") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: ethtool: populate the default HDS params in the core	Jakub Kicinski	3	-6/+4
	The core has the current HDS config, it can pre-populate the values for the drivers. While at it, remove the zero-setting in netdevsim. Zero are the default values since the config is zalloc'ed. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	eth: bnxt: apply hds_thrs settings correctly	Jakub Kicinski	1	-1/+1
	Use the pending config for hds_thrs. Core will only update the "current" one after we return success. Without this change 2 reconfigs would be required for the setting to reach the device. Fixes: 6b43673a25c3 ("bnxt_en: add support for hds-thresh ethtool command") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: provide pending ring configuration in net_device	Jakub Kicinski	4	-8/+29
	Record the pending configuration in net_device struct. ethtool core duplicates the current config and the specific handlers (for now just ringparam) can modify it. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: ethtool: store netdev in a temp variable in ethnl_default_set_doit()	Jakub Kicinski	1	-3/+6
	For ease of review of the next patch store the dev pointer on the stack, instead of referring to req_info.dev every time. No functional changes. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: move HDS config from ethtool state	Jakub Kicinski	10	-24/+43
	Separate the HDS config from the ethtool state struct. The HDS config contains just simple parameters, not state. Having it as a separate struct will make it easier to clone / copy and also long term potentially make it per-queue. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Use consume_skb() in connect() and sendmsg().	Kuniyuki Iwashima	1	-14/+14
	This is based on Donald Hunter's patch. These functions could fail for various reasons, sometimes triggering kfree_skb(). * unix_stream_connect() : connect() * unix_stream_sendmsg() : sendmsg() * queue_oob() : sendmsg(MSG_OOB) * unix_dgram_sendmsg() : sendmsg() Such kfree_skb() is tied to the errno of connect() and sendmsg(), and we need not define skb drop reasons. Let's use consume_skb() not to churn kfree_skb() events. Link: https://lore.kernel.org/netdev/eb30b164-7f86-46bf-a5d3-0f8bda5e9398@redhat.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Reuse out_pipe label in unix_stream_sendmsg().	Kuniyuki Iwashima	1	-14/+9
	This is a follow-up of commit d460b04bc452 ("af_unix: Clean up error paths in unix_stream_sendmsg()."). If we initialise skb with NULL in unix_stream_sendmsg(), we can reuse the existing out_pipe label for the SEND_SHUTDOWN check. Let's rename it and adjust the existing label as out_pipe_lock. While at it, size and data_len are moved to the while loop scope. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Set drop reason in unix_dgram_disconnected().	Kuniyuki Iwashima	2	-1/+10
	unix_dgram_disconnected() is called from two places: 1. when a connect()ed socket dis-connect()s or re-connect()s to another socket 2. when sendmsg() fails because the peer socket that the client has connect()ed to has been close()d Then, the client's recv queue is purged to remove all messages from the old peer socket. Let's define a new drop reason for that case. # echo 1 > /sys/kernel/tracing/events/skb/kfree_skb/enable # python3 >>> from socket import * >>> >>> # s1 has a message from s2 >>> s1, s2 = socketpair(AF_UNIX, SOCK_DGRAM) >>> s2.send(b'hello world') >>> >>> # re-connect() drops the message from s2 >>> s3 = socket(AF_UNIX, SOCK_DGRAM) >>> s3.bind('') >>> s1.connect(s3.getsockname()) # cat /sys/kernel/tracing/trace_pipe python3-250 ... kfree_skb: ... location=skb_queue_purge_reason+0xdc/0x110 reason: UNIX_DISCONNECT Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Set drop reason in unix_stream_read_skb().	Kuniyuki Iwashima	1	-2/+2
	unix_stream_read_skb() is called when BPF SOCKMAP reads some data from a socket in the map. SOCKMAP does not support MSG_OOB, and reading OOB results in a drop. Let's set drop reasons respectively. * SOCKET_CLOSE : the socket in SOCKMAP was close()d * UNIX_SKIP_OOB : OOB was read from the socket in SOCKMAP Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Set drop reason in manage_oob().	Kuniyuki Iwashima	2	-1/+7
	AF_UNIX SOCK_STREAM socket supports MSG_OOB. When OOB data is sent to a socket, recv() will break at that point. If the next recv() does not have MSG_OOB, the normal data following the OOB data is returned. Then, the OOB skb is dropped. Let's define a new drop reason for that case in manage_oob(). # echo 1 > /sys/kernel/tracing/events/skb/kfree_skb/enable # python3 >>> from socket import * >>> s1, s2 = socketpair(AF_UNIX) >>> s1.send(b'a', MSG_OOB) >>> s1.send(b'b') >>> s2.recv(2) b'b' # cat /sys/kernel/tracing/trace_pipe ... python3-223 ... kfree_skb: ... location=unix_stream_read_generic+0x59e/0xc20 reason: UNIX_SKIP_OOB Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Set drop reason in __unix_gc().	Kuniyuki Iwashima	1	-1/+1
	Inflight file descriptors by SCM_RIGHTS hold references to the struct file. AF_UNIX sockets could hold references to each other, forming reference cycles. Once such sockets are close()d without the fd recv()ed, they will be unaccessible from userspace but remain in kernel. __unix_gc() garbage-collects skb with the dead file descriptors and frees them by __skb_queue_purge(). Let's set SKB_DROP_REASON_SOCKET_CLOSE there. # echo 1 > /sys/kernel/tracing/events/skb/kfree_skb/enable # python3 >>> from socket import * >>> from array import array >>> >>> # Create a reference cycle >>> s1 = socket(AF_UNIX, SOCK_DGRAM) >>> s1.bind('') >>> s1.sendmsg([b"nop"], [(SOL_SOCKET, SCM_RIGHTS, array("i", [s1.fileno()]))], 0, s1.getsockname()) >>> s1.close() >>> >>> # Trigger GC >>> s2 = socket(AF_UNIX) >>> s2.close() # cat /sys/kernel/tracing/trace_pipe ... kworker/u16:2-42 ... kfree_skb: ... location=__unix_gc+0x4ad/0x580 reason: SOCKET_CLOSE Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Set drop reason in unix_sock_destructor().	Kuniyuki Iwashima	1	-1/+1
	unix_sock_destructor() is called as sk->sk_destruct() just before the socket is actually freed. Let's use SKB_DROP_REASON_SOCKET_CLOSE for skb_queue_purge(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	af_unix: Set drop reason in unix_release_sock().	Kuniyuki Iwashima	2	-2/+5
	unix_release_sock() is called when the last refcnt of struct file is released. Let's define a new drop reason SKB_DROP_REASON_SOCKET_CLOSE and set it for kfree_skb() in unix_release_sock(). # echo 1 > /sys/kernel/tracing/events/skb/kfree_skb/enable # python3 >>> from socket import * >>> s1, s2 = socketpair(AF_UNIX) >>> s1.send(b'hello world') >>> s2.close() # cat /sys/kernel/tracing/trace_pipe ... python3-280 ... kfree_skb: ... protocol=0 location=unix_release_sock+0x260/0x420 reason: SOCKET_CLOSE To be precise, unix_release_sock() is also called for a new child socket in unix_stream_connect() when something fails, but the new sk does not have skb in the recv queue then and no event is logged. Note that only tcp_inbound_ao_hash() uses a similar drop reason, SKB_DROP_REASON_TCP_CLOSE, and this can be generalised later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	net: dropreason: Gather SOCKET_ drop reasons.	Kuniyuki Iwashima	1	-6/+6
	The following patch adds a new drop reason starting with the SOCKET_ prefix. Let's gather the existing SOCKET_ reasons. Note that the order is not part of uAPI. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250116053441.5758-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	selftests/net/ipsec: Fix Null pointer dereference in rtattr_pack()	Liu Ye	1	-1/+2
	Address Null pointer dereference / undefined behavior in rtattr_pack (note that size is 0 in the bad case). Flagged by cppcheck as: tools/testing/selftests/net/ipsec.c:230:25: warning: Possible null pointer dereference: payload [nullPointer] memcpy(RTA_DATA(attr), payload, size); ^ tools/testing/selftests/net/ipsec.c:1618:54: note: Calling function 'rtattr_pack', 4th argument 'NULL' value is 0 if (rtattr_pack(&req.nh, sizeof(req), XFRMA_IF_ID, NULL, 0)) { ^ tools/testing/selftests/net/ipsec.c:230:25: note: Null pointer dereference memcpy(RTA_DATA(attr), payload, size); ^ Signed-off-by: Liu Ye <liuye@kylinos.cn> Link: https://patch.msgid.link/20250116013037.29470-1-liuye@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	x86: use cmov for user address masking	Linus Torvalds	2	-9/+8
	This was a suggestion by David Laight, and while I was slightly worried that some micro-architecture would predict cmov like a conditional branch, there is little reason to actually believe any core would be that broken. Intel documents that their existing cores treat CMOVcc as a data dependency that will constrain speculation in their "Speculative Execution Side Channel Mitigations" whitepaper: "Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also be used to prevent bounds check bypass by constraining speculative execution on current family 6 processors (Intel® Core™, Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors)" and while that leaves the future uarch issues open, that's certainly true of our traditional SBB usage too. Any core that predicts CMOV will be unusable for various crypto algorithms that need data-independent timing stability, so let's just treat CMOV as the safe choice that simplifies the address masking by avoiding an extra instruction and doesn't need a temporary register. Suggested-by: David Laight <David.Laight@aculab.com> Link: https://www.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-20	x86: use proper 'clac' and 'stac' opcode names	Linus Torvalds	1	-11/+7
	Back when we added SMAP support, all versions of binutils didn't necessarily understand the 'clac' and 'stac' instructions. So we implemented those instructions manually as ".byte" sequences. But we've since upgraded the minimum version of binutils to version 2.25, and that included proper support for the SMAP instructions, and there's no reason for us to use some line noise to express them any more. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-20	net: phy: realtek: HWMON support for standalone versions of RTL8221B and RTL8251	Aleksander Jan Bajkowski	1	-0/+5
	HWMON support has been added for the RTL8221/8251 PHYs integrated together with the MAC inside the RTL8125/8126 chips. This patch extends temperature reading support for standalone variants of the mentioned PHYs. I don't know whether the earlier revisions of the RTL8226 also have a built-in temperature sensor, so they have been skipped for now. Tested on RTL8221B-VB-CG. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20	tcp_cubic: fix incorrect HyStart round start detection	Mahdi Arghavani	1	-3/+5
	I noticed that HyStart incorrectly marks the start of rounds, leading to inaccurate measurements of ACK train lengths and resetting the `ca->sample_cnt` variable. This inaccuracy can impact HyStart's functionality in terminating exponential cwnd growth during Slow-Start, potentially degrading TCP performance. The issue arises because the changes introduced in commit 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows") moved the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function. This modification added an additional condition for triggering the caller, requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also be satisfied before invoking `bictcp_hystart_reset`. This fix ensures that `bictcp_hystart_reset` is correctly called at the start of a new round, regardless of the congestion window size. This is achieved by moving the condition (tcp_snd_cwnd(tp) >= hystart_low_window) from before calling `bictcp_hystart_reset` to after it. I tested with a client and a server connected through two Linux software routers. In this setup, the minimum RTT was 150 ms, the bottleneck bandwidth was 50 Mbps, and the bottleneck buffer size was 1 BDP, calculated as (50M / 1514 / 8) * 0.150 = 619 packets. I conducted the test twice, transferring data from the server to the client for 1.5 seconds. Before the patch was applied, HYSTART-DELAY stopped the exponential growth of cwnd when cwnd = 516, and the bottleneck link was not yet saturated (516 < 619). After the patch was applied, HYSTART-ACK-TRAIN stopped the exponential growth of cwnd when cwnd = 632, and the bottleneck link was saturated (632 > 619). In this test, applying the patch resulted in 300 KB more data delivered. Fixes: 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows") Signed-off-by: Mahdi Arghavani <ma.arghavani@yahoo.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haibo Zhang <haibo.zhang@otago.ac.nz> Cc: David Eyers <david.eyers@otago.ac.nz> Cc: Abbas Arghavani <abbas.arghavani@mdu.se> Reviewed-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>