wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2023-04-26	net: phy: hide the PHYLIB_LEDS knob	Paolo Abeni	1	-3/+1
	commit 4bb7aac70b5d ("net: phy: fix circular LEDS_CLASS dependencies") solved a build failure, but introduces a new config knob with a default 'y' value: PHYLIB_LEDS. The latter is against the current new config policy. The exception was raised to allow the user to catch bad configurations without led support. Anyway the current definition of PHYLIB_LEDS does not fit the above goal: if LEDS_CLASS is disabled, the new config will be available only with PHYLIB disabled, too. Hide the mentioned config, to preserve the randconfig testing done so far, while respecting the mentioned policy. Suggested-by: Andrew Lunn <andrew@lunn.ch> Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/d82489be8ed911c383c3447e9abf469995ccf39a.1682496488.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-25	spi: bcm63xx: use macro DEFINE_SIMPLE_DEV_PM_OPS	Dhruva Gole	1	-3/+1
	Using this macro makes the code more readable. It also inits the members of dev_pm_ops in the following manner without us explicitly needing to: .suspend = bcm63xx_spi_suspend, \ .resume = bcm63xx_spi_resume, \ .freeze = bcm63xx_spi_suspend, \ .thaw = bcm63xx_spi_resume, \ .poweroff = bcm63xx_spi_suspend, \ .restore = bcm63xx_spi_resume Signed-off-by: Dhruva Gole <d-gole@ti.com> Link: https://lore.kernel.org/r/20230424102546.1604484-1-d-gole@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-04-25	gfs2: gfs2_ail_empty_gl no log flush on error	Bob Peterson	1	-2/+3
	Before this patch, function gfs2_ail_empty_gl called gfs2_log_flush even in cases where it encountered an error. It should probably skip the log flush and leave the file system in an inconsistent state, letting a subsequent withdraw force the journal to be replayed to reestablish metadata consistency. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-04-25	gfs2: Issue message when revokes cannot be written	Bob Peterson	1	-1/+3
	Before this patch, function gfs2_ail_empty_gl would silently return an error to the caller. This would get silently set into sd_log_error which would cause a withdraw, but there was no indication why the file system was withdrawn. This patch adds a fs_err to log the appropriate error message. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-04-25	gfs2: Perform second log flush in gfs2_make_fs_ro	Bob Peterson	1	-0/+9
	Before this patch, function gfs2_make_fs_ro called gfs2_log_flush once to finalize the log. However, if there's dirty metadata, log flushes tend to sync the metadata and formulate revokes. Before this patch, those revokes may not be written out to the journal immediately, which meant unresolved glocks could still have revokes in their ail lists. When the glock worker runs, it tries to transition the glock, but the unresolved revokes in the ail still need to be written, so it tries to start a transaction. It's impossible to start a transaction because at that point, the SDF_JOURNAL_LIVE flag has been cleared by gfs2_make_fs_ro. That causes the glock worker to fail, unable to write the revokes. The calling sequence looked something like this: gfs2_make_fs_ro gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN) clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags); ...meanwhile... glock_work_func do_xmote rgrp_go_sync (or possibly inode_go_sync) ... gfs2_ail_empty_gl __gfs2_trans_begin if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) { ... return -EROFS; The previous patch in the series ("gfs2: return errors from gfs2_ail_empty_gl") now causes the transaction error to no longer be ignored, so it causes a warning from MOST of the xfstests: WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2] which corresponds to: WARN_ON(gfs2_withdrawing(sdp)); The withdraw was triggered silently from do_xmote by: if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp))) gfs2_withdraw_delayed(sdp); This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the data and one to sync any outstanding revokes and finalize the journal. Note that both of these log flushes need to be "special," in other words, not GFS2_LOG_HEAD_FLUSH_NORMAL. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-04-25	gfs2: return errors from gfs2_ail_empty_gl	Bob Peterson	1	-3/+5
	Before this patch, function gfs2_ail_empty_gl did not return errors it encountered from __gfs2_trans_begin. Those errors usually came from the fact that the file system was made read-only, often due to unmount (but theoretically could be due to -o remount,ro), which prevented the transaction from starting. The inability to start a transaction prevented its revokes from being properly written to the journal for glocks during unmount (and transition to ro). That meant glocks could be unlocked without the metadata properly revoked in the journal. So other nodes could grab the glock thinking that their lvb values were correct, but in fact corresponded to the glock without its revokes properly synced. That presented as lvb mismatch errors. This patch allows gfs2_ail_empty_gl to return the error properly to the caller. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-04-25	HID: amd_sfh: Fix max supported HID devices	Basavaraj Natikar	1	-1/+1
	commit 4bd763568dbd ("HID: amd_sfh: Support for additional light sensor") adds additional sensor devices, but forgets to add the number of HID devices to match. Thus, the number of HID devices does not match the actual number of sensors. In order to prevent corruption and system hangs when more than the allowed number of HID devices are accessed, the number of HID devices is increased accordingly. Fixes: 4bd763568dbd ("HID: amd_sfh: Support for additional light sensor") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217354 Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Link: https://lore.kernel.org/r/20230424160406.2579888-1-Basavaraj.Natikar@amd.com Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2023-04-25	net: phy: marvell-88x2222: remove unnecessary (void*) conversions	wuych	1	-2/+2
	Pointer variables of void * type do not require type cast. Signed-off-by: wuych <yunchuan@nfschina.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-25	tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.	Kuniyuki Iwashima	1	-0/+3
	syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY skbs. We can reproduce the problem with these sequences: sk = socket(AF_INET, SOCK_DGRAM, 0) sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE) sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1) sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53)) sk.close() sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets skb->cb->ubuf.refcnt to 1, and calls sock_hold(). Here, struct ubuf_info_msgzc indirectly holds a refcnt of the socket. When the skb is sent, __skb_tstamp_tx() clones it and puts the clone into the socket's error queue with the TX timestamp. When the original skb is received locally, skb_copy_ubufs() calls skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt. This additional count is decremented while freeing the skb, but struct ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is not called. The last refcnt is not released unless we retrieve the TX timestamped skb by recvmsg(). Since we clear the error queue in inet_sock_destruct() after the socket's refcnt reaches 0, there is a circular dependency. If we close() the socket holding such skbs, we never call sock_put() and leak the count, sk, and skb. TCP has the same problem, and commit e0c8bccd40fc ("net: stream: purge sk_error_queue in sk_stream_kill_queues()") tried to fix it by calling skb_queue_purge() during close(). However, there is a small chance that skb queued in a qdisc or device could be put into the error queue after the skb_queue_purge() call. In __skb_tstamp_tx(), the cloned skb should not have a reference to the ubuf to remove the circular dependency, but skb_clone() does not call skb_copy_ubufs() for zerocopy skb. So, we need to call skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs(). [0]: BUG: memory leak unreferenced object 0xffff88800c6d2d00 (size 1152): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00 ................ 02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024 [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083 [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline] [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245 [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515 [<00000000b9b11231>] sock_create net/socket.c:1566 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline] [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636 [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline] [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline] [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd BUG: memory leak unreferenced object 0xffff888017633a00 (size 240): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff .........-m..... backtrace: [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497 [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline] [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596 [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline] [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370 [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037 [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652 [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253 [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819 [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline] [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734 [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117 [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline] [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline] [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Fixes: b5947e5d1e71 ("udp: msg_zerocopy") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-25	net: amd: Fix link leak when verifying config failed	Gencen Gan	1	-1/+1
	After failing to verify configuration, it returns directly without releasing link, which may cause memory leak. Paolo Abeni thinks that the whole code of this driver is quite "suboptimal" and looks unmainatained since at least ~15y, so he suggests that we could simply remove the whole driver, please take it into consideration. Simon Horman suggests that the fix label should be set to "Linux-2.6.12-rc2" considering that the problem has existed since the driver was introduced and the commit above doesn't seem to exist in net/net-next. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gan Gecen <gangecen@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-25	media: ov5670: Fix probe on ACPI	Sakari Ailus	1	-1/+1
	devm_clk_get() will return either an error or NULL, which the driver handles, continuing to use the clock of reading the value of the clock-frequency property. However, the value of ov5670->xvclk is left as-is and the other clock framework functions aren't capable of handling error values. Use devm_clk_get_optional() to obtain NULL instead of -ENOENT. Fixes: 8004c91e2095 ("media: i2c: ov5670: Use common clock framework") Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-04-24	net: phy: marvell: Fix inconsistent indenting in led_blink_set	Christian Marangi	1	-4/+4
	Fix inconsistent indeinting in m88e1318_led_blink_set reported by kernel test robot, probably done by the presence of an if condition dropped in later revision of the same code. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304240007.0VEX8QYG-lkp@intel.com/ Fixes: ea9e86485dec ("net: phy: marvell: Implement led_blink_set()") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230423172800.3470-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	lan966x: Don't use xdp_frame when action is XDP_TX	Horatiu Vultur	3	-23/+28
	When the action of an xdp program was XDP_TX, lan966x was creating a xdp_frame and use this one to send the frame back. But it is also possible to send back the frame without needing a xdp_frame, because it is possible to send it back using the page. And then once the frame is transmitted is possible to use directly page_pool_recycle_direct as lan966x is using page pools. This would save some CPU usage on this path, which results in higher number of transmitted frames. Bellow are the statistics: Frame size: Improvement: 64 ~8% 256 ~11% 512 ~8% 1000 ~0% 1500 ~0% Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230422142344.3630602-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	tsnep: Add XDP socket zero-copy TX support	Gerhard Engleder	2	-11/+121
	Send and complete XSK pool frames within TX NAPI context. NAPI context is triggered by ndo_xsk_wakeup. Test results with A53 1.2GHz: xdpsock txonly copy mode, 64 byte frames: pps pkts 1.00 tx 284,409 11,398,144 Two CPUs with 100% and 10% utilization. xdpsock txonly zero-copy mode, 64 byte frames: pps pkts 1.00 tx 511,929 5,890,368 Two CPUs with 100% and 1% utilization. xdpsock l2fwd copy mode, 64 byte frames: pps pkts 1.00 rx 248,985 7,315,885 tx 248,921 7,315,885 Two CPUs with 100% and 10% utilization. xdpsock l2fwd zero-copy mode, 64 byte frames: pps pkts 1.00 rx 254,735 3,039,456 tx 254,735 3,039,456 Two CPUs with 100% and 4% utilization. Packet rate increases and CPU utilization is reduced in both cases. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	tsnep: Add XDP socket zero-copy RX support	Gerhard Engleder	3	-14/+556
	Add support for XSK zero-copy to RX path. The setup of the XSK pool can be done at runtime. If the netdev is running, then the queue must be disabled and enabled during reconfiguration. This can be done easily with functions introduced in previous commits. A more important property is that, if the netdev is running, then the setup of the XSK pool shall not stop the netdev in case of errors. A broken netdev after a failed XSK pool setup is bad behavior. Therefore, the allocation and setup of resources during XSK pool setup is done only before any queue is disabled. Additionally, freeing and later allocation of resources is eliminated in some cases. Page pool entries are kept for later use. Two memory models are registered in parallel. As a result, the XSK pool setup cannot fail during queue reconfiguration. In contrast to other drivers, XSK pool setup and XDP BPF program setup are separate actions. XSK pool setup can be done without any XDP BPF program. The XDP BPF program can be added, removed or changed without any reconfiguration of the XSK pool. Test results with A53 1.2GHz: xdpsock rxdrop copy mode, 64 byte frames: pps pkts 1.00 rx 856,054 10,625,775 Two CPUs with both 100% utilization. xdpsock rxdrop zero-copy mode, 64 byte frames: pps pkts 1.00 rx 889,388 4,615,284 Two CPUs with 100% and 20% utilization. Packet rate increases and CPU utilization is reduced. 100% CPU load seems to the base load. This load is consumed by ksoftirqd just for dropping the generated packets without xdpsock running. Using batch API reduced CPU utilization slightly, but measurements are not stable enough to provide meaningful numbers. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	tsnep: Move skb receive action to separate function	Gerhard Engleder	1	-16/+23
	The function tsnep_rx_poll() is already pretty long and the skb receive action can be reused for XSK zero-copy support. Move page based skb receive to separate function. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	tsnep: Add functions for queue enable/disable	Gerhard Engleder	1	-33/+64
	Move queue enable and disable code to separate functions. This way the activation and deactivation of the queues are defined actions, which can be used in future execution paths. This functions will be used for the queue reconfiguration at runtime, which is necessary for XSK zero-copy support. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	tsnep: Rework TX/RX queue initialization	Gerhard Engleder	1	-43/+51
	Make initialization of TX and RX queues less dynamic by moving some initialization from netdev open/close to device probing. Additionally, move some initialization code to separate functions to enable future use in other execution paths. This is done as preparation for queue reconfigure at runtime, which is necessary for XSK zero-copy support. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	tsnep: Replace modulo operation with mask	Gerhard Engleder	2	-14/+15
	TX/RX ring size is static and power of 2 to enable compiler to optimize modulo operation to mask operation. Make this optimization already in the code and don't rely on the compiler. CPU utilisation during high packet rate has not changed. So no performance improvement has been measured. But it is best practice to prevent modulo operations. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: phy: dp83867: Add led_brightness_set support	Alexander Stein	1	-0/+31
	Up to 4 LEDs can be attached to the PHY, add support for setting brightness manually. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230424134625.303957-1-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: phy: Fix reading LED reg property	Alexander Stein	1	-1/+5
	'reg' is always encoded in 32 bits, thus it has to be read using the function with the corresponding bit width. Fixes: 01e5b728e9e4 ("net: phy: Add a binding for PHY LEDs") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230424141648.317944-1-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	drivers: nfc: nfcsim: remove return value check of `dev_dir`	Jianuo Kuang	1	-5/+0
	Smatch complains that: nfcsim_debugfs_init_dev() warn: 'dev_dir' is an error pointer or valid According to the documentation of the debugfs_create_dir() function, there is no need to check the return value of this function. Just delete the dead code. Signed-off-by: Jianuo Kuang <u202110722@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230424024140.34607-1-u202110722@hust.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: phy: dp83867: Remove unnecessary (void*) conversions	wuych	1	-2/+1
	Pointer variables of void * type do not require type cast. Signed-off-by: wuych <yunchuan@nfschina.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230424101550.664319-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: ethtool: coalesce: try to make user settings stick twice	Jakub Kicinski	2	-11/+47
	SET_COALESCE may change operation mode and parameters in one call. Changing operation mode may cause the driver to reset the parameter values to what is a reasonable default for new operation mode. Since driver does not know which parameters come from user and which are echoed back from ->get, driver may ignore the parameters when switching operation modes. This used to be inevitable for ioctl() but in netlink we know which parameters are actually specified by the user. We could inform which parameters were set by the user but this would lead to a lot of code duplication in the drivers. Instead try to call the drivers twice if both mode and params are changed. The set method already checks if any params need updating so in case the driver did the right thing the first time around - there will be no second call to it's ->set method (only an extra call to ->get()). For mlx5 for example before this patch we'd see: # ethtool -C eth0 adaptive-rx on adaptive-tx on # ethtool -C eth0 adaptive-rx off adaptive-tx off \ tx-usecs 123 rx-usecs 123 Adaptive RX: off TX: off rx-usecs: 3 rx-frames: 32 tx-usecs: 16 tx-frames: 32 [...] After the change: # ethtool -C eth0 adaptive-rx on adaptive-tx on # ethtool -C eth0 adaptive-rx off adaptive-tx off \ tx-usecs 123 rx-usecs 123 Adaptive RX: off TX: off rx-usecs: 123 rx-frames: 32 tx-usecs: 123 tx-frames: 32 [...] This only works for netlink, so it's a small discrepancy between netlink and ioctl(). Since we anticipate most users to move to netlink I believe it's worth making their lives easier. Link: https://lore.kernel.org/r/20230420233302.944382-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: mana: Check if netdev/napi_alloc_frag returns single page	Haiyang Zhang	1	-0/+15
	netdev/napi_alloc_frag() may fall back to single page which is smaller than the requested size. Add error checking to avoid memory overwritten. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: mana: Rename mana_refill_rxoob and remove some empty lines	Haiyang Zhang	1	-6/+3
	Rename mana_refill_rxoob for naming consistency. And remove some empty lines between function call and error checking. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: veth: add page_pool stats	Lorenzo Bianconi	2	-3/+18
	Introduce page_pool stats support to report info about local page_pool through ethtool Tested-by: Maryam Tahhan <mtahhan@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	net: veth: add page_pool for page recycling	Lorenzo Bianconi	2	-4/+45
	Introduce page_pool support in veth driver in order to recycle pages in veth_convert_skb_to_xdp_buff routine and avoid reallocating the skb through the page allocator. The patch has been tested sending tcp traffic to a veth pair where the remote peer is running a simple xdp program just returning xdp_pass: veth upstream codebase: MTU 1500B: ~ 8Gbps MTU 8000B: ~ 13.9Gbps veth upstream codebase + pp support: MTU 1500B: ~ 9.2Gbps MTU 8000B: ~ 16.2Gbps Tested-by: Maryam Tahhan <mtahhan@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	netlink: Use copy_to_user() for optval in netlink_getsockopt().	Kuniyuki Iwashima	1	-52/+23
	Brad Spencer provided a detailed report [0] that when calling getsockopt() for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such options require at least sizeof(int) as length. The options return a flag value that fits into 1 byte, but such behaviour confuses users who do not initialise the variable before calling getsockopt() and do not strictly check the returned value as char. Currently, netlink_getsockopt() uses put_user() to copy data to optlen and optval, but put_user() casts the data based on the pointer, char *optval. As a result, only 1 byte is set to optval. To avoid this behaviour, we need to use copy_to_user() or cast optval for put_user(). Note that this changes the behaviour on big-endian systems, but we document that the size of optval is int in the man page. $ man 7 netlink ... Socket options To set or get a netlink socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level argument set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to an int. Fixes: 9a4595bc7e67 ("[NETLINK]: Add set/getsockopt options to support more than 32 groups") Fixes: be0c22a46cfb ("netlink: add NETLINK_BROADCAST_ERROR socket option") Fixes: 38938bfe3489 ("netlink: add NETLINK_NO_ENOBUFS socket flag") Fixes: 0a6a3a23ea6e ("netlink: add NETLINK_CAP_ACK socket option") Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting") Fixes: 89d35528d17d ("netlink: Add new socket option to enable strict checking on dumps") Reported-by: Brad Spencer <bspencer@blackberry.com> Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-24	selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests	Andrii Nakryiko	1	-11/+15
	iter_pass_iter_ptr_to_subprog subtest is relying on actual array size being passed as subprog parameter. This combined with recent fixes to precision tracking in conditional jumps ([0]) is now causing verifier to backtrack all the way to the point where sum() and fill() subprogs are called, at which point precision backtrack bails out and forces all the states to have precise SCALAR registers. This in turn causes each possible value of i within fill() and sum() subprogs to cause a different non-equivalent state, preventing iterator code to converge. For now, change the test to assume fixed size of passed in array. Once BPF verifier supports precision tracking across subprogram calls, these changes will be reverted as unnecessary. [0] 71b547f56124 ("bpf: Fix incorrect verifier pruning due to missing register precision taints") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230424235128.1941726-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-24	bpf: Add __rcu_read_{lock,unlock} into btf id deny list	Yafang Shao	1	-0/+4
	The tracing recursion prevention mechanism must be protected by rcu, that leaves __rcu_read_{lock,unlock} unprotected by this mechanism. If we trace them, the recursion will happen. Let's add them into the btf id deny list. When CONFIG_PREEMPT_RCU is enabled, it can be reproduced with a simple bpf program as such: SEC("fentry/__rcu_read_lock") int fentry_run() { return 0; } Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230424161104.3737-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-24	bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed	Dave Marchevsky	2	-3/+4
	As reported by Kumar in [0], the shared ownership implementation for BPF programs has some race conditions which need to be addressed before it can safely be used. This patch does so in a minimal way instead of ripping out shared ownership entirely, as proper fixes for the issues raised will follow ASAP, at which point this patch's commit can be reverted to re-enable shared ownership. The patch removes the ability to call bpf_refcount_acquire_impl from BPF programs. Programs can only bump refcount and obtain a new owning reference using this kfunc, so removing the ability to call it effectively disables shared ownership. Instead of changing success / failure expectations for bpf_refcount-related selftests, this patch just disables them from running for now. [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2/ Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230424204321.2680232-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-24	f2fs: remove unnessary comment in __may_age_extent_tree	Qi Han	1	-1/+0
	This comment make no sense and is in the wrong place, so let's remove it. Signed-off-by: Qi Han <hanqi@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24	f2fs: allocate node blocks for atomic write block replacement	Daeho Jeong	1	-1/+1
	When a node block is missing for atomic write block replacement, we need to allocate it in advance of the replacement. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24	f2fs: use cow inode data when updating atomic write	Daeho Jeong	1	-5/+10
	Need to use cow inode data content instead of the one in the original inode, when we try to write the already updated atomic write files. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24	f2fs: remove power-of-two limitation of zoned device	Jaegeuk Kim	4	-11/+6
	In f2fs, there's no reason to force po2. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24	iov: improve copy_iovec_from_user() code generation	Linus Torvalds	1	-9/+26
	Use the same pattern as the compat version of this code does: instead of copying the whole array to a kernel buffer and then having a separate phase of verifying it, just do it one entry at a time, verifying as you go. On Jens' /dev/zero readv() test this improves performance by ~6%. [ This was obviously triggered by Jens' ITER_UBUF updates series ] Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/all/de35d11d-bce7-e976-7372-1f2caf417103@kernel.dk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-24	scripts: Remove ICC-related dead code	Ruihan Li	1	-4/+0
	Intel compiler support has already been completely removed in commit 95207db8166a ("Remove Intel compiler support"). However, it appears that there is still some ICC-related code in scripts/cc-version.sh. There is no harm in leaving the code as it is, but removing the dead code makes the codebase a bit cleaner. Hopefully all ICC-related stuff in the build scripts will be removed after this commit, given the grep output as below: (linux/scripts) $ grep -i -w -R 'icc' cc-version.sh:ICC) cc-version.sh: min_version=$($min_tool_version icc) dtc/include-prefixes/arm64/qcom/sm6350.dtsi:#include <dt-bindings/interconnect/qcom,icc.h> Fixes: 95207db8166a ("Remove Intel compiler support") Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-24	tpm: Add !tpm_amd_is_rng_defective() to the hwrng_unregister() call site	Jarkko Sakkinen	1	-1/+2
	The following crash was reported: [ 1950.279393] list_del corruption, ffff99560d485790->next is NULL [ 1950.279400] ------------[ cut here ]------------ [ 1950.279401] kernel BUG at lib/list_debug.c:49! [ 1950.279405] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 1950.279407] CPU: 11 PID: 5886 Comm: modprobe Tainted: G O 6.2.8_1 #1 [ 1950.279409] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO-P/B550M AORUS PRO-P, BIOS F15c 05/11/2022 [ 1950.279410] RIP: 0010:__list_del_entry_valid+0x59/0xc0 [ 1950.279415] Code: 48 8b 01 48 39 f8 75 5a 48 8b 72 08 48 39 c6 75 65 b8 01 00 00 00 c3 cc cc cc cc 48 89 fe 48 c7 c7 08 a8 13 9e e8 b7 0a bc ff <0f> 0b 48 89 fe 48 c7 c7 38 a8 13 9e e8 a6 0a bc ff 0f 0b 48 89 fe [ 1950.279416] RSP: 0018:ffffa96d05647e08 EFLAGS: 00010246 [ 1950.279418] RAX: 0000000000000033 RBX: ffff99560d485750 RCX: 0000000000000000 [ 1950.279419] RDX: 0000000000000000 RSI: ffffffff9e107c59 RDI: 00000000ffffffff [ 1950.279420] RBP: ffffffffc19c5168 R08: 0000000000000000 R09: ffffa96d05647cc8 [ 1950.279421] R10: 0000000000000003 R11: ffffffff9ea2a568 R12: 0000000000000000 [ 1950.279422] R13: ffff99560140a2e0 R14: ffff99560127d2e0 R15: 0000000000000000 [ 1950.279422] FS: 00007f67da795380(0000) GS:ffff995d1f0c0000(0000) knlGS:0000000000000000 [ 1950.279424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1950.279424] CR2: 00007f67da7e65c0 CR3: 00000001feed2000 CR4: 0000000000750ee0 [ 1950.279426] PKRU: 55555554 [ 1950.279426] Call Trace: [ 1950.279428] <TASK> [ 1950.279430] hwrng_unregister+0x28/0xe0 [rng_core] [ 1950.279436] tpm_chip_unregister+0xd5/0xf0 [tpm] Add the forgotten !tpm_amd_is_rng_defective() invariant to the hwrng_unregister() call site inside tpm_chip_unregister(). Cc: stable@vger.kernel.org Reported-by: Martin Dimov <martin@dmarto.com> Link: https://lore.kernel.org/linux-integrity/3d1d7e9dbfb8c96125bc93b6b58b90a7@dmarto.com/ Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs") Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Tested-by: Martin Dimov <martin@dmarto.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm_tis: fix stall after iowrite*()s	Haris Okanovic	1	-2/+41
	ioread8() operations to TPM MMIO addresses can stall the CPU when immediately following a sequence of iowrite()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in CPU or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite*() operation to amortize the cost of flushing data to chip across multiple instructions. Signed-off-by: Haris Okanovic <haris.okanovic@ni.com> Link: https://lore.kernel.org/r/20230323153436.B2SATnZV@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm/tpm_tis_synquacer: Convert to platform remove callback returning void	Uwe Kleine-König	1	-4/+2
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm/tpm_tis: Convert to platform remove callback returning void	Uwe Kleine-König	1	-4/+2
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm/tpm_ftpm_tee: Convert to platform remove callback returning void	Uwe Kleine-König	1	-3/+3
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. ftpm_tee_remove() returns zero unconditionally (and cannot easily converted to return void). So ignore the return value to be able to make ftpm_plat_tee_remove() return void. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm: tpm_tis_spi: Mark ACPI and OF related data as maybe unused	Krzysztof Kozlowski	1	-2/+2
	The driver can be compile tested with !CONFIG_OF or !CONFIG_ACPI making unused: drivers/char/tpm/tpm_tis_spi_main.c:234:34: error: ‘of_tis_spi_match’ defined but not used [-Werror=unused-const-variable=] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm: st33zp24: Mark ACPI and OF related data as maybe unused	Krzysztof Kozlowski	2	-4/+4
	The driver can be compile tested with !CONFIG_OF or !CONFIG_ACPI making drivers/char/tpm/st33zp24/i2c.c:141:34: error: ‘of_st33zp24_i2c_match’ defined but not used [-Werror=unused-const-variable=] drivers/char/tpm/st33zp24/spi.c:258:34: error: ‘of_st33zp24_spi_match’ defined but not used [-Werror=unused-const-variable=] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm, tpm_tis: Enable interrupt test	Lino Sanfilippo	1	-0/+5
	The test for interrupts in tpm_tis_send() is skipped if the flag TPM_CHIP_FLAG_IRQ is not set. Since the current code never sets the flag initially the test is never executed. Fix this by setting the flag in tpm_tis_gen_interrupt() right after interrupts have been enabled and before the test is executed. Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com> Tested-by: Michael Niewöhner <linux@mniewoehner.de> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm, tpm_tis: startup chip before testing for interrupts	Lino Sanfilippo	3	-14/+30
	In tpm_tis_gen_interrupt() a request for a property value is sent to the TPM to test if interrupts are generated. However after a power cycle the TPM responds with TPM_RC_INITIALIZE which indicates that the TPM is not yet properly initialized. Fix this by first starting the TPM up before the request is sent. For this the startup implementation is removed from tpm_chip_register() and put into the new function tpm_chip_startup() which is called before the interrupts are tested. Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm, tpm_tis: Claim locality when interrupts are reenabled on resume	Lino Sanfilippo	1	-10/+9
	In tpm_tis_resume() make sure that the locality has been claimed when tpm_tis_reenable_interrupts() is called. Otherwise the writings to the register might not have any effect. Fixes: 45baa1d1fa39 ("tpm_tis: Re-enable interrupts upon (S3) resume") Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm, tpm_tis: Claim locality in interrupt handler	Lino Sanfilippo	1	-0/+2
	Writing the TPM_INT_STATUS register in the interrupt handler to clear the interrupts only has effect if a locality is held. Since this is not guaranteed at the time the interrupt is fired, claim the locality explicitly in the handler. Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com> Tested-by: Michael Niewöhner <linux@mniewoehner.de> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-04-24	tpm, tpm_tis: Request threaded interrupt handler	Lino Sanfilippo	1	-2/+5
	The TIS interrupt handler at least has to read and write the interrupt status register. In case of SPI both operations result in a call to tpm_tis_spi_transfer() which uses the bus_lock_mutex of the spi device and thus must only be called from a sleepable context. To ensure this request a threaded interrupt handler. Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com> Tested-by: Michael Niewöhner <linux@mniewoehner.de> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>