linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-09-27	nfp: Move delink_register to be last command	Leon Romanovsky	2	-10/+4
	Open user space access to the devlink after driver is probed. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: mscc: ocelot: delay devlink registration to the end	Leon Romanovsky	1	-3/+2
	Open access to the devlink interface when the driver fully initialized. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	mlxsw: core: Register devlink instance last	Leon Romanovsky	1	-13/+6
	Make sure that devlink is open to receive user input when all parameters are initialized. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net/mlx5: Accept devlink user input after driver initialization complete	Leon Romanovsky	3	-7/+6
	The change of devlink_alloc() to accept device makes sure that device is fully initialized and device_register() does nothing except allowing users to use that devlink instance. Such change ensures that no user input will be usable till that point and it eliminates the need to worry about internal locking as long as devlink_register is called last since all accesses to the devlink are during initialization. This change fixes the following lockdep warning. ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc2+ #27 Not tainted ------------------------------------------------------ devlink/265 is trying to acquire lock: ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core] but task is already holding lock: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (devlink_mutex){+.+.}-{3:3}: __mutex_lock+0x149/0x1310 devlink_register+0xe7/0x280 mlx5_devlink_register+0x118/0x480 [mlx5_core] mlx5_init_one+0x34b/0x440 [mlx5_core] probe_one+0x480/0x6e0 [mlx5_core] pci_device_probe+0x2a0/0x4a0 really_probe+0x1cb/0xba0 __driver_probe_device+0x18f/0x470 driver_probe_device+0x49/0x120 __driver_attach+0x1ce/0x400 bus_for_each_dev+0x11e/0x1a0 bus_add_driver+0x309/0x570 driver_register+0x20f/0x390 0xffffffffa04a0062 do_one_initcall+0xd5/0x400 do_init_module+0x1c8/0x760 load_module+0x7d9d/0xa4b0 __do_sys_finit_module+0x118/0x1a0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2999/0x5a40 lock_acquire+0x1a9/0x4a0 __mutex_lock+0x149/0x1310 mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] devlink_reload+0x1f2/0x640 devlink_nl_cmd_reload+0x6c3/0x10d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x27f/0x4a0 netlink_rcv_skb+0x11e/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6fb/0xbe0 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(devlink_mutex); lock(&dev->intf_state_mutex); lock(devlink_mutex); lock(&dev->intf_state_mutex); * DEADLOCK * 3 locks held by devlink/265: #0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 #1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0 #2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 stack backtrace: CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0x268/0x310 ? print_circular_bug+0x460/0x460 ? __kernel_text_address+0xe/0x30 ? alloc_chain_hlocks+0x1e6/0x5a0 __lock_acquire+0x2999/0x5a40 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? add_lock_to_list.constprop.0+0x6c/0x530 lock_acquire+0x1a9/0x4a0 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_release+0x6c0/0x6c0 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? lock_is_held_type+0x98/0x110 __mutex_lock+0x149/0x1310 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_is_held_type+0x98/0x110 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? find_held_lock+0x2d/0x110 ? mutex_lock_io_nested+0x1160/0x1160 ? mlx5_lag_is_active+0x72/0x90 [mlx5_core] ? lock_downgrade+0x6d0/0x6d0 ? do_raw_spin_lock+0x12e/0x270 ? rwlock_bug.part.0+0x90/0x90 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] ? netlink_broadcast_filtered+0x308/0xac0 ? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core] ? __build_skb_around+0x110/0x2b0 ? __alloc_skb+0x113/0x2b0 devlink_reload+0x1f2/0x640 ? devlink_unregister+0x1e0/0x1e0 ? security_capable+0x51/0x90 devlink_nl_cmd_reload+0x6c3/0x10d0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? devlink_nl_pre_doit+0x72/0x8d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 ? __lock_acquire+0x15e2/0x5a40 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? mutex_lock_io_nested+0x1160/0x1160 ? security_capable+0x51/0x90 genl_rcv_msg+0x27f/0x4a0 ? genl_get_cmd+0x3c0/0x3c0 ? lock_acquire+0x1a9/0x4a0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? lock_release+0x6c0/0x6c0 netlink_rcv_skb+0x11e/0x340 ? genl_get_cmd+0x3c0/0x3c0 ? netlink_ack+0x930/0x930 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 ? netlink_attachskb+0x750/0x750 ? __alloc_skb+0x113/0x2b0 netlink_sendmsg+0x6fb/0xbe0 ? netlink_unicast+0x700/0x700 ? netlink_unicast+0x700/0x700 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 ? __x64_sys_getpeername+0xb0/0xb0 ? do_sys_openat2+0x10a/0x370 ? down_write_nested+0x150/0x150 ? do_user_addr_fault+0x215/0xd50 ? __x64_sys_openat+0x11f/0x1d0 ? __x64_sys_open+0x1a0/0x1a0 __x64_sys_sendto+0xdc/0x1b0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f50b50b6b3a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003 RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440 mlx5_core 0000:00:09.0: firmware version: 4.8.9999 mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link) mlx5_core 0000:00:09.0 eth1: Link up Fixes: a6f3b62386a0 ("net/mlx5: Move devlink registration before interfaces load") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net/mlx4: Move devlink_register to be the last initialization command	Leon Romanovsky	1	-5/+3
	Refactor the code to make sure that devlink_register() is the last command during initialization stage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net/prestera: Split devlink and traps registrations to separate routines	Leon Romanovsky	3	-27/+14
	Separate devlink registrations and traps registrations so devlink will be registered when driver is fully initialized. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	octeontx2: Move devlink registration to be last devlink command	Leon Romanovsky	2	-20/+5
	This change prevents from users to access device before devlink is fully configured. This change allows us to delete call to devlink_params_publish() and impossible check during unregister flow. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	ice: Open devlink when device is ready	Leon Romanovsky	1	-4/+2
	Move devlink_registration routine to be the last command, when the device is fully initialized. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: hinic: Open device for the user access when it is ready	Leon Romanovsky	1	-5/+2
	Move devlink registration to be the last command in device activation, so it opens the driver to accept such devlink commands from the user when it is fully initialized. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	dpaa2-eth: Register devlink instance at the end of probe	Leon Romanovsky	3	-7/+21
	Move devlink_register to be the last command in the initialization sequence. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	liquidio: Overcome missing device lock protection in init/remove flows	Leon Romanovsky	1	-7/+12
	The liquidio driver is broken by design. It initialize PCI devices in separate delayed works. It causes to the situation where device lock is dropped during initialize and remove sequences. That lock is part of driver/core and needed to protect from races during init, destroy and bus invocations. In addition to lack of locking protection, it has incorrect order of destroy flows and very questionable synchronization scheme based on atomic_t. This change doesn't fix that driver but makes sure that rest of the netdev subsystem doesn't suffer from such basic protection by adding device_lock over devlink_*() APIs and by moving devlink_register() to be last command in setup_nic_devices(). Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	bnxt_en: Register devlink instance at the end devlink configuration	Leon Romanovsky	1	-9/+6
	Move devlink_register() to be last command in devlink configuration sequence, so no user space access will be possible till devlink instance is fully operable. As part of this change, the devlink_params_publish call is removed as not needed. This change fixes forgotten devlink_params_unpublish() too. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	devlink: Notify users when objects are accessible	Leon Romanovsky	1	-14/+93
	The devlink core code notified users about add/remove objects without relation if this object can be accessible or not. In this patch we unify such user visible notifications in one place. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	cxgb: avoid open-coded offsetof()	Arnd Bergmann	1	-1/+1
	clang-14 does not like the custom offsetof() macro in vsc7326: drivers/net/ethernet/chelsio/cxgb/vsc7326.c:597:3: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction] HW_STAT(RxUnicast, RxUnicastFramesOK), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb/vsc7326.c:594:56: note: expanded from macro 'HW_STAT' { reg, (&((struct cmac_statistics )NULL)->stat_name) - (u64 )NULL } Rewrite this to use the version provided by the kernel. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: stmmac: fix gcc-10 -Wrestrict warning	Arnd Bergmann	1	-0/+4
	gcc-10 and later warn about a theoretical array overrun when accessing priv->int_name_rx_irq[i] with an out of bounds value of 'i': drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_request_irq_multi_msi': drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3528:17: error: 'snprintf' argument 4 may overlap destination object 'dev' [-Werror=restrict] 3528 \| snprintf(int_name, int_name_len, "%s:%s-%d", dev->name, "tx", i); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3404:60: note: destination object referenced by 'restrict'-qualified argument 1 was declared here 3404 \| static int stmmac_request_irq_multi_msi(struct net_device *dev) \| ~~~~~~~~~~~~~~~~~~~^~~ The warning is a bit strange since it's not actually about the array bounds but rather about possible string operations with overlapping arguments, but it's not technically wrong. Avoid the warning by adding an extra bounds check. Fixes: 8532f613bc78 ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX") Link: https://lore.kernel.org/all/20210421134743.3260921-1-arnd@kernel.org/ Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: ethernet: emac: utilize of_net's of_get_mac_address()	Christian Lamparter	1	-7/+5
	of_get_mac_address() reads the same "local-mac-address" property. ... But goes above and beyond by checking the MAC value properly. printk+message seems outdated too, so let's put dev_err in the queue. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: sparx5: fix resource_size.cocci warnings	Yang Li	1	-2/+1
	Use resource_size function on resource object instead of explicit computation. Clean up coccicheck warning: ./drivers/net/ethernet/microchip/sparx5/sparx5_main.c:237:19-22: ERROR: Missing resource_size with iores [ idx ] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	ibmveth: Use dma_alloc_coherent() instead of kmalloc/dma_map_single()	Cai Huoqing	1	-16/+9
	Replacing kmalloc/kfree/dma_map_single/dma_unmap_single() with dma_alloc_coherent/dma_free_coherent() helps to reduce code size, and simplify the code, and coherent DMA will not clear the cache every time. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: cisco: Fix a function name in comments	Cai Huoqing	2	-3/+3
	Use dma_alloc_coherent() instead of pci_alloc_consistent(), because only dma_alloc_coherent() is called here. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net/ipv4/tcp_nv.c: remove superfluous header files from tcp_nv.c	Mianhan Liu	1	-1/+0
	tcp_nv.c hasn't use any macro or function declared in mm.h. Thus, these files can be removed from tcp_nv.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: smsc: Fix function names in print messages and comments	Cai Huoqing	1	-3/+3
	Use dma_xxx_xxx() instead of pci_xxx_xxx(), because the pci function wrappers are not called here. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: sis: Fix a function name in comments	Cai Huoqing	1	-1/+1
	Use dma_alloc_coherent() instead of pci_alloc_consistent(), because only dma_alloc_coherent() is called here. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: broadcom: Fix a function name in comments	Cai Huoqing	1	-1/+1
	Use dma_alloc_coherent() instead of pci_alloc_consistent(), because only dma_alloc_coherent() is called here. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: atl1c: Fix a function name in print messages	Cai Huoqing	1	-1/+1
	Use dma_map_single() instead of pci_map_single(), because the pci function wrappers are not called here. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: fddi: skfp: Fix a function name in comments	Cai Huoqing	1	-1/+1
	Use dma_map_single() instead of pci_map_single(), because only dma_map_single() is called here. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	FDDI: defxx: Fix function names in coments	Cai Huoqing	1	-3/+3
	Use dma_xxx_xxx() instead of pci_xxx_xxx(), because the pci function wrappers are not called here. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net: make napi_disable() symmetric with enable	Jakub Kicinski	1	-6/+12
	Commit 3765996e4f0b ("napi: fix race inside napi_enable") fixed an ordering bug in napi_enable() and made the napi_enable() diverge from napi_disable(). The state transitions done on disable are not symmetric to enable. There is no known bug in napi_disable() this is just refactoring. Eric suggests we can also replace msleep(1) with a more opportunistic usleep_range(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	ptp: clockmatrix: use rsmu driver to access i2c/spi bus	Min Li	4	-1242/+461
	rsmu (Renesas Synchronization Management Unit ) driver is located in drivers/mfd and responsible for creating multiple devices including clockmatrix phc, which will then use the exposed regmap and mutex handle to access i2c/spi bus. Signed-off-by: Min Li <min.li.xe@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	selftests: net: fib_nexthops: Wait before checking reported idle time	Petr Machata	1	-0/+1
	The purpose of this test is to verify that after a short activity passes, the reported time is reasonable: not zero (which could be reported by mistake), and not something outrageous (which would be indicative of an issue in used units). However, the idle time is reported in units of clock_t, or hundredths of second. If the initial sequence of commands is very quick, it is possible that the idle time is reported as just flat-out zero. When this test was recently enabled in our nightly regression, we started seeing spurious failures for exactly this reason. Therefore buffer the delay leading up to the test with a sleep, to make sure there is no legitimate way of reporting 0. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-26	octeontx2-af: Optimize KPU1 processing for variable-length headers	Kiran Kumar K	7	-323/+195
	Optimized KPU1 entry processing for variable-length custom L2 headers of size 24B, 90B by - Moving LA LTYPE parsing for 24B and 90B headers to PKIND. - Removing LA flags assignment for 24B and 90B headers. - Reserving a PKIND 55 to parse variable length headers. Also, new mailbox(NPC_SET_PKIND) added to configure PKIND with corresponding variable-length offset, mask, and shift count (NPC_AF_KPUX_ENTRYX_ACTION0). Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-26	octeontx2-af: Limit KPU parsing for GTPU packets	Kiran Kumar K	1	-19/+2
	With current KPU profile, while parsing GTPU packets, GTPU payload is also being parsed and GTPU PDU payload is being treated as IPV4 data, which is not correct. In case of GTPU packets, parsing should be stopped after identifying the GTPU. Adding changes to limit KPU profile parsing for GTPU payload. Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: re-arm retransmit timer if data is pending	Florian Westphal	1	-3/+18
	The retransmit head will be NULL in case there is no in-flight data (meaning all data injected into network has been acked). In that case the retransmit timer is stopped. This is only correct if there is no more pending, not-yet-sent data. If there is, the retransmit timer needs to set the PENDING bit again so that mptcp tries to send the remaining (new) data once a subflow can accept more data. Also, mptcp_subflow_get_retrans() has to be called unconditionally. This function checks for subflows that have become unresponsive and marks them as stale, so in the case where the rtx queue is empty, subflows will never be marked stale which prevents available backup subflows from becoming eligible for transmit. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/226 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: remove tx_pending_data	Florian Westphal	2	-5/+0
	The update on recovery is not correct. msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq; will update tx_pending_data multiple times when a subflow is declared stale while earlier recovery is still in progress. This means that tx_pending_data will still be positive even after all data as has been transmitted. Rather than fix it, remove this field: there are no consumers. The outstanding data byte count can be computed either via "msk->write_seq - rtx_head->data_seq" or "msk->write_seq - msk->snd_una". The latter is more recent/accurate estimate as rtx_head adjustment is deferred until mptcp lock can be acquired. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: use lockdep_assert_held_once() instead of open-coding it	Paolo Abeni	1	-6/+3
	We have a few more places where the mptcp code duplicates lockdep_assert_held_once(). Let's use the existing macro and avoid a bunch of compiler's conditional. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: use OPTIONS_MPTCP_MPC	Geliang Tang	1	-5/+2
	Since OPTIONS_MPTCP_MPC has been defined, use it instead of open-coding. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: do not shrink snd_nxt when recovering	Florian Westphal	2	-17/+34
	When recovering after a link failure, snd_nxt should not be set to a lower value. Else, update of snd_nxt is broken because: msk->snd_nxt += ret; (where ret is number of bytes sent) assumes that snd_nxt always moves forward. After reduction, its possible that snd_nxt update gets out of sync: dfrag we just sent might have had a data sequence number even past recovery_snd_nxt. This change factors the common msk state update to a helper and updates snd_nxt based on the current dfrag data sequence number. The conditional is required for the recovery phase where we may re-transmit old dfrags that are before current snd_nxt. After this change, snd_nxt only moves forward and covers all in-sequence data that was transmitted. recovery_snd_nxt is retained to detect when recovery has completed. Fixes: 1e1d9d6f119c5 ("mptcp: handle pending data on closed subflow") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	net/mlx5e: Enable TC offload for ingress MACVLAN	Dima Chumak	1	-2/+16
	Support offloading of TC rules that filter ingress traffic from a MACVLAN device, which is attached to uplink representor. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Enable TC offload for egress MACVLAN	Dima Chumak	1	-0/+4
	Support offloading of TC rules that mirror/redirect egress traffic to a MACVLAN device, which is attached to mlx5 representor net device. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: loopback test is not supported in switchdev mode	Roi Dayan	3	-43/+56
	In switchdev mode we insert steering rules to eswitch that make sure packets can't be looped back. Modify the self tests infra and have flags per test. Add a flag for tests that needs to be skipped in switchdev mode. Before this commit: $ ethtool --test enp8s0f0 The test result is FAIL The test extra info: Link Test 0 Speed Test 0 Health Test 0 Loopback Test 1 After this commit: $ ethtool --test enp8s0f0 The test result is PASS The test extra info: Link Test 0 Speed Test 0 Health Test 0 Example output in dmesg: enp8s0f0: Self test begin.. enp8s0f0: [0] Link Test start.. enp8s0f0: [0] Link Test end: result(0) enp8s0f0: [1] Speed Test start.. enp8s0f0: [1] Speed Test end: result(0) enp8s0f0: [2] Health Test start.. enp8s0f0: [2] Health Test end: result(0) enp8s0f0: Self test out: status flags(0x1) Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Use NL_SET_ERR_MSG_MOD() for errors parsing tunnel attributes	Roi Dayan	1	-4/+4
	This to be consistent and adds the module name to the error message. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Use tc sample stubs instead of ifdefs in source file	Roi Dayan	3	-14/+27
	Instead of having sparse ifdefs in source files use a single ifdef in the tc sample header file and use stubs. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Remove redundant priv arg from parse_pedit_to_reformat()	Roi Dayan	1	-3/+2
	The priv argument is not being used. remove it. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Check action fwd/drop flag exists also for nic flows	Roi Dayan	1	-7/+6
	The driver should add offloaded rules with either a fwd or drop action. The check existed in parsing fdb flows but not when parsing nic flows. Move the test into actions_match_supported() which is called for checking nic flows and fdb flows. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Set action fwd flag when parsing tc action goto	Roi Dayan	1	-25/+18
	Do it when parsing like in other actions instead of when checking if goto is supported in current scenario. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Remove incorrect addition of action fwd flag	Roi Dayan	1	-3/+0
	A user is expected to explicit request a fwd or drop action. It is not correct to implicit add a fwd action for the user, when modify header action flag exists. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Use correct return type	Roi Dayan	1	-14/+13
	modify_header_match_supported() should return type bool but it returns the value returned by is_action_keys_supported() which is type int. is_action_keys_supported() always returns either -EOPNOTSUPP or 0 and it shouldn't change as the purpose of the function is checking for support. so just make the function return a bool type. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5e: Add error flow for ethtool -X command	Aya Levin	1	-5/+22
	Prior to this patch, ethtool -X fail but the user receives a success status. Try to roll-back when failing and return success status accordingly. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	net/mlx5: DR, Fix code indentation in dr_ste_v1	Yevgeny Kliteynik	1	-1/+1
	Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24	rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()	Jiasheng Jiang	1	-1/+1
	Directly using _usecs_to_jiffies() might be unsafe, so it's better to use usecs_to_jiffies() instead. Because we can see that the result of _usecs_to_jiffies() could be larger than MAX_JIFFY_OFFSET values without the check of the input. Fixes: c410bf01933e ("Fix the excessive initial retransmission timeout") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	tcp: tracking packets with CE marks in BW rate sample	Yuchung Cheng	4	-11/+17
	In order to track CE marks per rate sample (one round trip), TCP needs a per-skb header field to record the tp->delivered_ce count when the skb was sent. To make space, we replace the "last_in_flight" field which is used exclusively for NV congestion control. The stat needed by NV can be alternatively approximated by existing stats tcp_sock delivered and mss_cache. This patch counts the number of packets delivered which have CE marks in the rate sample, using similar approach of delivery accounting. Cc: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>