aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2024-02-26Merge branch 'mptcp-more-misc-fixes-for-v6-8'Jakub Kicinski8-27/+128
Matthieu Baerts says: ==================== mptcp: more misc. fixes for v6.8 This series includes 6 types of fixes: - Patch 1 fixes v4 mapped in v6 addresses support for the userspace PM, when asking to delete a subflow. It was done everywhere else, but not there. Patch 2 validates the modification, thanks to a subtest in mptcp_join.sh. These patches can be backported up to v5.19. - Patch 3 is a small fix for a recent bug-fix patch, just to avoid printing an irrelevant warning (pr_warn()) once. It can be backported up to v5.6, alongside the bug-fix that has been introduced in the v6.8-rc5. - Patches 4 to 6 are fixes for bugs found by Paolo while working on TCP_NOTSENT_LOWAT support for MPTCP. These fixes can improve the performances in some cases. Patches can be backported up to v5.6, v5.11 and v6.7 respectively. - Patch 7 makes sure 'ss -M' is available when starting MPTCP Join selftest as it is required for some subtests since v5.18. - Patch 8 fixes a possible double-free on socket dismantle. The issue always existed, but was unnoticed because it was not causing any problem so far. This fix can be backported up to v5.6. - Patch 9 is a fix for a very recent patch causing lockdep warnings in subflow diag. The patch causing the regression -- which fixes another issue present since v5.7 -- should be part of the future v6.8-rc6. Patch 10 validates the modification, thanks to a new subtest in diag.sh. ==================== Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-0-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: explicitly trigger the listener diag code-pathPaolo Abeni1-1/+29
The mptcp diag interface already experienced a few locking bugs that lockdep and appropriate coverage have detected in advance. Let's add a test-case triggering the relevant code path, to prevent similar issues in the future. Be careful to cope with very slow environments. Note that we don't need an explicit timeout on the mptcp_connect subprocess to cope with eventual bug/hang-up as the final cleanup terminating the child processes will take care of that. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: fix possible deadlock in subflow diagPaolo Abeni1-0/+3
Syzbot and Eric reported a lockdep splat in the subflow diag: WARNING: possible circular locking dependency detected 6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted syz-executor.2/24141 is trying to acquire lock: ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 but task is already holding lock: ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&h->lhash2[i].lock){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743 inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261 __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217 inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239 rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316 rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577 ops_init+0x352/0x610 net/core/net_namespace.c:136 __register_pernet_operations net/core/net_namespace.c:1214 [inline] register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283 register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370 rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735 do_one_initcall+0x238/0x830 init/main.c:1236 do_initcall_level+0x157/0x210 init/main.c:1298 do_initcalls+0x3f/0x80 init/main.c:1314 kernel_init_freeable+0x42f/0x5d0 init/main.c:1551 kernel_init+0x1d/0x2a0 init/main.c:1441 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 -> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 lock_sock_fast include/net/sock.h:1723 [inline] subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28 tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345 inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061 __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263 inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371 netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405 sock_diag_rcv_msg+0xe7/0x410 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 As noted by Eric we can break the lock dependency chain avoid dumping any extended info for the mptcp subflow listener: nothing actually useful is presented there. Fixes: b8adb69a7d29 ("mptcp: fix lockless access in subflow ULP diag") Cc: stable@vger.kernel.org Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/netdev/CANn89iJ=Oecw6OZDwmSYc9HJKQ_G32uN11L+oUcMu+TOD5Xiaw@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-9-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: fix double-free on socket dismantleDavide Caratti1-0/+49
when MPTCP server accepts an incoming connection, it clones its listener socket. However, the pointer to 'inet_opt' for the new socket has the same value as the original one: as a consequence, on program exit it's possible to observe the following splat: BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0 Free of addr ffff888485950880 by task swapper/25/0 CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609 Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013 Call Trace: <IRQ> dump_stack_lvl+0x32/0x50 print_report+0xca/0x620 kasan_report_invalid_free+0x64/0x90 __kasan_slab_free+0x1aa/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 rcu_do_batch+0x34e/0xd90 rcu_core+0x559/0xac0 __do_softirq+0x183/0x5a4 irq_exit_rcu+0x12d/0x170 sysvec_apic_timer_interrupt+0x6b/0x80 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:cpuidle_enter_state+0x175/0x300 Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202 RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000 RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588 RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080 R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0 R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80 cpuidle_enter+0x4a/0xa0 do_idle+0x310/0x410 cpu_startup_entry+0x51/0x60 start_secondary+0x211/0x270 secondary_startup_64_no_verify+0x184/0x18b </TASK> Allocated by task 6853: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0xa6/0xb0 __kmalloc+0x1eb/0x450 cipso_v4_sock_setattr+0x96/0x360 netlbl_sock_setattr+0x132/0x1f0 selinux_netlbl_socket_post_create+0x6c/0x110 selinux_socket_post_create+0x37b/0x7f0 security_socket_post_create+0x63/0xb0 __sock_create+0x305/0x450 __sys_socket_create.part.23+0xbd/0x130 __sys_socket+0x37/0xb0 __x64_sys_socket+0x6f/0xb0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Freed by task 6858: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x12c/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 subflow_ulp_release+0x1f0/0x250 tcp_cleanup_ulp+0x6e/0x110 tcp_v4_destroy_sock+0x5a/0x3a0 inet_csk_destroy_sock+0x135/0x390 tcp_fin+0x416/0x5c0 tcp_data_queue+0x1bc8/0x4310 tcp_rcv_state_process+0x15a3/0x47b0 tcp_v4_do_rcv+0x2c1/0x990 tcp_v4_rcv+0x41fb/0x5ed0 ip_protocol_deliver_rcu+0x6d/0x9f0 ip_local_deliver_finish+0x278/0x360 ip_local_deliver+0x182/0x2c0 ip_rcv+0xb5/0x1c0 __netif_receive_skb_one_core+0x16e/0x1b0 process_backlog+0x1e3/0x650 __napi_poll+0xa6/0x500 net_rx_action+0x740/0xbb0 __do_softirq+0x183/0x5a4 The buggy address belongs to the object at ffff888485950880 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888485950880, ffff8884859508c0) The buggy address belongs to the physical page: page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950 flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006 raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix this by duplicating IP / IPv6 options after clone, so that ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice. Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Cc: stable@vger.kernel.org Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: join: add ss mptcp support checkGeliang Tang1-0/+5
Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP sockets. So it must be checked if ss tool supports MPTCP in this script. Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: fix potential wake-up event lossPaolo Abeni1-10/+11
After the blamed commit below, the send buffer auto-tuning can happen after that the mptcp_propagate_sndbuf() completes - via the delegated action infrastructure. We must check for write space even after such change or we risk missing the wake-up event. Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-6-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: fix snd_wnd initialization for passive socketPaolo Abeni1-1/+1
Such value should be inherited from the first subflow, but passive sockets always used 'rsk_rcv_wnd'. Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-5-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: push at DSS boundariesPaolo Abeni1-0/+1
when inserting not contiguous data in the subflow write queue, the protocol creates a new skb and prevent the TCP stack from merging it later with already queued skbs by setting the EOR marker. Still no push flag is explicitly set at the end of previous GSO packet, making the aggregation on the receiver side sub-optimal - and packetdrill self-tests less predictable. Explicitly mark the end of not contiguous DSS with the push flag. Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: avoid printing warning once on client sideMatthieu Baerts (NGI0)1-1/+1
After the 'Fixes' commit mentioned below, the client side might print the following warning once when a subflow is fully established at the reception of any valid additional ack: MPTCP: bogus mpc option on established client sk That's a normal situation, and no warning should be printed for that. We can then skip the check when the label is used. Fixes: e4a0fa47e816 ("mptcp: corner case locking for rx path fields initialization") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-3-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: rm subflow with v4/v4mapped addrGeliang Tang2-14/+18
Now both a v4 address and a v4-mapped address are supported when destroying a userspace pm subflow, this patch adds a second subflow to "userspace pm add & remove address" test, and two subflows could be removed two different ways, one with the v4mapped and one with v4. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 48d73f609dcc ("selftests: mptcp: update userspace pm addr tests") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: map v4 address to v6 when destroying subflowGeliang Tang1-0/+10
Address family of server side mismatches with that of client side, like in "userspace pm add & remove address" test: userspace_pm_add_addr $ns1 10.0.2.1 10 userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED That's because on the server side, the family is set to AF_INET6 and the v4 address is mapped in a v6 one. This patch fixes this issue. In mptcp_pm_nl_subflow_destroy_doit(), before checking local address family with remote address family, map an IPv4 address to an IPv6 address if the pair is a v4-mapped address. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-1-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26ipv6: anycast: complete RCU handling of struct ifacaddr6Eric Dumazet3-42/+27
struct ifacaddr6 are already freed after RCU grace period. Add __rcu qualifier to aca_next pointer, and idev->ac_list Add relevant rcu_assign_pointer() and dereference accessors. ipv6_chk_acast_dev() no longer needs to acquire idev->lock. /proc/net/anycast6 is now purely RCU protected, it no longer acquires idev->lock. Similarly in6_dump_addrs() can use RCU protection to iterate through anycast addresses. It was relying on a mixture of RCU and RTNL but next patches will get rid of RTNL there. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26dt-bindings: net: cdns,macb: add sam9x7 ethernet interfaceVarshini Rajendran1-0/+5
Add documentation for sam9x7 ethernet interface. Signed-off-by: Varshini Rajendran <varshini.rajendran@microchip.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240223172228.671553-1-varshini.rajendran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26dpll: rely on rcu for netdev_dpll_pin()Eric Dumazet5-12/+16
This fixes a possible UAF in if_nlmsg_size(), which can run without RTNL. Add rcu protection to "struct dpll_pin" Move netdev_dpll_pin() from netdevice.h to dpll.h to decrease name pollution. Note: This looks possible to no longer acquire RTNL in netdev_dpll_pin_assign() later in net-next. v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko) Fixes: 5f1842692880 ("netdev: expose DPLL pin handle for netdevice") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26net/vsockmon: Do not set zeroed statisticsBreno Leitao1-3/+0
Do not set rtnl_link_stats64 fields to zero, since they are zeroed before ops->ndo_get_stats64 is called in core dev_get_stats() function. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240223115839.3572852-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26net/vsockmon: Leverage core stats allocatorBreno Leitao1-15/+1
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the vsockmon driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20240223115839.3572852-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detectedOleksij Rempel1-1/+2
Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not present or not flashed, LAN7850 will fail to sync the speed detected by the PHY with the MAC. In case link speed is 100Mbit, it will accidentally work, otherwise no data can be transferred. Better way would be to implement link_up callback, or set auto speed configuration unconditionally. But this changes would be more intrusive. So, for now, set it only if no EEPROM is found. Fixes: e69647a19c87 ("lan78xx: Set ASD in MAC_CR when EEE is enabled.") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26Merge tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linuxLinus Torvalds3-10/+10
Pull mtd fixes from Miquel Raynal: "Many NAND page layouts have been added to the Marvell NAND controller but could not be used in practice so they are being removed. Regarding the SPI-NAND area, Gigadevice chips were not using the right buffer for an ECC status check operation. Aside from these driver fixes, there is also a refcount fix in the MTD core nodes parsing logic" * tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: marvell: fix layouts mtd: Fix possible refcounting issue when going through partition nodes mtd: spinand: gigadevice: Fix the get ecc status issue
2024-02-26Merge tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds6-22/+88
Pull btrfs fixes from David Sterba: "A more fixes for recently reported or discovered problems: - fix corner case of send that would generate potentially large stream of zeros if there's a hole at the end of the file - fix chunk validation in zoned mode on conventional zones, it was possible to create chunks that would not be allowed on sequential zones - fix validation of dev-replace ioctl filenames - fix KCSAN warnings about access to block reserve struct members" * tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve btrfs: fix data races when accessing the reserved amount of block reserves btrfs: send: don't issue unnecessary zero writes for trailing hole btrfs: dev-replace: properly validate device names btrfs: zoned: don't skip block group profile checks on conventional zones
2024-02-26fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESSMark O'Donovan1-1/+1
When CONFIG_NTFS3_LZX_XPRESS is not set then we get the following build error: fs/ntfs3/frecord.c:2460:16: error: unused variable ‘i_size’ Signed-off-by: Mark O'Donovan <shiftee@posteo.net> Fixes: 4fd6c08a16d7 ("fs/ntfs3: Use i_size_read and i_size_write") Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-26landlock: Fix asymmetric private inodes referringMickaël Salaün1-2/+2
When linking or renaming a file, if only one of the source or destination directory is backed by an S_PRIVATE inode, then the related set of layer masks would be used as uninitialized by is_access_to_paths_allowed(). This would result to indeterministic access for one side instead of always being allowed. This bug could only be triggered with a mounted filesystem containing both S_PRIVATE and !S_PRIVATE inodes, which doesn't seem possible. The collect_domain_accesses() calls return early if is_nouser_or_private() returns false, which means that the directory's superblock has SB_NOUSER or its inode has S_PRIVATE. Because rename or link actions are only allowed on the same mounted filesystem, the superblock is always the same for both source and destination directories. However, it might be possible in theory to have an S_PRIVATE parent source inode with an !S_PRIVATE parent destination inode, or vice versa. To make sure this case is not an issue, explicitly initialized both set of layer masks to 0, which means to allow all actions on the related side. If at least on side has !S_PRIVATE, then collect_domain_accesses() and is_access_to_paths_allowed() check for the required access rights. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Jann Horn <jannh@google.com> Cc: Shervin Oloumi <enlightened@chromium.org> Cc: stable@vger.kernel.org Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Link: https://lore.kernel.org/r/20240219190345.2928627-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-02-26Merge branch 'pcs-xpcs-cleanups'David S. Miller1-9/+8
Serge Semin says: ==================== net: pcs: xpcs: Cleanups before adding MMIO dev support As stated in the subject this series is a short prequel before submitting the main patches adding the memory-mapped DW XPCS support to the DW XPCS and DW *MAC (STMMAC) drivers. Originally it was a part of the bigger patchset (see the changelog v2 link below) but was detached to a preparation set to shrink down the main series thus simplifying it' review. The patchset' content is straightforward: drop the redundant sentinel entry and the header files; return EINVAL errno from the soft-reset method and make sure that the interface validation method return EINVAL straight away if the requested interface isn't supported by the XPCS device instance. All of these changes are required to simplify the changes being introduced a bit later in the framework of the memory-mapped DW XPCS support patches. Link: https://lore.kernel.org/netdev/20231205103559.9605-1-fancer.lancer@gmail.com Changelog v2: - Move the preparation patches to a separate series. - Simplify the commit messages (@Russell, @Vladimir). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Explicitly return error on caps validationSerge Semin1-3/+4
If an unsupported interface is passed to the PCS validation callback there is no need in further link-modes calculations since the resultant array will be initialized with zeros which will be perceived by the phylink subsystem as error anyway (see phylink_validate_mac_and_pcs()). Instead let's explicitly return the -EINVAL error to inform the caller about the unsupported interface as it's done in the rest of the pcs_validate callbacks. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Return EINVAL in the internal methodsSerge Semin1-2/+2
In particular the xpcs_soft_reset() and xpcs_do_config() functions currently return -1 if invalid auto-negotiation mode is specified. That value might be then passed to the generic kernel subsystems which require a standard kernel errno value. Even though the erroneous conditions are very specific (memory corruption or buggy driver implementation) using a hard-coded -1 literal doesn't seem correct anyway especially when it comes to passing it higher to the network subsystem or printing to the system log. Convert the hard-coded error values to -EINVAL then. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Drop redundant workqueue.h include directiveSerge Semin1-1/+1
There is nothing CM workqueue-related in the driver. So the respective include directive can be dropped. While at it add an empty line delimiter between the generic and local path include directives to visually separate them. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Drop sentinel entry from 2500basex ifaces listSerge Semin1-3/+1
There are currently only two methods (xpcs_find_compat() and xpcs_get_interfaces()) defined in the driver which loop over the available interfaces. All of them rely on the xpcs_compat::num_interfaces field value to get the total number of supported interfaces. Thus the interface arrays are supposed to be filled with actual interface IDs and there is no need in the dummy terminating ID placed at the end of the arrays. Based on the above drop the PHY_INTERFACE_MODE_MAX entry from the xpcs_2500basex_interfaces array and the PHY_INTERFACE_MODE_MAX-based conditional statement from the xpcs_get_interfaces() method as redundant. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()Eric Dumazet1-3/+4
It seems that if userspace provides a correct IFA_TARGET_NETNSID value but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr() returns -EINVAL with an elevated "struct net" refcount. Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26Merge branch 'rtnetlink-reduce-rtnl-pressure'David S. Miller32-198/+238
Eric Dumazet says: ==================== rtnetlink: reduce RTNL pressure for dumps This series restarts the conversion of rtnl dump operations to RCU protection, instead of requiring RTNL. In this new attempt (prior one failed in 2011), I chose to allow a gradual conversion of selected operations. After this series, "ip -6 addr" and "ip -4 ro" no longer need to acquire RTNL. I refrained from changing inet_dump_ifaddr() and inet6_dump_addr() to avoid merge conflicts because of two fixes in net tree. I also started the work for "ip link" future conversion. v2: rtnl_fill_link_ifmap() always emit IFLA_MAP (Jiri Pirko) Added "nexthop: allow nexthop_mpath_fill_node() to be called without RTNL" to avoid a lockdep splat (Ido Schimmel) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: provide RCU protection to rtnl_fill_prop_list()Eric Dumazet1-5/+4
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. dev->name_node items are already rcu protected. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: make rtnl_fill_link_ifmap() RCU readyEric Dumazet1-10/+11
Use READ_ONCE() to read the following device fields: dev->mem_start dev->mem_end dev->base_addr dev->irq dev->dma dev->if_port Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26inet: switch inet_dump_fib() to RCU protectionEric Dumazet2-21/+20
No longer hold RTNL while calling inet_dump_fib(). Also change return value for a completed dump: Returning 0 instead of skb->len allows NLMSG_DONE to be appended to the skb. User space does not have to call us again to get a standalone NLMSG_DONE marker. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26nexthop: allow nexthop_mpath_fill_node() to be called without RTNLEric Dumazet1-1/+1
nexthop_mpath_fill_node() will be potentially called from contexts holding rcu_lock instead of RTNL. Suggested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCUEric Dumazet6-9/+26
Add a new field into struct fib_dump_filter, to let callers tell if they use RTNL locking or RCU. This is used in the following patch, when inet_dump_fib() no longer holds RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: switch inet6_dump_ifinfo() to RCU protectionEric Dumazet1-1/+1
No longer hold RTNL while calling inet6_dump_ifinfo() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flagEric Dumazet4-0/+8
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: change nlk->cb_mutex roleEric Dumazet2-16/+21
In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it"), Patrick McHardy used a common mutex to protect both nlk->cb and the dump() operations. The override is used for rtnl dumps, registered with rntl_register() and rntl_register_module(). We want to be able to opt-out some dump() operations to not acquire RTNL, so we need to protect nlk->cb with a per socket mutex. This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex The optional pointer to the mutex used to protect dump() call is stored in nlk->dump_cb_mutex Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26netlink: hold nlk->cb_mutex longer in __netlink_dump_start()Eric Dumazet1-7/+6
__netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again. This seems dangerous, even if KASAN did not bother yet. Add a @lock_taken parameter to netlink_dump() to let it grab the mutex if called from netlink_recvmsg() only. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26netlink: fix netlink_diag_dump() return valueEric Dumazet1-1/+1
__netlink_diag_dump() returns 1 if the dump is not complete, zero if no error occurred. If err variable is zero, this means the dump is complete: We should not return skb->len in this case, but 0. This allows NLMSG_DONE to be appended to the skb. User space does not have to call us again only to get NLMSG_DONE. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: use xarray iterator to implement inet6_dump_ifinfo()Eric Dumazet1-27/+19
Prepare inet6_dump_ifinfo() to run with RCU protection instead of RTNL and use for_each_netdev_dump() interface. Also properly return 0 at the end of a dump, avoiding an extra recvmsg() system call and RTNL acquisition. Note that RTNL-less dumps need core changes, coming later in the series. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: prepare inet6_fill_ifinfo() for RCU protectionEric Dumazet3-8/+13
We want to use RCU protection instead of RTNL for inet6_fill_ifinfo(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: prepare inet6_fill_ifla6_attrs() for RCUEric Dumazet2-75/+90
We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs() in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: prepare nla_put_iflink() to run under RCUEric Dumazet16-20/+20
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26Merge branch 'dp83826'David S. Miller2-0/+60
Jérémie Dautheribes says: ==================== Add support for TI DP83826 configuration This short patch series introduces the possibility of overriding some parameters which are latched by default by hardware straps on the TI DP83826 PHY. The settings that can be overridden include: - Configuring the PHY in either MII mode or RMII mode. - When in RMII mode, configuring the PHY in RMII slave mode or RMII master mode. The RMII master/slave mode is TI-specific and determines whether the PHY operates from a 25MHz reference clock (master mode) or from a 50MHz reference clock (slave mode). While these features should be supported by all the TI DP8382x family, I have only been able to test them on TI DP83826 hardware. Therefore, support has been added specifically for this PHY in this patch series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: phy: dp83826: support configuring RMII master/slave operation modeJérémie Dautheribes1-5/+38
The TI DP83826 PHY can operate between two RMII modes: - master mode (PHY operates from a 25MHz clock reference) - slave mode (PHY operates from a 50MHz clock reference) By default, the operation mode is configured by hardware straps. Add support to configure the operation mode from within the driver. Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: phy: dp83826: Add support for phy-mode configurationJérémie Dautheribes1-0/+11
The TI DP83826 PHY can operate in either MII mode or RMII mode. By default, it is configured by straps. It can also be configured by writing to the bit 5 of register 0x17 - RMII and Status Register (RCSR). When phydev->interface is rmii, rmii mode must be enabled, otherwise mii mode must be set. This prevents misconfiguration of hw straps. Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26dt-bindings: net: dp83822: support configuring RMII master/slave modeJérémie Dautheribes1-0/+16
Add property ti,rmii-mode to support selecting the RMII operation mode between: - master mode (PHY operates from a 25MHz clock reference) - slave mode (PHY operates from a 50MHz clock reference) If not set, the operation mode is configured by hardware straps. Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: dsa: microchip: Add support for bridge port isolationOleksij Rempel2-5/+51
Implement bridge port isolation for KSZ switches. Enabling the isolation of switch ports from each other while maintaining connectivity with the CPU and other forwarding ports. For instance, to isolate swp1 and swp2 from each other, use the following commands: - bridge link set dev swp1 isolated on - bridge link set dev swp2 isolated on Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26selftests: net: veth: test syncing GRO and XDP state while device is downJakub Kicinski1-0/+14
Test that we keep GRO flag in sync when XDP is disabled while the device is closed. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: veth: clear GRO when clearing XDP even when downJakub Kicinski1-18/+17
veth sets NETIF_F_GRO automatically when XDP is enabled, because both features use the same NAPI machinery. The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which is called both on ndo_stop and when XDP is turned off. To avoid the flag from being cleared when the device is brought down, the clearing is skipped when IFF_UP is not set. Bringing the device down should indeed not modify its features. Unfortunately, this means that clearing is also skipped when XDP is disabled _while_ the device is down. And there's nothing on the open path to bring the device features back into sync. IOW if user enables XDP, disables it and then brings the device up we'll end up with a stray GRO flag set but no NAPI instances. We don't depend on the GRO flag on the datapath, so the datapath won't crash. We will crash (or hang), however, next time features are sync'ed (either by user via ethtool or peer changing its config). The GRO flag will go away, and veth will try to disable the NAPIs. But the open path never created them since XDP was off, the GRO flag was a stray. If NAPI was initialized before we'll hang in napi_disable(). If it never was we'll crash trying to stop uninitialized hrtimer. Move the GRO flag updates to the XDP enable / disable paths, instead of mixing them with the ndo_open / ndo_close paths. Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: syzbot+039399a9b96297ddedca@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-25Linux 6.8-rc6Linus Torvalds1-1/+1