wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2024-05-24	bpf: Fix potential integer overflow in resolve_btfids	Friedrich Vock	1	-1/+1
	err is a 32-bit integer, but elf_update returns an off_t, which is 64-bit at least on 64-bit platforms. If symbols_patch is called on a binary between 2-4GB in size, the result will be negative when cast to a 32-bit integer, which the code assumes means an error occurred. This can wrongly trigger build failures when building very large kernel images. Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object") Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240514070931.199694-1-friedrich.vock@gmx.de
2024-05-21	MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT	Xu Kuohai	1	-0/+1
	I am working on ARM64 BPF JIT for a while, hence add myself as reviewer. Signed-off-by: Xu Kuohai <xukuohai@huaweicloud.com> Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Link: https://lore.kernel.org/r/20240516020928.156125-1-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-21	openvswitch: Set the skbuff pkt_type for proper pmtud support.	Aaron Conole	1	-0/+6
	Open vSwitch is originally intended to switch at layer 2, only dealing with Ethernet frames. With the introduction of l3 tunnels support, it crossed into the realm of needing to care a bit about some routing details when making forwarding decisions. If an oversized packet would need to be fragmented during this forwarding decision, there is a chance for pmtu to get involved and generate a routing exception. This is gated by the skbuff->pkt_type field. When a flow is already loaded into the openvswitch module this field is set up and transitioned properly as a packet moves from one port to another. In the case that a packet execute is invoked after a flow is newly installed this field is not properly initialized. This causes the pmtud mechanism to omit sending the required exception messages across the tunnel boundary and a second attempt needs to be made to make sure that the routing exception is properly setup. To fix this, we set the outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get to the openvswitch module via a port device or packet command. Even for bridge ports as users, the pkt_type needs to be reset when doing the transmit as the packet is truly outgoing and routing needs to get involved post packet transformations, in the case of VXLAN/GENEVE/udp-tunnel packets. In general, the pkt_type on output gets ignored, since we go straight to the driver, but in the case of tunnel ports they go through IP routing layer. This issue is periodically encountered in complex setups, such as large openshift deployments, where multiple sets of tunnel traversal occurs. A way to recreate this is with the ovn-heater project that can setup a networking environment which mimics such large deployments. We need larger environments for this because we need to ensure that flow misses occur. In these environment, without this patch, we can see: ./ovn_cluster.sh start podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200 podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache podman exec ovn-chassis-1 ip netns exec sw01p1 \ ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142) --- 21.0.0.3 ping statistics --- ... Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not sent into the server. With this patch, setting the pkt_type, we see the following: podman exec ovn-chassis-1 ip netns exec sw01p1 \ ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222) ping: local error: message too long, mtu=1222 --- 21.0.0.3 ping statistics --- ... In this case, the first ping request receives the FRAG_NEEDED message and a local routing exception is created. Tested-by: Jaime Caamano <jcaamano@redhat.com> Reported-at: https://issues.redhat.com/browse/FDP-164 Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.") Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20240516200941.16152-1-aconole@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21	selftest: af_unix: Make SCM_RIGHTS into OOB data.	Kuniyuki Iwashima	1	-2/+2
	scm_rights.c covers various test cases for inflight file descriptors and garbage collector for AF_UNIX sockets. Currently, SCM_RIGHTS messages are sent with 3-bytes string, and it's not good for MSG_OOB cases, as SCM_RIGTS cmsg goes with the first 2-bytes, which is non-OOB data. Let's send SCM_RIGHTS messages with 1-byte character to pack SCM_RIGHTS into OOB data. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21	af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS	Michal Luczaj	1	-9/+14
	GC attempts to explicitly drop oob_skb's reference before purging the hit list. The problem is with embryos: kfree_skb(u->oob_skb) is never called on an embryo socket. The python script below [0] sends a listener's fd to its embryo as OOB data. While GC does collect the embryo's queue, it fails to drop the OOB skb's refcount. The skb which was in embryo's receive queue stays as unix_sk(sk)->oob_skb and keeps the listener's refcount [1]. Tell GC to dispose embryo's oob_skb. [0]: from array import array from socket import * addr = '\x00unix-oob' lis = socket(AF_UNIX, SOCK_STREAM) lis.bind(addr) lis.listen(1) s = socket(AF_UNIX, SOCK_STREAM) s.connect(addr) scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()])) s.sendmsg([b'x'], [scm], MSG_OOB) lis.close() [1] $ grep unix-oob /proc/net/unix $ ./unix-oob.py $ grep unix-oob /proc/net/unix 0000000000000000: 00000002 00000000 00000000 0001 02 0 @unix-oob 0000000000000000: 00000002 00000000 00010000 0001 01 6072 @unix-oob Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21	tcp: Fix shift-out-of-bounds in dctcp_update_alpha().	Kuniyuki Iwashima	1	-1/+12
	In dctcp_update_alpha(), we use a module parameter dctcp_shift_g as follows: alpha -= min_not_zero(alpha, alpha >> dctcp_shift_g); ... delivered_ce <<= (10 - dctcp_shift_g); It seems syzkaller started fuzzing module parameters and triggered shift-out-of-bounds [0] by setting 100 to dctcp_shift_g: memcpy((void)0x20000080, "/sys/module/tcp_dctcp/parameters/dctcp_shift_g\000", 47); res = syscall(__NR_openat, /fd=/0xffffffffffffff9cul, /file=/0x20000080ul, /flags=/2ul, /mode=/0ul); memcpy((void)0x20000000, "100\000", 4); syscall(__NR_write, /fd=/r[0], /val=/0x20000000ul, /len=/4ul); Let's limit the max value of dctcp_shift_g by param_set_uint_minmax(). With this patch: # echo 10 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g # cat /sys/module/tcp_dctcp/parameters/dctcp_shift_g 10 # echo 11 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g -bash: echo: write error: Invalid argument [0]: UBSAN: shift-out-of-bounds in net/ipv4/tcp_dctcp.c:143:12 shift exponent 100 is too large for 32-bit type 'u32' (aka 'unsigned int') CPU: 0 PID: 8083 Comm: syz-executor345 Not tainted 6.9.0-05151-g1b294a1f3561 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x201/0x300 lib/dump_stack.c:114 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_shift_out_of_bounds+0x346/0x3a0 lib/ubsan.c:468 dctcp_update_alpha+0x540/0x570 net/ipv4/tcp_dctcp.c:143 tcp_in_ack_event net/ipv4/tcp_input.c:3802 [inline] tcp_ack+0x17b1/0x3bc0 net/ipv4/tcp_input.c:3948 tcp_rcv_state_process+0x57a/0x2290 net/ipv4/tcp_input.c:6711 tcp_v4_do_rcv+0x764/0xc40 net/ipv4/tcp_ipv4.c:1937 sk_backlog_rcv include/net/sock.h:1106 [inline] __release_sock+0x20f/0x350 net/core/sock.c:2983 release_sock+0x61/0x1f0 net/core/sock.c:3549 mptcp_subflow_shutdown+0x3d0/0x620 net/mptcp/protocol.c:2907 mptcp_check_send_data_fin+0x225/0x410 net/mptcp/protocol.c:2976 __mptcp_close+0x238/0xad0 net/mptcp/protocol.c:3072 mptcp_close+0x2a/0x1a0 net/mptcp/protocol.c:3127 inet_release+0x190/0x1f0 net/ipv4/af_inet.c:437 __sock_release net/socket.c:659 [inline] sock_close+0xc0/0x240 net/socket.c:1421 __fput+0x41b/0x890 fs/file_table.c:422 task_work_run+0x23b/0x300 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x9c8/0x2540 kernel/exit.c:878 do_group_exit+0x201/0x2b0 kernel/exit.c:1027 __do_sys_exit_group kernel/exit.c:1038 [inline] __se_sys_exit_group kernel/exit.c:1036 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xe4/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x67/0x6f RIP: 0033:0x7f6c2b5005b6 Code: Unable to access opcode bytes at 0x7f6c2b50058c. RSP: 002b:00007ffe883eb948 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f6c2b5862f0 RCX: 00007f6c2b5005b6 RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001 RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0 R10: 0000000000000006 R11: 0000000000000246 R12: 00007f6c2b5862f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 </TASK> Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: Yue Sun <samsun1006219@gmail.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/netdev/CAEkJfYNJM=cw-8x7_Vmj1J6uYVCWMbbvD=EFmDPVBGpTsqOxEA@mail.gmail.com/ Fixes: e3118e8359bb ("net: tcp: add DCTCP congestion control algorithm") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240517091626.32772-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21	selftests/net: use tc rule to filter the na packet	Hangbin Liu	3	-94/+75
	Test arp_ndisc_untracked_subnets use tcpdump to filter the unsolicited and untracked na messages. It set -e before calling tcpdump. But if tcpdump filters 0 packet, it will return none zero, and cause the script to exit. Instead of using slow tcpdump to capture packets, let's using tc rule to filter out the na message. At the same time, fix function setup_v6 which only needs one parameter. Move all the related helpers from forwarding lib.sh to net lib.sh. Fixes: 0ea7b0a454ca ("selftests: net: arp_ndisc_untracked_subnets: test for arp_accept and accept_untracked_na") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240517010327.2631319-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21	ipv6: sr: fix memleak in seg6_hmac_init_algo	Hangbin Liu	1	-14/+28
	seg6_hmac_init_algo returns without cleaning up the previous allocations if one fails, so it's going to leak all that memory and the crypto tfms. Update seg6_hmac_exit to only free the memory when allocated, so we can reuse the code directly. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Closes: https://lore.kernel.org/netdev/Zj3bh-gE7eT6V6aH@hog/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240517005435.2600277-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21	af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.	Kuniyuki Iwashima	1	-6/+22
	Billy Jheng Bing-Jhong reported a race between __unix_gc() and queue_oob(). __unix_gc() tries to garbage-collect close()d inflight sockets, and then if the socket has MSG_OOB in unix_sk(sk)->oob_skb, GC will drop the reference and set NULL to it locklessly. However, the peer socket still can send MSG_OOB message and queue_oob() can update unix_sk(sk)->oob_skb concurrently, leading NULL pointer dereference. [0] To fix the issue, let's update unix_sk(sk)->oob_skb under the sk_receive_queue's lock and take it everywhere we touch oob_skb. Note that we defer kfree_skb() in manage_oob() to silence lockdep false-positive (See [1]). [0]: BUG: kernel NULL pointer dereference, address: 0000000000000008 PF: supervisor write access in kernel mode PF: error_code(0x0002) - not-present page PGD 8000000009f5e067 P4D 8000000009f5e067 PUD 9f5d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc5-00191-gd091e579b864 #110 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events delayed_fput RIP: 0010:skb_dequeue (./include/linux/skbuff.h:2386 ./include/linux/skbuff.h:2402 net/core/skbuff.c:3847) Code: 39 e3 74 3e 8b 43 10 48 89 ef 83 e8 01 89 43 10 49 8b 44 24 08 49 c7 44 24 08 00 00 00 00 49 8b 14 24 49 c7 04 24 00 00 00 00 <48> 89 42 08 48 89 10 e8 e7 c5 42 00 4c 89 e0 5b 5d 41 5c c3 cc cc RSP: 0018:ffffc900001bfd48 EFLAGS: 00000002 RAX: 0000000000000000 RBX: ffff8880088f5ae8 RCX: 00000000361289f9 RDX: 0000000000000000 RSI: 0000000000000206 RDI: ffff8880088f5b00 RBP: ffff8880088f5b00 R08: 0000000000080000 R09: 0000000000000001 R10: 0000000000000003 R11: 0000000000000001 R12: ffff8880056b6a00 R13: ffff8880088f5280 R14: 0000000000000001 R15: ffff8880088f5a80 FS: 0000000000000000(0000) GS:ffff88807dd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000006314000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> unix_release_sock (net/unix/af_unix.c:654) unix_release (net/unix/af_unix.c:1050) __sock_release (net/socket.c:660) sock_close (net/socket.c:1423) __fput (fs/file_table.c:423) delayed_fput (fs/file_table.c:444 (discriminator 3)) process_one_work (kernel/workqueue.c:3259) worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416) kthread (kernel/kthread.c:388) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK> Modules linked in: CR2: 0000000000000008 Link: https://lore.kernel.org/netdev/a00d3993-c461-43f2-be6d-07259c98509a@rbox.co/ [1] Fixes: 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.") Reported-by: Billy Jheng Bing-Jhong <billy@starlabs.sg> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240516134835.8332-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21	Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already"	Heiner Kallweit	1	-4/+2
	This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. Ken reported that RTL8125b can lock up if gro_flush_timeout has the default value of 20000 and napi_defer_hard_irqs is set to 0. In this scenario device interrupts aren't disabled, what seems to trigger some silicon bug under heavy load. I was able to reproduce this behavior on RTL8168h. Fix this by reverting 7274c4147afb. Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") Cc: stable@vger.kernel.org Reported-by: Ken Milmore <ken.milmore@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/9b5b6f4c-4f54-4b90-b0b3-8d8023c2e780@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-20	nfc: nci: Fix uninit-value in nci_rx_work	Ryosuke Yasuoka	1	-1/+14
	syzbot reported the following uninit-value access issue [1] nci_rx_work() parses received packet from ndev->rx_q. It should be validated header size, payload size and total packet size before processing the packet. If an invalid packet is detected, it should be silently discarded. Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Reported-and-tested-by: syzbot+d7b4dc6cd50410152534@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d7b4dc6cd50410152534 [1] Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-20	selftests: net: kill smcrouted in the cleanup logic in amt.sh	Taehee Yoo	1	-1/+7
	The amt.sh requires smcrouted for multicasting routing. So, it starts smcrouted before forwarding tests. It must be stopped after all tests, but it isn't. To fix this issue, it kills smcrouted in the cleanup logic. Fixes: c08e8baea78e ("selftests: add amt interface selftest script") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-20	ipv6: sr: fix missing sk_buff release in seg6_input_core	Andrea Mayer	1	-5/+6
	The seg6_input() function is responsible for adding the SRH into a packet, delegating the operation to the seg6_input_core(). This function uses the skb_cow_head() to ensure that there is sufficient headroom in the sk_buff for accommodating the link-layer header. In the event that the skb_cow_header() function fails, the seg6_input_core() catches the error but it does not release the sk_buff, which will result in a memory leak. This issue was introduced in commit af3b5158b89d ("ipv6: sr: fix BUG due to headroom too small after SRH push") and persists even after commit 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane"), where the entire seg6_input() code was refactored to deal with netfilter hooks. The proposed patch addresses the identified memory leak by requiring the seg6_input_core() function to release the sk_buff in the event that skb_cow_head() fails. Fixes: af3b5158b89d ("ipv6: sr: fix BUG due to headroom too small after SRH push") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-20	net: Always descend into dsa/ folder with CONFIG_NET_DSA enabled	Florian Fainelli	1	-1/+3
	Stephen reported that he was unable to get the dsa_loop driver to get probed, and the reason ended up being because he had CONFIG_FIXED_PHY=y in his kernel configuration. As Masahiro explained it: "obj-m += dsa/" means everything under dsa/ must be modular. If there is a built-in object under dsa/ with CONFIG_NET_DSA=m, you cannot do "obj-$(CONFIG_NET_DSA) += dsa/". You need to change it back to "obj-y += dsa/". This was the case here whereby CONFIG_NET_DSA=m, and so the obj-$(CONFIG_FIXED_PHY) += dsa_loop_bdinfo.o rule is not executed and the DSA loop mdio_board info structure is not registered with the kernel, and eventually the device is simply not found. To preserve the intention of the original commit of limiting the amount of folder descending, conditionally descend into drivers/net/dsa when CONFIG_NET_DSA is enabled. Fixes: 227d72063fcc ("dsa: simplify Kconfig symbols and dependencies") Reported-by: Stephen Langstaff <stephenlangstaff1@gmail.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-17	kprobe/ftrace: fix build error due to bad function definition	Linus Torvalds	1	-1/+1
	Commit 1a7d0890dd4a ("kprobe/ftrace: bail out if ftrace was killed") introduced a bad K&R function definition, which we haven't accepted in a long long time. Gcc seems to let it slide, but clang notices with the appropriate error: kernel/kprobes.c:1140:24: error: a function declaration without a prototype is deprecated in all > 1140 \| void kprobe_ftrace_kill() \| ^ \| void but this commit was apparently never in linux-next before it was sent upstream, so it didn't get the appropriate build test coverage. Fixes: 1a7d0890dd4a kprobe/ftrace: bail out if ftrace was killed Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-05-17	selftests: net: local_termination: annotate the expected failures	Jakub Kicinski	1	-12/+18
	Vladimir said when adding this test: The bridge driver fares particularly badly [...] mainly because it does not implement IFF_UNICAST_FLT. See commit 90b9566aa5cd ("selftests: forwarding: add a test for local_termination.sh"). We don't want to hide the known gaps, but having a test which always fails prevents us from catching regressions. Report the cases we know may fail as XFAIL. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20240516152513.1115270-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-17	net: dsa: microchip: Correct initialization order for KSZ88x3 ports	Oleksij Rempel	1	-0/+10
	Adjust the initialization sequence of KSZ88x3 switches to enable 802.1p priority control on Port 2 before configuring Port 1. This change ensures the apptrust functionality on Port 1 operates correctly, as it depends on the priority settings of Port 2. The prior initialization sequence incorrectly configured Port 1 first, which could lead to functional discrepancies. Fixes: a1ea57710c9d ("net: dsa: microchip: dcb: add special handling for KSZ88X3 family") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://lore.kernel.org/r/20240517050121.2174412-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-17	MAINTAINERS: net: Update reviewers for TI's Ethernet drivers	Ravi Gunasekaran	1	-1/+0
	Remove myself as reviewer for TI's ethernet drivers Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20240516082545.6412-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-17	dt-bindings: net: ti: Update maintainers list	Ravi Gunasekaran	3	-3/+0
	Update the list with the current maintainers of TI's CPSW ethernet peripheral. Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20240516054932.27597-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-17	l2tp: fix ICMP error handling for UDP-encap sockets	Tom Parkin	1	-11/+33
	Since commit a36e185e8c85 ("udp: Handle ICMP errors for tunnels with same destination port on both endpoints") UDP's handling of ICMP errors has allowed for UDP-encap tunnels to determine socket associations in scenarios where the UDP hash lookup could not. Subsequently, commit d26796ae58940 ("udp: check udp sock encap_type in __udp_lib_err") subtly tweaked the approach such that UDP ICMP error handling would be skipped for any UDP socket which has encapsulation enabled. In the case of L2TP tunnel sockets using UDP-encap, this latter modification effectively broke ICMP error reporting for the L2TP control plane. To a degree this isn't catastrophic inasmuch as the L2TP control protocol defines a reliable transport on top of the underlying packet switching network which will eventually detect errors and time out. However, paying attention to the ICMP error reporting allows for more timely detection of errors in L2TP userspace, and aids in debugging connectivity issues. Reinstate ICMP error handling for UDP encap L2TP tunnels: * implement struct udp_tunnel_sock_cfg .encap_err_rcv in order to allow the L2TP code to handle ICMP errors; * only implement error-handling for tunnels which have a managed socket: unmanaged tunnels using a kernel socket have no userspace to report errors back to; * flag the error on the socket, which allows for userspace to get an error such as -ECONNREFUSED back from sendmsg/recvmsg; * pass the error into ip[v6]_icmp_error() which allows for userspace to get extended error information via. MSG_ERRQUEUE. Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") Signed-off-by: Tom Parkin <tparkin@katalix.com> Link: https://lore.kernel.org/r/20240513172248.623261-1-tparkin@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-17	net: txgbe: fix to control VLAN strip	Jiawen Wu	7	-13/+84
	When VLAN tag strip is changed to enable or disable, the hardware requires the Rx ring to be in a disabled state, otherwise the feature cannot be changed. Fixes: f3b03c655f67 ("net: wangxun: Implement vlan add and kill functions") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-17	net: wangxun: match VLAN CTAG and STAG features	Jiawen Wu	4	-0/+50
	Hardware requires VLAN CTAG and STAG configuration always matches. And whether VLAN CTAG or STAG changes, the configuration needs to be changed as well. Fixes: 6670f1ece2c8 ("net: txgbe: Add netdev features support") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-17	net: wangxun: fix to change Rx features	Jiawen Wu	1	-1/+3
	Fix the issue where some Rx features cannot be changed. When using ethtool -K to turn off rx offload, it returns error and displays "Could not change any device features". And netdev->features is not assigned a new value to actually configure the hardware. Fixes: 6dbedcffcf54 ("net: libwx: Implement xx_set_features ops") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-16	af_packet: do not call packet_read_pending() from tpacket_destruct_skb()	Eric Dumazet	1	-2/+1
	trafgen performance considerably sank on hosts with many cores after the blamed commit. packet_read_pending() is very expensive, and calling it in af_packet fast path defeats Daniel intent in commit b013840810c2 ("packet: use percpu mmap tx frame pending refcount") tpacket_destruct_skb() makes room for one packet, we can immediately wakeup a producer, no need to completely drain the tx ring. Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-16	virtio_net: Fix missed rtnl_unlock	Daniel Jurgens	1	-3/+3
	The rtnl_lock would stay locked if allocating promisc_allmulti failed. Also changed the allocation to GFP_KERNEL. Fixes: ff7c7d9f5261 ("virtio_net: Remove command data from control_buf") Reported-by: Eric Dumazet <edumaset@google.com> Link: https://lore.kernel.org/netdev/CANn89iLazVaUCvhPm6RPJJ0owra_oFnx7Fhc8d60gV-65ad3WQ@mail.gmail.com/ Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20240515163125.569743-1-danielj@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-16	netrom: fix possible dead-lock in nr_rt_ioctl()	Eric Dumazet	1	-12/+7
	syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1] Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node) [1] WARNING: possible circular locking dependency detected 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Not tainted ------------------------------------------------------ syz-executor350/5129 is trying to acquire lock: ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 but task is already holding lock: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (nr_node_list_lock){+...}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_remove_node net/netrom/nr_route.c:299 [inline] nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355 nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&nr_node->node_lock){+...}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(nr_node_list_lock); lock(&nr_node->node_lock); lock(nr_node_list_lock); lock(&nr_node->node_lock); * DEADLOCK * 1 lock held by syz-executor350/5129: #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 stack backtrace: CPU: 0 PID: 5129 Comm: syz-executor350 Not tainted 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240515142934.3708038-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-16	idpf: don't skip over ethtool tcp-data-split setting	Michal Schmidt	1	-1/+2
	Disabling tcp-data-split on idpf silently fails: # ethtool -G $NETDEV tcp-data-split off # ethtool -g $NETDEV \| grep 'TCP data split' TCP data split: on But it works if you also change 'tx' or 'rx': # ethtool -G $NETDEV tcp-data-split off tx 256 # ethtool -g $NETDEV \| grep 'TCP data split' TCP data split: off The bug is in idpf_set_ringparam, where it takes a shortcut out if the TX and RX sizes are not changing. Fix it by checking also if the tcp-data-split setting remains unchanged. Only then can the soft reset be skipped. Fixes: 9b1aa3ef2328 ("idpf: add get/set for Ethtool's header split ringparam") Reported-by: Xu Du <xudu@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-36182 Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240515092414.158079-1-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-16	dt-bindings: net: qcom: ethernet: Allow dma-coherent	Sagar Cheluvegowda	1	-0/+2
	On SA8775P, Ethernet DMA controller is coherent with the CPU. allow specifying that. Signed-off-by: Sagar Cheluvegowda <quic_scheluve@quicinc.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240514-mark_ethernet_devices_dma_coherent-v4-2-04e1198858c5@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-16	bonding: fix oops during rmmod	Tony Battersby	1	-6/+7
	"rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function"). Here are the relevant functions being called: bonding_exit() bond_destroy_debugfs() debugfs_remove_recursive(bonding_debug_root); bonding_debug_root = NULL; <--------- SET TO NULL HERE bond_netlink_fini() rtnl_link_unregister() __rtnl_link_unregister() unregister_netdevice_many_notify() bond_uninit() bond_debug_unregister() (commit removed check for bonding_debug_root == NULL) debugfs_remove() simple_recursive_removal() down_write() -> OOPS However, reverting the bad commit does not solve the problem completely because the original code contains a race that could cause the same oops, although it was much less likely to be triggered unintentionally: CPU1 rmmod bonding bonding_exit() bond_destroy_debugfs() debugfs_remove_recursive(bonding_debug_root); CPU2 echo -bond0 > /sys/class/net/bonding_masters bond_uninit() bond_debug_unregister() if (!bonding_debug_root) CPU1 bonding_debug_root = NULL; So do NOT revert the bad commit (since the removed checks were racy anyway), and instead change the order of actions taken during module removal. The same oops can also happen if there is an error during module init, so apply the same fix there. Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function") Cc: stable@vger.kernel.org Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/641f914f-3216-4eeb-87dd-91b78aa97773@cybernetics.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-16	net/ipv6: Fix route deleting failure when metric equals 0	xu xin	1	-1/+4
	Problem ========= After commit 67f695134703 ("ipv6: Move setting default metric for routes"), we noticed that the logic of assigning the default value of fc_metirc changed in the ioctl process. That is, when users use ioctl(fd, SIOCADDRT, rt) with a non-zero metric to add a route, then they may fail to delete a route with passing in a metric value of 0 to the kernel by ioctl(fd, SIOCDELRT, rt). But iproute can succeed in deleting it. As a reference, when using iproute tools by netlink to delete routes with a metric parameter equals 0, like the command as follows: ip -6 route del fe80::/64 via fe81::5054:ff:fe11:3451 dev eth0 metric 0 the user can still succeed in deleting the route entry with the smallest metric. Root Reason =========== After commit 67f695134703 ("ipv6: Move setting default metric for routes"), When ioctl() pass in SIOCDELRT with a zero metric, rtmsg_to_fib6_config() will set a defalut value (1024) to cfg->fc_metric in kernel, and in ip6_route_del() and the line 4074 at net/ipv3/route.c, it will check by if (cfg->fc_metric && cfg->fc_metric != rt->fib6_metric) continue; and the condition is true and skip the later procedure (deleting route) because cfg->fc_metric != rt->fib6_metric. But before that commit, cfg->fc_metric is still zero there, so the condition is false and it will do the following procedure (deleting). Solution ======== In order to keep a consistent behaviour across netlink() and ioctl(), we should allow to delete a route with a metric value of 0. So we only do the default setting of fc_metric in route adding. CC: stable@vger.kernel.org # 5.4+ Fixes: 67f695134703 ("ipv6: Move setting default metric for routes") Co-developed-by: Fan Yu <fan.yu9@zte.com.cn> Signed-off-by: Fan Yu <fan.yu9@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240514201102055dD2Ba45qKbLlUMxu_DTHP@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-16	selftests/net: reduce xfrm_policy test time	Hangbin Liu	1	-2/+2
	The check_random_order test add/get plenty of xfrm rules, which consume a lot time on debug kernel and always TIMEOUT. Let's reduce the test loop and see if it works. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240514095227.2597730-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-17	selftests/bpf: Adjust btf_dump test to reflect recent change in file_operations	Martin KaFai Lau	1	-1/+1
	The btf_dump test fails: test_btf_dump_struct_data:FAIL:file_operations unexpected file_operations: actual '(struct file_operations){ .owner = (struct module )0xffffffffffffffff, .fop_flags = (fop_flags_t)4294967295, .llseek = (loff_t ()(struct f' != expected '(struct file_operations){ .owner = (struct module )0xffffffffffffffff, .llseek = (loff_t ()(struct file , loff_t, int))0xffffffffffffffff,' The "fop_flags" is a recent addition to the struct file_operations in commit 210a03c9d51a ("fs: claw back a few FMODE_ bits") This patch changes the test_btf_dump_struct_data() to reflect this change. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20240516164310.2481460-1-martin.lau@linux.dev
2024-05-17	selftests/bpf: Adjust test_access_variable_array after a kernel function name change	Martin KaFai Lau	1	-1/+1
	After commit 4c3e509ea9f2 ("sched/balancing: Rename load_balance() => sched_balance_rq()"), the load_balance kernel function is renamed to sched_balance_rq. This patch adjusts the fentry program in test_access_variable_array.c to reflect this kernel function name change. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240516170140.2689430-1-martin.lau@linux.dev
2024-05-16	rtla: Documentation: Fix -t, --trace	John Kacur	3	-4/+8
	Move -t, --trace from common_options.rst to common_osnoise_options.rst and common_timerlat_options.rst so that it will appear in the man pages rtla-timerlat-hist.1 rtla-timerlat-top.1 rtla-osnoise-hist.1 rtla-osnoise-top.1 Remove the equals ('=') sign and add a space. Link: https://lkml.kernel.org/r/20240516143121.12614-1-jkacur@redhat.com Cc: Daniel Bristot de Oliveria <bristot@kernel.org> Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-16	rtla: Fix -t\--trace[=file]	John Kacur	4	-20/+36
	The -t option has an optional argument. The usual case is for a short option to be specified without an '=' and for the long version to be specified with an '=' Various forms of this do not work as expected. For example: rtla timerlat hist -T50 -tfile.txt will result in a truncated file name of "ile.txt" Another example is that the long form without the '=' will result in the default file name instead of the requested file name. This patch properly parses the optional argument with and without '=' and with and without spaces for the short form. This patch was also tested using -t and --trace without providing a file name both as the last requested option and with a following long and short option. For example: rtla timerlat hist -T50 -t -u rtla timerlat hist -T50 --trace -u This fix is applied to both timerlat top and hist and to osnoise top and hist. Here is the full testing for rtla timerlat hist. Before applying the patch rtla timerlat hist -T50 -t=file.txt Works as expected, "file.txt" rtla timerlat hist -T50 -tfile.txt Truncated file name "ile.txt" rtla timerlat hist -T50 -t file.txt Default file name instead of file.txt rtla timerlat hist -T50 --trace=file.txt Truncated file name "ile.txt" rtla timerlat hist -T50 --trace file.txt Default file name "timerlat_trace.txt" instead of "file.txt" After applying the patch: rtla timerlat hist -T50 -t=file.txt Works as expected, "file.txt" rtla timerlat hist -T50 -tfile.txt Works as expected, "file.txt" rtla timerlat hist -T50 -t file.txt Works as expected, "file.txt" rtla timerlat hist -T50 --trace=file.txt Works as expected, "file.txt" rtla timerlat hist -T50 --trace file.txt Works as expected, "file.txt" In addition the following tests were performed to make sure that the default file name worked as expected including with trailing options. rtla timerlat hist -T50 -t Works as expected "timerlat_trace.txt" rtla timerlat hist -T50 --trace Works as expected "timerlat_trace.txt" rtla timerlat hist -T50 -t -u Works as expected "timerlat_trace.txt" rtla timerlat hist -T50 --trace -u Works as expected "timerlat_trace.txt" Link: https://lkml.kernel.org/r/20240515183024.59985-1-jkacur@redhat.com Cc: Daniel Bristot de Oliveria <bristot@kernel.org> Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-16	rtla/timerlat: Fix histogram report when a cpu count is 0	John Kacur	1	-18/+42
	On short runs it is possible to get no samples on a cpu, like this: # rtla timerlat hist -u -T50 Index IRQ-001 Thr-001 Usr-001 IRQ-002 Thr-002 Usr-002 2 1 0 0 0 0 0 33 0 1 0 0 0 0 36 0 0 1 0 0 0 49 0 0 0 1 0 0 52 0 0 0 0 1 0 over: 0 0 0 0 0 0 count: 1 1 1 1 1 0 min: 2 33 36 49 52 18446744073709551615 avg: 2 33 36 49 52 - max: 2 33 36 49 52 0 rtla timerlat hit stop tracing IRQ handler delay: (exit from idle) 48.21 us (91.09 %) IRQ latency: 49.11 us Timerlat IRQ duration: 2.17 us (4.09 %) Blocking thread: 1.01 us (1.90 %) swapper/2:0 1.01 us ------------------------------------------------------------------------ Thread latency: 52.93 us (100%) Max timerlat IRQ latency from idle: 49.11 us in cpu 2 Note, the value 18446744073709551615 is the same as ~0. Fix this by reporting no results for the min, avg and max if the count is 0. Link: https://lkml.kernel.org/r/20240510190318.44295-1-jkacur@redhat.com Cc: stable@vger.kernel.org Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode") Suggested-by: Daniel Bristot de Oliveria <bristot@kernel.org> Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-16	rtla: Add --trace-buffer-size option	Daniel Bristot de Oliveira	8	-5/+71
	Add the option allow the users to set a different buffer size for the trace. For example, in large systems, the user might be interested on reducing the trace buffer to avoid large tracing files. The buffer size is specified in kB, and it is only affecting the tracing instance. The function trace_set_buffer_size() appears on libtracefs v1.6, so increase the minimum required version on Makefile.config. Link: https://lkml.kernel.org/r/e7c9ca5b3865f28e131a49ec3b984fadf2d056c6.1715860611.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
2024-05-16	powerpc/fadump: Fix section mismatch warning	Michael Ellerman	1	-1/+1
	With some compilers/configs fadump_setup_param_area() isn't inlined into its caller (which is __init), leading to a section mismatch warning: WARNING: modpost: vmlinux: section mismatch in reference: fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> memblock_phys_alloc_range (section: .init.text) Fix it by adding an __init annotation. Fixes: 683eab94da75 ("powerpc/fadump: setup additional parameters for dump capture kernel") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20240515163708.3380c4d1@canb.auug.org.au/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/all/202405140922.oucLOx4Y-lkp@intel.com/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240516132631.347956-1-mpe@ellerman.id.au
2024-05-16	selftests/net/lib: no need to record ns name if it already exist	Hangbin Liu	1	-2/+4
	There is no need to add the name to ns_list again if the netns already recoreded. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-16	net: qrtr: ns: Fix module refcnt	Chris Lew	1	-0/+27
	The qrtr protocol core logic and the qrtr nameservice are combined into a single module. Neither the core logic or nameservice provide much functionality by themselves; combining the two into a single module also prevents any possible issues that may stem from client modules loading inbetween qrtr and the ns. Creating a socket takes two references to the module that owns the socket protocol. Since the ns needs to create the control socket, this creates a scenario where there are always two references to the qrtr module. This prevents the execution of 'rmmod' for qrtr. To resolve this, forcefully put the module refcount for the socket opened by the nameservice. Fixes: a365023a76f2 ("net: qrtr: combine nameservice into main module") Reported-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Tested-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Chris Lew <quic_clew@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-16	net: lan966x: remove debugfs directory in probe() error path	Herve Codina	1	-2/+4
	A debugfs directory entry is create early during probe(). This entry is not removed on error path leading to some "already present" issues in case of EPROBE_DEFER. Create this entry later in the probe() code to avoid the need to change many 'return' in 'goto' and add the removal in the already present error path. Fixes: 942814840127 ("net: lan966x: Add VCAP debugFS support") Cc: <stable@vger.kernel.org> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-16	drm/tests: Add a unit test for range bias allocation	Arunpravin Paneer Selvam	1	-1/+35
	Allocate cleared blocks in the bias range when the DRM buddy's clear avail is zero. This will validate the bias range allocation in scenarios like system boot when no cleared blocks are available and exercise the fallback path too. The resulting blocks should always be dirty. v1:(Matthew) - move the size to the variable declaration section. - move the mm.clear_avail init to allocator init. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240514145636.16253-2-Arunpravin.PaneerSelvam@amd.com
2024-05-16	drm/buddy: Fix the range bias clear memory allocation issue	Arunpravin Paneer Selvam	1	-1/+2
	Problem statement: During the system boot time, an application request for the bulk volume of cleared range bias memory when the clear_avail is zero, we dont fallback into normal allocation method as we had an unnecessary clear_avail check which prevents the fallback method leads to fb allocation failure following system goes into unresponsive state. Solution: Remove the unnecessary clear_avail check in the range bias allocation function. v2: add a kunit for this corner case (Daniel Vetter) Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Fixes: 96950929eb23 ("drm/buddy: Implement tracking clear page feature") Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240514145636.16253-1-Arunpravin.PaneerSelvam@amd.com
2024-05-16	kprobe/ftrace: bail out if ftrace was killed	Stephen Brennan	10	-0/+35
	If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic. This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]: [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb. Link: https://lore.kernel.org/all/20240501162956.229427-1-stephen.s.brennan@oracle.com/ Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-15	ARM: 9398/1: Fix userspace enter on LPAE with CC_OPTIMIZE_FOR_SIZE=y	Geert Uytterhoeven	2	-2/+2
	Booting an LPAE-enabled kernel built with CONFIG_CC_OPTIMIZE_FOR_SIZE=y fails when starting userspace: Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 CPU: 1 PID: 1 Comm: init Tainted: G W N 6.9.0-rc1-koelsch-00004-g7af5b901e847 #1930 Hardware name: Generic R-Car Gen2 (Flattened Device Tree) Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x78/0xa8 dump_stack_lvl from panic+0x118/0x398 panic from do_exit+0x1ec/0x938 do_exit from sys_exit_group+0x0/0x10 ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 ]--- Add the missing memory clobber to cpu_set_ttbcr(), as suggested by Russell King. Force inlining of uaccess_save_and_enable(), as suggested by Ard Biesheuvel. The latter fixes booting on Koelsch. Closes: https://lore.kernel.org/r/CAMuHMdWTAJcZ9BReWNhpmsgkOzQxLNb5OhNYxzxv6D5TSh2fwQ@mail.gmail.com/ Fixes: 7af5b901e84743c6 ("ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement") Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-05-15	selftests/kvm: remove dead file	Paolo Bonzini	1	-1135/+0
	This file was supposed to be removed in commit 2b7deea3ec7c ("Revert "kvm: selftests: move base kvm_util.h declarations to kvm_util_base.h""), but it survived. Remove it now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-15	selftests/bpf: add more variations of map-in-map situations	Andrii Nakryiko	1	-0/+10
	Add test cases validating usage of PERCPU_ARRAY and PERCPU_HASH maps as inner maps. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240515062440.846086-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-15	bpf: save extended inner map info for percpu array maps as well	Andrii Nakryiko	1	-2/+2
	ARRAY_OF_MAPS and HASH_OF_MAPS map types have special logic to save a few extra fields required for correct operations of ARRAY maps, when they are used as inner maps. PERCPU_ARRAY maps have similar requirements as they now support generating inline element lookup logic. So make sure that both classes of maps are handled correctly. Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: db69718b8efa ("bpf: inline bpf_map_lookup_elem() for PERCPU_ARRAY maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240515062440.846086-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-15	MAINTAINERS: Update ARM64 BPF JIT maintainer	Puranjay Mohan	1	-1/+1
	Zi Shen Lim is not actively doing kernel development and has decided to tranfer the responsibility of maintaining the JIT to me. Add myself as the maintainer for BPF JIT for ARM64 and remove Zi Shen Lim. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Link: https://lore.kernel.org/r/20240514183914.27737-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-15	bpf, docs: Fix the description of 'src' in ALU instructions	Puranjay Mohan	1	-2/+3
	An ALU instruction's source operand can be the value in the source register or the 32-bit immediate value encoded in the instruction. This is controlled by the 's' bit of the 'opcode'. The current description explicitly uses the phrase 'value of the source register' when defining the meaning of 'src'. Change the description to use 'source operand' in place of 'value of the source register'. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Dave Thaler <dthaler1968@gmail.com> Link: https://lore.kernel.org/r/20240514130303.113607-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>