aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-07-23Merge tag 'mac80211-for-net-2021-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211David S. Miller7-33/+63
Couple of fixes: * fix aggregation on mesh * fix late enabling of 4-addr mode * leave monitor SKBs with some headroom * limit band information for old applications * fix virt-wifi WARN_ON * fix memory leak in cfg80211 BSS list maintenance
2021-07-23net: qrtr: fix memory leaksPavel Skripkin1-1/+5
Syzbot reported memory leak in qrtr. The problem was in unputted struct sock. qrtr_local_enqueue() function calls qrtr_port_lookup() which takes sock reference if port was found. Then there is the following check: if (!ipc || &ipc->sk == skb->sk) { ... return -ENODEV; } Since we should drop the reference before returning from this function and ipc can be non-NULL inside this if, we should add qrtr_port_put() inside this if. The similar corner case is in qrtr_endpoint_post() as Manivannan reported. In case of sock_queue_rcv_skb() failure we need to put port reference to avoid leaking struct sock pointer. Fixes: e04df98adf7d ("net: qrtr: Remove receive worker") Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Reported-and-tested-by: syzbot+35a511c72ea7356cdcf3@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller6-10/+41
Pablo Neira Ayusosays: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Memleak in commit audit error path, from Dongliang Mu. 2) Avoid possible false sharing for flowtable timeout updates and nft_last use. 3) Adjust conntrack timestamp due to garbage collection delay, from Florian Westphal. 4) Fix nft_nat without layer 3 address for the inet family. 5) Fix compilation warning in nfnl_hook when ingress support is disabled, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23ipv6: decrease hop limit counter in ip6_forward()Kangmin Park1-2/+3
Decrease hop limit counter when deliver skb to ndp proxy. Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23net: Set true network header for ECN decapsulationGilad Naaman1-1/+1
In cases where the header straight after the tunnel header was another ethernet header (TEB), instead of the network header, the ECN decapsulation code would treat the ethernet header as if it was an IP header, resulting in mishandling and possible wrong drops or corruption of the IP header. In this case, ECT(1) is sent, so IP_ECN_decapsulate tries to copy it to the inner IPv4 header, and correct its checksum. The offset of the ECT bits in an IPv4 header corresponds to the lower 2 bits of the second octet of the destination MAC address in the ethernet header. The IPv4 checksum corresponds to end of the source address. In order to reproduce: $ ip netns add A $ ip netns add B $ ip -n A link add _v0 type veth peer name _v1 netns B $ ip -n A link set _v0 up $ ip -n A addr add dev _v0 10.254.3.1/24 $ ip -n A route add default dev _v0 scope global $ ip -n B link set _v1 up $ ip -n B addr add dev _v1 10.254.1.6/24 $ ip -n B route add default dev _v1 scope global $ ip -n B link add gre1 type gretap local 10.254.1.6 remote 10.254.3.1 key 0x49000000 $ ip -n B link set gre1 up # Now send an IPv4/GRE/Eth/IPv4 frame where the outer header has ECT(1), # and the inner header has no ECT bits set: $ cat send_pkt.py #!/usr/bin/env python3 from scapy.all import * pkt = IP(b'E\x01\x00\xa7\x00\x00\x00\x00@/`%\n\xfe\x03\x01\n\xfe\x01\x06 \x00eXI\x00' b'\x00\x00\x18\xbe\x92\xa0\xee&\x18\xb0\x92\xa0l&\x08\x00E\x00\x00}\x8b\x85' b'@\x00\x01\x01\xe4\xf2\x82\x82\x82\x01\x82\x82\x82\x02\x08\x00d\x11\xa6\xeb' b'3\x1e\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12' b'\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./01234' b'56789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ') send(pkt) $ sudo ip netns exec B tcpdump -neqlllvi gre1 icmp & ; sleep 1 $ sudo ip netns exec A python3 send_pkt.py In the original packet, the source/destinatio MAC addresses are dst=18:be:92:a0:ee:26 src=18:b0:92:a0:6c:26 In the received packet, they are dst=18:bd:92:a0:ee:26 src=18:b0:92:a0:6c:27 Thanks to Lahav Schlesinger <lschlesinger@drivenets.com> and Isaac Garzon <isaac@speed.io> for helping me pinpoint the origin. Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040") Cc: David S. Miller <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23tipc: fix sleeping in tipc accept routineHoang Le1-5/+4
The release_sock() is blocking function, it would change the state after sleeping. In order to evaluate the stated condition outside the socket lock context, switch to use wait_woken() instead. Fixes: 6398e23cdb1d8 ("tipc: standardize accept routine") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23tipc: fix implicit-connect for SYN+Xin Long1-8/+13
For implicit-connect, when it's either SYN- or SYN+, an ACK should be sent back to the client immediately. It's not appropriate for the client to enter established state only after receiving data from the server. On client side, after the SYN is sent out, tipc_wait_for_connect() should be called to wait for the ACK if timeout is set. This patch also restricts __tipc_sendstream() to call __sendmsg() only when it's in TIPC_OPEN state, so that the client can program in a single loop doing both connecting and data sending like: for (...) sendmsg(dest, buf); This makes the implicit-connect more implicit. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23netfilter: nfnl_hook: fix unused variable warningArnd Bergmann1-0/+2
The only user of this variable is in an #ifdef: net/netfilter/nfnetlink_hook.c: In function 'nfnl_hook_entries_head': net/netfilter/nfnetlink_hook.c:177:28: error: unused variable 'netdev' [-Werror=unused-variable] Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23netfilter: nft_nat: allow to specify layer 4 protocol NAT onlyPablo Neira Ayuso1-1/+3
nft_nat reports a bogus EAFNOSUPPORT if no layer 3 information is specified. Fixes: d07db9884a5f ("netfilter: nf_tables: introduce nft_validate_register_load()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23netfilter: conntrack: adjust stop timestamp to real expiry valueFlorian Westphal1-1/+6
In case the entry is evicted via garbage collection there is delay between the timeout value and the eviction event. This adjusts the stop value based on how much time has passed. Fixes: b87a2f9199ea82 ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23netfilter: nft_last: avoid possible false sharingPablo Neira Ayuso1-7/+13
Use the idiom described in: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance Moreover, prevent a compiler optimization. Fixes: 836382dc2471 ("netfilter: nf_tables: add last expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23netfilter: flowtable: avoid possible false sharingPablo Neira Ayuso1-1/+5
The flowtable follows the same timeout approach as conntrack, use the same idiom as in cc16921351d8 ("netfilter: conntrack: avoid same-timeout update") but also include the fix provided by e37542ba111f ("netfilter: conntrack: avoid possible false sharing"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23cfg80211: Fix possible memory leak in function cfg80211_bss_updateNguyen Dinh Phi1-4/+2
When we exceed the limit of BSS entries, this function will free the new entry, however, at this time, it is the last door to access the inputed ies, so these ies will be unreferenced objects and cause memory leak. Therefore we should free its ies before deallocating the new entry, beside of dropping it from hidden_list. Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Link: https://lore.kernel.org/r/20210628132334.851095-1-phind.uet@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23nl80211: limit band information in non-split dataJohannes Berg1-1/+4
In non-split data, we shouldn't be adding S1G and 6 GHz data (or future bands) since we're really close to the 4k message size limit. Remove those bands, any modern userspace that can use S1G or 6 GHz should already be using split dumps, and if not then it needs to update. Link: https://lore.kernel.org/r/20210712215329.31444162a2c2.I5555312e4a074c84f8b4e7ad79dc4d1fbfc5126c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23mac80211: fix enabling 4-address mode on a sta vif after assocFelix Fietkau3-2/+23
Notify the driver about the 4-address mode change and also send a nulldata packet to the AP to notify it about the change Fixes: 1ff4e8f2dec8 ("mac80211: notify the driver when a sta uses 4-address mode") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210702050111.47546-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23mac80211: fix starting aggregation sessions on mesh interfacesFelix Fietkau1-25/+32
The logic for starting aggregation sessions was recently moved from minstrel_ht to mac80211, into the subif tx handler just after the sta lookup. Unfortunately this didn't work for mesh interfaces, since the sta lookup is deferred until a much later point in time on those. Fix this by also calling the aggregation check right after the deferred sta lookup. Fixes: 08a46c642001 ("mac80211: move A-MPDU session check from minstrel_ht to mac80211") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210629112853.29785-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23mac80211: Do not strip skb headroom on monitor framesJohan Almbladh1-1/+2
When a monitor interface is present together with other interfaces, a received skb is copied and received on the monitor netdev. Before, the copied skb was allocated with exactly the amount of space needed for the radiotap header, resulting in an skb without any headroom at all being received on the monitor netdev. With the introduction of eBPF and XDP in the kernel, skbs may be processed by custom eBPF programs. However, since the skb cannot be reallocated in the eBPF program, no more data or headers can be pushed. The old code made sure the final headroom was zero regardless of the value of NET_SKB_PAD, so increasing that constant would have no effect. Now we allocate monitor skb copies with a headroom of NET_SKB_PAD bytes before the radiotap header. Monitor interfaces now behave in the same way as other netdev interfaces that honor the NET_SKB_PAD constant. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Link: https://lore.kernel.org/r/20210628123713.2070753-1-johan.almbladh@anyfinetworks.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-22net: sched: cls_api: Fix the the wrong parameterYajun Deng1-1/+1
The 4th parameter in tc_chain_notify() should be flags rather than seq. Let's change it back correctly. Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21net: dsa: tag_ksz: dont let the hardware process the layer 4 checksumLino Sanfilippo1-0/+9
If the checksum calculation is offloaded to the network device (e.g due to NETIF_F_HW_CSUM inherited from the DSA master device), the calculated layer 4 checksum is incorrect. This is since the DSA tag which is placed after the layer 4 data is considered as being part of the daa and thus errorneously included into the checksum calculation. To avoid this, always calculate the layer 4 checksum in software. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21net: dsa: ensure linearized SKBs in case of tail taggersLino Sanfilippo1-5/+9
The function skb_put() that is used by tail taggers to make room for the DSA tag must only be called for linearized SKBS. However in case that the slave device inherited features like NETIF_F_HW_SG or NETIF_F_FRAGLIST the SKB passed to the slaves transmit function may not be linearized. Avoid those SKBs by clearing the NETIF_F_HW_SG and NETIF_F_FRAGLIST flags for tail taggers. Furthermore since the tagging protocol can be changed at runtime move the code for setting up the slaves features into dsa_slave_setup_tagger(). Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21tcp: disable TFO blackhole logic by defaultWei Wang2-2/+9
Multiple complaints have been raised from the TFO users on the internet stating that the TFO blackhole logic is too aggressive and gets falsely triggered too often. (e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/) Considering that most middleboxes no longer drop TFO packets, we decide to disable the blackhole logic by setting /proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default. Fixes: cf1ef3f0719b4 ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21sctp: do not update transport pathmtu if SPP_PMTUD_ENABLE is not setXin Long1-2/+2
Currently, in sctp_packet_config(), sctp_transport_pmtu_check() is called to update transport pathmtu with dst's mtu when dst's mtu has been changed by non sctp stack like xfrm. However, this should only happen when SPP_PMTUD_ENABLE is set, no matter where dst's mtu changed. This patch is to fix by checking SPP_PMTUD_ENABLE flag before calling sctp_transport_pmtu_check(). Thanks Jacek for reporting and looking into this issue. v1->v2: - add the missing "{" to fix the build error. Fixes: 69fec325a643 ('Revert "sctp: remove sctp_transport_pmtu_check"') Reported-by: Jacek Szafraniec <jacek.szafraniec@nokia.com> Tested-by: Jacek Szafraniec <jacek.szafraniec@nokia.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21udp: check encap socket in __udp_lib_errVadim Fedorenko2-12/+38
Commit d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") added checks for encapsulated sockets but it broke cases when there is no implementation of encap_err_lookup for encapsulation, i.e. ESP in UDP encapsulation. Fix it by calling encap_err_lookup only if socket implements this method otherwise treat it as legal socket. Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21sctp: update active_key for asoc when old key is being replacedXin Long1-0/+2
syzbot reported a call trace: BUG: KASAN: use-after-free in sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112 Call Trace: sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112 sctp_set_owner_w net/sctp/socket.c:131 [inline] sctp_sendmsg_to_asoc+0x152e/0x2180 net/sctp/socket.c:1865 sctp_sendmsg+0x103b/0x1d30 net/sctp/socket.c:2027 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:821 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:723 This is an use-after-free issue caused by not updating asoc->shkey after it was replaced in the key list asoc->endpoint_shared_keys, and the old key was freed. This patch is to fix by also updating active_key for asoc when old key is being replaced with a new one. Note that this issue doesn't exist in sctp_auth_del_key_id(), as it's not allowed to delete the active_key from the asoc. Fixes: 1b1e0bc99474 ("sctp: add refcnt support for sh_key") Reported-by: syzbot+b774577370208727d12b@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21net/xfrm/compat: Copy xfrm_spdattr_type_t atributesDmitry Safonov1-5/+44
The attribute-translator has to take in mind maxtype, that is xfrm_link::nla_max. When it is set, attributes are not of xfrm_attr_type_t. Currently, they can be only XFRMA_SPD_MAX (message XFRM_MSG_NEWSPDINFO), their UABI is the same for 64/32-bit, so just copy them. Thanks to YueHaibing for reporting this: In xfrm_user_rcv_msg_compat() if maxtype is not zero and less than XFRMA_MAX, nlmsg_parse_deprecated() do not initialize attrs array fully. xfrm_xlate32() will access uninit 'attrs[i]' while iterating all attrs array. KASAN: probably user-memory-access in range [0x0000000041b58ab0-0x0000000041b58ab7] CPU: 0 PID: 15799 Comm: syz-executor.2 Tainted: G W 5.14.0-rc1-syzkaller #0 RIP: 0010:nla_type include/net/netlink.h:1130 [inline] RIP: 0010:xfrm_xlate32_attr net/xfrm/xfrm_compat.c:410 [inline] RIP: 0010:xfrm_xlate32 net/xfrm/xfrm_compat.c:532 [inline] RIP: 0010:xfrm_user_rcv_msg_compat+0x5e5/0x1070 net/xfrm/xfrm_compat.c:577 [...] Call Trace: xfrm_user_rcv_msg+0x556/0x8b0 net/xfrm/xfrm_user.c:2774 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2824 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:702 [inline] Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator") Cc: <stable@kernel.org> Reported-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-20ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptionsPaolo Abeni1-1/+1
While running the self-tests on a KASAN enabled kernel, I observed a slab-out-of-bounds splat very similar to the one reported in commit 821bbf79fe46 ("ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions"). We additionally need to take care of fib6_metrics initialization failure when the caller provides an nh. The fix is similar, explicitly free the route instead of calling fib6_info_release on a half-initialized object. Fixes: f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net/sched: act_skbmod: Skip non-Ethernet packetsPeilin Ye1-4/+8
Currently tcf_skbmod_act() assumes that packets use Ethernet as their L2 protocol, which is not always the case. As an example, for CAN devices: $ ip link add dev vcan0 type vcan $ ip link set up vcan0 $ tc qdisc add dev vcan0 root handle 1: htb $ tc filter add dev vcan0 parent 1: protocol ip prio 10 \ matchall action skbmod swap mac Doing the above silently corrupts all the packets. Do not perform skbmod actions for non-Ethernet packets. Fixes: 86da71b57383 ("net_sched: Introduce skbmod action") Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: bridge: do not replay fdb entries pointing towards the bridge twiceVladimir Oltean1-1/+1
This simple script: ip link add br0 type bridge ip link set swp2 master br0 ip link set br0 address 00:01:02:03:04:05 ip link del br0 produces this result on a DSA switch: [ 421.306399] br0: port 1(swp2) entered blocking state [ 421.311445] br0: port 1(swp2) entered disabled state [ 421.472553] device swp2 entered promiscuous mode [ 421.488986] device swp2 left promiscuous mode [ 421.493508] br0: port 1(swp2) entered disabled state [ 421.886107] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 1 from fdb: -ENOENT [ 421.894374] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 0 from fdb: -ENOENT [ 421.943982] br0: port 1(swp2) entered blocking state [ 421.949030] br0: port 1(swp2) entered disabled state [ 422.112504] device swp2 entered promiscuous mode A very simplified view of what happens is: (1) the bridge port is created, and the bridge device inherits its MAC address (2) when joining, the bridge port (DSA) requests a replay of the addition of all FDB entries towards this bridge port and towards the bridge device itself. In fact, DSA calls br_fdb_replay() twice: br_fdb_replay(br, brport_dev); br_fdb_replay(br, br); DSA uses reference counting for the FDB entries. So the MAC address of the bridge is simply kept with refcount 2. When the bridge port leaves under normal circumstances, everything cancels out since the replay of the FDB entry deletion is also done twice per VLAN. (3) when the bridge MAC address changes, switchdev is notified of the deletion of the old address and of the insertion of the new one. But the old address does not really go away, since it had refcount 2, and the new address is added "only" with refcount 1. (4) when the bridge port leaves now, it will replay a deletion of the FDB entries pointing towards the bridge twice. Then DSA will complain that it can't delete something that no longer exists. It is clear that the problem is that the FDB entries towards the bridge are replayed too many times, so let's fix that problem. Fixes: 63c51453c82c ("net: dsa: replay the local bridge FDB entries pointing to the bridge dev too") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210719093916.4099032-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-20ipv6: ip6_finish_output2: set sk into newly allocated nskbVasily Averin1-1/+1
skb_set_owner_w() should set sk not to old skb but to new nskb. Fixes: 5796015fa968 ("ipv6: allocate enough headroom in ip6_finish_output2()") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Link: https://lore.kernel.org/r/70c0744f-89ae-1869-7e3e-4fa292158f4b@virtuozzo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-19net/tcp_fastopen: fix data races around tfo_active_disable_stampEric Dumazet1-3/+16
tfo_active_disable_stamp is read and written locklessly. We need to annotate these accesses appropriately. Then, we need to perform the atomic_inc(tfo_active_disable_times) after the timestamp has been updated, and thus add barriers to make sure tcp_fastopen_active_should_disable() wont read a stale timestamp. Fixes: cf1ef3f0719b ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-18netrom: Decrease sock refcount when sock timers expireNguyen Dinh Phi1-9/+11
Commit 63346650c1a9 ("netrom: switch to sock timer API") switched to use sock timer API. It replaces mod_timer() by sk_reset_timer(), and del_timer() by sk_stop_timer(). Function sk_reset_timer() will increase the refcount of sock if it is called on an inactive timer, hence, in case the timer expires, we need to decrease the refcount ourselves in the handler, otherwise, the sock refcount will be unbalanced and the sock will never be freed. Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Reported-by: syzbot+10f1194569953b72f1ae@syzkaller.appspotmail.com Fixes: 63346650c1a9 ("netrom: switch to sock timer API") Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-18sctp: trim optlen when it's a huge value in sctp_setsockoptXin Long1-0/+4
After commit ca84bd058dae ("sctp: copy the optval from user space in sctp_setsockopt"), it does memory allocation in sctp_setsockopt with the optlen, and it would fail the allocation and return error if the optlen from user space is a huge value. This breaks some sockopts, like SCTP_HMAC_IDENT, SCTP_RESET_STREAMS and SCTP_AUTH_KEY, as when processing these sockopts before, optlen would be trimmed to a biggest value it needs when optlen is a huge value, instead of failing the allocation and returning error. This patch is to fix the allocation failure when it's a huge optlen from user space by trimming it to the biggest size sctp sockopt may need when necessary, and this biggest size is from sctp_setsockopt_reset_streams() for SCTP_RESET_STREAMS, which is bigger than those for SCTP_HMAC_IDENT and SCTP_AUTH_KEY. Fixes: ca84bd058dae ("sctp: copy the optval from user space in sctp_setsockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-18net: sched: fix memory leak in tcindex_partial_destroy_workPavel Skripkin1-1/+4
Syzbot reported memory leak in tcindex_set_parms(). The problem was in non-freed perfect hash in tcindex_partial_destroy_work(). In tcindex_set_parms() new tcindex_data is allocated and some fields from old one are copied to new one, but not the perfect hash. Since tcindex_partial_destroy_work() is the destroy function for old tcindex_data, we need to free perfect hash to avoid memory leak. Reported-and-tested-by: syzbot+f0bbb2287b8993d4fa74@syzkaller.appspotmail.com Fixes: 331b72922c5f ("net: sched: RCU cls_tcindex") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-18net: Fix zero-copy head len calculation.Pravin B Shelar1-1/+4
In some cases skb head could be locked and entire header data is pulled from skb. When skb_zerocopy() called in such cases, following BUG is triggered. This patch fixes it by copying entire skb in such cases. This could be optimized incase this is performance bottleneck. ---8<--- kernel BUG at net/core/skbuff.c:2961! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 0 Comm: swapper/2 Tainted: G OE 5.4.0-77-generic #86-Ubuntu Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:skb_zerocopy+0x37a/0x3a0 RSP: 0018:ffffbcc70013ca38 EFLAGS: 00010246 Call Trace: <IRQ> queue_userspace_packet+0x2af/0x5e0 [openvswitch] ovs_dp_upcall+0x3d/0x60 [openvswitch] ovs_dp_process_packet+0x125/0x150 [openvswitch] ovs_vport_receive+0x77/0xd0 [openvswitch] netdev_port_receive+0x87/0x130 [openvswitch] netdev_frame_hook+0x4b/0x60 [openvswitch] __netif_receive_skb_core+0x2b4/0xc90 __netif_receive_skb_one_core+0x3f/0xa0 __netif_receive_skb+0x18/0x60 process_backlog+0xa9/0x160 net_rx_action+0x142/0x390 __do_softirq+0xe1/0x2d6 irq_exit+0xae/0xb0 do_IRQ+0x5a/0xf0 common_interrupt+0xf/0xf Code that triggered BUG: int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen) { int i, j = 0; int plen = 0; /* length of skb->head fragment */ int ret; struct page *page; unsigned int offset; BUG_ON(!from->head_frag && !hlen); Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-17netfilter: nf_tables: fix audit memory leak in nf_tables_commitDongliang Mu1-0/+12
In nf_tables_commit, if nf_tables_commit_audit_alloc fails, it does not free the adp variable. Fix this by adding nf_tables_commit_audit_free which frees the linked list with the head node adl. backtrace: kmalloc include/linux/slab.h:591 [inline] kzalloc include/linux/slab.h:721 [inline] nf_tables_commit_audit_alloc net/netfilter/nf_tables_api.c:8439 [inline] nf_tables_commit+0x16e/0x1760 net/netfilter/nf_tables_api.c:8508 nfnetlink_rcv_batch+0x512/0xa80 net/netfilter/nfnetlink.c:562 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0x1fa/0x220 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x2c7/0x3e0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x36b/0x6b0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:702 [inline] sock_sendmsg+0x56/0x80 net/socket.c:722 Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: kernel test robot <lkp@intel.com> Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-16net: decnet: Fix sleeping inside in af_decnetYajun Deng1-15/+12
The release_sock() is blocking function, it would change the state after sleeping. use wait_woken() instead. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-16skbuff: Fix a potential race while recycling page_pool packetsIlias Apalodimas1-1/+12
As Alexander points out, when we are trying to recycle a cloned/expanded SKB we might trigger a race. The recycling code relies on the pp_recycle bit to trigger, which we carry over to cloned SKBs. If that cloned SKB gets expanded or if we get references to the frags, call skb_release_data() and overwrite skb->head, we are creating separate instances accessing the same page frags. Since the skb_release_data() will first try to recycle the frags, there's a potential race between the original and cloned SKB, since both will have the pp_recycle bit set. Fix this by explicitly those SKBs not recyclable. The atomic_sub_return effectively limits us to a single release case, and when we are calling skb_release_data we are also releasing the option to perform the recycling, or releasing the pages from the page pool. Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") Reported-by: Alexander Duyck <alexanderduyck@fb.com> Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller5-11/+26
Andrii Nakryiko says: ==================== pull-request: bpf 2021-07-15 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 5 day(s) which contain a total of 9 files changed, 37 insertions(+), 15 deletions(-). The main changes are: 1) Fix NULL pointer dereference in BPF_TEST_RUN for BPF_XDP_DEVMAP and BPF_XDP_CPUMAP programs, from Xuan Zhuo. 2) Fix use-after-free of net_device in XDP bpf_link, from Xuan Zhuo. 3) Follow-up fix to subprog poke descriptor use-after-free problem, from Daniel Borkmann and John Fastabend. 4) Fix out-of-range array access in s390 BPF JIT backend, from Colin Ian King. 5) Fix memory leak in BPF sockmap, from John Fastabend. 6) Fix for sockmap to prevent proc stats reporting bug, from John Fastabend and Jakub Sitnicki. 7) Fix NULL pointer dereference in bpftool, from Tobias Klauser. 8) AF_XDP documentation fixes, from Baruch Siach. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-15net: fix uninit-value in caif_seqpkt_sendmsgZiyang Xuan1-1/+2
When nr_segs equal to zero in iovec_from_user, the object msg->msg_iter.iov is uninit stack memory in caif_seqpkt_sendmsg which is defined in ___sys_sendmsg. So we cann't just judge msg->msg_iter.iov->base directlly. We can use nr_segs to judge msg in caif_seqpkt_sendmsg whether has data buffers. ===================================================== BUG: KMSAN: uninit-value in caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2343 ___sys_sendmsg net/socket.c:2397 [inline] __sys_sendmmsg+0x808/0xc90 net/socket.c:2480 __compat_sys_sendmmsg net/compat.c:656 [inline] Reported-by: syzbot+09a5d591c1f98cf5efcb@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=1ace85e8fc9b0d5a45c08c2656c3e91762daa9b8 Fixes: bece7b2398d0 ("caif: Rewritten socket implementation") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-15bpf, sockmap, udp: sk_prot needs inuse_idx set for proc statsJakub Sitnicki1-1/+1
The proc socket stats use sk_prot->inuse_idx value to record inuse sock stats. We currently do not set this correctly from sockmap side. The result is reading sock stats '/proc/net/sockstat' gives incorrect values. The socket counter is incremented correctly, but because we don't set the counter correctly when we replace sk_prot we may omit the decrement. To get the correct inuse_idx value move the core_initcall that initializes the UDP proto handlers to late_initcall. This way it is initialized after UDP has the chance to assign the inuse_idx value from the register protocol handler. Fixes: edc6741cc660 ("bpf: Add sockmap hooks for UDP sockets") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210714154750.528206-1-jakub@cloudflare.com
2021-07-15bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc statsJohn Fastabend1-1/+1
The proc socket stats use sk_prot->inuse_idx value to record inuse sock stats. We currently do not set this correctly from sockmap side. The result is reading sock stats '/proc/net/sockstat' gives incorrect values. The socket counter is incremented correctly, but because we don't set the counter correctly when we replace sk_prot we may omit the decrement. To get the correct inuse_idx value move the core_initcall that initializes the TCP proto handlers to late_initcall. This way it is initialized after TCP has the chance to assign the inuse_idx value from the register protocol handler. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/bpf/20210712195546.423990-3-john.fastabend@gmail.com
2021-07-15bpf, sockmap: Fix potential memory leak on unlikely error caseJohn Fastabend1-5/+11
If skb_linearize is needed and fails we could leak a msg on the error handling. To fix ensure we kfree the msg block before returning error. Found during code review. Fixes: 4363023d2668e ("bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/bpf/20210712195546.423990-2-john.fastabend@gmail.com
2021-07-15net_sched: introduce tracepoint trace_qdisc_enqueue()Qitao Xu1-4/+16
Tracepoint trace_qdisc_enqueue() is introduced to trace skb at the entrance of TC layer on TX side. This is similar to trace_qdisc_dequeue(): 1. For both we only trace successful cases. The failure cases can be traced via trace_kfree_skb(). 2. They are called at entrance or exit of TC layer, not for each ->enqueue() or ->dequeue(). This is intentional, because we want to make trace_qdisc_enqueue() symmetric to trace_qdisc_dequeue(), which is easier to use. The return value of qdisc_enqueue() is not interesting here, we have Qdisc's drop packets in ->dequeue(), it is impossible to trace them even if we have the return value, the only way to trace them is tracing kfree_skb(). We only add information we need to trace ring buffer. If any other information is needed, it is easy to extend it without breaking ABI, see commit 3dd344ea84e1 ("net: tracepoint: exposing sk_family in all tcp:tracepoints"). Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Qitao Xu <qitao.xu@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-14Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds61-174/+671
Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ...
2021-07-13net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()Vladimir Oltean1-2/+2
This was not caught because there is no switch driver which implements the .port_bridge_join but not .port_bridge_leave method, but it should nonetheless be fixed, as in certain conditions (driver development) it might lead to NULL pointer dereference. Fixes: f66a6a69f97a ("net: dsa: permit cross-chip bridging between all trees in the system") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13net: Use nlmsg_unicast() instead of netlink_unicast()Yajun Deng8-27/+13
It has 'if (err >0 )' statement in nlmsg_unicast(), so use nlmsg_unicast() instead of netlink_unicast(), this looks more concise. v2: remove the change in netfilter. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-13xdp, net: Fix use-after-free in bpf_xdp_link_releaseXuan Zhuo1-4/+10
The problem occurs between dev_get_by_index() and dev_xdp_attach_link(). At this point, dev_xdp_uninstall() is called. Then xdp link will not be detached automatically when dev is released. But link->dev already points to dev, when xdp link is released, dev will still be accessed, but dev has been released. dev_get_by_index() | link->dev = dev | | rtnl_lock() | unregister_netdevice_many() | dev_xdp_uninstall() | rtnl_unlock() rtnl_lock(); | dev_xdp_attach_link() | rtnl_unlock(); | | netdev_run_todo() // dev released bpf_xdp_link_release() | /* access dev. | use-after-free */ | [ 45.966867] BUG: KASAN: use-after-free in bpf_xdp_link_release+0x3b8/0x3d0 [ 45.967619] Read of size 8 at addr ffff00000f9980c8 by task a.out/732 [ 45.968297] [ 45.968502] CPU: 1 PID: 732 Comm: a.out Not tainted 5.13.0+ #22 [ 45.969222] Hardware name: linux,dummy-virt (DT) [ 45.969795] Call trace: [ 45.970106] dump_backtrace+0x0/0x4c8 [ 45.970564] show_stack+0x30/0x40 [ 45.970981] dump_stack_lvl+0x120/0x18c [ 45.971470] print_address_description.constprop.0+0x74/0x30c [ 45.972182] kasan_report+0x1e8/0x200 [ 45.972659] __asan_report_load8_noabort+0x2c/0x50 [ 45.973273] bpf_xdp_link_release+0x3b8/0x3d0 [ 45.973834] bpf_link_free+0xd0/0x188 [ 45.974315] bpf_link_put+0x1d0/0x218 [ 45.974790] bpf_link_release+0x3c/0x58 [ 45.975291] __fput+0x20c/0x7e8 [ 45.975706] ____fput+0x24/0x30 [ 45.976117] task_work_run+0x104/0x258 [ 45.976609] do_notify_resume+0x894/0xaf8 [ 45.977121] work_pending+0xc/0x328 [ 45.977575] [ 45.977775] The buggy address belongs to the page: [ 45.978369] page:fffffc00003e6600 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f998 [ 45.979522] flags: 0x7fffe0000000000(node=0|zone=0|lastcpupid=0x3ffff) [ 45.980349] raw: 07fffe0000000000 fffffc00003e6708 ffff0000dac3c010 0000000000000000 [ 45.981309] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 45.982259] page dumped because: kasan: bad access detected [ 45.982948] [ 45.983153] Memory state around the buggy address: [ 45.983753] ffff00000f997f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 45.984645] ffff00000f998000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.985533] >ffff00000f998080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.986419] ^ [ 45.987112] ffff00000f998100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.988006] ffff00000f998180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.988895] ================================================================== [ 45.989773] Disabling lock debugging due to kernel taint [ 45.990552] Kernel panic - not syncing: panic_on_warn set ... [ 45.991166] CPU: 1 PID: 732 Comm: a.out Tainted: G B 5.13.0+ #22 [ 45.991929] Hardware name: linux,dummy-virt (DT) [ 45.992448] Call trace: [ 45.992753] dump_backtrace+0x0/0x4c8 [ 45.993208] show_stack+0x30/0x40 [ 45.993627] dump_stack_lvl+0x120/0x18c [ 45.994113] dump_stack+0x1c/0x34 [ 45.994530] panic+0x3a4/0x7d8 [ 45.994930] end_report+0x194/0x198 [ 45.995380] kasan_report+0x134/0x200 [ 45.995850] __asan_report_load8_noabort+0x2c/0x50 [ 45.996453] bpf_xdp_link_release+0x3b8/0x3d0 [ 45.997007] bpf_link_free+0xd0/0x188 [ 45.997474] bpf_link_put+0x1d0/0x218 [ 45.997942] bpf_link_release+0x3c/0x58 [ 45.998429] __fput+0x20c/0x7e8 [ 45.998833] ____fput+0x24/0x30 [ 45.999247] task_work_run+0x104/0x258 [ 45.999731] do_notify_resume+0x894/0xaf8 [ 46.000236] work_pending+0xc/0x328 [ 46.000697] SMP: stopping secondary CPUs [ 46.001226] Dumping ftrace buffer: [ 46.001663] (ftrace buffer empty) [ 46.002110] Kernel Offset: disabled [ 46.002545] CPU features: 0x00000001,23202c00 [ 46.003080] Memory Limit: none Fixes: aa8d3a716b59db6c ("bpf, xdp: Add bpf_link-based XDP attachment API") Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210710031635.41649-1-xuanzhuo@linux.alibaba.com
2021-07-12ipv6: allocate enough headroom in ip6_finish_output2()Vasily Averin1-0/+28
When TEE target mirrors traffic to another interface, sk_buff may not have enough headroom to be processed correctly. ip_finish_output2() detect this situation for ipv4 and allocates new skb with enogh headroom. However ipv6 lacks this logic in ip_finish_output2 and it leads to skb_under_panic: skbuff: skb_under_panic: text:ffffffffc0866ad4 len:96 put:24 head:ffff97be85e31800 data:ffff97be85e317f8 tail:0x58 end:0xc0 dev:gre0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:110! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 393 Comm: kworker/2:2 Tainted: G OE 5.13.0 #13 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:skb_panic+0x48/0x4a Call Trace: skb_push.cold.111+0x10/0x10 ipgre_header+0x24/0xf0 [ip_gre] neigh_connected_output+0xae/0xf0 ip6_finish_output2+0x1a8/0x5a0 ip6_output+0x5c/0x110 nf_dup_ipv6+0x158/0x1000 [nf_dup_ipv6] tee_tg6+0x2e/0x40 [xt_TEE] ip6t_do_table+0x294/0x470 [ip6_tables] nf_hook_slow+0x44/0xc0 nf_hook.constprop.34+0x72/0xe0 ndisc_send_skb+0x20d/0x2e0 ndisc_send_ns+0xd1/0x210 addrconf_dad_work+0x3c8/0x540 process_one_work+0x1d1/0x370 worker_thread+0x30/0x390 kthread+0x116/0x130 ret_from_fork+0x22/0x30 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-12bpf, test: fix NULL pointer dereference on invalid expected_attach_typeXuan Zhuo1-0/+3
These two types of XDP progs (BPF_XDP_DEVMAP, BPF_XDP_CPUMAP) will not be executed directly in the driver, therefore we should also not directly run them from here. To run in these two situations, there must be further preparations done, otherwise these may cause a kernel panic. For more details, see also dev_xdp_attach(). [ 46.982479] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 46.984295] #PF: supervisor read access in kernel mode [ 46.985777] #PF: error_code(0x0000) - not-present page [ 46.987227] PGD 800000010dca4067 P4D 800000010dca4067 PUD 10dca6067 PMD 0 [ 46.989201] Oops: 0000 [#1] SMP PTI [ 46.990304] CPU: 7 PID: 562 Comm: a.out Not tainted 5.13.0+ #44 [ 46.992001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/24 [ 46.995113] RIP: 0010:___bpf_prog_run+0x17b/0x1710 [ 46.996586] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02 [ 47.001562] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246 [ 47.003115] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000 [ 47.005163] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98 [ 47.007135] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff [ 47.009171] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98 [ 47.011172] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8 [ 47.013244] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 [ 47.015705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.017475] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0 [ 47.019558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.021595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.023574] PKRU: 55555554 [ 47.024571] Call Trace: [ 47.025424] __bpf_prog_run32+0x32/0x50 [ 47.026296] ? printk+0x53/0x6a [ 47.027066] ? ktime_get+0x39/0x90 [ 47.027895] bpf_test_run.cold.28+0x23/0x123 [ 47.028866] ? printk+0x53/0x6a [ 47.029630] bpf_prog_test_run_xdp+0x149/0x1d0 [ 47.030649] __sys_bpf+0x1305/0x23d0 [ 47.031482] __x64_sys_bpf+0x17/0x20 [ 47.032316] do_syscall_64+0x3a/0x80 [ 47.033165] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 47.034254] RIP: 0033:0x7f04a51364dd [ 47.035133] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 48 [ 47.038768] RSP: 002b:00007fff8f9fc518 EFLAGS: 00000213 ORIG_RAX: 0000000000000141 [ 47.040344] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a51364dd [ 47.041749] RDX: 0000000000000048 RSI: 0000000020002a80 RDI: 000000000000000a [ 47.043171] RBP: 00007fff8f9fc530 R08: 0000000002049300 R09: 0000000020000100 [ 47.044626] R10: 0000000000000004 R11: 0000000000000213 R12: 0000000000401070 [ 47.046088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 47.047579] Modules linked in: [ 47.048318] CR2: 0000000000000000 [ 47.049120] ---[ end trace 7ad34443d5be719a ]--- [ 47.050273] RIP: 0010:___bpf_prog_run+0x17b/0x1710 [ 47.051343] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02 [ 47.054943] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246 [ 47.056068] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000 [ 47.057522] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98 [ 47.058961] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff [ 47.060390] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98 [ 47.061803] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8 [ 47.063249] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 [ 47.065070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.066307] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0 [ 47.067747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.069217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.070652] PKRU: 55555554 [ 47.071318] Kernel panic - not syncing: Fatal exception [ 47.072854] Kernel Offset: disabled [ 47.073683] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry") Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: David Ahern <dsahern@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210708080409.73525-1-xuanzhuo@linux.alibaba.com
2021-07-11net: bridge: multicast: fix MRD advertisement router port marking raceNikolay Aleksandrov1-0/+4
When an MRD advertisement is received on a bridge port with multicast snooping enabled, we mark it as a router port automatically, that includes adding that port to the router port list. The multicast lock protects that list, but it is not acquired in the MRD advertisement case leading to a race condition, we need to take it to fix the race. Cc: stable@vger.kernel.org Cc: linus.luessing@c0d3.blue Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>