aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-06-02net: ipv4: provide __rcu annotation for ifa_listFlorian Westphal1-15/+6
ifa_list is protected by rcu, yet code doesn't reflect this. Add the __rcu annotations and fix up all places that are now reported by sparse. I've done this in the same commit to not add intermediate patches that result in new warnings. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02net: inetdevice: provide replacement iterators for in_ifaddr walkFlorian Westphal1-1/+9
The ifa_list is protected either by rcu or rtnl lock, but the current iterators do not account for this. This adds two iterators as replacement, a later patch in the series will update them with the needed rcu/rtnl_dereference calls. Its not done in this patch yet to avoid sparse warnings -- the fields lack the proper __rcu annotation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02Merge tag 'isdn-removal' of https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playgroundDavid S. Miller12-1717/+0
Arnd Bergmann says: ==================== isdn: deprecate non-mISDN drivers When isdn4linux came up in the context of another patch series, I remembered that we had discussed removing it a while ago. It turns out that the suggestion from Karsten Keil wa to remove I4L in 2018 after the last public ISDN networks are shut down. This has happened now (with a very small number of exceptions), so I guess it's time to try again. We currently have three ISDN stacks in the kernel: the original isdn4linux (with the hisax driver), the newer CAPI (with four drivers), and finally the mISDN stack (supporting roughly the same hardware as hisax). As far as I can tell, anyone using ISDN with mainline kernel drivers in the past few years uses mISDN, and this is typically used for voice-only PBX installations that don't require a public network. The older stacks support additional features for data networks, but those typically make no sense any more if there is no network to connect to. My proposal for this time is to kill off isdn4linux entirely, as it seems to have been unusable for quite a while. This code has been abandoned for many years and it does cause problems for treewide maintenance as it tends to do everything that we try to stop doing. Birger Harzenetter mentioned that is is still using i4l in order to make use of the 'divert' feature that is not part of mISDN, but has otherwise moved on to mISDN for normal operation, like apparently everyone else. CAPI in turn is not quite as obsolete, but two of the drivers (avm and hysdn) don't seem to be used at all, while another one (gigaset) will stop being maintained as Paul Bolle is no longer able to test it after the network gets shut down in September. All three are now moved into drivers/staging to let others speak up in case there are remaining users. This leaves Bluetooth CMTP as the only remaining user of CAPI, but Marcel Holtmann wishes to keep maintaining it. For the discussion on version 1, see [2] Unfortunately, Karsten Keil as the maintainer has not participated in the discussion. Arnd [1] https://patchwork.kernel.org/patch/8484861/#17900371 [2] https://listserv.isdn4linux.de/pipermail/isdn4linux/2019-April/thread.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller5-10/+19
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset container Netfilter/IPVS update for net-next: 1) Add UDP tunnel support for ICMP errors in IPVS. Julian Anastasov says: This patchset is a followup to the commit that adds UDP/GUE tunnel: "ipvs: allow tunneling with gue encapsulation". What we do is to put tunnel real servers in hash table (patch 1), add function to lookup tunnels (patch 2) and use it to strip the embedded tunnel headers from ICMP errors (patch 3). 2) Extend xt_owner to match for supplementary groups, from Lukasz Pawelczyk. 3) Remove unused oif field in flow_offload_tuple object, from Taehee Yoo. 4) Release basechain counters from workqueue to skip synchronize_rcu() call. From Florian Westphal. 5) Replace skb_make_writable() by skb_ensure_writable(). Patchset from Florian Westphal. 6) Checksum support for gue encapsulation in IPVS, from Jacky Hu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller6-18/+160
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-05-31 The following pull-request contains BPF updates for your *net-next* tree. Lots of exciting new features in the first PR of this developement cycle! The main changes are: 1) misc verifier improvements, from Alexei. 2) bpftool can now convert btf to valid C, from Andrii. 3) verifier can insert explicit ZEXT insn when requested by 32-bit JITs. This feature greatly improves BPF speed on 32-bit architectures. From Jiong. 4) cgroups will now auto-detach bpf programs. This fixes issue of thousands bpf programs got stuck in dying cgroups. From Roman. 5) new bpf_send_signal() helper, from Yonghong. 6) cgroup inet skb programs can signal CN to the stack, from Lawrence. 7) miscellaneous cleanups, from many developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31bpf: move memory size checks to bpf_map_charge_init()Roman Gushchin1-1/+1
Most bpf map types doing similar checks and bytes to pages conversion during memory allocation and charging. Let's unify these checks by moving them into bpf_map_charge_init(). Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-31bpf: rework memlock-based memory accounting for mapsRoman Gushchin1-1/+4
In order to unify the existing memlock charging code with the memcg-based memory accounting, which will be added later, let's rework the current scheme. Currently the following design is used: 1) .alloc() callback optionally checks if the allocation will likely succeed using bpf_map_precharge_memlock() 2) .alloc() performs actual allocations 3) .alloc() callback calculates map cost and sets map.memory.pages 4) map_create() calls bpf_map_init_memlock() which sets map.memory.user and performs actual charging; in case of failure the map is destroyed <map is in use> 1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which performs uncharge and releases the user 2) .map_free() callback releases the memory The scheme can be simplified and made more robust: 1) .alloc() calculates map cost and calls bpf_map_charge_init() 2) bpf_map_charge_init() sets map.memory.user and performs actual charge 3) .alloc() performs actual allocations <map is in use> 1) .map_free() callback releases the memory 2) bpf_map_charge_finish() performs uncharge and releases the user The new scheme also allows to reuse bpf_map_charge_init()/finish() functions for memcg-based accounting. Because charges are performed before actual allocations and uncharges after freeing the memory, no bogus memory pressure can be created. In cases when the map structure is not available (e.g. it's not created yet, or is already destroyed), on-stack bpf_map_memory structure is used. The charge can be transferred with the bpf_map_charge_move() function. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-31bpf: group memory related fields in struct bpf_map_memoryRoman Gushchin1-3/+7
Group "user" and "pages" fields of bpf_map into the bpf_map_memory structure. Later it can be extended with "memcg" and other related information. The main reason for a such change (beside cosmetics) is to pass bpf_map_memory structure to charging functions before the actual allocation of bpf_map. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-31bpf: cgroup inet skb programs can return 0 to 3brakmo1-1/+2
Allows cgroup inet skb programs to return values in the range [0, 3]. The second bit is used to deterine if congestion occurred and higher level protocol should decrease rate. E.g. TCP would call tcp_enter_cwr() The bpf_prog must set expected_attach_type to BPF_CGROUP_INET_EGRESS at load time if it uses the new return values (i.e. 2 or 3). The expected_attach_type is currently not enforced for BPF_PROG_TYPE_CGROUP_SKB. e.g Meaning the current bpf_prog with expected_attach_type setting to BPF_CGROUP_INET_EGRESS can attach to BPF_CGROUP_INET_INGRESS. Blindly enforcing expected_attach_type will break backward compatibility. This patch adds a enforce_expected_attach_type bit to only enforce the expected_attach_type when it uses the new return value. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-31bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAYbrakmo1-0/+50
Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can request cwr for TCP packets. Current cgroup skb programs can only return 0 or 1 (0 to drop the packet. This macro changes the behavior so the low order bit indicates whether the packet should be dropped (0) or not (1) and the next bit is used for congestion notification (cn). Hence, new allowed return values of CGROUP EGRESS BPF programs are: 0: drop packet 1: keep packet 2: drop packet and call cwr 3: keep packet and call cwr This macro then converts it to one of NET_XMIT values or -EPERM that has the effect of dropping the packet with no cn. 0: NET_XMIT_SUCCESS skb should be transmitted (no cn) 1: NET_XMIT_DROP skb should be dropped and cwr called 2: NET_XMIT_CN skb should be transmitted and cwr called 3: -EPERM skb should be dropped (no cn) Note that when more than one BPF program is called, the packet is dropped if at least one of programs requests it be dropped, and there is cn if at least one program returns cn. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-31net: sfp: remove sfp-bus use of netdevsRussell King1-4/+2
The sfp-bus code now no longer has any use for the network device structure, so remove its use. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31net: sfp: add mandatory attach/detach methods for sfp busesRussell King1-0/+6
Add attach and detach methods for SFP buses, which will allow us to get rid of the netdev storage in sfp-bus. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller135-1440/+179
The phylink conflict was between a bug fix by Russell King to make sure we have a consistent PHY interface mode, and a change in net-next to pull some code in phylink_resolve() into the helper functions phylink_mac_link_{up,down}() On the dp83867 side it's mostly overlapping changes, with the 'net' side removing a condition that was supposed to trigger for RGMII but because of how it was coded never actually could trigger. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=yPablo Neira Ayuso1-0/+2
This patch fixes a few problems with CONFIG_IPV6=y and CONFIG_NF_CONNTRACK_BRIDGE=m: In file included from net/netfilter/utils.c:5: include/linux/netfilter_ipv6.h: In function 'nf_ipv6_br_defrag': include/linux/netfilter_ipv6.h:110:9: error: implicit declaration of function 'nf_ct_frag6_gather'; did you mean 'nf_ct_attach'? [-Werror=implicit-function-declaration] And these too: net/ipv6/netfilter.c:242:2: error: unknown field 'br_defrag' specified in initializer net/ipv6/netfilter.c:243:2: error: unknown field 'br_fragment' specified in initializer This patch includes an original chunk from wenxu. Fixes: 764dd163ac92 ("netfilter: nf_conntrack_bridge: add support for IPv6") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Yuehaibing <yuehaibing@huawei.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31ipvs: add checksum support for gue encapsulationJacky Hu2-0/+9
Add checksum support for gue encapsulation with the tun_flags parameter, which could be one of the values below: IP_VS_TUNNEL_ENCAP_FLAG_NOCSUM IP_VS_TUNNEL_ENCAP_FLAG_CSUM IP_VS_TUNNEL_ENCAP_FLAG_REMCSUM Signed-off-by: Jacky Hu <hengqing.hu@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-31netfilter: replace skb_make_writable with skb_ensure_writableFlorian Westphal1-5/+0
This converts all remaining users and then removes skb_make_writable. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-31netfilter: nf_flow_table: remove unnecessary variable in flow_offload_tupleTaehee Yoo1-2/+0
The oifidx in the struct flow_offload_tuple is not used anymore. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-31netfilter: xt_owner: Add supplementary groups optionLukasz Pawelczyk1-3/+4
The XT_OWNER_SUPPL_GROUPS flag causes GIDs specified with XT_OWNER_GID to be also checked in the supplementary groups of a process. f_cred->group_info cannot be modified during its lifetime and f_cred holds a reference to it so it's safe to use. Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-31ipvs: add function to find tunnelsJulian Anastasov1-0/+3
Add ip_vs_find_tunnel() to match tunnel headers by family, address and optional port. Use it to properly find the tunnel real server used in received ICMP errors. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-31ipvs: allow rs_table to contain different real server typesJulian Anastasov1-0/+3
Before now rs_table was used only for NAT real servers. Change it to allow TUN real severs from different types, possibly hashed with different port key. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-31isdn: hdlc: move into mISDNArnd Bergmann1-82/+0
The last remnant of the isdn4linux interface is now the isdnhdlc support, used by the netjet driver. Move it next to that driver. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-05-31isdn: remove isdn4linuxArnd Bergmann11-1648/+0
With all isdn4linux hardware drivers gone, this is only a wrapper around CAPI to support old user space. However, from looking at the mailing list, it seems that the last time anyone asked about it was in 2014, when the upgrade from a linux-2.4 installation failed, and mISDN was suggested as a replacement. The largest public ISDN network (Deutsche Telekom) was supposed to be shut down 2018, which must have drastically reduced the number of legacy installations. When we last discussed removing i4l in 2016, Karsten Keil suggested revisiting this in 2018. I guess this is overdue. Link: http://listserv.isdn4linux.de/pipermail/isdn4linux/2014-October/006165.html Link: https://patchwork.kernel.org/patch/8484861/#17900371 Link: https://listserv.isdn4linux.de/pipermail/isdn4linux/2019-April/thread.html Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-05-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-2/+18
Pull networking fixes from David Miller: 1) Fix OOPS during nf_tables rule dump, from Florian Westphal. 2) Use after free in ip_vs_in, from Yue Haibing. 3) Fix various kTLS bugs (NULL deref during device removal resync, netdev notification ignoring, etc.) From Jakub Kicinski. 4) Fix ipv6 redirects with VRF, from David Ahern. 5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet. 6) Missing memory allocation failure check in ip6_ra_control(), from Gen Zhang. And likewise fix ip_ra_control(). 7) TX clean budget logic error in aquantia, from Igor Russkikh. 8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet. 9) Double frees in mlx5, from Parav Pandit. 10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit. 11) Fix botched register access in mvpp2, from Antoine Tenart. 12) Use after free in napi_gro_frags(), from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits) net: correct zerocopy refcnt with udp MSG_MORE ethtool: Check for vlan etype or vlan tci when parsing flow_rule net: don't clear sock->sk early to avoid trouble in strparser net-gro: fix use-after-free read in napi_gro_frags() net: dsa: tag_8021q: Create a stable binary format net: dsa: tag_8021q: Change order of rx_vid setup net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value ipv4: tcp_input: fix stack out of bounds when parsing TCP options. mlxsw: spectrum: Prevent force of 56G mlxsw: spectrum_acl: Avoid warning after identical rules insertion net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT r8169: fix MAC address being lost in PCI D3 net: core: support XDP generic on stacked devices. netvsc: unshare skb in VF rx handler udp: Avoid post-GRO UDP checksum recalculation net: phy: dp83867: Set up RGMII TX delay net: phy: dp83867: do not call config_init twice net: phy: dp83867: increase SGMII autoneg timer duration net: phy: dp83867: fix speed 10 in sgmii mode net: phy: marvell10g: report if the PHY fails to boot firmware ...
2019-05-30net: phy: export phy_queue_state_machineHeiner Kallweit1-1/+1
We face the issue that link change interrupt and link status may be reported by different PHY layers. As a result the link change interrupt may occur before the link status changes. Export phy_queue_state_machine to allow PHY drivers to specify a delay between link status change interrupt and link status check. v2: - change jiffies parameter type to unsigned long Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: phy: add callback for custom interrupt handler to struct phy_driverHeiner Kallweit1-0/+3
The phylib interrupt handler handles link change events only currently. However PHY drivers may want to use other interrupt sources too, e.g. to report temperature monitoring events. Therefore add a callback to struct phy_driver allowing PHY drivers to implement a custom interrupt handler. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: phy: enable interrupts when PHY is attached alreadyHeiner Kallweit1-0/+1
This patch is a step towards allowing PHY drivers to handle more interrupt sources than just link change. E.g. several PHY's have built-in temperature monitoring and can raise an interrupt if a temperature threshold is exceeded. We may be interested in such interrupts also if the phylib state machine isn't started. Therefore move enabling interrupts to phy_request_interrupt(). v2: - patch added to series Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30sctp: deduplicate identical skb_checksum_opsMatteo Croce1-5/+7
The same skb_checksum_ops struct is defined twice in two different places, leading to code duplication. Declare it as a global variable into a common header instead of allocating it on the stack on each function call. bloat-o-meter reports a slight code shrink. add/remove: 1/1 grow/shrink: 0/10 up/down: 128/-1282 (-1154) Function old new delta sctp_csum_ops - 128 +128 crc32c_csum_ops 16 - -16 sctp_rcv 6616 6583 -33 sctp_packet_pack 4542 4504 -38 nf_conntrack_sctp_packet 4980 4926 -54 execute_masked_set_action 6453 6389 -64 tcf_csum_sctp 575 428 -147 sctp_gso_segment 1292 1126 -166 sctp_csum_check 579 412 -167 sctp_snat_handler 957 772 -185 sctp_dnat_handler 1321 1132 -189 l4proto_manip_pkt 2536 2313 -223 Total: Before=359297613, After=359296459, chg -0.00% Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30netfilter: nf_conntrack_bridge: add support for IPv6Pablo Neira Ayuso1-0/+50
br_defrag() and br_fragment() indirections are added in case that IPv6 support comes as a module, to avoid pulling innecessary dependencies in. The new fraglist iterator and fragment transformer APIs are used to implement the refragmentation code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30netfilter: bridge: add connection tracking systemPablo Neira Ayuso2-0/+10
This patch adds basic connection tracking support for the bridge, including initial IPv4 support. This patch register two hooks to deal with the bridge forwarding path, one from the bridge prerouting hook to call nf_conntrack_in(); and another from the bridge postrouting hook to confirm the entry. The conntrack bridge prerouting hook defragments packets before passing them to nf_conntrack_in() to look up for an existing entry, otherwise a new entry is allocated and it is attached to the skbuff. The conntrack bridge postrouting hook confirms new conntrack entries, ie. if this is the first packet seen, then it adds the entry to the hashtable and (if needed) it refragments the skbuff into the original fragments, leaving the geometry as is if possible. Exceptions are linearized skbuffs, eg. skbuffs that are passed up to nfqueue and conntrack helpers, as well as cloned skbuff for the local delivery (eg. tcpdump), also in case of bridge port flooding (cloned skbuff too). The packet defragmentation is done through the ip_defrag() call. This forces us to save the bridge control buffer, reset the IP control buffer area and then restore it after call. This function also bumps the IP fragmentation statistics, it would be probably desiderable to have independent statistics for the bridge defragmentation/refragmentation. The maximum fragment length is stored in the control buffer and it is used to refragment the skbuff from the postrouting path. The new fraglist splitter and fragment transformer APIs are used to implement the bridge refragmentation code. The br_ip_fragment() function drops the packet in case the maximum fragment size seen is larger than the output port MTU. This patchset follows the principle that conntrack should not drop packets, so users can do it through policy via invalid state matching. Like br_netfilter, there is no refragmentation for packets that are passed up for local delivery, ie. prerouting -> input path. There are calls to nf_reset() already in several spots in the stack since time ago already, eg. af_packet, that show that skbuff fraglist handling from the netif_rx path is supported already. The helpers are called from the postrouting hook, before confirmation, from there we may see packet floods to bridge ports. Then, although unlikely, this may result in exercising the helpers many times for each clone. It would be good to explore how to pass all the packets in a list to the conntrack hook to do this handle only once for this case. Thanks to Florian Westphal for handing me over an initial patchset version to add support for conntrack bridge. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30netfilter: nf_conntrack: allow to register bridge supportPablo Neira Ayuso2-0/+14
This patch adds infrastructure to register and to unregister bridge support for the conntrack module via nf_ct_bridge_register() and nf_ct_bridge_unregister(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv6: split skbuff into fragments transformerPablo Neira Ayuso1-0/+19
This patch exposes a new API to refragment a skbuff. This allows you to split either a linear skbuff or to force the refragmentation of an existing fraglist using a different mtu. The API consists of: * ip6_frag_init(), that initializes the internal state of the transformer. * ip6_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv6 header, and it also copies the payload for each fragment. The ip6_frag_state object stores the internal state of the splitter. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv4: split skbuff into fragments transformerPablo Neira Ayuso1-0/+16
This patch exposes a new API to refragment a skbuff. This allows you to split either a linear skbuff or to force the refragmentation of an existing fraglist using a different mtu. The API consists of: * ip_frag_init(), that initializes the internal state of the transformer. * ip_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv4 header, and it also copies the payload for each fragment. The ip_frag_state object stores the internal state of the splitter. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv6: add skbuff fraglist splitterPablo Neira Ayuso1-0/+25
This patch adds the skbuff fraglist split iterator. This API provides an iterator to transform the fraglist into single skbuff objects, it consists of: * ip6_fraglist_init(), that initializes the internal state of the fraglist iterator. * ip6_fraglist_prepare(), that restores the IPv6 header on the fragment. * ip6_fraglist_next(), that retrieves the fragment from the fraglist and updates the internal state of the iterator to point to the next fragment in the fraglist. The ip6_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv4: add skbuff fraglist splitterPablo Neira Ayuso1-0/+23
This patch adds the skbuff fraglist splitter. This API provides an iterator to transform the fraglist into single skbuff objects, it consists of: * ip_fraglist_init(), that initializes the internal state of the fraglist splitter. * ip_fraglist_prepare(), that restores the IPv4 header on the fragments. * ip_fraglist_next(), that retrieves the fragment from the fraglist and it updates the internal state of the splitter to point to the next fragment skbuff in the fraglist. The ip_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30tcp: add backup TFO key infrastructureJason Baron2-3/+39
We would like to be able to rotate TFO keys while minimizing the number of client cookies that are rejected. Currently, we have only one key which can be used to generate and validate cookies, thus if we simply replace this key clients can easily have cookies rejected upon rotation. We propose having the ability to have both a primary key and a backup key. The primary key is used to generate as well as to validate cookies. The backup is only used to validate cookies. Thus, keys can be rotated as: 1) generate new key 2) add new key as the backup key 3) swap the primary and backup key, thus setting the new key as the primary We don't simply set the new key as the primary key and move the old key to the backup slot because the ip may be behind a load balancer and we further allow for the fact that all machines behind the load balancer will not be updated simultaneously. We make use of this infrastructure in subsequent patches. Suggested-by: Igor Lubashev <ilubashe@akamai.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30i2c: acpi: export i2c_acpi_find_adapter_by_handleRuslan Babayev1-0/+6
This allows drivers to lookup i2c adapters on ACPI based systems similar to of_get_i2c_adapter_by_node() with DT based systems. Signed-off-by: Ruslan Babayev <ruslan@babayev.com> Cc: xe-linux-external@cisco.com Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30udp: Avoid post-GRO UDP checksum recalculationSean Tranchetti1-1/+8
Currently, when resegmenting an unexpected UDP GRO packet, the full UDP checksum will be calculated for every new SKB created by skb_segment() because the netdev features passed in by udp_rcv_segment() lack any information about checksum offload capabilities. Usually, we have no need to perform this calculation again, as 1) The GRO implementation guarantees that any packets making it to the udp_rcv_segment() function had correct checksums, and, more importantly, 2) Upon the successful return of udp_rcv_segment(), we immediately pull the UDP header off and either queue the segment to the socket or hand it off to a new protocol handler. Unless userspace has set the IP_CHECKSUM sockopt to indicate that they want the final checksum values, we can pass the needed netdev feature flags to __skb_gso_segment() to avoid checksumming each segment in skb_segment(). Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection") Cc: Paolo Abeni <pabeni@redhat.com> Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29net: phylink: Add PHYLINK_DEV operation typeIoana Ciornei1-0/+1
In the PHYLINK_DEV operation type, the PHYLINK infrastructure can work without an attached net_device. For printing usecases, instead, a struct device * should be passed to PHYLINK using the phylink_config structure. Also, netif_carrier_* calls ar guarded by the presence of a valid net_device. When using the PHYLINK_DEV operation type, we cannot check link status using the netif_carrier_ok() API so instead, keep an internal state of the MAC and call mac_link_{down,up} only when the link changed. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29net: phylink: Add struct phylink_config to PHYLINK APIIoana Ciornei2-20/+38
The phylink_config structure will encapsulate a pointer to a struct device and the operation type requested for this instance of PHYLINK. This patch does not make any functional changes, it just transitions the PHYLINK internals and all its users to the new API. A pointer to a phylink_config structure will be passed to phylink_create() instead of the net_device directly. Also, the same phylink_config pointer will be passed back to all phylink_mac_ops callbacks instead of the net_device. Using this mechanism, a PHYLINK user can get the original net_device using a structure such as 'to_net_dev(config->dev)' or directly the structure containing the phylink_config using a container_of call. At the moment, only the PHYLINK_NETDEV is defined as a valid operation type for PHYLINK. In this mode, a valid reference to a struct device linked to the original net_device should be passed to PHYLINK through the phylink_config structure. This API changes is mainly driven by the necessity of adding a new operation type in PHYLINK that disconnects the phy_device from the net_device and also works when the net_device is lacking. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29net: sched: Introduce act_ctinfo actionKevin 'ldir' Darbyshire-Bryant3-0/+63
ctinfo is a new tc filter action module. It is designed to restore information contained in firewall conntrack marks to other packet fields and is typically used on packet ingress paths. At present it has two independent sub-functions or operating modes, DSCP restoration mode & skb mark restoration mode. The DSCP restore mode: This mode copies DSCP values that have been placed in the firewall conntrack mark back into the IPv4/v6 diffserv fields of relevant packets. The DSCP restoration is intended for use and has been found useful for restoring ingress classifications based on egress classifications across links that bleach or otherwise change DSCP, typically home ISP Internet links. Restoring DSCP on ingress on the WAN link allows qdiscs such as but by no means limited to CAKE to shape inbound packets according to policies that are easier to set & mark on egress. Ingress classification is traditionally a challenging task since iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT lookups, hence are unable to see internal IPv4 addresses as used on the typical home masquerading gateway. Thus marking the connection in some manner on egress for later restoration of classification on ingress is easier to implement. Parameters related to DSCP restore mode: dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the conntrack mark field contain the DSCP value to be restored. statemask - a 32 bit mask of (usually) 1 bit length, outside the area specified by dscpmask. This represents a conditional operation flag whereby the DSCP is only restored if the flag is set. This is useful to implement a 'one shot' iptables based classification where the 'complicated' iptables rules are only run once to classify the connection on initial (egress) packet and subsequent packets are all marked/restored with the same DSCP. A mask of zero disables the conditional behaviour ie. the conntrack mark DSCP bits are always restored to the ip diffserv field (assuming the conntrack entry is found & the skb is an ipv4/ipv6 type) e.g. dscpmask 0xfc000000 statemask 0x01000000 |----0xFC----conntrack mark----000000---| | Bits 31-26 | bit 25 | bit24 |~~~ Bit 0| | DSCP | unused | flag |unused | |-----------------------0x01---000000---| | | | | ---| Conditional flag v only restore if set |-ip diffserv-| | 6 bits | |-------------| The skb mark restore mode (cpmark): This mode copies the firewall conntrack mark to the skb's mark field. It is completely the functional equivalent of the existing act_connmark action with the additional feature of being able to apply a mask to the restored value. Parameters related to skb mark restore mode: mask - a 32 bit mask applied to the firewall conntrack mark to mask out bits unwanted for restoration. This can be useful where the conntrack mark is being used for different purposes by different applications. If not specified and by default the whole mark field is copied (i.e. default mask of 0xffffffff) e.g. mask 0x00ffffff to mask out the top 8 bits being used by the aforementioned DSCP restore mode. |----0x00----conntrack mark----ffffff---| | Bits 31-24 | | | DSCP & flag| some value here | |---------------------------------------| | | v |------------skb mark-------------------| | | | | zeroed | | |---------------------------------------| Overall parameters: zone - conntrack zone control - action related control (reclassify | pipe | drop | continue | ok | goto chain <CHAIN_INDEX>) Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29rhashtable: Add rht_ptr_rcu and improve rht_ptrHerbert Xu1-18/+18
This patch moves common code between rht_ptr and rht_ptr_exclusive into __rht_ptr. It also adds a new helper rht_ptr_rcu exclusively for the RCU case. This way rht_ptr becomes a lock-only construct so we can use the lighter rcu_dereference_protected primitive. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29bpf: cgroup: properly use bpf_prog_array apiStanislav Fomichev1-1/+1
Now that we don't have __rcu markers on the bpf_prog_array helpers, let's use proper rcu_dereference_protected to obtain array pointer under mutex. We also don't need __rcu annotations on cgroup_bpf.inactive since it's not read/updated concurrently. v4: * drop cgroup_rcu_xyz wrappers and use rcu APIs directly; presumably should be more clear to understand which mutex/refcount protects each particular place v3: * amend cgroup_rcu_dereference to include percpu_ref_is_dying; cgroup_bpf is now reference counted and we don't hold cgroup_mutex anymore in cgroup_bpf_release v2: * replace xchg with rcu_swap_protected Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-29bpf: remove __rcu annotations from bpf_prog_arrayStanislav Fomichev1-6/+6
Drop __rcu annotations and rcu read sections from bpf_prog_array helper functions. They are not needed since all existing callers call those helpers from the rcu update side while holding a mutex. This guarantees that use-after-free could not happen. In the next patches I'll fix the callers with missing rcu_dereference_protected to make sparse/lockdep happy, the proper way to use these helpers is: struct bpf_prog_array __rcu *progs = ...; struct bpf_prog_array *p; mutex_lock(&mtx); p = rcu_dereference_protected(progs, lockdep_is_held(&mtx)); bpf_prog_array_length(p); bpf_prog_array_copy_to_user(p, ...); bpf_prog_array_delete_safe(p, ...); bpf_prog_array_copy_info(p, ...); bpf_prog_array_copy(p, ...); bpf_prog_array_free(p); mutex_unlock(&mtx); No functional changes! rcu_dereference_protected with lockdep_is_held should catch any cases where we update prog array without a mutex (I've looked at existing call sites and I think we hold a mutex everywhere). Motivation is to fix sparse warnings: kernel/bpf/core.c:1803:9: warning: incorrect type in argument 1 (different address spaces) kernel/bpf/core.c:1803:9: expected struct callback_head *head kernel/bpf/core.c:1803:9: got struct callback_head [noderef] <asn:4> * kernel/bpf/core.c:1877:44: warning: incorrect type in initializer (different address spaces) kernel/bpf/core.c:1877:44: expected struct bpf_prog_array_item *item kernel/bpf/core.c:1877:44: got struct bpf_prog_array_item [noderef] <asn:4> * kernel/bpf/core.c:1901:26: warning: incorrect type in assignment (different address spaces) kernel/bpf/core.c:1901:26: expected struct bpf_prog_array_item *existing kernel/bpf/core.c:1901:26: got struct bpf_prog_array_item [noderef] <asn:4> * kernel/bpf/core.c:1935:26: warning: incorrect type in assignment (different address spaces) kernel/bpf/core.c:1935:26: expected struct bpf_prog_array_item *[assigned] existing kernel/bpf/core.c:1935:26: got struct bpf_prog_array_item [noderef] <asn:4> * v2: * remove comment about potential race; that can't happen because all callers are in rcu-update section Cc: Roman Gushchin <guro@fb.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-28nexthop: Add support for nexthop groupsDavid Ahern1-1/+97
Allow the creation of nexthop groups which reference other nexthop objects to create multipath routes: +--------------+ +------------+ +--------------+ | | nh nh_grp --->| nh_grp_entry |-+ +------------+ +---------|----+ ^ | | +------------+ +----------------+ +--->| nh, weight | nh_parent +------------+ A group entry points to a nexthop with a weight for that hop within the group. The nexthop has a list_head, grp_list, for tracking which groups it is a member of and the group entry has a reference back to the parent. The grp_list is used when a nexthop is deleted - to efficiently remove it from groups using it. If a nexthop group spec is given, no other attributes can be set. Each nexthop id in a group spec must already exist. Similar to single nexthops, the specification of a nexthop group can be updated so that data is managed with rcu locking. Add path selection function to account for multiple paths and add ipv{4,6}_good_nh helpers to know that if a neighbor entry exists it is in a good state. Update NETDEV event handling to rebalance multipath nexthop groups if a nexthop is deleted due to a link event (down or unregister). When a nexthop is removed any groups using it are updated. Groups using a nexthop a tracked via a grp_list. Nexthop dumps can be limited to groups only by adding NHA_GROUPS to the request. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28nexthop: Add support for lwt encapsDavid Ahern1-0/+3
Add support for NHA_ENCAP and NHA_ENCAP_TYPE. Leverages the existing code for lwtunnel within fib_nh_common, so the only change needed is handling the attributes in the nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28nexthop: Add support for IPv6 gatewaysDavid Ahern1-0/+3
Handle IPv6 gateway in a nexthop spec. If nh_family is set to AF_INET6, NHA_GATEWAY is expected to be an IPv6 address. Add ipv6 option to gw in nh_config to hold the address, add fib6_nh to nh_info to leverage the ipv6 initialization and cleanup code. Update nh_fill_node to dump the v6 address. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28nexthop: Add support for IPv4 nexthopsDavid Ahern1-0/+5
Add support for IPv4 nexthops. If nh_family is set to AF_INET, then NHA_GATEWAY is expected to be an IPv4 address. Register for netdev events to be notified of admin up/down changes as well as deletes. A hash table is used to track nexthop per devices to quickly convert device events to the affected nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28net: Initial nexthop codeDavid Ahern3-0/+108
Barebones start point for nexthops. Implementation for RTM commands, notifications, management of rbtree for holding nexthops by id, and kernel side data structures for nexthops and nexthop config. Nexthops are maintained in an rbtree sorted by id. Similar to routes, nexthops are configured per namespace using netns_nexthop struct added to struct net. Nexthop notifications are sent when a nexthop is added or deleted, but NOT if the delete is due to a device event or network namespace teardown (which also involves device events). Applications are expected to use the device down event to flush nexthops and any routes used by the nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28net: nexthop uapiDavid Ahern2-0/+66
New UAPI for nexthops as standalone objects: - defines netlink ancillary header, struct nhmsg - RTM commands for nexthop objects, RTM_*NEXTHOP, - RTNLGRP for nexthop notifications, RTNLGRP_NEXTHOP, - Attributes for creating nexthops, NHA_* - Attribute for route specs to specify a nexthop by id, RTA_NH_ID. The nexthop attributes and semantics follow the route and RTA ones for device, gateway and lwt encap. Unique to nexthop objects are a blackhole and a group which contains references to other nexthop objects. With the exception of blackhole and group, nexthop objects MUST contain a device. Gateway and encap are optional. Nexthop groups can only reference other pre-existing nexthops by id. If the NHA_ID attribute is present that id is used for the nexthop. If not specified, one is auto assigned. Dump requests can include attributes: - NHA_GROUPS to return only nexthop groups, - NHA_MASTER to limit dumps to nexthops with devices enslaved to the given master (e.g., VRF) - NHA_OIF to limit dumps to nexthops using given device nlmsg_route_perms in selinux code is updated for the new RTM comands. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28inet: frags: fix use-after-free read in inet_frag_destroy_rcuEric Dumazet1-0/+3
As caught by syzbot [1], the rcu grace period that is respected before fqdir_rwork_fn() proceeds and frees fqdir is not enough to prevent inet_frag_destroy_rcu() being run after the freeing. We need a proper rcu_barrier() synchronization to replace the one we had in inet_frags_fini() We also have to fix a potential problem at module removal : inet_frags_fini() needs to make sure that all queued work queues (fqdir_rwork_fn) have completed, otherwise we might call kmem_cache_destroy() too soon and get another use-after-free. [1] BUG: KASAN: use-after-free in inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 Read of size 8 at addr ffff88806ed47a18 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0-rc1+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch kernel/rcu/tree.c:2092 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2310 [inline] rcu_core+0xba5/0x1500 kernel/rcu/tree.c:2291 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: ff ff 48 89 df e8 f2 95 8c fa eb 82 e9 07 00 00 00 0f 00 2d e4 45 4b 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d4 45 4b 00 fb f4 <c3> 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 8e 18 42 fa e8 99 RSP: 0018:ffff8880a98e7d78 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff1164e11 RBX: ffff8880a98d4340 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a98d4bbc RBP: ffff8880a98e7da8 R08: ffff8880a98d4340 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffffffff88b27078 R14: 0000000000000001 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571 default_idle_call+0x36/0x90 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x377/0x560 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354 start_secondary+0x34e/0x4c0 arch/x86/kernel/smpboot.c:267 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 8877: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 kmem_cache_alloc_trace+0x151/0x750 mm/slab.c:3555 kmalloc include/linux/slab.h:547 [inline] kzalloc include/linux/slab.h:742 [inline] fqdir_init include/net/inet_frag.h:115 [inline] ipv6_frags_init_net+0x48/0x460 net/ipv6/reassembly.c:513 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206 ksys_unshare+0x440/0x980 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 17: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 fqdir_rwork_fn+0x33/0x40 net/ipv4/inet_fragment.c:154 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at ffff88806ed47a00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 24 bytes inside of 512-byte region [ffff88806ed47a00, ffff88806ed47c00) The buggy address belongs to the page: page:ffffea0001bb51c0 refcount:1 mapcount:0 mapping:ffff8880aa400940 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000282a788 ffffea0001bb53c8 ffff8880aa400940 raw: 0000000000000000 ffff88806ed47000 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88806ed47900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88806ed47a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88806ed47a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>