aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-23net: sched: pie: export symbols to be reused by FQ-PIEMohit P. Tahiliani1-85/+88
This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix alignment in struct instancesMohit P. Tahiliani1-9/+9
Make the alignment in the initialization of the struct instances consistent in the file. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix commentingMohit P. Tahiliani1-5/+5
Fix punctuation and logical mistakes in the comments. The logical mistake was that "dequeue_rate" is no longer the default way to calculate queuing delay and is not needed. The default way to calculate queue delay was changed in commit cec2975f2b70 ("net: sched: pie: enable timestamp based delay calculation"). Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: rearrange structure members and their initializationsMohit P. Tahiliani1-1/+1
Rearrange the members of the structure such that closely referenced members appear together and/or fit in the same cacheline. Also, change the order of their initializations to match the order in which they appear in the structure. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: move common code to pie.hMohit P. Tahiliani1-85/+1
This patch moves macros, structures and small functions common to PIE and FQ-PIE (to be added in a future commit) from the file net/sched/sch_pie.c to the header file include/net/pie.h. All the moved functions are made inline. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller11-92/+303
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-01-22 The following pull-request contains BPF updates for your *net-next* tree. We've added 92 non-merge commits during the last 16 day(s) which contain a total of 320 files changed, 7532 insertions(+), 1448 deletions(-). The main changes are: 1) function by function verification and program extensions from Alexei. 2) massive cleanup of selftests/bpf from Toke and Andrii. 3) batched bpf map operations from Brian and Yonghong. 4) tcp congestion control in bpf from Martin. 5) bulking for non-map xdp_redirect form Toke. 6) bpf_send_signal_thread helper from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-22bpf: Add BPF_FUNC_jiffies64Martin KaFai Lau1-0/+2
This patch adds a helper to read the 64bit jiffies. It will be used in a later patch to implement the bpf_cubic.c. The helper is inlined for jit_requested and 64 BITS_PER_LONG as the map_gen_lookup(). Other cases could be considered together with map_gen_lookup() if needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
2020-01-22xsk, net: Make sock_def_readable() have external linkageBjörn Töpel2-2/+2
XDP sockets use the default implementation of struct sock's sk_data_ready callback, which is sock_def_readable(). This function is called in the XDP socket fast-path, and involves a retpoline. By letting sock_def_readable() have external linkage, and being called directly, the retpoline can be avoided. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200120092917.13949-1-bjorn.topel@gmail.com
2020-01-21Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-nextDavid S. Miller11-43/+819
Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-01-21 1) Add support for TCP encapsulation of IKE and ESP messages, as defined by RFC 8229. Patchset from Sabrina Dubroca. Please note that there is a merge conflict in: net/unix/af_unix.c between commit: 3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo") from the net-next tree and commit: b50b0580d27b ("net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram") from the ipsec-next tree. The conflict can be solved as done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21tcp/ipv4: remove AF_INET_FAMILYAlex Shi1-6/+0
After commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") the macro isn't used anymore. remove it. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21net/hsr: remove seq_nr_after_or_eqAlex Shi1-1/+0
It's never used after introduced. So maybe better to remove. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21net/smc: allow unprivileged users to read pnet tableHans Wippel1-1/+1
The current flags of the SMC_PNET_GET command only allow privileged users to retrieve entries from the pnet table via netlink. The content of the pnet table may be useful for all users though, e.g., for debugging smc connection problems. This patch removes the GENL_ADMIN_PERM flag so that unprivileged users can read the pnet table. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdmaDavid S. Miller7-58/+257
Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller35-166/+290
2020-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds33-159/+266
Pull networking fixes from David Miller: 1) Fix non-blocking connect() in x25, from Martin Schiller. 2) Fix spurious decryption errors in kTLS, from Jakub Kicinski. 3) Netfilter use-after-free in mtype_destroy(), from Cong Wang. 4) Limit size of TSO packets properly in lan78xx driver, from Eric Dumazet. 5) r8152 probe needs an endpoint sanity check, from Johan Hovold. 6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from John Fastabend. 7) hns3 needs short frames padded on transmit, from Yunsheng Lin. 8) Fix netfilter ICMP header corruption, from Eyal Birger. 9) Fix soft lockup when low on memory in hns3, from Yonglong Liu. 10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan. 11) Fix memory leak in act_ctinfo, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits) cxgb4: reject overlapped queues in TC-MQPRIO offload cxgb4: fix Tx multi channel port rate limit net: sched: act_ctinfo: fix memory leak bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal. bnxt_en: Fix ipv6 RFS filter matching logic. bnxt_en: Fix NTUPLE firmware command failures. net: systemport: Fixed queue mapping in internal ring map net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec net: dsa: sja1105: Don't error out on disabled ports with no phy-mode net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset net: hns: fix soft lockup when there is not enough memory net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key() net/sched: act_ife: initalize ife->metalist earlier netfilter: nat: fix ICMP header corruption on ICMP errors net: wan: lapbether.c: Use built-in RCU list checking netfilter: nf_tables: fix flowtable list del corruption netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks() netfilter: nf_tables: remove WARN and add NLA_STRING upper limits netfilter: nft_tunnel: ERSPAN_VERSION must not be null netfilter: nft_tunnel: fix null-attribute check ...
2020-01-19devlink: Add overlay source MAC is multicast trapAmit Cohen1-0/+1
Add packet trap that can report NVE packets that the device decided to drop because their overlay source MAC is multicast. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19devlink: Add tunnel generic packet trapsAmit Cohen1-0/+2
Add packet traps that can report packets that were dropped during tunnel decapsulation. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19devlink: Add non-routable packet trapAmit Cohen1-0/+1
Add packet trap that can report packets that reached the router, but are non-routable. For example, IGMP queries can be flooded by the device in layer 2 and reach the router. Such packets should not be routed and instead dropped. Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19net: sched: act_ctinfo: fix memory leakEric Dumazet1-0/+11
Implement a cleanup method to properly free ci->params BUG: memory leak unreferenced object 0xffff88811746e2c0 (size 64): comm "syz-executor617", pid 7106, jiffies 4294943055 (age 14.250s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ c0 34 60 84 ff ff ff ff 00 00 00 00 00 00 00 00 .4`............. backtrace: [<0000000015aa236f>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<0000000015aa236f>] slab_post_alloc_hook mm/slab.h:586 [inline] [<0000000015aa236f>] slab_alloc mm/slab.c:3320 [inline] [<0000000015aa236f>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549 [<000000002c946bd1>] kmalloc include/linux/slab.h:556 [inline] [<000000002c946bd1>] kzalloc include/linux/slab.h:670 [inline] [<000000002c946bd1>] tcf_ctinfo_init+0x21a/0x530 net/sched/act_ctinfo.c:236 [<0000000086952cca>] tcf_action_init_1+0x400/0x5b0 net/sched/act_api.c:944 [<000000005ab29bf8>] tcf_action_init+0x135/0x1c0 net/sched/act_api.c:1000 [<00000000392f56f9>] tcf_action_add+0x9a/0x200 net/sched/act_api.c:1410 [<0000000088f3c5dd>] tc_ctl_action+0x14d/0x1bb net/sched/act_api.c:1465 [<000000006b39d986>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424 [<00000000fd6ecace>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<0000000047493d02>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000bdcf8286>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000bdcf8286>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<00000000fc5b92d9>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<00000000da84d076>] sock_sendmsg_nosec net/socket.c:639 [inline] [<00000000da84d076>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<0000000042fb2eee>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<000000008f23f67e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<00000000d838e4f6>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<00000000289a9cb1>] __do_sys_sendmsg net/socket.c:2426 [inline] [<00000000289a9cb1>] __se_sys_sendmsg net/socket.c:2424 [inline] [<00000000289a9cb1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller7-154/+314
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Incorrect uapi header comment in bitwise, from Jeremy Sowden. 2) Fetch flow statistics if flow is still active. 3) Restrict flow matching on hardware based on input device. 4) Add nf_flow_offload_work_alloc() helper function. 5) Remove the last client of the FLOW_OFFLOAD_DYING flag, use teardown instead. 6) Use atomic bitwise operation to operate with flow flags. 7) Add nf_flowtable_hw_offload() helper function to check for the NF_FLOWTABLE_HW_OFFLOAD flag. 8) Add NF_FLOW_HW_REFRESH to retry hardware offload from the flowtable software datapath. 9) Remove indirect calls in xt_hashlimit, from Florian Westphal. 10) Add nf_flow_offload_tuple() helper to consolidate code. 11) Add nf_flow_table_offload_cmd() helper function. 12) A few whitespace cleanups in nf_tables in bitwise and the bitmap/hash set types, from Jeremy Sowden. 13) Cleanup netlink attribute checks in bitwise, from Jeremy Sowden. 14) Replace goto by return in error path of nft_bitwise_dump(), from Jeremy Sowden. 15) Add bitwise operation netlink attribute, also from Jeremy. 16) Add nft_bitwise_init_bool(), from Jeremy Sowden. 17) Add nft_bitwise_eval_bool(), also from Jeremy. 18) Add nft_bitwise_dump_bool(), from Jeremy Sowden. 19) Disallow hardware offload for other that NFT_BITWISE_BOOL, from Jeremy Sowden. 20) Add NFTA_BITWISE_DATA netlink attribute, again from Jeremy. 21) Add support for bitwise shift operation, from Jeremy Sowden. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-18net/rds: Use prefetch for On-Demand-Paging MRHans Westgaard Ry1-0/+9
Try prefetching pages when using On-Demand-Paging MR using ib_advise_mr. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-18net/rds: Handle ODP mr registration/unregistrationHans Westgaard Ry7-56/+244
On-Demand-Paging MRs are registered using ib_reg_user_mr and unregistered with ib_dereg_mr. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-17netns: Constify exported functionsGuillaume Nault1-3/+3
Mark function parameters as 'const' where possible. Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-17net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()Cong Wang1-12/+0
syzbot reported some bogus lockdep warnings, for example bad unlock balance in sch_direct_xmit(). They are due to a race condition between slow path and fast path, that is qdisc_xmit_lock_key gets re-registered in netdev_update_lockdep_key() on slow path, while we could still acquire the queue->_xmit_lock on fast path in this small window: CPU A CPU B __netif_tx_lock(); lockdep_unregister_key(qdisc_xmit_lock_key); __netif_tx_unlock(); lockdep_register_key(qdisc_xmit_lock_key); In fact, unlike the addr_list_lock which has to be reordered when the master/slave device relationship changes, queue->_xmit_lock is only acquired on fast path and only when NETIF_F_LLTX is not set, so there is likely no nested locking for it. Therefore, we can just get rid of re-registration of qdisc_xmit_lock_key. Reported-by: syzbot+4ec99438ed7450da6272@syzkaller.appspotmail.com Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-17net/sched: act_ife: initalize ife->metalist earlierEric Dumazet1-4/+3
It seems better to init ife->metalist earlier in tcf_ife_init() to avoid the following crash : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline] RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431 Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000 RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8 R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000 FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119 __tcf_action_put+0xfa/0x130 net/sched/act_api.c:135 __tcf_idr_release net/sched/act_api.c:165 [inline] __tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145 tcf_idr_release include/net/act_api.h:171 [inline] tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616 tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944 tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000 tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410 tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 ____sys_sendmsg+0x753/0x880 net/socket.c:2330 ___sys_sendmsg+0x100/0x170 net/socket.c:2384 __sys_sendmsg+0x105/0x1d0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg net/socket.c:2424 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-24/+54
Pablo Neira Ayuso says: ==================== Netfilter updates for net The following patchset contains Netfilter fixes for net: 1) Fix use-after-free in ipset bitmap destroy path, from Cong Wang. 2) Missing init netns in entry cleanup path of arp_tables, from Florian Westphal. 3) Fix WARN_ON in set destroy path due to missing cleanup on transaction error. 4) Incorrect netlink sanity check in tunnel, from Florian Westphal. 5) Missing sanity check for erspan version netlink attribute, also from Florian. 6) Remove WARN in nft_request_module() that can be triggered from userspace, from Florian Westphal. 7) Memleak in NFTA_HOOK_DEVS netlink parser, from Dan Carpenter. 8) List poison from commit path for flowtables that are added and deleted in the same batch, from Florian Westphal. 9) Fix NAT ICMP packet corruption, from Eyal Birger. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-16xdp: Use bulking for non-map XDP_REDIRECT and consolidate code pathsToke Høiland-Jørgensen1-71/+19
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
2020-01-16xdp: Move devmap bulk queue into struct net_deviceToke Høiland-Jørgensen1-0/+2
Commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), changed devmap flushing to be a global operation instead of a per-map operation. However, the queue structure used for bulking was still allocated as part of the containing map. This patch moves the devmap bulk queue into struct net_device. The motivation for this is reusing it for the non-map variant of XDP_REDIRECT, which will be changed in a subsequent commit. To avoid other fields of struct net_device moving to different cache lines, we also move a couple of other members around. We defer the actual allocation of the bulk queue structure until the NETDEV_REGISTER notification devmap.c. This makes it possible to check for ndo_xdp_xmit support before allocating the structure, which is not possible at the time struct net_device is allocated. However, we keep the freeing in free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. Because of this change, we lose the reference back to the map that originated the redirect, so change the tracepoint to always return 0 as the map ID and index. Otherwise no functional change is intended with this patch. After this patch, the relevant part of struct net_device looks like this, according to pahole: /* --- cacheline 14 boundary (896 bytes) --- */ struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ unsigned int num_tx_queues; /* 904 4 */ unsigned int real_num_tx_queues; /* 908 4 */ struct Qdisc * qdisc; /* 912 8 */ unsigned int tx_queue_len; /* 920 4 */ spinlock_t tx_global_lock; /* 924 4 */ struct xdp_dev_bulk_queue * xdp_bulkq; /* 928 8 */ struct xps_dev_maps * xps_cpus_map; /* 936 8 */ struct xps_dev_maps * xps_rxqs_map; /* 944 8 */ struct mini_Qdisc * miniq_egress; /* 952 8 */ /* --- cacheline 15 boundary (960 bytes) --- */ struct hlist_head qdisc_hash[16]; /* 960 128 */ /* --- cacheline 17 boundary (1088 bytes) --- */ struct timer_list watchdog_timer; /* 1088 40 */ /* XXX last struct has 4 bytes of padding */ int watchdog_timeo; /* 1128 4 */ /* XXX 4 bytes hole, try to pack */ struct list_head todo_list; /* 1136 16 */ /* --- cacheline 18 boundary (1152 bytes) --- */ Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
2020-01-16netfilter: bitwise: add support for shifts.Jeremy Sowden1-0/+77
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR and XOR. Extend it to do shifts as well. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: add NFTA_BITWISE_DATA attribute.Jeremy Sowden1-0/+5
Add a new bitwise netlink attribute that will be used by shift operations to store the size of the shift. It is not used by boolean operations. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: only offload boolean operations.Jeremy Sowden1-0/+3
Only boolean operations supports offloading, so check the type of the operation and return an error for other types. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: add helper for dumping boolean operations.Jeremy Sowden1-8/+21
Split the code specific to dumping bitwise boolean operations out into a separate function. A similar function will be added later for shift operations. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: add helper for evaluating boolean operations.Jeremy Sowden1-3/+14
Split the code specific to evaluating bitwise boolean operations out into a separate function. Similar functions will be added later for shift operations. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: add helper for initializing boolean operations.Jeremy Sowden1-25/+41
Split the code specific to initializing bitwise boolean operations out into a separate function. A similar function will be added later for shift operations. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.Jeremy Sowden1-0/+16
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a value of a new enum, nft_bitwise_ops. It describes the type of operation an expression contains. Currently, it only has one value: NFT_BITWISE_BOOL. More values will be added later to implement shifts. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: replace gotos with returns.Jeremy Sowden1-8/+5
When dumping a bitwise expression, if any of the puts fails, we use goto to jump to a label. However, no clean-up is required and the only statement at the label is a return. Drop the goto's and return immediately instead. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: bitwise: remove NULL comparisons from attribute checks.Jeremy Sowden1-5/+5
In later patches, we will be adding more checks. In order to be consistent and prevent complaints from checkpatch.pl, replace the existing comparisons with NULL with logical NOT operators. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: nf_tables: white-space fixes.Jeremy Sowden3-5/+5
Indentation fixes for the parameters of a few nft functions. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: add nf_flow_table_offload_cmd()Pablo Neira Ayuso1-12/+28
Split nf_flow_table_offload_setup() in two functions to make it more maintainable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: add nf_flow_offload_tuple() helperPablo Neira Ayuso1-23/+24
Consolidate code to configure the flow_cls_offload structure into one helper function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: hashlimit: do not use indirect calls during gcFlorian Westphal1-18/+4
no need, just use a simple boolean to indicate we want to reap all entries. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: refresh flow if hardware offload failsPablo Neira Ayuso3-10/+21
If nf_flow_offload_add() fails to add the flow to hardware, then the NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable software path. If flowtable hardware offload is enabled, this patch enqueues a new request to offload this flow to hardware. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: add nf_flowtable_hw_offload() helper functionPablo Neira Ayuso2-3/+3
This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that the flowtable hardware offload is enabled. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: use atomic bitwise operations for flow flagsPablo Neira Ayuso3-24/+24
Originally, all flow flag bits were set on only from the workqueue. With the introduction of the flow teardown state and hardware offload this is no longer true. Let's be safe and use atomic bitwise operation to operation with flow flags. Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: remove dying bit, use teardown bit insteadPablo Neira Ayuso1-5/+3
The dying bit removes the conntrack entry if the netdev that owns this flow is going down. Instead, use the teardown mechanism to push back the flow to conntrack to let the classic software path decide what to do with it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: add nf_flow_offload_work_alloc()Pablo Neira Ayuso1-16/+22
Add helper function to allocate and initialize flow offload work and use it to consolidate existing code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: restrict flow dissector match on meta ingress devicePablo Neira Ayuso1-1/+7
Set on FLOW_DISSECTOR_KEY_META meta key using flow tuple ingress interface. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-16netfilter: flowtable: fetch stats only if flow is still alivePablo Neira Ayuso2-5/+3
Do not fetch statistics if flow has expired since it might not in hardware anymore. After this update, remove the FLOW_OFFLOAD_HW_DYING check from nf_flow_offload_stats() since this flag is never set on. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>
2020-01-16net/rds: Detect need of On-Demand-Paging memory registrationHans Westgaard Ry1-2/+4
Add code to check if memory intended for RDMA is FS-DAX-memory. RDS will fail with error code EOPNOTSUPP if FS-DAX-memory is detected. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16netfilter: nat: fix ICMP header corruption on ICMP errorsEyal Birger1-0/+13
Commit 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts") made nf_nat_icmp_reply_translation() use icmp_manip_pkt() as the l4 manipulation function for the outer packet on ICMP errors. However, icmp_manip_pkt() assumes the packet has an 'id' field which is not correct for all types of ICMP messages. This is not correct for ICMP error packets, and leads to bogus bytes being written the ICMP header, which can be wrongfully regarded as 'length' bytes by RFC 4884 compliant receivers. Fix by assigning the 'id' field only for ICMP messages that have this semantic. Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Fixes: 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>