aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-11-21enetc: make enetc_setup_tc_mqprio staticMao Wenan1-1/+1
While using ARCH=mips CROSS_COMPILE=mips-linux-gnu- command to compile, make C=2 drivers/net/ethernet/freescale/enetc/enetc.o one warning can be found: drivers/net/ethernet/freescale/enetc/enetc.c:1439:5: warning: symbol 'enetc_setup_tc_mqprio' was not declared. Should it be static? This patch make symbol enetc_setup_tc_mqprio static. Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21Merge branch 'net-introduce-and-use-route-hint'David S. Miller10-16/+160
Paolo Abeni says: ==================== net: introduce and use route hint This series leverages the listification infrastructure to avoid unnecessary route lookup on ingress packets. In absence of custom rules, packets with equal daddr will usually land on the same dst. When processing packet bursts (lists) we can easily reference the previous dst entry. When we hit the 'same destination' condition we can avoid the route lookup, coping the already available dst. Detailed performance numbers are available in the individual commit messages. v3 -> v4: - move helpers to their own patches (Eric D.) - enable hints for SUBTREE builds (David A.) - re-enable hints for ipv4 forward (David A.) v2 -> v3: - use fib*_has_custom_rules() helpers (David A.) - add ip*_extract_route_hint() helper (Edward C.) - use prev skb as hint instead of copying data (Willem ) v1 -> v2: - fix build issue with !CONFIG_IP*_MULTIPLE_TABLES - fix potential race in ip6_list_rcv_finish() ==================== Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21ipv4: use dst hint for ipv4 list receivePaolo Abeni3-4/+77
This is alike the previous change, with some additional ipv4 specific quirk. Even when using the route hint we still have to do perform additional per packet checks about source address validity: a new helper is added to wrap them. Hints are explicitly disabled if the destination is a local broadcast, that keeps the code simple and local broadcast are a slower path anyway. UDP flood performances vs recvmmsg() receiver: vanilla patched delta Kpps Kpps % 1683 1871 +11 In the worst case scenario - each packet has a different destination address - the performance delta is within noise range. v3 -> v4: - re-enable hints for forward v2 -> v3: - really fix build (sic) and hint usage check - use fib4_has_custom_rules() helpers (David A.) - add ip_extract_route_hint() helper (Edward C.) - use prev skb as hint instead of copying data (Willem) v1 -> v2: - fix build issue with !CONFIG_IP_MULTIPLE_TABLES Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21ipv4: move fib4_has_custom_rules() helper to public headerPaolo Abeni2-10/+10
So that we can use it in the next patch. Additionally constify the helper argument. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21ipv6: introduce and uses route look hints for list input.Paolo Abeni1-2/+24
When doing RX batch packet processing, we currently always repeat the route lookup for each ingress packet. When no custom rules are in place, and there aren't routes depending on source addresses, we know that packets with the same destination address will use the same dst. This change tries to avoid per packet route lookup caching the destination address of the latest successful lookup, and reusing it for the next packet when the above conditions are in place. Ingress traffic for most servers should fit. The measured performance delta under UDP flood vs a recvmmsg receiver is as follow: vanilla patched delta Kpps Kpps % 1431 1674 +17 In the worst-case scenario - each packet has a different destination address - the performance delta is within noise range. v3 -> v4: - support hints for SUBFLOW build, too (David A.) - several style fixes (Eric) v2 -> v3: - add fib6_has_custom_rules() helpers (David A.) - add ip6_extract_route_hint() helper (Edward C.) - use hint directly in ip6_list_rcv_finish() (Willem) v1 -> v2: - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES - fix potential race when fib6_has_custom_rules is set while processing a packet batch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21ipv6: keep track of routes using srcPaolo Abeni4-0/+40
Use a per namespace counter, increment it on successful creation of any route using the source address, decrement it on deletion of such routes. This allows us to check easily if the routing decision in the current namespace depends on the packet source. Will be used by the next patch. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21ipv6: add fib6_has_custom_rules() helperPaolo Abeni1-0/+9
It wraps the namespace field with the same name, to easily access it regardless of build options. Suggested-by: David Ahern <dsahern@gmail.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21Merge branch 'DSA-Felix-PTP'David S. Miller7-82/+222
Yangbo Lu says: ==================== Support PTP clock and hardware timestamping for DSA Felix driver This patch-set is to support PTP clock and hardware timestamping for DSA Felix driver. Some functions in ocelot.c/ocelot_board.c driver were reworked/exported, so that DSA Felix driver was able to reuse them as much as possible. On TX path, timestamping works on packet which requires timestamp. The injection header will be configured accordingly, and skb clone requires timestamp will be added into a list. The TX timestamp is final handled in threaded interrupt handler when PTP timestamp FIFO is ready. On RX path, timestamping is always working. The RX timestamp could be got from extraction header. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: dsa: ocelot: add hardware timestamping support for FelixYangbo Lu2-1/+102
This patch is to reuse ocelot functions as possible to enable PTP clock and to support hardware timestamping on Felix. On TX path, timestamping works on packet which requires timestamp. The injection header will be configured accordingly, and skb clone requires timestamp will be added into a list. The TX timestamp is final handled in threaded interrupt handler when PTP timestamp FIFO is ready. On RX path, timestamping is always working. The RX timestamp could be got from extraction header. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: dsa: ocelot: define PTP registers for felix_vsc9959Yangbo Lu1-0/+16
This patch is to define PTP registers for felix_vsc9959. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: mscc: ocelot: convert to use ocelot_port_add_txtstamp_skb()Yangbo Lu2-16/+29
Convert to use ocelot_port_add_txtstamp_skb() for adding skbs which require TX timestamp into list. Export it so that DSA Felix driver could reuse it too. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: mscc: ocelot: convert to use ocelot_get_txtstamp()Yangbo Lu4-61/+69
The method getting TX timestamp by reading timestamp FIFO and matching skbs list is common for DSA Felix driver too. So move code out of ocelot_board.c, convert to use ocelot_get_txtstamp() function and export it. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: mscc: ocelot: export ocelot_hwstamp_get/set functionsYangbo Lu2-4/+6
Export ocelot_hwstamp_get/set functions so that DSA driver is able to reuse them. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21bpf: skmsg, fix potential psock NULL pointer dereferenceJohn Fastabend1-5/+8
Report from Dan Carpenter, net/core/skmsg.c:792 sk_psock_write_space() error: we previously assumed 'psock' could be null (see line 790) net/core/skmsg.c 789 psock = sk_psock(sk); 790 if (likely(psock && sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))) Check for NULL 791 schedule_work(&psock->work); 792 write_space = psock->saved_write_space; ^^^^^^^^^^^^^^^^^^^^^^^^ 793 rcu_read_unlock(); 794 write_space(sk); Ensure psock dereference on line 792 only occurs if psock is not null. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21audit: Move audit_log_task declaration under CONFIG_AUDITSYSCALLJiri Olsa1-3/+5
The 0-DAY found that audit_log_task is not declared under CONFIG_AUDITSYSCALL which causes compilation error when it is not defined: kernel/bpf/syscall.o: In function `bpf_audit_prog.isra.30': >> syscall.c:(.text+0x860): undefined reference to `audit_log_task' Adding the audit_log_task declaration and stub within CONFIG_AUDITSYSCALL ifdef. Fixes: 91e6015b082b ("bpf: Emit audit messages upon successful prog load and unload") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: Fix Kconfig indentation, continuedKrzysztof Kozlowski5-148/+148
Adjust indentation from spaces to tab (+optional two spaces) as in coding style. This fixes various indentation mixups (seven spaces, tab+one space, etc). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21drivers: net: Fix Kconfig indentation, continuedKrzysztof Kozlowski10-149/+149
Adjust indentation from spaces to tab (+optional two spaces) as in coding style. This fixes various indentation mixups (seven spaces, tab+one space, etc). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21lwtunnel: check erspan options before allocating tun_infoXin Long1-8/+16
As Jakub suggested on another patch, it's better to do the check on erspan options before allocating memory. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21lwtunnel: be STRICT to validate the new LWTUNNEL_IP(6)_OPTSXin Long1-0/+3
LWTUNNEL_IP(6)_OPTS are the new items in ip(6)_tun_policy, which are parsed by nla_parse_nested_deprecated(). We should check it strictly by setting .strict_start_type = LWTUNNEL_IP(6)_OPTS. This patch also adds missing LWTUNNEL_IP6_OPTS in ip6_tun_policy. Fixes: 4ece47787077 ("lwtunnel: add options setting and dumping for geneve") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: remove the unnecessary strict_start_type in some policiesXin Long3-3/+0
ct_policy and mpls_policy are parsed with nla_parse_nested(), which does NL_VALIDATE_STRICT validation, strict_start_type is not needed to set as it is actually trying to make some attributes parsed with NL_VALIDATE_STRICT. This patch is to remove it, and do the same on rtm_nh_policy which is parsed by nlmsg_parse(). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21Merge branch 'net-sched-support-vxlan-and-erspan-options'David S. Miller4-1/+514
Xin Long says: ==================== net: sched: support vxlan and erspan options This patchset is to add vxlan and erspan options support in cls_flower and act_tunnel_key. The form is pretty much like geneve_opts in: https://patchwork.ozlabs.org/patch/935272/ https://patchwork.ozlabs.org/patch/954564/ but only one option is allowed for vxlan and erspan. v1->v2: - see each patch changelog. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: sched: allow flower to match erspan optionsXin Long2-0/+161
This patch is to allow matching options in erspan. The options can be described in the form: VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK. When ver is set to 1, index will be applied while dir and hwid will be ignored, and when ver is set to 2, dir and hwid will be used while index will be ignored. Different from geneve, only one option can be set. And also, geneve options, vxlan options or erspan options can't be set at the same time. # ip link add name erspan1 type erspan external # tc qdisc add dev erspan1 ingress # tc filter add dev erspan1 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ erspan_opts 1:12:0:0/1:ffff:0:0 \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - improve some err msgs of extack. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: sched: allow flower to match vxlan optionsXin Long2-0/+122
This patch is to allow matching gbp option in vxlan. The options can be described in the form GBP/GBP_MASK, where GBP is represented as a 32bit hexadecimal value. Different from geneve, only one option can be set. And also, geneve options and vxlan options can't be set at the same time. # ip link add name vxlan0 type vxlan dstport 0 external # tc qdisc add dev vxlan0 ingress # tc filter add dev vxlan0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ vxlan_opts 01020304/ffffffff \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - add .strict_start_type for enc_opts_policy as Jakub noticed. - use Duplicate instead of Wrong in err msg for extack as Jakub suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: sched: add erspan option support to act_tunnel_keyXin Long2-0/+134
This patch is to allow setting erspan options using the act_tunnel_key action. Different from geneve options, only one option can be set. And also, geneve options, vxlan options or erspan options can't be set at the same time. Options are expressed as ver:index:dir:hwid, when ver is set to 1, index will be applied while dir and hwid will be ignored, and when ver is set to 2, dir and hwid will be used while index will be ignored. # ip link add name erspan1 type erspan external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ erspan_opts 1:2:0:0 \ action mirred egress redirect dev erspan1 v1->v2: - do the validation when dst is not yet allocated as Jakub suggested. - use Duplicate instead of Wrong in err msg for extack. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21net: sched: add vxlan option support to act_tunnel_keyXin Long2-1/+97
This patch is to allow setting vxlan options using the act_tunnel_key action. Different from geneve options, only one option can be set. And also, geneve options and vxlan options can't be set at the same time. gbp is the only param for vxlan options: # ip link add name vxlan0 type vxlan dstport 0 external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ vxlan_opts 01020304 \ action mirred egress redirect dev vxlan0 v1->v2: - add .strict_start_type for enc_opts_policy as Jakub noticed. - use Duplicate instead of Wrong in err msg for extack as Jakub suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21octeontx2-af: Fix uninitialized variable in debugfsDan Carpenter1-1/+2
If rvu_get_blkaddr() fails, then this rvu_cgx_nix_cuml_stats() returns zero and we write some uninitialized data into the debugfs output. On the error paths, the use of the uninitialized "*stat" is harmless, but it will lead to a Smatch warning (static analysis) and a UBSan warning (runtime analysis) so we should prevent that as well. Fixes: f967488d095e ("octeontx2-af: Add per CGX port level NIX Rx/Tx counters") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21vsock: avoid to assign transport if its initialization failsStefano Garzarella1-1/+8
If transport->init() fails, we can't assign the transport to the socket, because it's not initialized correctly, and any future calls to the transport callbacks would have an unexpected behavior. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reported-and-tested-by: syzbot+e2e5c07bf353b2f79daa@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: sfp: soft status and control supportRussell King2-20/+94
Add support for the soft status and control register, which allows TX_FAULT and RX_LOS to be monitored and TX_DISABLE to be set. We make use of this when the board does not support GPIOs for these signals. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20Merge branch 'sfp-quirks'David S. Miller1-0/+79
Russell King says: ==================== Add rudimentary SFP module quirk support The SFP module EEPROM describes the capabilities of the module, but doesn't describe the host interface. We have a certain amount of guess-work to work out how to configure the host - which works most of the time. However, there are some (such as GPON) modules which are able to support different host interfaces, such as 1000BASE-X and 2500BASE-X. The module will switch between each mode until it achieves link with the host. There is no defined way to describe this in the SFP EEPROM, so we can only recognise the module and handle it appropriately. This series adds the necessary recognition of the modules using a quirk system, and tweaks the support mask to allow them to link with the host at 2500BASE-X, thereby allowing the user to achieve full line rate. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: sfp: add some quirks for GPON modulesRussell King1-0/+25
Marc Micalizzi reports that Huawei MA5671A and Alcatel/Lucent G-010S-P modules are capable of 2500base-X, but incorrectly report their capabilities in the EEPROM. It seems rather common that GPON modules mis-report. Let's fix these modules by adding some quirks. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: sfp: add support for module quirksRussell King1-0/+54
Add support for applying module quirks to the list of supported ethtool link modes. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20tcp: warn if offset reach the maxlen limit when using snprintfHangbin Liu3-0/+13
snprintf returns the number of chars that would be written, not number of chars that were actually written. As such, 'offs' may get larger than 'tbl.maxlen', causing the 'tbl.maxlen - offs' being < 0, and since the parameter is size_t, it would overflow. Since using scnprintf may hide the limit error, while the buffer is still enough now, let's just add a WARN_ON_ONCE in case it reach the limit in future. v2: Use WARN_ON_ONCE as Jiri and Eric suggested. Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20ip_gre: Make none-tun-dst gre tunnel store tunnel info as metadat_dst in recvwenxu1-1/+5
Currently collect_md gre tunnel will store the tunnel info(metadata_dst) to skb_dst. And now the non-tun-dst gre tunnel already can add tunnel header through lwtunnel. When received a arp_request on the non-tun-dst gre tunnel. The packet of arp response will send through the non-tun-dst tunnel without tunnel info which will lead the arp response packet to be dropped. If the non-tun-dst gre tunnel also store the tunnel info as metadata_dst, The arp response packet will set the releted tunnel info in the iptunnel_metadata_reply. The following is the test script: ip netns add cl ip l add dev vethc type veth peer name eth0 netns cl ifconfig vethc 172.168.0.7/24 up ip l add dev tun1000 type gretap key 1000 ip link add user1000 type vrf table 1 ip l set user1000 up ip l set dev tun1000 master user1000 ifconfig tun1000 10.0.1.1/24 up ip netns exec cl ifconfig eth0 172.168.0.17/24 up ip netns exec cl ip l add dev tun type gretap local 172.168.0.17 remote 172.168.0.7 key 1000 ip netns exec cl ifconfig tun 10.0.1.7/24 up ip r r 10.0.1.7 encap ip id 1000 dst 172.168.0.17 key dev tun1000 table 1 With this patch ip netns exec cl ping 10.0.1.1 can success Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller120-1078/+4952
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-11-20 The following pull-request contains BPF updates for your *net-next* tree. We've added 81 non-merge commits during the last 17 day(s) which contain a total of 120 files changed, 4958 insertions(+), 1081 deletions(-). There are 3 trivial conflicts, resolve it by always taking the chunk from 196e8ca74886c433: <<<<<<< HEAD ======= void *bpf_map_area_mmapable_alloc(u64 size, int numa_node); >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD void *bpf_map_area_alloc(u64 size, int numa_node) ======= static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { ======= /* kmalloc()'ed memory can't be mmap()'ed */ if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 The main changes are: 1) Addition of BPF trampoline which works as a bridge between kernel functions, BPF programs and other BPF programs along with two new use cases: i) fentry/fexit BPF programs for tracing with practically zero overhead to call into BPF (as opposed to k[ret]probes) and ii) attachment of the former to networking related programs to see input/output of networking programs (covering xdpdump use case), from Alexei Starovoitov. 2) BPF array map mmap support and use in libbpf for global data maps; also a big batch of libbpf improvements, among others, support for reading bitfields in a relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko. 3) Extend s390x JIT with usage of relative long jumps and loads in order to lift the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich. 4) Add BPF audit support and emit messages upon successful prog load and unload in order to have a timeline of events, from Daniel Borkmann and Jiri Olsa. 5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode (XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson. 6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API call named bpf_get_link_xdp_info() for retrieving the full set of prog IDs attached to XDP, from Toke Høiland-Jørgensen. 7) Add BTF support for array of int, array of struct and multidimensional arrays and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau. 8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo. 9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid xdping to be run as standalone, from Jiri Benc. 10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song. 11) Fix a memory leak in BPF fentry test run data, from Colin Ian King. 12) Various smaller misc cleanups and improvements mostly all over BPF selftests and samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20bpf: Switch bpf_map_{area_alloc,area_mmapable_alloc}() to u64 sizeDaniel Borkmann2-7/+10
Given we recently extended the original bpf_map_area_alloc() helper in commit fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY"), we need to apply the same logic as in ff1c08e1f74b ("bpf: Change size to u64 for bpf_map_{area_alloc, charge_init}()"). To avoid conflicts, extend it for bpf-next. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-11-20bpf: Emit audit messages upon successful prog load and unloadDaniel Borkmann4-1/+36
Allow for audit messages to be emitted upon BPF program load and unload for having a timeline of events. The load itself is in syscall context, so additional info about the process initiating the BPF prog creation can be logged and later directly correlated to the unload event. The only info really needed from BPF side is the globally unique prog ID where then audit user space tooling can query / dump all info needed about the specific BPF program right upon load event and enrich the record, thus these changes needed here can be kept small and non-intrusive to the core. Raw example output: # auditctl -D # auditctl -a always,exit -F arch=x86_64 -S bpf # ausearch --start recent -m 1334 [...] ---- time->Wed Nov 20 12:45:51 2019 type=PROCTITLE msg=audit(1574271951.590:8974): proctitle="./test_verifier" type=SYSCALL msg=audit(1574271951.590:8974): arch=c000003e syscall=321 success=yes exit=14 a0=5 a1=7ffe2d923e80 a2=78 a3=0 items=0 ppid=742 pid=949 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="test_verifier" exe="/root/bpf-next/tools/testing/selftests/bpf/test_verifier" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1334] msg=audit(1574271951.590:8974): auid=0 uid=0 gid=0 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=949 comm="test_verifier" exe="/root/bpf-next/tools/testing/selftests/bpf/test_verifier" prog-id=3260 event=LOAD ---- time->Wed Nov 20 12:45:51 2019 type=UNKNOWN[1334] msg=audit(1574271951.590:8975): prog-id=3260 event=UNLOAD ---- [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191120213816.8186-1-jolsa@kernel.org
2019-11-20Merge branch 'r8169-smaller-improvements-to-firmware-handling'David S. Miller1-7/+12
Heiner Kallweit says: ==================== r8169: smaller improvements to firmware handling This series includes few smaller improvements to firmware handling. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20r8169: add check for PHY_MDIO_CHG to rtl_nic_fw_data_okHeiner Kallweit1-5/+10
Only values 0 and 1 are currently defined as parameters for PHY_MDIO_CHG. Instead of silently ignoring unknown values and misinterpreting the firmware code let's explicitly check. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20r8169: use macro FIELD_SIZEOF in definition of FW_OPCODE_SIZEHeiner Kallweit1-1/+1
Using macro FIELD_SIZEOF makes this define easier understandable. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20r8169: change mdelay to msleep in rtl_fw_write_firmwareHeiner Kallweit1-1/+1
We're not in atomic context here, therefore switch to msleep. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: ipconfig: Wait for deferred device probesThomas Bogendoerfer1-0/+3
If network device drives are using deferred probing, it was possible that waiting for devices to show up in ipconfig was already over, when the device eventually showed up. By calling wait_for_device_probe() we now make sure deferred probing is done before checking for available devices. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20vsock/vmci: make vmci_vsock_cb_host_called staticMao Wenan1-1/+1
When using make C=2 drivers/misc/vmw_vmci/vmci_driver.o to compile, below warning can be seen: drivers/misc/vmw_vmci/vmci_driver.c:33:6: warning: symbol 'vmci_vsock_cb_host_called' was not declared. Should it be static? This patch make symbol vmci_vsock_cb_host_called static. Fixes: b1bba80a4376 ("vsock/vmci: register vmci_transport only when VMCI guest/host are active") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20Merge branch 'page_pool-DMA-sync'David S. Miller3-17/+67
Lorenzo Bianconi says: ==================== add DMA-sync-for-device capability to page_pool API Introduce the possibility to sync DMA memory for device in the page_pool API. This feature allows to sync proper DMA size and not always full buffer (dma_sync_single_for_device can be very costly). Please note DMA-sync-for-CPU is still device driver responsibility. Relying on page_pool DMA sync mvneta driver improves XDP_DROP pps of about 170Kpps: - XDP_DROP DMA sync managed by mvneta driver: ~420Kpps - XDP_DROP DMA sync managed by page_pool API: ~585Kpps Do not change naming convention for the moment since the changes will hit other drivers as well. I will address it in another series. Changes since v4: - do not allow the driver to set max_len to 0 - convert PP_FLAG_DMA_MAP/PP_FLAG_DMA_SYNC_DEV to BIT() macro Changes since v3: - move dma_sync_for_device before putting the page in ptr_ring in __page_pool_recycle_into_ring since ptr_ring can be consumed concurrently. Simplify the code moving dma_sync_for_device before running __page_pool_recycle_direct/__page_pool_recycle_into_ring Changes since v2: - rely on PP_FLAG_DMA_SYNC_DEV flag instead of dma_sync Changes since v1: - rename sync in dma_sync - set dma_sync_size to 0xFFFFFFFF in page_pool_recycle_direct and page_pool_put_page routines - Improve documentation ==================== Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: mvneta: get rid of huge dma sync in mvneta_rx_refillLorenzo Bianconi1-11/+15
Get rid of costly dma_sync_single_for_device in mvneta_rx_refill since now the driver can let page_pool API to manage needed DMA sync with a proper size. - XDP_DROP DMA sync managed by mvneta driver: ~420Kpps - XDP_DROP DMA sync managed by page_pool API: ~585Kpps Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: page_pool: add the possibility to sync DMA memory for deviceLorenzo Bianconi2-8/+52
Introduce the following parameters in order to add the possibility to sync DMA memory for device before putting allocated pages in the page_pool caches: - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that the driver gets from page_pool will be DMA-synced-for-device according to the length provided by the device driver. Please note DMA-sync-for-CPU is still device driver responsibility - offset: DMA address offset where the DMA engine starts copying rx data - max_len: maximum DMA memory size page_pool is allowed to flush. This is currently used in __page_pool_alloc_pages_slow routine when pages are allocated from page allocator These parameters are supposed to be set by device drivers. This optimization reduces the length of the DMA-sync-for-device. The optimization is valid because pages are initially DMA-synced-for-device as defined via max_len. At RX time, the driver will perform a DMA-sync-for-CPU on the memory for the packet length. What is important is the memory occupied by packet payload, because this is the area CPU is allowed to read and modify. As we don't track cache-lines written into by the CPU, simply use the packet payload length as dma_sync_size at page_pool recycle time. This also take into account any tail-extend. Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdpLorenzo Bianconi1-2/+4
Rely on page_pool_recycle_direct and not on xdp_return_buff in mvneta_run_xdp. This is a preliminary patch to limit the dma sync len to the one strictly necessary Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20net: sched: pie: enable timestamp based delay calculationGautam Ramakrishnan2-29/+113
RFC 8033 suggests an alternative approach to calculate the queue delay in PIE by using a timestamp on every enqueued packet. This patch adds an implementation of that approach and sets it as the default method to calculate queue delay. The previous method (based on Little's law) to calculate queue delay is set as optional. Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Acked-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20isdn: Fix Kconfig indentationKrzysztof Kozlowski1-1/+1
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20nfc: Fix Kconfig indentationKrzysztof Kozlowski1-1/+1
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20Merge branch 'cxgb4-add-TC-MATCHALL-classifier-offload'David S. Miller14-88/+708
Rahul Lakkireddy says: ==================== cxgb4: add TC-MATCHALL classifier offload This series of patches add support to offload TC-MATCHALL classifier to hardware to classify all outgoing and incoming traffic on the underlying port. Only 1 egress and 1 ingress rule each can be offloaded on the underlying port. Patch 1 adds support for TC-MATCHALL classifier offload on the egress side. TC-POLICE is the only action that can be offloaded on the egress side and is used to rate limit all outgoing traffic to specified max rate. Patch 2 adds logic to reject the current rule offload if its priority conflicts with existing rules in the TCAM. Patch 3 adds support for TC-MATCHALL classifier offload on the ingress side. The same set of actions supported by existing TC-FLOWER classifier offload can be applied on all the incoming traffic. v5: - Fixed commit message and comment to include comparison for equal priority in patch 2. v4: - Removed check in patch 1 to reject police offload if prio is not 1. - Moved TC_SETUP_BLOCK code to separate function in patch 1. - Added logic to ensure the prio passed by TC doesn't conflict with other rules in TCAM in patch 2. - Higher index has lower priority than lower index in TCAM. So, rework cxgb4_get_free_ftid() to search free index from end of TCAM in descending order in patch 2. - Added check to ensure the matchall rule's prio doesn't conflict with other rules in TCAM in patch 3. - Added logic to fill default mask for VIID, if none has been provided, to prevent conflict with duplicate VIID rules in patch 3. - Used existing variables in private structure to fill VIID info, instead of extracting the info manually in patch 3. v3: - Added check in patch 1 to reject police offload if prio is not 1. - Assign block_shared variable only for TC_SETUP_BLOCK in patch 1. v2: - Added check to reject flow block sharing for policers in patch 1. - Removed logic to fetch free index from end of TCAM in patch 2. Must maintain the same ordering as in kernel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>