linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-09	fm10k: reduce duplicate fm10k_stat macro code	Jacob Keller	1	-14/+15
	Share some of the code for setting up fm10k_stat macros by implementing an FM10K_STAT_FIELDS macro which we can use when setting up the type specific macros. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-09	fm10k: setup VLANs for l2 accelerated macvlan interfaces	Jacob Keller	1	-2/+48
	We have support for accelerating macvlan devices via the .ndo_dfwd_add_station() netdev op. These accelerated macvlan MAC addresses are stored in the l2_accel structure, separate from the unicast or multicast address lists. If a VLAN is added on top of the macvlan device by the stack, traffic will not properly flow to the macvlan. This occurs because we fail to setup the VLANs for l2_accel MAC addresses. In the non-offloaded case the MAC address is added to the unicast address list, and thus the normal setup for enabling VLANs works as expected. We also need to add VLANs marked from .ndo_vlan_rx_add_vid() into the l2_accel MAC addresses. Otherwise, VLAN traffic will not properly be received by the VLAN devices attached to the offloaded macvlan devices. Fix this by adding necessary logic to setup VLANs not only for the unicast and multicast addresses, but also the l2_accel list. We need similar logic in dfwd_add_station, dfwd_del_station, fm10k_update_vid, and fm10k_restore_rx_state. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-08	udp: Do not copy destructor if one is not present	Alexander Duyck	1	-8/+14
	This patch makes it so that if a destructor is not present we avoid trying to update the skb socket or any reference counting that would be associated with the NULL socket and/or descriptor. By doing this we can support traffic coming from another namespace without any issues. Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	udp: Add support for software checksum and GSO_PARTIAL with GSO offload	Alexander Duyck	2	-20/+20
	This patch adds support for a software provided checksum and GSO_PARTIAL segmentation support. With this we can offload UDP segmentation on devices that only have partial support for tunnels. Since we are no longer needing the hardware checksum we can drop the checks in the segmentation code that were verifying if it was present. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	udp: Partially unroll handling of first segment and last segment	Alexander Duyck	1	-14/+19
	This patch allows us to take care of unrolling the first segment and the last segment of the loop for processing the segmented skb. Part of the motivation for this is that it makes it easier to process the fact that the first fame and all of the frames in between should be mostly identical in terms of header data, and the last frame has differences in the length and partial checksum. In addition I am dropping the header length calculation since we don't really need it for anything but the last frame and it can be easily obtained by just pulling the data_len and offset of tail from the transport header. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	udp: Do not pass checksum as a parameter to GSO segmentation	Alexander Duyck	3	-22/+20
	This patch is meant to allow us to avoid having to recompute the checksum from scratch and have it passed as a parameter. Instead of taking that approach we can take advantage of the fact that the length that was used to compute the existing checksum is included in the UDP header. Finally to avoid the need to invert the result we can just call csum16_add and csum16_sub directly. By doing this we can avoid a number of instructions in the loop that is handling segmentation. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	udp: Do not pass MSS as parameter to GSO segmentation	Alexander Duyck	3	-4/+6
	There is no point in passing MSS as a parameter for for the GSO segmentation call as it is already available via the shared info for the skb itself. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	udp: Record gso_segs when supporting UDP segmentation offload	Alexander Duyck	1	-0/+2
	We need to record the number of segments that will be generated when this frame is segmented. The expectation is that if gso_size is set then gso_segs is set as well. Without this some drivers such as ixgbe get confused if they attempt to offload this as they record 0 segments for the entire packet instead of the correct value. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	dt-bindings: dsa: Remove unnecessary #address/#size-cells	Fabio Estevam	1	-6/+0
	If the example binding is used on a real dts file, the following DTC warning is seen with W=1: arch/arm/boot/dts/imx6q-b450v3.dtb: Warning (avoid_unnecessary_addr_size): /mdio-gpio/switch@0: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property Remove unnecessary #address-cells/#size-cells to improve the binding document examples. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	net: phy: sfp: handle cases where neither BR, min nor BR, max is given	Antoine Tenart	1	-0/+7
	When computing the bitrate using values read from an SFP module EEPROM, we use the nominal BR plus BR,min and BR,max to determine the boundaries. But in some cases BR,min and BR,max aren't provided, which led the SFP code to end up having the nominal value for both the minimum and maximum bitrate values. When using a passive cable, the nominal value should be used as the maximum one, and there is no minimum one so we should use 0. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	bnxt_en: Always forward VF MAC address to the PF.	Michael Chan	2	-2/+3
	The current code already forwards the VF MAC address to the PF, except in one case. If the VF driver gets a valid MAC address from the firmware during probe time, it will not forward the MAC address to the PF, incorrectly assuming that the PF already knows the MAC address. This causes "ip link show" to show zero VF MAC addresses for this case. This assumption is not correct. Newer firmware remembers the VF MAC address last used by the VF and provides it to the VF driver during probe. So we need to always forward the VF MAC address to the PF. The forwarded MAC address may now be the PF assigned MAC address and so we need to make sure we approve it for this case. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	bnxt_en: Read phy eeprom A2h address only when optical diagnostics is supported.	Vasundhara Volam	2	-14/+9
	For SFP+ modules, 0xA2 page is available only when Diagnostic Monitoring Type [Address A0h, Byte 92] is implemented. Extend bnxt_get_module_info(), to read optical diagnostics support at offset 92(0x5c) and set eeprom_len length to ETH_MODULE_SFF_8436_LEN (to exclude A2 page), if dianostics is not supported. Also in bnxt_get_module_info(), module id is read from offset 0x5e which is not correct. It was working by accident, as offset was not effective without setting enables flag in the firmware request. SFP module id is present at location 0. Fix this by removing the offset and read it from location 0. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only.	Michael Chan	1	-0/+3
	Only non-NPAR PFs need to actively check and manage unsupported link speeds. NPAR functions and VFs do not control the link speed and should skip the unsupported speed detection logic, to avoid warning messages from firmware rejecting the unsupported firmware calls. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	bnxt_en: Fix firmware message delay loop regression.	Michael Chan	2	-4/+15
	A recent change to reduce delay granularity waiting for firmware reponse has caused a regression. With a tighter delay loop, the driver may see the beginning part of the response faster. The original 5 usec delay to wait for the rest of the message is not long enough and some messages are detected as invalid. Increase the maximum wait time from 5 usec to 20 usec. Also, fix the debug message that shows the total delay time for the response when the message times out. With the new logic, the delay time is not fixed per iteration of the loop, so we define a macro to show the total delay time. Fixes: 9751e8e71487 ("bnxt_en: reduce timeout on initial HWRM calls") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	net-next/hinic: add pci device ids for 25ge and 100ge card	Zhao Chen	1	-2/+6
	This patch adds PCI device IDs to support 25GE and 100GE card: 1. Add device id 0x0201 for HINIC 100GE dual port card. 2. Add device id 0x0200 for HINIC 25GE dual port card. 3. Macro of device id 0x1822 is modified for HINIC 25GE quad port card. Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	flow_dissector: do not rely on implicit casts	Paolo Abeni	2	-3/+3
	This change fixes a couple of type mismatch reported by the sparse tool, explicitly using the requested type for the offending arguments. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08	net: core: rework basic flow dissection helper	Paolo Abeni	4	-20/+28
	When the core networking needs to detect the transport offset in a given packet and parse it explicitly, a full-blown flow_keys struct is used for storage. This patch introduces a smaller keys store, rework the basic flow dissect helper to use it, and apply this new helper where possible - namely in skb_probe_transport_header(). The used flow dissector data structures are renamed to match more closely the new role. The above gives ~50% performance improvement in micro benchmarking around skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due to the smaller memset. Small, but measurable improvement is measured also in macro benchmarking. v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(), as per DaveM suggestion Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	net: ipv6/gre: Add GRO support	Eran Ben Elisha	1	-10/+27
	Add GRO capability for IPv6 GRE tunnel and ip6erspan tap, via gro_cells infrastructure. Performance testing: 55% higher badwidth. Measuring bandwidth of 1 thread IPv4 TCP traffic over IPv6 GRE tunnel while GRO on the physical interface is disabled. CPU: Intel Xeon E312xx (Sandy Bridge) NIC: Mellanox Technologies MT27700 Family [ConnectX-4] Before (GRO not working in tunnel) : 2.47 Gbits/sec After (GRO working in tunnel) : 3.85 Gbits/sec Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	net: ipv6: Fix typo in ipv6_find_hdr() documentation	Tariq Toukan	1	-1/+1
	Fix 'an' into 'and', and use a comma instead of a period. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	qed: Add support for Unified Fabric Port.	Sudarsana Reddy Kalluru	14	-27/+283
	This patch adds driver changes for supporting the Unified Fabric Port (UFP). This is a new paritioning mode wherein MFW provides the set of parameters to be used by the device such as traffic class, outer-vlan tag value, priority type etc. Drivers receives this info via notifications from mfw and configures the hardware accordingly. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	qed: Add support for multi function mode with 802.1ad tagging.	Sudarsana Reddy Kalluru	2	-20/+49
	The patch adds support for new Multi function mode wherein the traffic classification is done based on the 802.1ad tagging and the outer vlan tag provided by the management firmware. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	qed: Remove unused data member 'is_mf_default'.	Sudarsana Reddy Kalluru	2	-3/+0
	The data member 'is_mf_default' is not used by the qed/qede drivers, removing the same. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	qed*: Refactor mf_mode to consist of bits.	Sudarsana Reddy Kalluru	8	-46/+71
	`mf_mode' field indicates the multi-partitioning mode the device is configured to. This method doesn't scale very well, adding a new MF mode requires going over all the existing conditions, and deciding whether those are needed for the new mode or not. The patch defines a set of bit-fields for modes which are derived according to the mode info shared by the MFW and all the configuration would be made according to those. To add a new mode, there would be a single place where we'll need to go and choose which bits apply and which don't. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	net/9p: correct the variable name in v9fs_get_trans_by_name() comment	Sun Lianwen	1	-1/+1
	The v9fs_get_trans_by_name(char *s) variable name is not "name" but "s". Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	vlan: correct the file path in vlan_dev_change_flags() comment	Sun Lianwen	1	-1/+3
	The vlan_flags enum is defined in include/uapi/linux/if_vlan.h file. not in include/linux/if_vlan.h file. Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	liquidio: support use of ethtool to set link speed of CN23XX-225 cards	Weilin Chang	7	-24/+425
	Support setting the link speed of CN23XX-225 cards (which can do 25Gbps or 10Gbps) via ethtool_ops.set_link_ksettings. Also fix the function assigned to ethtool_ops.get_link_ksettings to use the new link_ksettings api completely (instead of partially via ethtool_convert_legacy_u32_to_link_mode). Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Acked-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	net: 3com: 3c59x: irq save variant of ISR	Anna-Maria Gleixner	1	-14/+4
	When vortex_boomerang_interrupt() is invoked from vortex_tx_timeout() or poll_vortex() interrupts must be disabled. This detaches the interrupt disable logic from locking which requires patching for PREEMPT_RT. The advantage of avoiding spin_lock_irqsave() in the interrupt handler is minimal, but converting it removes all the extra code for callers which come not from interrupt context. Cc: Steffen Klassert <klassert@mathematik.tu-chemnitz.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	net: 3com: 3c59x: Pull locking out of ISR	Anna-Maria Gleixner	1	-11/+9
	Locking is done in the same way in _vortex_interrupt() and _boomerang_interrupt(). To prevent duplication, move the locking into the calling vortex_boomerang_interrupt() function. No functional change. Cc: Steffen Klassert <klassert@mathematik.tu-chemnitz.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	net: 3com: 3c59x: Move boomerang/vortex conditional into function	Anna-Maria Gleixner	1	-14/+20
	If vp->full_bus_master_tx is set, vp->full_bus_master_rx is set as well (see vortex_probe1()). Therefore the conditionals for the decision if boomerang or vortex ISR is executed have the same result. Instead of repeating the explicit conditional execution of the boomerang/vortex ISR, move it into an own function. No functional change. Cc: Steffen Klassert <klassert@mathematik.tu-chemnitz.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	net: u64_stats_sync: Remove functions without user	Anna-Maria Gleixner	1	-14/+0
	Commit 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling tcp_get_info()") removes the only users of u64_stats_update_end/begin_raw() without removing the function in header file. Remove no longer used functions. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	selftests: net: add udpgso* to TEST_GEN_FILES	Anders Roxell	1	-1/+1
	The generated files udpgso* shouldn't be part of TEST_PROGS, they are used by udpgso.sh and udpgsp_bench.sh. They should be added to the TEST_GEN_FILES to get installed without being added to the main run_kselftest.sh script. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07	netfilter: nft_dynset: fix timeout updates on 32bit	Florian Westphal	1	-1/+1
	This must now use a 64bit jiffies value, else we set a bogus timeout on 32bit. Fixes: 8e1102d5a1596 ("netfilter: nf_tables: support timeouts larger than 23 days") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-07	netfilter: ctnetlink: export nf_conntrack_max	Florent Fourcot	3	-0/+5
	IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number of conntrack entries. However, if one wants to compare it with the maximum (and detect exhaustion), the only solution is currently to read sysctl value. This patch add nf_conntrack_max value in netlink message, and simplify monitoring for application built on netlink API. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-07	netfilter: extract Passive OS fingerprint infrastructure from xt_osf	Fernando Fernandez Mancera	7	-289/+359
	Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for nf_tables support. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06	netfilter: nf_tables: Provide NFT_{RT,CT}_MAX for userspace	Phil Sutter	1	-0/+4
	These macros allow conveniently declaring arrays which use NFT_{RT,CT}_* values as indexes. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06	netfilter: nf_nat: remove unused ct arg from lookup functions	Florian Westphal	7	-42/+22
	Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06	netfilter: ip6t_srh: extend SRH matching for previous, next and last SID	Ahmed Abdelsalam	2	-11/+205
	IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed by SR encapsulated packet. Each SID is encoded as an IPv6 prefix. When a Firewall receives an SR encapsulated packet, it should be able to identify which node previously processed the packet (previous SID), which node is going to process the packet next (next SID), and which node is the last to process the packet (last SID) which represent the final destination of the packet in case of inline SR mode. An example use-case of using these features could be SID list that includes two firewalls. When the second firewall receives a packet, it can check whether the packet has been processed by the first firewall or not. Based on that check, it decides to apply all rules, apply just subset of the rules, or totally skip all rules and forward the packet to the next SID. This patch extends SRH match to support matching previous SID, next SID, and last SID. Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06	netfilter: nft_numgen: enable hashing of one element	Laura Garcia Liebana	1	-1/+1
	The modulus in the hash function was limited to > 1 as initially there was no sense to create a hashing of just one element. Nevertheless, there are certain cases specially for load balancing where this case needs to be addressed. This patch fixes the following error. Error: Could not process rule: Numerical result out of range add rule ip nftlb lb01 dnat to jhash ip saddr mod 1 map { 0: 192.168.0.10 } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The solution comes to force the hash to 0 when the modulus is 1. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
2018-05-06	netfilter: nft_numgen: add map lookups for numgen statements	Laura Garcia Liebana	2	-5/+84
	This patch includes a new attribute in the numgen structure to allow the lookup of an element based on the number generator as a key. For this purpose, different ops have been included to extend the current numgen inc functions. Currently, only supported for numgen incremental operations, but it will be supported for random in a follow-up patch. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-04	net/ipv6: rename rt6_next to fib6_next	David Ahern	3	-22/+22
	This slipped through the cracks in the followup set to the fib6_info flip. Rename rt6_next to fib6_next. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-04	bpf, xskmap: fix crash in xsk_map_alloc error path handling	Daniel Borkmann	1	-0/+2
	If bpf_map_precharge_memlock() did not fail, then we set err to zero. However, any subsequent failure from either alloc_percpu() or the bpf_map_area_alloc() will return ERR_PTR(0) which in find_and_alloc_map() will cause NULL pointer deref. In devmap we have the convention that we return -EINVAL on page count overflow, so keep the same logic here and just set err to -ENOMEM after successful bpf_map_precharge_memlock(). Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Björn Töpel <bjorn.topel@intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-04	bpf: fix references to free_bpf_prog_info() in comments	Jakub Kicinski	1	-2/+2
	Comments in the verifier refer to free_bpf_prog_info() which seems to have never existed in tree. Replace it with free_used_maps(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	tools: bpftool: add simple perf event output reader	Jakub Kicinski	8	-19/+444
	Users of BPF sooner or later discover perf_event_output() helpers and BPF_MAP_TYPE_PERF_EVENT_ARRAY. Dumping this array type is not possible, however, we can add simple reading of perf events. Create a new event_pipe subcommand for maps, this sub command will only work with BPF_MAP_TYPE_PERF_EVENT_ARRAY maps. Parts of the code from samples/bpf/trace_output_user.c. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	tools: bpftool: move get_possible_cpus() to common code	Jakub Kicinski	3	-58/+59
	Move the get_possible_cpus() function to shared code. No functional changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	tools: bpftool: fold hex keyword in command help	Jakub Kicinski	2	-15/+17
	Instead of spelling [hex] BYTES everywhere use DATA as keyword for generalized value. This will help us keep the messages concise when longer command are added in the future. It will also be useful once BTF support comes. We will only have to change the definition of DATA. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	nfp: bpf: rewrite map pointers with NFP TIDs	Jakub Kicinski	2	-21/+32
	Kernel will now replace map fds with actual pointer before calling the offload prepare. We can identify those pointers and replace them with NFP table IDs instead of loading the table ID in code generated for CALL instruction. This allows us to support having the same CALL being used with different maps. Since we don't want to change the FW ABI we still need to move the TID from R1 to portion of R0 before the jump. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	nfp: bpf: perf event output helpers support	Jakub Kicinski	7	-4/+187
	Add support for the perf_event_output family of helpers. The implementation on the NFP will not match the host code exactly. The state of the host map and rings is unknown to the device, hence device can't return errors when rings are not installed. The device simply packs the data into a firmware notification message and sends it over to the host, returning success to the program. There is no notion of a host CPU on the device when packets are being processed. Device will only offload programs which set BPF_F_CURRENT_CPU. Still, if map index doesn't match CPU no error will be returned (see above). Dropped/lost firmware notification messages will not cause "lost events" event on the perf ring, they are only visible via device error counters. Firmware notification messages may also get reordered in respect to the packets which caused their generation. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	bpf: replace map pointer loads before calling into offloads	Jakub Kicinski	1	-5/+5
	Offloads may find host map pointers more useful than map fds. Map pointers can be used to identify the map, while fds are only valid within the context of loading process. Jump to skip_full_check on error in case verifier log overflow has to be handled (replace_map_fd_with_map_ptr() prints to the log, driver prep may do that too in the future). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	bpf: export bpf_event_output()	Jakub Kicinski	1	-0/+1
	bpf_event_output() is useful for offloads to add events to BPF event rings, export it. Note that export is placed near the stub since tracing is optional and kernel/bpf/core.c is always going to be built. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04	nfp: bpf: record offload neutral maps in the driver	Jakub Kicinski	5	-6/+168
	For asynchronous events originating from the device, like perf event output, we need to be able to make sure that objects being referred to by the FW message are valid on the host. FW events can get queued and reordered. Even if we had a FW message "barrier" we should still protect ourselves from bogus FW output. Add a reverse-mapping hash table and record in it all raw map pointers FW may refer to. Only record neutral maps, i.e. perf event arrays. These are currently the only objects FW can refer to. Use RCU protection on the read side, update side is under RTNL. Since program vs map destruction order is slightly painful for offload simply take an extra reference on all the recorded maps to make sure they don't disappear. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>