linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-08-29	xdp: separate xdp_redirect tracepoint in error case	Jesper Dangaard Brouer	2	-17/+42
	There is a need to separate the xdp_redirect tracepoint into two tracepoints, for separating the error case from the normal forward case. Due to the extreme speeds XDP is operating at, loading a tracepoint have a measurable impact. Single core XDP REDIRECT (ethtool tuned rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps (CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe) The overhead of loading a bpf-based tracepoint can be calculated to cost 25 nanosec ((1/13782002-1/10267937)10^9 = -24.83 ns). Using perf record on the tracepoint event, with a non-matching --filter expression, the overhead is much larger. Performance drops to 8.3 Mpps, cost 48 nanosec ((1/13782002-1/8312497)10^9 = -47.74)) Having a separate tracepoint for err cases, which should be less frequent, allow running a continuous monitor for errors while not affecting the redirect forward performance (this have also been verified by measurements). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	xdp: make xdp tracepoints report bpf prog id instead of prog_tag	Jesper Dangaard Brouer	1	-10/+8
	Given previous patch expose the map_id, it seems natural to also report the bpf prog id. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	xdp: tracepoint xdp_redirect also need a map argument	Jesper Dangaard Brouer	2	-11/+33
	To make sense of the map index, the tracepoint user also need to know that map we are talking about. Supply the map pointer but only expose the map->id. The 'to_index' is renamed 'to_ifindex'. In the xdp_redirect_map case, this is the result of the devmap lookup. The map lookup key is exposed as map_index, which is needed to troubleshoot in case the lookup failed. The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as possible. This also keeps the TP_STRUCT similar enough, that userspace can write a monitor program, that doesn't need to care about whether bpf_redirect or bpf_redirect_map were used. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	xdp: remove redundant argument to trace_xdp_redirect	Jesper Dangaard Brouer	2	-6/+6
	Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect is redundant as it is only called in-case this action was specified. Remove the argument, but keep "act" member of the tracepoint struct and populate it with XDP_REDIRECT. This makes it easier to write a common bpf_prog processing events. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	addrlabel: add/delete/get can run without rtnl	Florian Westphal	1	-5/+17
	There appears to be no need to use rtnl, addrlabel entries are refcounted and add/delete is serialized by the addrlabel table spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	selftests: add addrlabel add/delete to rtnetlink.sh	Florian Westphal	1	-0/+41
	Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	staging: irda: update MAINTAINERS	Greg Kroah-Hartman	1	-3/+1
	Now that the IRDA code has moved under drivers/staging/irda/, update the MAINTAINERS file with the new location. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	bnxt_en: add a dummy definition for bnxt_vf_rep_get_fid()	Sathya Perla	1	-0/+5
	When bnxt VF-reps are not compiled in (CONFIG_BNXT_SRIOV is off) bnxt_tc.c needs a dummy definition of the routine bnxt_vf_rep_get_fid(). Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 2ae7408fedfe ("bnxt_en: bnxt: add TC flower filter offload support") Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29	rxrpc: Allow failed client calls to be retried	David Howells	7	-31/+261
	Allow a client call that failed on network error to be retried, provided that the Tx queue still holds DATA packet 1. This allows an operation to be submitted to another server or another address for the same server without having to repackage and re-encrypt the data so far processed. Two new functions are provided: (1) rxrpc_kernel_check_call() - This is used to find out the completion state of a call to guess whether it can be retried and whether it should be retried. (2) rxrpc_kernel_retry_call() - Disconnect the call from its current connection, reset the state and submit it as a new client call to a new address. The new address need not match the previous address. A call may be retried even if all the data hasn't been loaded into it yet; a partially constructed will be retained at the same point it was at when an error condition was detected. msg_data_left() can be used to find out how much data was packaged before the error occurred. Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-29	rxrpc: Add notification of end-of-Tx phase	David Howells	4	-20/+77
	Add a callback to rxrpc_kernel_send_data() so that a kernel service can get a notification that the AF_RXRPC call has transitioned out the Tx phase and is now waiting for a reply or a final ACK. This is called from AF_RXRPC with the call state lock held so the notification is guaranteed to come before any reply is passed back. Further, modify the AFS filesystem to make use of this so that we don't have to change the afs_call state before sending the last bit of data. Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-29	rxrpc: Remove some excess whitespace	David Howells	1	-3/+3
	Remove indentation from some blank lines. Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-29	rxrpc: Don't negate call->error before returning it	David Howells	1	-4/+4
	call->error is stored as 0 or a negative error code. Don't negate this value (ie. make it positive) before returning it from a kernel function (though it should still be negated before passing to userspace through a control message). Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-29	rxrpc: Fix IPv6 support	David Howells	8	-18/+31
	Fix IPv6 support in AF_RXRPC in the following ways: (1) When extracting the address from a received IPv4 packet, if the local transport socket is open for IPv6 then fill out the sockaddr_rxrpc struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead of an AF_INET one. (2) When sending CHALLENGE or RESPONSE packets, the transport length needs to be set from the sockaddr_rxrpc::transport_len field rather than sizeof() on the IPv4 transport address. (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up the address correctly before searching for the affected peer. Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-29	rxrpc: Use correct timestamp from Kerberos 5 ticket	David Howells	1	-1/+1
	When an XDR-encoded Kerberos 5 ticket is added as an rxrpc-type key, the expiry time should be drawn from the k5 part of the token union (which was what was filled in), rather than the kad part of the union. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-29	net: rxrpc: Replace time_t type with time64_t type	Baolin Wang	4	-16/+45
	Since the 'expiry' variable of 'struct key_preparsed_payload' has been changed to 'time64_t' type, which is year 2038 safe on 32bits system. In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type when copying ticket expires time to 'prep->expiry', then this patch introduces two helper functions to help convert 'u32' to 'time64_t' type. This patch also uses ktime_get_real_seconds() to get current time instead of get_seconds() which is not year 2038 safe on 32bits system. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-28	hinic: don't build the module by default	Vitaly Kuznetsov	1	-1/+0
	We probably don't want to enable code supporting particular hardware by default e.g. when someone does 'make defconfig'. Other ethernet modules don't do it. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: add code to query TC flower offload stats	Sathya Perla	1	-0/+95
	This patch adds code to implement TC_CLSFLOWER_STATS TC-cmd and the required FW code to query the stats from the HW. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: add TC flower offload flow_alloc/free FW cmds	Sathya Perla	1	-2/+139
	This patch adds the hwrm_cfa_flow_alloc/free() routines that are needed to issue the FW cmds needed for TC flower offload. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: bnxt: add TC flower filter offload support	Sathya Perla	8	-9/+850
	This patch adds support for offloading TC based flow rules and actions for the 'flower' classifier in the bnxt_en driver. It includes logic to parse flow rules and actions received from the TC subsystem, store them and issue the corresponding hwrm_cfa_flow_alloc/free FW cmds. L2/IPv4/IPv6 flows and drop, redir, vlan push/pop actions are supported in this patch. In this patch the hwrm_cfa_flow_xxx routines are just stubs. The code for these routines is introduced in the next patch for easier review. Also, the code to query the TC/flower action stats will be introduced in a subsequent patch. Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: fix clearing devlink ptr from bnxt struct	Sathya Perla	2	-7/+11
	The routine bnxt_link_bp_to_dl() is used to set the devlink ptr in bnxt struct (bp) and also to set the bnxt back ptr in the devlink struct. If devlink_register() fails, bp->dl must be cleared which is not happening currently. This patch fixes bnxt_link_bp_to_dl() to clear bp->dl by passing a NULL dl ptr. Fixes: 4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors") Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Reduce default rings on multi-port cards.	Michael Chan	2	-4/+10
	Reduce default rings from 8 to 4 on multi-port cards to reduce memory usage. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Improve -ENOMEM logic in NAPI poll loop.	Michael Chan	1	-0/+7
	If we cannot allocate RX buffers in the NAPI poll loop when processing an RX event, the current code does not count that event towards the NAPI budget. This can cause us to potentially loop forever in NAPI if we consistently cannot allocate new buffers. Improve it by counting -ENOMEM event as 1 towards the NAPI budget. Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt: initialize board_info values with proper enums	Scott Branden	1	-32/+32
	initialize board_info values with proper enums for defensive programming purposes. This will avoid any errors of the enums being declared not lining up with the board_info array. Signed-off-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt: Add PCIe device IDs for bcm58802/bcm58808	Ray Jui	2	-1/+15
	Add PCIe device ID for bcm58802 and bcm58808. Also add chip number update to declare bcm588xx as chip class phase 4 and later Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: assign CPU affinity hints to bnxt_en IRQs	Vasundhara Volam	2	-2/+27
	This patch provides hints to irqbalance to map bnxt_en device IRQs to specific CPU cores. cpumask_local_spread() is used, which first maps IRQs to near NUMA cores; when those cores are exhausted, IRQs are mapped to far NUMA cores. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Improve tx ring reservation logic.	Michael Chan	4	-14/+44
	When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX rings, etc), the current code tries to reserve the new number of TX rings before closing and re-opening the NIC. If we are unable to reserve the new TX rings, we abort the operation and keep the current TX rings. The problem is that the firmware will disable the current TX rings even when it cannot reserve the new set of TX rings. We fix it as follows: 1. Instead of reserving the new set of TX rings, just ask the firmware to check if the new set of TX rings is available. There is a flag in the firmware message to do that. If not available, abort and the current TX rings will not be disabled. 2. Do the actual TX ring reservation in the path that opens the NIC. We keep the number of TX rings currently successfully reserved. If the number of TX rings is different than the reserved TX rings, we call firmware and reserve again. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	bnxt_en: Update firmware interface spec. to 1.8.1.4.	Michael Chan	1	-5/+181
	Flow APIs are added in this firmware interface. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	ftgmac100: Support NCSI VLAN filtering when available	Samuel Mendoza-Jonas	1	-0/+5
	Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available. This allows the VLAN core to notify the NCSI driver when changes occur so that the remote NCSI channel can be properly configured to filter on the set VLAN tags. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	net/ncsi: Configure VLAN tag filter	Samuel Mendoza-Jonas	4	-4/+326
	Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI stack process new VLAN tags and configure the channel VLAN filter appropriately. Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent for each one, meaning the ncsi_dev_state_config_svf state must be repeated. An internal list of VLAN tags is maintained, and compared against the current channel's ncsi_channel_filter in order to keep track within the state. VLAN filters are removed in a similar manner, with the introduction of the ncsi_dev_state_config_clear_vids state. The maximum number of VLAN tag filters is determined by the "Get Capabilities" response from the channel. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	net/ncsi: Fix several packet definitions	Samuel Mendoza-Jonas	3	-7/+8
	Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	net-next/hinic: fix comparison of a uint16_t type with -1	Aviad Krawczyk	2	-36/+22
	Remove the search for index of constant buffer size Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28	net-next/hinic: Fix MTU limitation	Aviad Krawczyk	1	-0/+1
	Fix the hw MTU limitation by setting max_mtu Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>