linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-06-06	net/mlxfw: remove redundant goto on error check	Colin Ian King	1	-2/+0
	The check to see of err is set and the subsequent goto is extraneous as the next statement is where the goto is jumping to. Remove this redundant check and goto. Detected by CoverityScan, CID#1437734 ("Identical code for different branches") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06	e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll()	Konstantin Khlebnikov	1	-6/+6
	Replace disable_irq() which waits for threaded irq handlers with disable_hardirq() which waits only for hardirq part. Fixes: 311191297125 ("e1000: use disable_hardirq() for e1000_netpoll()") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	e1000e: Don't return uninitialized stats	Benjamin Poirier	1	-1/+1
	Some statistics passed to ethtool are garbage because e1000e_get_stats64() doesn't write them, for example: tx_heartbeat_errors. This leaks kernel memory to userspace and confuses users. Do like ixgbe and use dev_get_stats() which first zeroes out rtnl_link_stats64. Fixes: 5944701df90d ("net: remove useless memset's in drivers get_stats64") Reported-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	igb: Remove useless argument	Benjamin Poirier	3	-7/+7
	Given that all callers of igb_update_stats() pass the same two arguments: (adapter, &adapter->stats64), the second argument can be removed. Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	igb: check for Tx timestamp timeouts during watchdog	Jacob Keller	3	-0/+31
	The igb driver has logic to handle only one Tx timestamp at a time, using a state bit lock to avoid multiple requests at once. It may be possible, if incredibly unlikely, that a Tx timestamp event is requested but never completes. Since we use an interrupt scheme to determine when the Tx timestamp occurred we would never clear the state bit in this case. Add an igb_ptp_tx_hang() function similar to the already existing igb_ptp_rx_hang() function. This function runs in the watchdog routine and makes sure we eventually recover from this case instead of permanently disabling Tx timestamps. Note: there is no currently known way to cause this without hacking the driver code to force it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	igb: add statistic indicating number of skipped Tx timestamps	Jacob Keller	3	-0/+4
	The igb driver can only handle one Tx timestamp request at a time. This means it is possible for an application timestamp request to be ignored. There is no easy way for an administrator to determine if this occurred. Add a new statistic which tracks this, tx_hwtstamp_skipped. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	e1000e: add statistic indicating number of skipped Tx timestamps	Jacob Keller	3	-7/+12
	The e1000e driver can only handle one Tx timestamp request at a time. This means it is possible for an application timestamp request to be ignored. There is no easy way for an administrator to determine if this occurred. Add a new statistic which tracks this, tx_hwtstamp_skipped. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	igb: avoid permanent lock of *_PTP_TX_IN_PROGRESS	Jacob Keller	1	-5/+18
	The igb driver uses a state bit lock to avoid handling more than one Tx timestamp request at once. This is required because hardware is limited to a single set of registers for Tx timestamps. The state bit lock is not properly cleaned up during igb_xmit_frame_ring() if the transmit fails such as due to DMA or TSO failure. In some hardware this results in blocking timestamps until the service task times out. In other hardware this results in a permanent lock of the timestamp bit because we never receive an interrupt indicating the timestamp occurred, since indeed the packet was never transmitted. Fix this by checking for DMA and TSO errors in igb_xmit_frame_ring() and properly cleaning up after ourselves when these occur. Reported-by: Reported-by: David Mirabito <davidm@metamako.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	igb: fix race condition with PTP_TX_IN_PROGRESS bits	Jacob Keller	1	-2/+10
	Hardware related to the igb driver has a limitation of only handling one Tx timestamp at a time. Thus, the driver uses a state bit lock to enforce that only one timestamp request is honored at a time. Unfortunately this suffers from a simple race condition. The bit lock is not cleared until after skb_tstamp_tx() is called notifying the stack of a new Tx timestamp. Even a well behaved application which sends only one timestamp request at once and waits for a response might wake up and send a new packet before the bit lock is cleared. This results in needlessly dropping some Tx timestamp requests. We can fix this by unlocking the state bit as soon as we read the Timestamp register, as this is the first point at which it is safe to unlock. To avoid issues with the skb pointer, we'll use a copy of the pointer and set the global variable in the driver structure to NULL first. This ensures that the next timestamp request does not modify our local copy of the skb pointer. This ensures that well behaved applications do not accidentally race with the unlock bit. Obviously an application which sends multiple Tx timestamp requests at once will still only timestamp one packet at a time. Unfortunately there is nothing we can do about this. Reported-by: David Mirabito <davidm@metamako.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	e1000e: fix race condition around skb_tstamp_tx()	Jacob Keller	1	-2/+8
	The e1000e driver and related hardware has a limitation on Tx PTP packets which requires we limit to timestamping a single packet at once. We do this by verifying that we never request a new Tx timestamp while we still have a tx_hwtstamp_skb pointer. Unfortunately the driver suffers from a race condition around this. The tx_hwtstamp_skb pointer is not set to NULL until after skb_tstamp_tx() is called. This function notifies the stack and applications of a new timestamp. Even a well behaved application that only sends a new request when the first one is finished might be woken up and possibly send a packet before we can free the timestamp in the driver again. The result is that we needlessly ignore some Tx timestamp requests in this corner case. Fix this by assigning the tx_hwtstamp_skb pointer prior to calling skb_tstamp_tx() and use a temporary pointer to hold the timestamped skb until that function finishes. This ensures that the application is not woken up until the driver is ready to begin timestamping a new packet. This ensures that well behaved applications do not accidentally race with condition to skip Tx timestamps. Obviously an application which sends multiple Tx timestamp requests at once will still only timestamp one packet at a time. Unfortunately there is nothing we can do about this. Reported-by: David Mirabito <davidm@metamako.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	igb: mark PM functions as __maybe_unused	Arnd Bergmann	1	-13/+5
	The new wake function is only used by the suspend/resume handlers that are defined in inside of an #ifdef, which can cause this harmless warning: drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: 'igb_deliver_wake_packet' defined but not used [-Wunused-function] Removing the #ifdef, instead using a __maybe_unused annotation simplifies the code and avoids the warning. Fixes: b90fa8763560 ("igb: Enable reading of wake up packet") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06	igb: Explicitly select page 0 at initialization	Matwey V Kornilov	1	-0/+1
	The functions igb_read_phy_reg_gs40g/igb_write_phy_reg_gs40g (which were removed in 2a3cdea) explicitly selected the required page at every phy_reg access. Currently, igb_get_phy_id_82575 relays on the fact that page 0 is already selected. The assumption is not fulfilled for my Lex 3I380CW motherboard with integrated dual i211 based gigabit ethernet. This leads to igb initialization failure and network interfaces are not working: igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k igb: Copyright (c) 2007-2014 Intel Corporation. igb: probe of 0000:01:00.0 failed with error -2 igb: probe of 0000:02:00.0 failed with error -2 In order to fix it, we explicitly select page 0 before first access to phy registers. See also: https://bugzilla.suse.com/show_bug.cgi?id=1009911 See also: http://www.lex.com.tw/products/pdf/3I380A&3I380CW.pdf Fixes: 2a3cdea ("igb: Remove GS40G specific defines/functions") Cc: <stable@vger.kernel.org> # 4.5+ Signed-off-by: Matwey V Kornilov <matwey@sai.msu.ru> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-05	mdio: mux: fix an incorrect less than zero error check using a u32	Colin Ian King	1	-1/+1
	The u32 variable v is being checked to see if an error return is less than zero and this check has no effect because it is unsigned. Fix this by making v and int (this also matches the type of cb->bus_number which is assigned to the value in v). Detected by CoverityScan, CID#1440454 ("Unsigned compared against zero") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05	net-next: stmmac: dwmac-sun8i: ensure the EPHY is properly reseted	Icenowy Zheng	1	-0/+5
	The EPHY may be already enabled by bootloaders which have Ethernet capability (e.g. current U-Boot). Thus it should be reseted properly before doing the enabling sequence in the dwmac-sun8i driver, otherwise the EMAC reset process may fail if no cable is plugged, and then fail the dwmac-sun8i probing. Tested on Orange Pi PC, One and Zero. All the boards fail to have dwmac-sun8i probed with "EMAC reset timeout" without cable plugged before, and with this fix they're now all able to successfully probe the EMAC without cable plugged and then use the connection after a cable is hot-plugged in. Fixes: 9f93ac8d408 ("net-next: stmmac: Add dwmac-sun8i") Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Corentin Labbe <clabbe.montjoie@gmail.com> Acked-by: is not as formal as Signed-off-by:. It is a record that the acker Reviewed-by: is similar. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05	net/3com: Make el3_netdev_get_ecmd return void	yuval.shaia@oracle.com	1	-5/+3
	Make return value void since function never returns meaningfull value. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05	net/{mii, smsc}: Make mii_ethtool_get_link_ksettings and smc_netdev_get_ecmd return void	yuval.shaia@oracle.com	29	-73/+78
	Make return value void since functions never returns meaningfull value. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05	net/dec: Make __de_get_link_ksettings return void	yuval.shaia@oracle.com	1	-7/+4
	Make return value void since function never return meaningfull value Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05	net: sched: select cls when cls_act is enabled	Jiri Pirko	1	-0/+1
	It really makes no sense to have cls_act enabled without cls. In that case, the cls_act code is dead. So select it. This also fixes an issue recently reported by kbuild robot: [linux-next:master 1326/4151] net/sched/act_api.c:37:18: error: implicit declaration of function 'tcf_chain_get' Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05	genetlink: remove ops_list from genetlink header.	Rosen, Rami	1	-1/+0
	commit d91824c08fbc ("genetlink: register family ops as array") removed the ops_list member from both genl_family and genl_ops; while the documentation of genl_family was updated accordingly by this patch, ops_list remained in the documentation of the genl_ops object. This patch fixes it by removing ops_list from genl_ops documentation. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05	rxrpc: Add service upgrade support for client connections	David Howells	8	-12/+106
	Make it possible for a client to use AuriStor's service upgrade facility. The client does this by adding an RXRPC_UPGRADE_SERVICE control message to the first sendmsg() of a call. This takes no parameters. When recvmsg() starts returning data from the call, the service ID field in the returned msg_name will reflect the result of the upgrade attempt. If the upgrade was ignored, srx_service will match what was set in the sendmsg(); if the upgrade happened the srx_service will be altered to indicate the service the server upgraded to. Note that: (1) The choice of upgrade service is up to the server (2) Further client calls to the same server that would share a connection are blocked if an upgrade probe is in progress. (3) This should only be used to probe the service. Clients should then use the returned service ID in all subsequent communications with that server (and not set the upgrade). Note that the kernel will not retain this information should the connection expire from its cache. (4) If a server that supports upgrading is replaced by one that doesn't, whilst a connection is live, and if the replacement is running, say, OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement server will not respond to packets sent to the upgraded connection. At this point, calls will time out and the server must be reprobed. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05	rxrpc: Implement service upgrade	David Howells	7	-12/+71
	Implement AuriStor's service upgrade facility. There are three problems that this is meant to deal with: (1) Various of the standard AFS RPC calls have IPv4 addresses in their requests and/or replies - but there's no room for including IPv6 addresses. (2) Definition of IPv6-specific RPC operations in the standard operation sets has not yet been achieved. (3) One could envision the creation a new service on the same port that as the original service. The new service could implement improved operations - and the client could try this first, falling back to the original service if it's not there. Unfortunately, certain servers ignore packets addressed to a service they don't implement and don't respond in any way - not even with an ABORT. This means that the client must then wait for the call timeout to occur. What service upgrade does is to see if the connection is marked as being 'upgradeable' and if so, change the service ID in the server and thus the request and reply formats. Note that the upgrade isn't mandatory - a server that supports only the original call set will ignore the upgrade request. In the protocol, the procedure is then as follows: (1) To request an upgrade, the first DATA packet in a new connection must have the userStatus set to 1 (this is normally 0). The userStatus value is normally ignored by the server. (2) If the server doesn't support upgrading, the reply packets will contain the same service ID as for the first request packet. (3) If the server does support upgrading, all future reply packets on that connection will contain the new service ID and the new service ID will be applied to all further calls on that connection as well. (4) The RPC op used to probe the upgrade must take the same request data as the shadow call in the upgrade set (but may return a different reply). GetCapability RPC ops were added to all standard sets for just this purpose. Ops where the request formats differ cannot be used for probing. (5) The client must wait for completion of the probe before sending any further RPC ops to the same destination. It should then use the service ID that recvmsg() reported back in all future calls. (6) The shadow service must have call definitions for all the operation IDs defined by the original service. To support service upgrading, a server should: (1) Call bind() twice on its AF_RXRPC socket before calling listen(). Each bind() should supply a different service ID, but the transport addresses must be the same. This allows the server to receive requests with either service ID. (2) Enable automatic upgrading by calling setsockopt(), specifying RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of unsigned shorts as the argument: unsigned short optval[2]; This specifies a pair of service IDs. They must be different and must match the service IDs bound to the socket. Member 0 is the service ID to upgrade from and member 1 is the service ID to upgrade to. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05	rxrpc: Permit multiple service binding	David Howells	6	-24/+51
	Permit bind() to be called on an AF_RXRPC socket more than once (currently maximum twice) to bind multiple listening services to it. There are some restrictions: (1) All bind() calls involved must have a non-zero service ID. (2) The service IDs must all be different. (3) The rest of the address (notably the transport part) must be the same in all (a single UDP socket is shared). (4) This must be done before listen() or sendmsg() is called. This allows someone to connect to the service socket with different service IDs and lays the foundation for service upgrading. The service ID used by an incoming call can be extracted from the msg_name returned by recvmsg(). Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05	rxrpc: Separate the connection's protocol service ID from the lookup ID	David Howells	10	-13/+19
	Keep the rxrpc_connection struct's idea of the service ID that is exposed in the protocol separate from the service ID that's used as a lookup key. This allows the protocol service ID on a client connection to get upgraded without making the connection unfindable for other client calls that also would like to use the upgraded connection. The connection's actual service ID is then returned through recvmsg() by way of msg_name. Whilst we're at it, we get rid of the last_service_id field from each channel. The service ID is per-connection, not per-call and an entire connection is upgraded in one go. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-04	mlxsw: spectrum_router: Align RIF index allocation with existing code	Ido Schimmel	1	-12/+12
	The way we usually allocate an index is by letting the allocation function return an error instead of an invalid index. Do the same for RIF index. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	mlxsw: Fix typo inside enumeration	Ido Schimmel	2	-4/+4
	Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	mlxsw: spectrum: Tidy up header file	Ido Schimmel	1	-22/+25
	Make it clear where functions are defined and move misplaced declaration to their correct place. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	mlxsw: spectrum: Rename the firmware file	Yotam Gigi	1	-1/+1
	Change the firmware file name to be in "mellanox" directory. This commit is a followup to the linux-firmware commit a4c72696f5f4 ("Mellanox: Add firmware for mlxsw_spectrum") Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qede: VF XDP support	Mintz, Yuval	1	-3/+31
	This introduces 2 changes needed for XDP to be supported for VFs: a. On VF-side, publish the NDO based on qed outputs b. On PF-side, request qed to allocate sufficient cids per-VF to allow the child vfs to support it Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: VF XDP support	Mintz, Yuval	4	-7/+40
	The final addition on the qed front - - VFs would now require their PFs to provide multiple CIDs - Based on the availability of connections from PF, determine whether XDP is feasible and share it with qede via dev_info. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: VFs to try utilizing the doorbell bar	Mintz, Yuval	7	-72/+231
	VFs are currently not mapping their doorbell bar, instead relying on the small doorbell window they have in their limited regview bar. In order to increase the number of possible Tx connections [queues] employeed by VF past 16, we need to start using the doorbell bar if one such is exposed - VF would communicate this fact to PF which would return the size-bar internally configured into chip, according to which the VF would decide whether to actually utilize the doorbell bar. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: Multiple qzone queues for VFs	Mintz, Yuval	7	-35/+215
	This adds the infrastructure for supporting VFs that want to open multiple transmission queues on the same queue-zone. At this point, there are no VFs that actually request this functionality, but later patches would remedy that. a. VF and PF would communicate the capability during ACQUIRE; Legacy VFs would continue on behaving as they do today b. PF would communicate number of supported CIDs to the VF and would enforce said limitation c. Whenever VF passes a request for a given queue configuration it would also pass an associated index within said queue-zone Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: IOV db support multiple queues per qzone	Mintz, Yuval	4	-72/+123
	Allow the infrastructure a PF maintains for each one of its VFs to support multiple queue-cids on a single queue-zone. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: Make VF legacy a bitfield	Mintz, Yuval	3	-21/+33
	Until now we used to have a single VF legacy compatibility mode, one that affected the place of the Rx producers of those VFs [mostly]. As PF would soon support allocating CIDs for VFs instead of having a static CID<->queue configuration for them, we'll need to have an additional legacy mode since existing VFs would need to continue on using the older mode of operation. Change the infrastrucutre so that the legacy would be able to indicate which of the legacy behaviors is needed for a given VF. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: Assign a unique per-queue index to queue-cid	Mintz, Yuval	4	-4/+88
	When a queue-cid is allocated, assign an index inside that's CID's queue-zone. For PFs and VFS, this number is going to be unique and derive from a per-queue-zone bitmap, while for PF's VFs queues the number is currently going to constant; Later, we'd add the capability of a VF to communicate such an index to its PF. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: Pass vf_params when creating a queue-cid	Mintz, Yuval	3	-45/+95
	We're going to need additional information for queue-cids that a PF creates for its VFs, so start by refactoring existing logic used for initializing said struct into receiving a structure encapsulating the VF-specific information that needs to be provided. This also introduces QED_QUEUE_CID_SELF - each queue-cid would hold an indication to whether it belongs to the hw-function holding it [whether that's a PF or a VF], or else what's the VF id it belongs to. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed*: L2 interface to use the SB structures directly	Mintz, Yuval	6	-32/+52
	Part of an effort of a cleaner seperation between qed and the protocol drivers, the L2 interface is to use the SB structure for initialization purposes opaquely. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: Create L2 queue database	Mintz, Yuval	6	-3/+133
	First step in allowing a single PF/VF to open multiple queues on the same queue zone is to add per-hwfn database of queue-cids as a two-dimensional array where entry would be according to [queue zone][internal index]. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	qed: Add bitmaps for VF CIDs	Mintz, Yuval	2	-74/+202
	Each PF has a bitmap for its own ranges of CIDs, to allow easy grabbing of an available CID when such is needed. But VFs are not using the same mechanism, instead relying on hard-coded CIDs [ queue-index == cid ]. As an infrastructure step toward increasing number of CIDs of VFs, the PF is going to maintain bitmaps for the VF CIDs as well - the bitmaps would be per-VF and the ranges would be the same [in HW all VFs of a given PF have the same mapping of CIDs, and the HW is capable of distinguishing between those according to the VF index] Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	virtio_net: check return value of skb_to_sgvec always	Jason A. Donenfeld	1	-2/+7
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	macsec: check return value of skb_to_sgvec always	Jason A. Donenfeld	1	-2/+11
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	rxrpc: check return value of skb_to_sgvec always	Jason A. Donenfeld	1	-5/+14
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	ipsec: check return value of skb_to_sgvec always	Jason A. Donenfeld	4	-18/+38
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow	Jason A. Donenfeld	2	-27/+46
	This is a defense-in-depth measure in response to bugs like 4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). There's not only a potential overflow of sglist items, but also a stack overflow potential, so we fix this by limiting the amount of recursion this function is allowed to do. Not actually providing a bounded base case is a future disaster that we can easily avoid here. As a small matter of house keeping, we take this opportunity to move the documentation comment over the actual function the documentation is for. While this could be implemented by using an explicit stack of skbuffs, when implementing this, the function complexity increased considerably, and I don't think such complexity and bloat is actually worth it. So, instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS, and measured the stack usage there. I also reverted the recent MIPS changes that give it a separate IRQ stack, so that I could experience some worst-case situations. I found that limiting it to 24 layers deep yielded a good stack usage with room for safety, as well as being much deeper than any driver actually ever creates. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	bpf: update perf event helper functions documentation	Teng Qin	2	-8/+14
	This commit updates documentation of the bpf_perf_event_output and bpf_perf_event_read helpers to match their implementation. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	samples/bpf: add tests for more perf event types	Teng Qin	4	-56/+228
	$ trace_event tests attaching BPF program to HW_CPU_CYCLES, SW_CPU_CLOCK, HW_CACHE_L1D and other events. It runs 'dd' in the background while bpf program collects user and kernel stack trace on counter overflow. User space expects to see sys_read and sys_write in the kernel stack. $ tracex6 tests reading of various perf counters from BPF program. Both tests were refactored to increase coverage and be more accurate. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	perf, bpf: Add BPF support to all perf_event types	Alexei Starovoitov	4	-56/+48
	Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all perf_event types, including HW_CACHE, RAW, and dynamic pmu events. Only tracepoint/kprobe events are treated differently which require BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly. Also add support for reading all event counters using bpf_perf_event_read() helper. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	neigh: Really delete an arp/neigh entry on "ip neigh delete" or "arp -d"	Sowmini Varadhan	3	-11/+54
	The command # arp -s 62.2.0.1 a:b:c:d:e:f dev eth2 adds an entry like the following (listed by "arp -an") ? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2 but the symmetric deletion command # arp -i eth2 -d 62.2.0.1 does not remove the PERM entry from the table, and instead leaves behind ? (62.2.0.1) at <incomplete> on eth2 The reason is that there is a refcnt of 1 for the arp_tbl itself (neigh_alloc starts off the entry with a refcnt of 1), thus the neigh_release() call from arp_invalidate() will (at best) just decrement the ref to 1, but will never actually free it from the table. To fix this, we need to do something like neigh_forced_gc: if the refcnt is 1 (i.e., on the table's ref), remove the entry from the table and free it. This patch refactors and shares common code between neigh_forced_gc and the newly added neigh_remove_one. A similar issue exists for IPv6 Neighbor Cache entries, and is fixed in a similar manner by this patch. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	net: phy: smsc: Implement PHY statistics	Andrew Lunn	1	-0/+72
	Most of the PHYs supported by the SMSC driver have a counter of symbol errors. This is 16 bit wide and wraps around when it reaches its maximum value. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-By: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot	Andrew Lunn	1	-2/+2
	The mv88e6161 was using the wrong method to perform statistics snapshot. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04	net: dsa: mv88e6xxx: 6161 uses global 2 for PHY access	Andrew Lunn	1	-4/+4
	Access to the internal PHYs of the 6161 and 6123 go through global 2 SMI registers. Fix the ops structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>