aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2016-06-16vmxnet3: update to version 3Shrikrishna Khare2-3/+8
With all vmxnet3 version 3 changes incorporated in the vmxnet3 driver, the driver can configure emulation to run at vmxnet3 version 3, provided the emulation advertises support for version 3. Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16vmxnet3: introduce command to register memory regionShrikrishna Khare1-0/+17
In vmxnet3 version 3, the emulation added support for the vmxnet3 driver to communicate information about the memory regions the driver will use for rx/tx buffers. The driver can also indicate which rx/tx queue the memory region is applicable for. If this information is communicated to the emulation, the emulation will always keep these memory regions mapped, thereby avoiding the mapping/unmapping overhead for every packet. Currently, Linux vmxnet3 driver does not leverage this capability. The feasibility of using this approach for the Linux vmxnet3 driver will be investigated independently and if possible, will be part of a different patch. This patch only exposes the emulation capability to the driver (vmxnet3_defs.h is identical between the driver and the emulation). Signed-off-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16vmxnet3: add support for get_coalesce, set_coalesce ethtool operationsShrikrishna Khare4-1/+253
The emulation supports a variety of coalescing modes viz. disabled (no coalescing), adaptive, static (number of packets to batch before raising an interrupt), rate based (number of interrupts per second). This patch implements get_coalesce and set_coalesce methods to allow querying and configuring different coalescing modes. Signed-off-by: Keyong Sun <sunk@vmware.com> Signed-off-by: Manoj Tammali <tammalim@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16vmxnet3: add receive data ring supportShrikrishna Khare4-45/+193
vmxnet3 driver preallocates buffers for receiving packets and posts the buffers to the emulation. In order to deliver a received packet to the guest, the emulation must map buffer(s) and copy the packet into it. To avoid this memory mapping overhead, this patch introduces the receive data ring - a set of small sized buffers that are always mapped by the emulation. If a packet fits into the receive data ring buffer, the emulation delivers the packet via the receive data ring (which must be copied by the guest driver), or else the usual receive path is used. Receive Data Ring buffer length is configurable via ethtool -G ethX rx-mini Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16vmxnet3: allow variable length transmit data ring bufferShrikrishna Khare4-19/+64
vmxnet3 driver supports transmit data ring viz. a set of fixed size buffers used by the driver to copy packet headers. Small packets that fit these buffers are copied into these buffers entirely. Currently this buffer size of fixed at 128 bytes. This patch extends transmit data ring implementation to allow variable length transmit data ring buffers. The length of the buffer is read from the emulation during initialization. Signed-off-by: Sriram Rangarajan <rangarajans@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16vmxnet3: introduce generalized command interface to configure the deviceShrikrishna Khare1-1/+21
Shared memory is used to exchange information between the vmxnet3 driver and the emulation. In order to request emulation to perform a task, the driver first populates specific fields in this shared memory and then issues corresponding command by writing to the command register(CMD). The layout of the shared memory was defined by vmxnet3 version 1 and cannot be extended for every new command without breaking backward compatibility. To address this problem, in vmxnet3 version 3, the emulation repurposed a reserved field in the shared memory to represent command information instead. For new commands, the driver first populates the command information field in the shared memory and then issues the command. The emulation interprets the data written to the command information depending on the type of the command. This patch exposes this capability to the driver. Signed-off-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16vmxnet3: prepare for version 3 changesShrikrishna Khare6-20/+36
vmxnet3 is currently at version 2, but some command definitions from previous vmxnet3 versions are missing. Add those definitions before moving to version 3. Also, introduce utility macros for vmxnet3 version comparison and update Copyright information and Maintained by. Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16s390/qeth: fix indentation in qeth_l3_arp_querySebastian Ott1-13/+12
gcc-6 warns about obviously wrong indentation: drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_arp_query': drivers/s390/net/qeth_l3_main.c:2315:3: warning: this 'if' clause does not guard... [-Wmisleading-indentation] if (copy_to_user(udata, qinfo.udata, 4)) ^~ drivers/s390/net/qeth_l3_main.c:2317:4: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if' goto free_and_out; ^~~~ Although this particular case is harmless, fix the indentation to get rid of that warning and improve readability. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: omit outbound queue 3 for unicast packets in Priority Queuing on HiperSocketsHans Wippel2-4/+17
On HiperSockets only outbound queues 0 to 2 are available for unicast packets. Current Priority Queuing implementation in the qeth driver puts outgoing packets in outbound queues 0 to 3. This puts outgoing unicast packets into outbound queue 2 instead of outbound queue 3 when using Priority Queuing on a HiperSocket. Additionally, the default outbound queue cannot be set to outbound queue 3 on HiperSockets. Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: improve set_features error handlingHans Wippel1-11/+27
The function set_features is called to configure network device features on the hardware. If errors occur, the network device features should reflect the changed hardware state and the function should return an error in order to notify the user. In case of an error, the current implementation does not necessarily save the changed hardware state in the network device features before an error is returned. This patch improves error handling by saving features, that could be changed, to the network device features before returning an error. If the device is not running, an additional check in fix_features removes features, that require hardware changes, before they are passed to set_features. Thus, the corresponding check was removed in set_features. Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: add network device features for VLAN devicesHans Wippel2-2/+10
Network device features indicate the capabilities of network devices (e.g., TX checksum offloading and TSO) and their configuration state. Additional network device features (vlan_features) indicate for each network device, which capabilities can be used on VLAN devices, that are configured on the respective base network device. In the current qeth implementation, network device features are only set for the base network devices and not for the VLAN devices. Thus, features like TX checksum offloading cannot be used on VLAN devices. This patch adds network device features to vlan_features, so they can be used by VLAN devices. Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth layer 2 and layer 3 common feature handlingThomas Richter4-198/+111
This patch introduces a common set of fix_features and set_features functions for layer 2 and layer 3. The RX, TX and TSO offload functionality on the OSA card is enabled using ethtool at user's request and not at device initialization as done before. For layer 3 the RX checksum offloading is disabled at device initialization time. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: optimize IP handling in rx_mode callbackLakhvich Dmitriy7-458/+482
In layer3 mode of the qeth driver, multicast IP addresses from struct net_device and other type of IP addresses from other sources require mapping to the OSA-card. This patch simplifies the IP address mapping logic, and changes imple- mentation of ndo_set_rx_mode callback and ip notifier events. Addresses are stored in private hashtables instead of lists now. It allows hardware registration/removal for new/deleted multicast addresses only. Signed-off-by: Lakhvich Dmitriy <ldmitriy@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Evgeny Cherkashin <Eugene.Crosser@ru.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: introduce linearization fail count to statsEugene Crosser4-8/+23
When skb data touches too many pages, skb_linearize() is called opportunistically in the hope that less pages will be required for a big linear buffer than for multiple fragments. This patch intoduces a separate counter in ethtool statistics structure representing _failed_ linearization attempts. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Lakhvich Dmitriy <ldmitriy@ru.ibm.com> Reviewed-by: Thomas Richter <tmricht@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: enable scatter/gather by defaultEugene Crosser2-3/+4
Set scatter/gather ON by default on OSA, for both layer 2 and layer 3 modes. We always use fragmentation over QDIO anyway, so let the upper layers of the stack take advantage of that. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Lakhvich Dmitriy <ldmitriy@ru.ibm.com> Reviewed-by: Thomas Richter <tmricht@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: enable scatter/gather in layer 2 modeEugene Crosser1-1/+25
The patch enables NETIF_F_SG flag for OSA in layer 2 mode. It also adds performance accounting for fragmented sends, adds a conditional skb_linearize() attempt if the skb had too many fragments for QDIO SBAL, and fills netdevice->gso_* attributes. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Lakhvich Dmitriy <ldmitriy@ru.ibm.com> Reviewed-by: Thomas Richter <tmricht@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: fill netdevice->gso_* attributes accuratelyEugene Crosser1-1/+3
Use QETH_MAX_BUFFER_ELEMENTS(card) instead of constant 16. Also fill gso_max_segs and gso_min_segs. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: clean up condition when tso is usedEugene Crosser2-20/+30
Make conditions under which TSO is activated more stringent. Make calculation of SBALEs required for the skb more accurate. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: refactor calculation of SBALE countEugene Crosser3-41/+85
Rewrite the functions that calculate the required number of buffer elements needed to represent SKB data, to make them hopefully more comprehensible. Plus a few cleanups. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16qeth: Include error message for "OS Mismatch"Eugene Crosser1-0/+6
Having understood the semantics of BRIDGEPORT error code 0x0010, we can introduce a meaningful error message. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: xfrm: fix old-style declarationArnd Bergmann2-6/+6
Modern C standards expect the '__inline__' keyword to come before the return type in a declaration, and we get a couple of warnings for this with "make W=1" in the xfrm{4,6}_policy.c files: net/ipv6/xfrm6_policy.c:369:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static int inline xfrm6_net_sysctl_init(struct net *net) net/ipv6/xfrm6_policy.c:374:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static void inline xfrm6_net_sysctl_exit(struct net *net) net/ipv4/xfrm4_policy.c:339:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static int inline xfrm4_net_sysctl_init(struct net *net) net/ipv4/xfrm4_policy.c:344:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static void inline xfrm4_net_sysctl_exit(struct net *net) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: gianfar: fix old-style declarationArnd Bergmann1-1/+1
Modern C standards expect the '__inline__' keyword to come before the return type in a declaration, and we get a warning for this with "make W=1": drivers/net/ethernet/freescale/gianfar.c:2278:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16isdn: eicon: fix old-style declarationsArnd Bergmann2-9/+9
Modern C standards expect the '__inline__' keyword to come before the return type in a declaration, and we get many warnings for this with "make W=1" because the eicon driver has this in a header file: eicon/divasmain.c:448:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] eicon/divasmain.c:453:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] eicon/divasmain.c:458:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] eicon/divasmain.c:463:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] eicon/divasmain.c:468:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] eicon/divasmain.c:473:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] eicon/platform.h:274:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] eicon/platform.h:280:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] A similar warning gets printed for the diva_os_register_io_port() declaration, because 'register' is interpreted as a keyword instead of a variable name: In file included from eicon/diva_didd.c:21:0: eicon/platform.h:206:1: error: 'register' is not at beginning of declaration [-Werror=old-style-declaration] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16hamradio: baycom: fix old-style declarationArnd Bergmann1-3/+3
Modern C standards expect the '__inline__' keyword to come before the return type in a declaration, and we get a warning for this with "make W=1": drivers/net/hamradio/baycom_par.c:159:1: error: '__inline__' is not at beginning of declaration [-Werror=old-style-declaration] For consistency with other drivers, I'm changing '__inline__' to 'inline' at the same time. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: the space is required after ','Wei Tang1-6/+6
The space is missing after ',', and this will introduce much more noise when checking patch around. Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: do not initialise statics to 0Wei Tang1-1/+1
This patch fixes the checkpatch.pl error to dev.c: ERROR: do not initialise statics to 0 Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: tlan: don't set unused function argumentArnd Bergmann1-1/+0
We get a warning for tlan_handle_tx_eoc when building with "make W=1" drivers/net/ethernet/ti/tlan.c: In function 'tlan_handle_tx_eoc': drivers/net/ethernet/ti/tlan.c:1647:59: error: parameter 'host_int' set but not used [-Werror=unused-but-set-parameter] static u32 tlan_handle_tx_eoc(struct net_device *dev, u16 host_int) This is harmless, but removing the unused assignment lets us avoid the warning with no downside. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: qlcnic: don't set unused function argumentArnd Bergmann1-1/+0
We get a warning for qlcnic_83xx_get_mac_address when building with "make W=1": drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c: In function 'qlcnic_83xx_get_mac_address': drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c:2156:8: error: parameter 'function' set but not used [-Werror=unused-but-set-parameter] Clearly this is harmless, but there is also no point for setting the variable, so we can simply remove the assignment. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16dsa: b53: fix big-endian register accessArnd Bergmann1-12/+8
The b53 dsa register access confusingly uses __raw register accessors when both the CPU and the device are big-endian, but it uses little- endian accessors when the same device is used from a little-endian CPU, which makes no sense. This uses normal accessors in device-endianess all the time, which will work in all four combinations of register and CPU endianess, and it will have the same barrier semantics in all cases. This also seems to take care of a (false positive) warning I'm getting: drivers/net/dsa/b53/b53_mmap.c: In function 'b53_mmap_read64': drivers/net/dsa/b53/b53_mmap.c:109:10: error: 'hi' may be used uninitialized in this function [-Werror=maybe-uninitialized] *val = ((u64)hi << 32) | lo; I originally planned to submit another patch for that warning and did this one as a preparation cleanup, but it does seem to be sufficient by itself. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16mpls: allow routes on ipgre devicesSimon Horman1-3/+4
This appears to be necessary and sufficient to provide MPLS in GRE (RFC4023) support. This can be used by establishing an ipgre tunnel device and then routing MPLS over it. The following example will forward MPLS frames received with an outermost MPLS label 100 over tun1, a GRE tunnel. The forwarded packet will have the outermost MPLS LSE removed and two new LSEs added with labels 200 (outermost) and 300 (next). ip link add name tun1 type gre remote 10.0.99.193 local 10.0.99.192 ttl 225 ip link set up dev tun1 ip addr add 10.0.98.192/24 dev tun1 ip route sh echo 1 > /proc/sys/net/mpls/conf/eth0/input echo 101 > /proc/sys/net/mpls/platform_labels ip -f mpls route add 100 as 200/300 via inet 10.0.98.193 ip -f mpls route sh Also remove unnecessary braces. Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16r8152: modify the check of the flag of PHY_RESET in set_speed functionhayeswang1-2/+2
In set_speed(), BMCR_RESET would be set when the flag of PHY_RESET is set. Use BMCR_RESET to replace testing the flag of PHY_RESET. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: ethernet: ax88796: use phy_ethtool_{get|set}_link_ksettingsPhilippe Reynes1-22/+2
There are two generics functions phy_ethtool_{get|set}_link_ksettings, so we can use them instead of defining the same code in the driver. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: ethernet: ax88796: use phydev from struct net_devicePhilippe Reynes1-13/+7
The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phydev in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16ARM: dts: rockchip: add interrupt for Wake-on-Lan on RK3288Vincent Palatin1-2/+3
In order to use Wake-on-Lan on RK3288 integrated MAC, we need to wake-up the CPU on the PMT interrupt when the MAC and the PHY are in low power mode. Adding the interrupt declaration. Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: stmmac: dwmac-rk: keep the PHY up for WoLVincent Palatin1-5/+43
When suspending the machine, do not shutdown the external PHY by cutting its regulator in the mac platform driver suspend code if Wake-on-Lan is enabled, else it cannot wake us up. In order to do this, split the suspend/resume callbacks from the init/exit callbacks, so we can condition the power-down on the lack of need to wake-up from the LAN but do it unconditionally when unloading the module. Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: stmmac: allow to split suspend/resume from init/exit callbacksVincent Palatin2-2/+8
Let the stmmac platform drivers provide dedicated suspend and resume callbacks rather than always re-using the init and exits callbacks. If the driver does not provide the suspend or resume callback, we fall back to the old behavior trying to use exit or init. This allows a specific platform to perform only a partial power-down on suspend if Wake-on-Lan is enabled but always perform the full shutdown sequence if the module is unloaded. Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrateXin Long1-1/+1
Commit d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") may set sk_state CLOSING in sctp_sock_migrate, but inet_accept doesn't allow the sk_state other than ESTABLISHED/ CLOSED for sctp. So we will change sk_state to CLOSED, instead of CLOSING, as actually sk is closed already there. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16netlink: Add comment to warn about deprecated netlink rings attribute requestFabien Siron1-0/+1
Signed-off-by: Fabien Siron <fabien.siron@epita.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15bpf, maps: flush own entries on perf map releaseDaniel Borkmann3-38/+91
The behavior of perf event arrays are quite different from all others as they are tightly coupled to perf event fds, f.e. shown recently by commit e03e7ee34fdd ("perf/bpf: Convert perf_event_array to use struct file") to make refcounting on perf event more robust. A remaining issue that the current code still has is that since additions to the perf event array take a reference on the struct file via perf_event_get() and are only released via fput() (that cleans up the perf event eventually via perf_event_release_kernel()) when the element is either manually removed from the map from user space or automatically when the last reference on the perf event map is dropped. However, this leads us to dangling struct file's when the map gets pinned after the application owning the perf event descriptor exits, and since the struct file reference will in such case only be manually dropped or via pinned file removal, it leads to the perf event living longer than necessary, consuming needlessly resources for that time. Relations between perf event fds and bpf perf event map fds can be rather complex. F.e. maps can act as demuxers among different perf event fds that can possibly be owned by different threads and based on the index selection from the program, events get dispatched to one of the per-cpu fd endpoints. One perf event fd (or, rather a per-cpu set of them) can also live in multiple perf event maps at the same time, listening for events. Also, another requirement is that perf event fds can get closed from application side after they have been attached to the perf event map, so that on exit perf event map will take care of dropping their references eventually. Likewise, when such maps are pinned, the intended behavior is that a user application does bpf_obj_get(), puts its fds in there and on exit when fd is released, they are dropped from the map again, so the map acts rather as connector endpoint. This also makes perf event maps inherently different from program arrays as described in more detail in commit c9da161c6517 ("bpf: fix clearing on persistent program array maps"). To tackle this, map entries are marked by the map struct file that added the element to the map. And when the last reference to that map struct file is released from user space, then the tracked entries are purged from the map. This is okay, because new map struct files instances resp. frontends to the anon inode are provided via bpf_map_new_fd() that is called when we invoke bpf_obj_get_user() for retrieving a pinned map, but also when an initial instance is created via map_create(). The rest is resolved by the vfs layer automatically for us by keeping reference count on the map's struct file. Any concurrent updates on the map slot are fine as well, it just means that perf_event_fd_array_release() needs to delete less of its own entires. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15bpf, maps: extend map_fd_get_ptr argumentsDaniel Borkmann3-10/+24
This patch extends map_fd_get_ptr() callback that is used by fd array maps, so that struct file pointer from the related map can be passed in. It's safe to remove map_update_elem() callback for the two maps since this is only allowed from syscall side, but not from eBPF programs for these two map types. Like in per-cpu map case, bpf_fd_array_map_update_elem() needs to be called directly here due to the extra argument. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15bpf, maps: add release callbackDaniel Borkmann2-2/+8
Add a release callback for maps that is invoked when the last reference to its struct file is gone and the struct file about to be released by vfs. The handler will be used by fd array maps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Fix VLAN filtering feature if vPort has VLAN_RESTRICT flagAndrew Rybchenko3-5/+74
If vPort has VLAN_RESTRICT flag, VLAN tagged traffic will not be delivered without corresponding Rx filters which may be proxied to and moderated by hypervisor. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Update MCDI protocol definitionsEdward Cree1-48/+1279
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Disable VLAN filtering by default if not strictly requiredAndrew Rybchenko1-1/+9
If should be done after net_dev->hw_features initialization, to keep the feature there to be able to enable it later using ethtool. VLAN filtering is enforced and fixed if vPort requires usage of VLAN filters to receive tagged traffic. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: VLAN filters must only be created if the firmware supports this.Martin Habets2-0/+25
If it is not supported we simply disable the feature. For the feature to work we need firmware filter support for OUTER_VID + LOC_MAC and for OUTER_VID + LOC_MAC_IG. The low-latency firmware can match on OUTER_VID + LOC_MAC but not on OUTER_VID + LOC_MAC_IG. For the capture packet firmware it is the other way around. Only the full-feature variant can match on both combinations. Incorporates a fix by Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru> in the net_dev->[hw_]features handling. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Fix dup unknown multicast/unicast filters after datapath resetAndrew Rybchenko1-11/+69
Filter match flags are not unique criteria to be mapped to priority because of both unknown unicast and unknown multicast are mapped to LOC_MAC_IG. So, local MAC is required to map filter to priority. MCDI filter flags is unique criteria to find filter priority. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Refactor checks for invalid filter IDEdward Cree1-26/+13
Nearly every time we call efx_ef10_filter_remove_unsafe, we first check for EFX_EF10_FILTER_ID_INVALID, in which case we do nothing. So move that check into the function, simplifying all the call sites. Also, change the return type to void, since none of the callers check it. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Take mac_lock before calling efx_ef10_filter_table_probeMartin Habets3-2/+11
When trying to enslave an SFC interface to a bond the following BUG_ON was hit: kernel BUG [in ef10.c]! CPU: 0 PID: 4383 Comm: ifenslave Tainted: G ... Call Trace: efx_ef10_filter_add_vlan+0x121/0x180 [sfc] efx_ef10_filter_table_probe+0x2a2/0x4f0 [sfc] efx_ef10_set_mac_address+0x370/0x6d0 [sfc] efx_set_mac_address+0x7d/0x120 [sfc] dev_set_mac_address+0x43/0xa0 bond_enslave+0x337/0xea0 [bonding] This comes from function efx_ef10_filter_vlan_sync_rx_mode. To solve the bug we ensure the mac_lock is taken before calling efx_ef10_filter_add_vlan. But to avoid a priority inversion mac_lock must be taken before filter_sem. To satisfy these requirements we end up taking mac_lock in efx_ef10_vport_set_mac_address, efx_ef10_set_mac_address, efx_ef10_sriov_set_vf_vlan and efx_probe_filters. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Implement ndo_vlan_rx_{add, kill}_vid() callbacksAndrew Rybchenko3-2/+124
Supports HW VLAN filtering, en/disabled using ethtool. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sfc: Implement list of VLANs added over interfaceAndrew Rybchenko2-35/+295
Right now it contains dummy VLAN entry with unspecified VID only. The entry is used for the case when HW VLAN filtering is not used. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>