linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-09-06	vxlan: Update tx_errors statistics if vxlan_build_skb return err.	Haishuang Yan	1	-0/+1
	If vxlan_build_skb return err < 0, tx_errors should be also increased. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	net/mlx4_en: protect ring->xdp_prog with rcu_read_lock	Brenden Blanco	3	-11/+19
	Depending on the preempt mode, the bpf_prog stored in xdp_prog may be freed despite the use of call_rcu inside bpf_prog_put. The situation is possible when running in PREEMPT_RCU=y mode, for instance, since the rcu callback for destroying the bpf prog can run even during the bh handling in the mlx4 rx path. Several options were considered before this patch was settled on: Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all of the rings are updated with the new program. This approach has the disadvantage that as the number of rings increases, the speed of update will slow down significantly due to napi_synchronize's msleep(1). Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh. The action of the bpf_prog_put_bh would be to then call bpf_prog_put later. Those drivers that consume a bpf prog in a bh context (like mlx4) would then use the bpf_prog_put_bh instead when the ring is up. This has the problem of complexity, in maintaining proper refcnts and rcu lists, and would likely be harder to review. In addition, this approach to freeing must be exclusive with other frees of the bpf prog, for instance a _bh prog must not be referenced from a prog array that is consumed by a non-_bh prog. The placement of rcu_read_lock in this patch is functionally the same as putting an rcu_read_lock in napi_poll. Actually doing so could be a potentially controversial change, but would bring the implementation in line with sk_busy_loop (though of course the nature of those two paths is substantially different), and would also avoid future copy/paste problems with future supporters of XDP. Still, this patch does not take that opinionated option. Testing was done with kernels in either PREEMPT_RCU=y or CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did not show up in the perf report whatsoever, and with PREEMPT_RCU=y the overhead of rcu_read_lock (according to perf) was the same before/after. In the rx path, rcu_read_lock is eventually called for every packet from netif_receive_skb_internal, so the napi poll call's rcu_read_lock is easily amortized. v2: Remove extra rcu_read_lock in mlx4_en_process_rx_cq body Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or rcu_dereference[_protected] as appropriate. Add explicit mutex lock around rcu_assign instead of xchg loop. Fixes: d576acf0a22 ("net/mlx4_en: add page recycle to prepare rx ring for tx support") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	net: ethernet: mediatek: enhance RX path by aggregating more SKBs into NAPI	Sean Wang	1	-14/+17
	The patch adds support for aggregating more SKBs feed into NAPI in order to get more benefits from generic receive offload (GRO) by peeking at the RX ring status and moving more packets right before returning from NAPI RX polling handler if NAPI budgets are still available and some packets already present in hardware. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	net: ethernet: mediatek: enhance RX path by reducing the frequency of the memory barrier used	Sean Wang	1	-5/+6
	The patch makes move wmb() to outside the loop that could help RX path handling more faster although that RX descriptors aren't freed for DMA to use as soon as possible, but based on my experiment and the result shows it still can reach about 943Mbpis without performance drop that is tested based on the setup with one port using Giga PHY and 256 RX descriptors for DMA to move. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	hso: Convert printk to pr_<level>	Joe Perches	1	-10/+11
	Use a more common logging style Miscellanea: o Add pr_fmt to prefix each output message o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	hso: Use a more common logging style	Joe Perches	1	-53/+44
	Macros that end in an underscore are just odd. Add hso_dbg(level, fmt, ...) and use it everwhere instead. Several uses had additional unnecessary newlines as the macro added a newline. Remove the newline from the macro and add newlines to each use as appropriate. Remove now unused D<digit> macros. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	smsc95xx: Add mdix control via ethtool	Woojung Huh	1	-3/+106
	Add mdix control through ethtool. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	smsc95xx: Add register define	Woojung Huh	1	-0/+8
	Add STRAP_STATUS defines. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	smsc95xx: Add maintainer	Woojung Huh	1	-0/+1
	Add Microchip Linux Driver Support as maintainer because this driver is maintaining by Microchip. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	net: dsa: mv88e6xxx: make global2 code optional	Vivien Didelot	4	-1/+74
	Since not every chip has a Global2 set of registers, make its support optional, in which case the related functions will return -EOPNOTSUPP. This also allows to reduce the size of the mv88e6xxx driver for devices such as home routers embedding Ethernet chips without Global2 support. It is present on most recent chips, thus enable its support by default. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	net: dsa: mv88e6xxx: move Global2 code	Vivien Didelot	5	-450/+521
	Marvell chips are composed of multiple SMI devices. One of them at address 0x1C is called Global2. It provides an extended set of registers, used for interrupt control, EEPROM access, indirect PHY access (to bypass the PHY Polling Unit) and cross-chip related setup. Most chips have it, but some others don't (older ones such as 6060). Now that its related code is isolated in mv88e6xxx_g2_* functions, move it to its own global2.c file, making most of its setup code static. Document each registers in the meantime. Its compilation can be later avoided for chips without such registers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06	net: dsa: mv88e6xxx: fix module naming	Vivien Didelot	1	-1/+2
	Since the mv88e6xxx.c file has been renamed, the driver compiled as a module is called chip.ko instead of mv88e6xxx.ko. Fix this. Fixes: fad09c73c270 ("net: dsa: mv88e6xxx: rename single-chip support") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04	rxrpc Move enum rxrpc_command to sendmsg.c	David Howells	2	-7/+7
	Move enum rxrpc_command to sendmsg.c as it's now only used in that file. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	fs/afs/flock: Remove deprecated create_singlethread_workqueue	Bhaktipriya Shridhar	1	-2/+2
	The workqueue "afs_lock_manager" queues work item &vnode->lock_work, per vnode. Since there can be multiple vnodes and since their work items can be executed concurrently, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure because the workqueue is being used on a memory reclaim path. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	rxrpc: Rearrange net/rxrpc/sendmsg.c	David Howells	1	-281/+277
	Rearrange net/rxrpc/sendmsg.c to be in a more logical order. This makes it easier to follow and eliminates forward declarations. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	fs/afs/callback: Remove deprecated create_singlethread_workqueue	Bhaktipriya Shridhar	1	-2/+2
	The workqueue "afs_callback_update_worker" queues multiple work items viz &vnode->cb_broken_work, &server->cb_break_work which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. Since the workqueue is being used on a memory reclaim path, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	rxrpc: Split sendmsg from packet transmission code	David Howells	6	-635/+659
	Split the sendmsg code from the packet transmission code (mostly to be found in output.c). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	fs/afs/rxrpc: Remove deprecated create_singlethread_workqueue	Bhaktipriya Shridhar	1	-1/+1
	The workqueue "afs_async_calls" queues work item &call->async_work per afs_call. Since there could be multiple calls and since these calls can be run concurrently, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure because the workqueue is being used on a memory reclaim path. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	fs/afs/vlocation: Remove deprecated create_singlethread_workqueue	Bhaktipriya Shridhar	1	-2/+2
	The workqueue "afs_vlocation_update_worker" queues a single work item &afs_vlocation_update and hence it doesn't require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. Since the workqueue is being used on a memory reclaim path, WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	rxrpc: Don't change the epoch	David Howells	1	-24/+8
	It seems the local epoch should only be changed on boot, so remove the code that changes it for client connections. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	rxrpc: Randomise epoch and starting client conn ID values	David Howells	2	-1/+9
	Create a random epoch value rather than a time-based one on startup and set the top bit to indicate that this is the case. Also create a random starting client connection ID value. This will be incremented from here as new client connections are created. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04	net: ti: cpmac: Fix compiler warning due to type confusion	Paul Burton	1	-2/+3
	cpmac_start_xmit() used the max() macro on skb->len (an unsigned int) and ETH_ZLEN (a signed int literal). This led to the following compiler warning: In file included from include/linux/list.h:8:0, from include/linux/module.h:9, from drivers/net/ethernet/ti/cpmac.c:19: drivers/net/ethernet/ti/cpmac.c: In function 'cpmac_start_xmit': include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast (void) (&_max1 == &_max2); \ ^ drivers/net/ethernet/ti/cpmac.c:560:8: note: in expansion of macro 'max' len = max(skb->len, ETH_ZLEN); ^ On top of this, it assigned the result of the max() macro to a signed integer whilst all further uses of it result in it being cast to varying widths of unsigned integer. Fix this up by using max_t to ensure the comparison is performed as unsigned integers, and for consistency change the type of the len variable to unsigned int. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04	cxgb4: Add support for ndo_get_vf_config	Hariprasad Shenai	3	-2/+72
	Adds support for ndo_get_vf_config, also fill the default mac address that will be provided to the VF by firmware, in case user doesn't provide one. So user can get the default MAC address address also through ndo_get_vf_config. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04	netns: avoid disabling irq for netns id	WANG Cong	1	-20/+15
	We never read or change netns id in hardirq context, the only place we read netns id in softirq context is in vxlan_xmit(). So, it should be enough to just disable BH. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04	vxlan: call peernet2id() in fdb notification	WANG Cong	2	-2/+2
	netns id should be already allocated each time we change netns, that is, in dev_change_net_namespace() (more precisely in rtnl_fill_ifinfo()). It is safe to just call peernet2id() here. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04	openvswitch: Free tmpl with tmpl_free.	Joe Stringer	1	-1/+1
	When an error occurs during conntrack template creation as part of actions validation, we need to free the template. Previously we've been using nf_ct_put() to do this, but nf_ct_tmpl_free() is more appropriate. Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04	rxrpc: The client call state must be changed before attachment to conn	David Howells	2	-2/+4
	We must set the client call state to RXRPC_CALL_CLIENT_SEND_REQUEST before attaching the call to the connection struct, not after, as it's liable to receive errors and conn aborts as soon as the assignment is made - and these will cause its state to be changed outside of the initiating thread's control. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-02	liquidio:CN23XX pause frame support	Raghu Vatsavayi	3	-7/+126
	Adds support for pause frame and priv flag for cn23xx device. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	liquidio: CN23XX napi support	Raghu Vatsavayi	3	-6/+30
	This patch adds NAPI related support for cn23xx device. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	liquidio: CN23XX health monitoring	Raghu Vatsavayi	3	-2/+132
	Adds support for watchdog based health monitoring of octeon cores on cn23xx device. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	liquidio: ethtool and led control support	Raghu Vatsavayi	4	-8/+390
	This patch adds support for some control operations like LED identification, ethtool statistics and intr config for cn23xx device. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	liquidio: CN23XX octeon3 instruction	Raghu Vatsavayi	8	-76/+212
	Adds support for data path related changes based on octeon3 instruction header(ih3) for cn23xx. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	liquidio: CN23XX IQ access	Raghu Vatsavayi	1	-0/+66
	Adds support for Instruction Queue(IQ) index manipulation routines through bar1 of cn23xx. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	liquidio: RX control commands	Raghu Vatsavayi	3	-13/+96
	Adds support for RX control commands on cn23xx device. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	liquidio: link and control commands	Raghu Vatsavayi	3	-8/+76
	This patch adds work queue support for link status and control commands. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	tipc: send broadcast nack directly upon sequence gap detection	Jon Paul Maloy	1	-7/+16
	Because of the risk of an excessive number of NACK messages and retransissions, receivers have until now abstained from sending broadcast NACKS directly upon detection of a packet sequence number gap. We have instead relied on such gaps being detected by link protocol STATE message exchange, something that by necessity delays such detection and subsequent retransmissions. With the introduction of unicast NACK transmission and rate control of retransmissions we can now remove this limitation. We now allow receiving nodes to send NACKS immediately, while coordinating the permission to do so among the nodes in order to avoid NACK storms. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	tipc: rate limit broadcast retransmissions	Jon Paul Maloy	1	-5/+47
	As cluster sizes grow, so does the amount of identical or overlapping broadcast NACKs generated by the packet receivers. This often leads to 'NACK crunches' resulting in huge numbers of redundant retransmissions of the same packet ranges. In this commit, we introduce rate control of broadcast retransmissions, so that a retransmitted range cannot be retransmitted again until after at least 10 ms. This reduces the frequency of duplicate, redundant retransmissions by an order of magnitude, while having a significant positive impact on overall throughput and scalability. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	tipc: transfer broadcast nacks in link state messages	Jon Paul Maloy	7	-27/+108
	When we send broadcasts in clusters of more 70-80 nodes, we sometimes see the broadcast link resetting because of an excessive number of retransmissions. This is caused by a combination of two factors: 1) A 'NACK crunch", where loss of broadcast packets is discovered and NACK'ed by several nodes simultaneously, leading to multiple redundant broadcast retransmissions. 2) The fact that the NACKS as such also are sent as broadcast, leading to excessive load and packet loss on the transmitting switch/bridge. This commit deals with the latter problem, by moving sending of broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused bits in word 8 of the said message for this purpose, and introduce a new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change backwards compatible. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	net: stmmac: dwmac-rk: add pd_gmac support for rk3399	David Wu	1	-0/+9
	Add the gmac power domain support for rk3399, in order to save more power consumption. Signed-off-by: David Wu <david.wu@rock-chips.com> Signed-off-by: Caesar Wang <wxt@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	net: stmmac: dwmac-rk: fixes the gmac resume after PD on/off	Roger Chen	1	-9/+10
	GMAC Power Domain(PD) will be disabled during suspend. That will causes GRF registers reset. So corresponding GRF registers for GMAC must be setup again. Signed-off-by: Roger Chen <roger.chen@rock-chips.com> Signed-off-by: Caesar Wang <wxt@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	net: stmmac: dwmac-rk: add rk3366 & rk3399 specific data	Roger Chen	2	-2/+232
	Add constants and callback functions for the dwmac on rk3228/rk3229 socs. As can be seen, the base structure is the same, only registers and the bits in them moved slightly. Signed-off-by: Roger Chen <roger.chen@rock-chips.com> Signed-off-by: Caesar Wang <wxt@rock-chips.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	rxrpc: Fix uninitialised variable warning	David Howells	1	-3/+2
	Fix the following uninitialised variable warning: ../net/rxrpc/call_event.c: In function 'rxrpc_process_call': ../net/rxrpc/call_event.c:879:58: warning: 'error' may be used uninitialized in this function [-Wmaybe-uninitialized] _debug("post net error %d", error); ^ Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-02	rxrpc: fix undefined behavior in rxrpc_mark_call_released	Arnd Bergmann	1	-1/+1
	gcc -Wmaybe-initialized correctly points out a newly introduced bug through which we can end up calling rxrpc_queue_call() for a dead connection: net/rxrpc/call_object.c: In function 'rxrpc_mark_call_released': net/rxrpc/call_object.c:600:5: error: 'sched' may be used uninitialized in this function [-Werror=maybe-uninitialized] This sets the 'sched' variable to zero to restore the previous behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state") Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-02	switchdev: Fix return value of switchdev_port_fdb_dump().	Rosen, Rami	1	-1/+1
	This patch fixes the retun value of switchdev_port_fdb_dump() when CONFIG_NET_SWITCHDEV is not set. This avoids getting "warning: return makes integer from pointer without a cast [-Wint-conversion]" when building when CONFIG_NET_SWITCHDEV is not set under several compiler versions. This warning is due to commit d297653dd6f07afbe7e6c702a4bcd7615680002e ("rtnetlink: fdb dump: optimize by saving last interface markers"). Signed-off-by: Rami Rosen <rami.rosen@intel.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	samples/bpf: add sampleip example	Brendan Gregg	3	-0/+238
	sample instruction pointer and frequency count in a BPF map Signed-off-by: Brendan Gregg <bgregg@netflix.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	samples/bpf: add perf_event+bpf example	Alexei Starovoitov	5	-1/+290
	The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++ It's primary purpose to check that key bpf helpers like map lookup, update, get_stackid, trace_printk and ctx access are all working. It checks: - PERF_COUNT_HW_CPU_CYCLES on all cpus - PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children - PERF_COUNT_SW_CPU_CLOCK on all cpus - PERF_COUNT_SW_CPU_CLOCK for current process Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs	Alexei Starovoitov	3	-1/+96
	Allow attaching BPF_PROG_TYPE_PERF_EVENT programs to sw and hw perf events via overflow_handler mechanism. When program is attached the overflow_handlers become stacked. The program acts as a filter. Returning zero from the program means that the normal perf_event_output handler will not be called and sampling event won't be stored in the ring buffer. The overflow_handler_context==NULL is an additional safety check to make sure programs are not attached to hw breakpoints and watchdog in case other checks (that prevent that now anyway) get accidentally relaxed in the future. The program refcnt is incremented in case perf_events are inhereted when target task is forked. Similar to kprobe and tracepoint programs there is no ioctl to detach the program or swap already attached program. The user space expected to close(perf_event_fd) like it does right now for kprobe+bpf. That restriction simplifies the code quite a bit. The invocation of overflow_handler in __perf_event_overflow() is now done via READ_ONCE, since that pointer can be replaced when the program is attached while perf_event itself could have been active already. There is no need to do similar treatment for event->prog, since it's assigned only once before it's accessed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	bpf: perf_event progs should only use preallocated maps	Alexei Starovoitov	1	-1/+21
	Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use preallocated hash maps, since doing memory allocation in overflow_handler can crash depending on where nmi got triggered. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type	Alexei Starovoitov	5	-0/+86
	Introduce BPF_PROG_TYPE_PERF_EVENT programs that can be attached to HW and SW perf events (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE correspondingly in uapi/linux/perf_event.h) The program visible context meta structure is struct bpf_perf_event_data { struct pt_regs regs; __u64 sample_period; }; which is accessible directly from the program: int bpf_prog(struct bpf_perf_event_data *ctx) { ... ctx->sample_period ... ... ctx->regs.ip ... } The bpf verifier rewrites the accesses into kernel internal struct bpf_perf_event_data_kern which allows changing struct perf_sample_data without affecting bpf programs. New fields can be added to the end of struct bpf_perf_event_data in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02	bpf: support 8-byte metafield access	Alexei Starovoitov	1	-3/+6
	The verifier supported only 4-byte metafields in struct __sk_buff and struct xdp_md. The metafields in upcoming struct bpf_perf_event are 8-byte to match register width in struct pt_regs. Teach verifier to recognize 8-byte metafield access. The patch doesn't affect safety of sockets and xdp programs. They check for 4-byte only ctx access before these conditions are hit. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>