linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-01-16	dm9000: fix a lot of checkpatch issues	Barry Song	1	-12/+11
	recently, dm9000 codes have many checkpatch errors and warnings: WARNING: please, no space before tabs 3: FILE: dm9000.c:3: + * ^ICopyright (C) 1997 Sten Wang$ WARNING: please, no space before tabs 5: FILE: dm9000.c:5: + * ^IThis program is free software; you can redistribute it and/or$ WARNING: please, no space before tabs 6: FILE: dm9000.c:6: + * ^Imodify it under the terms of the GNU General Public License$ WARNING: please, no space before tabs 7: FILE: dm9000.c:7: + * ^Ias published by the Free Software Foundation; either version 2$ WARNING: please, no space before tabs 8: FILE: dm9000.c:8: + * ^Iof the License, or (at your option) any later version.$ WARNING: please, no space before tabs 10: FILE: dm9000.c:10: + * ^IThis program is distributed in the hope that it will be useful,$ WARNING: please, no space before tabs 11: FILE: dm9000.c:11: + * ^Ibut WITHOUT ANY WARRANTY; without even the implied warranty of$ WARNING: please, no space before tabs 12: FILE: dm9000.c:12: + * ^IMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the$ WARNING: please, no space before tabs 13: FILE: dm9000.c:13: + * ^IGNU General Public License for more details.$ WARNING: do not add new typedefs 97: FILE: dm9000.c:97: +typedef struct board_info { ERROR: spaces prohibited around that ':' (ctx:WxV) 113: FILE: dm9000.c:113: + unsigned int in_suspend :1; ^ ERROR: spaces prohibited around that ':' (ctx:WxV) 114: FILE: dm9000.c:114: + unsigned int wake_supported :1; ^ This patch fixes important errors in it. Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	packet: use percpu mmap tx frame pending refcount	Daniel Borkmann	3	-7/+62
	In PF_PACKET's packet mmap(), we can avoid using one atomic_inc() and one atomic_dec() call in skb destructor and use a percpu reference count instead in order to determine if packets are still pending to be sent out. Micro-benchmark with [1] that has been slightly modified (that is, protcol = 0 in socket(2) and bind(2)), example on a rather crappy testing machine; I expect it to scale and have even better results on bigger machines: ./packet_mm_tx -s7000 -m7200 -z700000 em1, avg over 2500 runs: With patch: 4,022,015 cyc Without patch: 4,812,994 cyc time ./packet_mm_tx -s64 -c10000000 em1 > /dev/null, stable: With patch: real 1m32.241s user 0m0.287s sys 1m29.316s Without patch: real 1m38.386s user 0m0.265s sys 1m35.572s In function tpacket_snd(), it is okay to use packet_read_pending() since in fast-path we short-circuit the condition already with ph != NULL, since we have next frames to process. In case we have MSG_DONTWAIT, we also do not execute this path as need_wait is false here anyway, and in case of _no_ MSG_DONTWAIT flag, it is okay to call a packet_read_pending(), because when we ever reach that path, we're done processing outgoing frames anyway and only look if there are skbs still outstanding to be orphaned. We can stay lockless in this percpu counter since it's acceptable when we reach this path for the sum to be imprecise first, but we'll level out at 0 after all pending frames have reached the skb destructor eventually through tx reclaim. When people pin a tx process to particular CPUs, we expect overflows to happen in the reference counter as on one CPU we expect heavy increase; and distributed through ksoftirqd on all CPUs a decrease, for example. As David Laight points out, since the C language doesn't define the result of signed int overflow (i.e. rather than wrap, it is allowed to saturate as a possible outcome), we have to use unsigned int as reference count. The sum over all CPUs when tx is complete will result in 0 again. The BUG_ON() in tpacket_destruct_skb() we can remove as well. It can _only_ be set from inside tpacket_snd() path and we made sure to increase tx_ring.pending in any case before we called po->xmit(skb). So testing for tx_ring.pending == 0 is not too useful. Instead, it would rather have been useful to test if lower layers didn't orphan the skb so that we're missing ring slots being put back to TP_STATUS_AVAILABLE. But such a bug will be caught in user space already as we end up realizing that we do not have any TP_STATUS_AVAILABLE slots left anymore. Therefore, we're all set. Btw, in case of RX_RING path, we do not make use of the pending member, therefore we also don't need to use up any percpu memory here. Also note that __alloc_percpu() already returns a zero-filled percpu area, so initialization is done already. [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	packet: don't unconditionally schedule() in case of MSG_DONTWAIT	Daniel Borkmann	1	-7/+6
	In tpacket_snd(), when we've discovered a first frame that is not in status TP_STATUS_SEND_REQUEST, and return a NULL buffer, we exit the send routine in case of MSG_DONTWAIT, since we've finished traversing the mmaped send ring buffer and don't care about pending frames. While doing so, we still unconditionally call an expensive schedule() in the packet_current_frame() "error" path, which is unnecessary in this case since it's enough to just quit the function. Also, in case MSG_DONTWAIT is not set, we should rather test for need_resched() first and do schedule() only if necessary since meanwhile pending frames could already have finished processing and called skb destructor. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	packet: improve socket create/bind latency in some cases	Daniel Borkmann	1	-11/+22
	Most people acquire PF_PACKET sockets with a protocol argument in the socket call, e.g. libpcap does so with htons(ETH_P_ALL) for all its sockets. Most likely, at some point in time a subsequent bind() call will follow, e.g. in libpcap with ... memset(&sll, 0, sizeof(sll)); sll.sll_family = AF_PACKET; sll.sll_ifindex = ifindex; sll.sll_protocol = htons(ETH_P_ALL); ... as arguments. What happens in the kernel is that already in socket() syscall, we install a proto hook via register_prot_hook() if our protocol argument is != 0. Yet, in bind() we're almost doing the same work by doing a unregister_prot_hook() with an expensive synchronize_net() call in case during socket() the proto was != 0, plus follow-up register_prot_hook() with a bound device to it this time, in order to limit traffic we get. In the case when the protocol and user supplied device index (== 0) does not change from socket() to bind(), we can spare us doing the same work twice. Similarly for re-binding to the same device and protocol. For these scenarios, we can decrease create/bind latency from ~7447us (sock-bind-2 case) to ~89us (sock-bind-1 case) with this patch. Alternatively, for the first case, if people care, they should simply create their sockets with proto == 0 argument and define the protocol during bind() as this saves a call to synchronize_net() as well (sock-bind-3 case). In all other cases, we're tied to user space behaviour we must not change, also since a bind() is not strictly required. Thus, we need the synchronize_net() to make sure no asynchronous packet processing paths still refer to the previous elements of po->prot_hook. In case of mmap()ed sockets, the workflow that includes bind() is socket() -> setsockopt(<ring>) -> bind(). In that case, a pair of {__unregister, register}_prot_hook is being called from setsockopt() in order to install the new protocol receive handler. Thus, when we call bind and can skip a re-hook, we have already previously installed the new handler. For fanout, this is handled different entirely, so we should be good. Timings on an i7-3520M machine: * sock-bind-1: 89 us * sock-bind-2: 7447 us * sock-bind-3: 75 us sock-bind-1: socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3 bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=all(0), pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0 sock-bind-2: socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3 bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1), pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0 sock-bind-3: socket(PF_PACKET, SOCK_RAW, 0) = 3 bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1), pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	i40e: Remove autogenerated Module.symvers file.	David S. Miller	1	-0/+0
	Fixes: 9d8bf54 ("i40e: associate VMDq queue with VM type") Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net/ipv4: don't use module_init in non-modular gre_offload	Paul Gortmaker	1	-8/+2
	Recent commit 438e38fadca2f6e57eeecc08326c8a95758594d4 ("gre_offload: statically build GRE offloading support") added new module_init/module_exit calls to the gre_offload.c file. The file is obj-y and can't be anything other than built-in. Currently it can never be built modular, so using module_init as an alias for __initcall can be somewhat misleading. Fix this up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. We also make the inclusion explicit. Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of device_initcall directly in this change means that the runtime impact is zero -- it will remain at level 6 in initcall ordering. As for the module_exit, rather than replace it with __exitcall, we simply remove it, since it appears only UML does anything with those, and even for UML, there is no relevant cleanup to be done here. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net/mlx4_core: clean up srq_res_start_move_to()	Paul Bolle	1	-30/+16
	Building resource_tracker.o triggers a GCC warning: drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_HW2SW_SRQ_wrapper': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3202:17: warning: 'srq' may be used uninitialized in this function [-Wmaybe-uninitialized] atomic_dec(&srq->mtt->ref_count); ^ This is a false positive. But a cleanup of srq_res_start_move_to() can help GCC here. The code currently uses a switch statement where a plain if/else would do, since only two of the switch's four cases can ever occur. Dropping that switch makes the warning go away. While we're at it, add some missing braces, and convert state to the correct type. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net/mlx4_core: clean up cq_res_start_move_to()	Paul Bolle	1	-33/+19
	Building resource_tracker.o triggers a GCC warning: drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_HW2SW_CQ_wrapper': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3019:16: warning: 'cq' may be used uninitialized in this function [-Wmaybe-uninitialized] atomic_dec(&cq->mtt->ref_count); ^ This is a false positive. But a cleanup of cq_res_start_move_to() can help GCC here. The code currently uses a switch statement where an if/else construct would do too, since only two of the switch's four cases can ever occur. Dropping that switch makes the warning go away. While we're at it, add some missing braces. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	ixgbe: Fix incorrect logic for fixed fiber eeprom write	Don Skidmore	1	-1/+1
	In this code we wanted to set the bit in IXGBE_SFF_SOFT_RS_SELECT_MASK to the value in rs. So we really needed a logical or rather than an and, this patch makes that change. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	ixgbevf: create function for all of ring init	Don Skidmore	2	-137/+183
	This patch creates new functions for ring initialization, ixgbevf_configure_tx_ring() and ixgbevf_configure_rx_ring(). The work done in these function previously was spread between several other functions and this change should hopefully lead to greater readability and make the code more like ixgbe. This patch also moves the placement of some older functions to avoid having to write prototypes. It also promotes a couple of debug messages to errors. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	ixgbevf: Convert ring storage form pointer to an array to array of pointers	Don Skidmore	3	-86/+98
	This will change how we store rings arrays in the adapter sturct. We use to have a pointer to an array now we will be using an array of pointers. This will allow us to support multiple queues on muliple nodes at some point we would be able to reallocate the rings so that each is on a local node if needed. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	ixgbevf: use pci drvdata correctly in ixgbevf_suspend()	Wei Yongjun	1	-2/+2
	We had set the pci driver-specific data in ixgbevf_probe() as a type of struct net_device, so we should use it as netdev in ixgbevf_suspend(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	ixgbevf: set the disable state when ixgbevf_qv_disable is called	Jacob Keller	1	-0/+1
	The ixgbevf_qv_disable function used by CONFIG_NET_RX_BUSY_POLL is broken, because it does not properly set the IXGBEVF_QV_STATE_DISABLED bit, indicating that the q_vector should be disabled (and preventing future locks from obtaining the vector). This patch corrects the issue by setting the disable state. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	ixgbe: reinit_locked() should be called with rtnl_lock	John Fastabend	1	-0/+2
	ixgbe_service_task() is calling ixgbe_reinit_locked() without the rtnl_lock being held. This is because it is being called from a worker thread and not a rtnl netlink or dcbnl path. Add rtnl_{un}lock() semantics. I found this during code review. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: eth_type_trans() should use skb_header_pointer()	Eric Dumazet	1	-2/+5
	eth_type_trans() can read uninitialized memory as drivers do not necessarily pull more than 14 bytes in skb->head before calling it. As David suggested, we can use skb_header_pointer() to fix this without breaking some drivers that might not expect eth_type_trans() pulling 2 additional bytes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: notify the PM core of a wakeup event.	Srinivas Kandagatla	2	-2/+8
	In PM_SUSPEND_FREEZE and WOL(Wakeup On Lan) case, when the driver gets a wakeup event, either the driver or platform specific PM code should notify the pm core about it, so that the system can wakeup from low power. In cases where there is no involvement of platform specific PM, it becomes driver responsibility to notify the PM core to wakeup the system. Without this WOL with PM_SUSPEND_FREEZE does not work on STi based SOCs. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: restore pinstate in pm resume.	Srinivas Kandagatla	1	-0/+3
	This patch adds code to restore default pinstate of the pins when it comes back from low power state. Without this patch the state of the pins would be unknown and the driver would not work. This patch also adds code to put the pins in to sleep state when the driver enters low power state. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: use suspend functions for hibernation	Srinivas Kandagatla	3	-50/+12
	In hibernation freeze case the driver just releases the resources like dma buffers, irqs, unregisters the drivers and during restore it does register, request the resources. This is not really necessary, as part of power management all the data structures are intact, all the previously allocated resources can be used after coming out of low power. This patch uses the suspend and resume callbacks for freeze and restore which initializes the hardware correctly without unregistering or releasing the resources, this should also help in reducing the time to restore. Also this patch fixes a bug in stmmac_pltfr_restore and stmmac_pltfr_freeze where it tries to get hold of platform data via dev_get_platdata call, which would return NULL in device tree cases and the next if statement would crash as there is no NULL check. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: fix power management suspend-resume case	Srinivas Kandagatla	1	-6/+7
	The driver PM resume assumes that the IP is still powered up and the all the register contents are not disturbed when it comes out of low power suspend case. This assumption is wrong, basically the driver should not consider any state of registers after it comes out of low power. However driver can keep the part of the IP powered up if its a wake up source. But it can not assume the register state of the IP. Also its possible that SOC glue layer can take the power off the IP if its not wake-up source to reduce the power consumption. This patch re initializes hardware by calling stmmac_hw_setup function in resume case. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: make stmmac_mdio_reset non-static	Srinivas Kandagatla	2	-1/+2
	This patch promotes stmmac_mdio_reset function from static to non-static, so that power management functions can decide to reset if the IP comes out from lowe power state specially hibernation cases. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: move hardware setup for stmmac_open to new function	Srinivas Kandagatla	1	-67/+88
	This patch moves hardware setup part of the code in stmmac_open to a new function stmmac_hw_setup, the reason for doing this is to make hw initialization independent function so that PM functions can re-use it to re-initialize the IP after returning from low power state. This will also avoid code duplication across stmmac_resume/restore and stmmac_open. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: move dma allocation to new function	Srinivas Kandagatla	1	-84/+85
	This patch moves dma resource allocation to a new function alloc_dma_desc_resources, the reason for moving this to a new function is to keep the memory allocations in a separate function. One more reason it to get suspend and hibernation cases working without releasing and allocating these resources during suspend-resume and freeze-restore cases. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: mdio: remove reset gpio free	Srinivas Kandagatla	1	-1/+0
	This patch removes gpio_free for reset line of the phy, driver stores the gpio number in its private data-structure to use in future. As the driver uses this pin in future this pin should not be freed. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: stmmac: support max-speed device tree property	Srinivas Kandagatla	3	-1/+8
	This patch adds support to "max-speed" property which is a standard Ethernet device tree property. max-speed specifies maximum speed (specified in megabits per second) supported the device. Depending on the clocking schemes some of the boards can only support few link speeds, so having a way to limit the link speed in the mac driver would allow such setups to work reliably. Without this patch there is no way to tell the driver to limit the link speed. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: make mvneta_txq_done() return void	Arnaud Ebalard	1	-5/+4
	The function return parameter is not used in mvneta_tx_done_gbe(), where the function is called. This patch makes the function return void. Reviewed-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: mvneta_tx_done_gbe() cleanups	Arnaud Ebalard	1	-13/+4
	mvneta_tx_done_gbe() return value and third parameter are no more used. This patch changes the function prototype and removes a useless variable where the function is called. Reviewed-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: implement rx_copybreak	willy tarreau	1	-6/+38
	calling dma_map_single()/dma_unmap_single() is quite expensive compared to copying a small packet. So let's copy short frames and keep the buffers mapped. We set the limit to 256 bytes which seems to give good results both on the XP-GP board and on the AX3/4. The Rx small packet rate increased by 16.4% doing this, from 486kpps to 573kpps. It is worth noting that even the call to the function dma_sync_single_range_for_cpu() is expensive (300 ns) although less than dma_unmap_single(). Without it, the packet rate raises to 711kpps (+24% more). Thus on systems where coherency from device to CPU is guaranteed by a snoop control unit, this patch should provide even more gains, and probably rx_copybreak could be increased. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: convert to build_skb()	willy tarreau	1	-14/+35
	Make use of build_skb() to allocate frags on the RX path. When frag size is lower than a page size, we can use netdev_alloc_frag(), and we fall back to kmalloc() for larger sizes. The frag size is stored into the mvneta_port struct. The alloc/free functions check the frag size to decide what alloc/ free method to use. MTU changes are safe because the MTU change function stops the device and clears the queues before applying the change. With this patch, I observed a reproducible 2% performance improvement on HTTP-based benchmarks, and 5% on small packet RX rate. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: prefetch next rx descriptor instead of current one	willy tarreau	1	-1/+1
	Currently, the mvneta driver tries to prefetch the current Rx descriptor during read. Tests have shown that prefetching the next one instead increases general performance by about 1% on HTTP traffic. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: simplify access to the rx descriptor status	willy tarreau	1	-13/+12
	At several places, we already know the value of the rx status but we call functions which dereference the pointer again to get it and don't need the descriptor for anything else. Simplify this task by replacing the rx desc pointer by the status word itself. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: factor rx refilling code	willy tarreau	1	-20/+3
	Make mvneta_rxq_fill() use mvneta_rx_refill() instead of using duplicate code. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: remove tests for impossible cases in the tx_done path	willy tarreau	1	-6/+9
	Currently, mvneta_txq_bufs_free() calls mvneta_tx_done_policy() with a non-null cause to retrieve the pointer to the next queue to process. There are useless tests on the return queue number and on the pointer, all of which are well defined within a known limited set. This code path is fast, although not critical. Removing 3 tests here that the compiler could not optimize (verified) is always desirable. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: replace Tx timer with a real interrupt	willy tarreau	1	-59/+12
	Right now the mvneta driver doesn't handle Tx IRQ, and relies on two mechanisms to flush Tx descriptors : a flush at the end of mvneta_tx() and a timer. If a burst of packets is emitted faster than the device can send them, then the queue is stopped until next wake-up of the timer 10ms later. This causes jerky output traffic with bursts and pauses, making it difficult to reach line rate with very few streams. A test on UDP traffic shows that it's not possible to go beyond 134 Mbps / 12 kpps of outgoing traffic with 1500-bytes IP packets. Routed traffic tends to observe pauses as well if the traffic is bursty, making it even burstier after the wake-up. It seems that this feature was inherited from the original driver but nothing there mentions any reason for not using the interrupt instead, which the chip supports. Thus, this patch enables Tx interrupts and removes the timer. It does the two at once because it's not really possible to make the two mechanisms coexist, so a split patch doesn't make sense. First tests performed on a Mirabox (Armada 370) show that less CPU seems to be used when sending traffic. One reason might be that we now call the mvneta_tx_done_gbe() with a mask indicating which queues have been done instead of looping over all of them. The same UDP test above now happily reaches 987 Mbps / 87.7 kpps. Single-stream TCP traffic can now more easily reach line rate. HTTP transfers of 1 MB objects over a single connection went from 730 to 840 Mbps. It is even possible to go significantly higher (>900 Mbps) by tweaking tcp_tso_win_divisor. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: Arnaud Ebalard <arno@natisbad.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: add missing bit descriptions for interrupt masks and causes	willy tarreau	1	-2/+42
	Marvell has not published the chip's datasheet yet, so it's very hard to find the relevant bits to manipulate to change the IRQ behaviour. Fortunately, these bits are described in the proprietary LSP patch set which is publicly available here : http://www.plugcomputer.org/downloads/mirabox/ So let's put them back in the driver in order to reduce the burden of current and future maintenance. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: do not schedule in mvneta_tx_timeout	willy tarreau	1	-11/+0
	If a queue timeout is reported, we can oops because of some schedules while the caller is atomic, as shown below : mvneta d0070000.ethernet eth0: tx timeout BUG: scheduling while atomic: bash/1528/0x00000100 Modules linked in: slhttp_ethdiv(C) [last unloaded: slhttp_ethdiv] CPU: 2 PID: 1528 Comm: bash Tainted: G WC 3.13.0-rc4-mvebu-nf #180 [<c0011bd9>] (unwind_backtrace+0x1/0x98) from [<c000f1ab>] (show_stack+0xb/0xc) [<c000f1ab>] (show_stack+0xb/0xc) from [<c02ad323>] (dump_stack+0x4f/0x64) [<c02ad323>] (dump_stack+0x4f/0x64) from [<c02abe67>] (__schedule_bug+0x37/0x4c) [<c02abe67>] (__schedule_bug+0x37/0x4c) from [<c02ae261>] (__schedule+0x325/0x3ec) [<c02ae261>] (__schedule+0x325/0x3ec) from [<c02adb97>] (schedule_timeout+0xb7/0x118) [<c02adb97>] (schedule_timeout+0xb7/0x118) from [<c0020a67>] (msleep+0xf/0x14) [<c0020a67>] (msleep+0xf/0x14) from [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) from [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) from [<c024afc7>] (dev_watchdog+0x18b/0x1c4) [<c024afc7>] (dev_watchdog+0x18b/0x1c4) from [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) from [<c0020cad>] (run_timer_softirq+0x115/0x170) [<c0020cad>] (run_timer_softirq+0x115/0x170) from [<c001ccb9>] (__do_softirq+0xbd/0x1a8) [<c001ccb9>] (__do_softirq+0xbd/0x1a8) from [<c001cfad>] (irq_exit+0x61/0x98) [<c001cfad>] (irq_exit+0x61/0x98) from [<c000d4bf>] (handle_IRQ+0x27/0x60) [<c000d4bf>] (handle_IRQ+0x27/0x60) from [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) from [<c000fba9>] (__irq_usr+0x49/0x60) Ben Hutchings attempted to propose a better fix consisting in using a scheduled work for this, but while it fixed this panic, it caused other random freezes and panics proving that the reset sequence in the driver is unreliable and that additional fixes should be investigated. When sending multiple streams over a link limited to 100 Mbps, Tx timeouts happen from time to time, and the driver correctly recovers only when the function is disabled. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: Ben Hutchings <ben@decadent.org.uk> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: use per_cpu stats to fix an SMP lock up	willy tarreau	1	-29/+55
	Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything when they update the stats, and as a result, it randomly happens that the stats freeze on SMP if two updates happen during stats retrieval. This is very easily reproducible by starting two HTTP servers and binding each of them to a different CPU, then consulting /proc/net/dev in loops during transfers, the interface should immediately lock up. This issue also randomly happens upon link state changes during transfers, because the stats are collected in this situation, but it takes more attempts to reproduce it. The comments in netdevice.h suggest using per_cpu stats instead to get rid of this issue. This patch implements this. It merges both rx_stats and tx_stats into a single "stats" member with a single syncp. Both mvneta_rx() and mvneta_rx() now only update the a single CPU's counters. In turn, mvneta_get_stats64() does the summing by iterating over all CPUs to get their respective stats. With this change, stats are still correct and no more lockup is encountered. Note that this bug was present since the first import of the mvneta driver. It might make sense to backport it to some stable trees. If so, it depends on "d33dc73 net: mvneta: increase the 64-bit rx/tx stats out of the hot path". Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	net: mvneta: increase the 64-bit rx/tx stats out of the hot path	willy tarreau	1	-4/+11
	Better count packets and bytes in the stack and on 32 bit then accumulate them at the end for once. This saves two memory writes and two memory barriers per packet. The incoming packet rate was increased by 4.7% on the Openblocks AX3 thanks to this. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	drivers/net: delete non-required instances of include <linux/init.h>	Paul Gortmaker	155	-155/+0
	None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. This covers everything under drivers/net except for wireless, which has been submitted separately. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16	neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions.	Jiri Pirko	2	-2/+7
	When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is not initialized yet. This might cause confusion for the people who use NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for usage in ndo_neigh_setup. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	ixgbe: Clear head write-back registers on VF reset	Alexander Duyck	2	-0/+17
	The Tx head write-back registers are not cleared during an FLR or VF reset. As a result a configuration that had head write-back enabled can leave the registers set after the driver is unloaded. If the next driver loaded doesn't use the write-back registers this can lead to a bad configuration where head write-back is enabled, but the driver didn't request it. To avoid this situation the PF should be resetting the Tx head write-back registers when the VF requests a reset. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	ixgbe: Force QDE via PFQDE for VFs during reset	Alexander Duyck	2	-3/+18
	This change makes it so that the QDE bits are set for a VF before the Rx queues are enabled. As such we avoid head of line blocking in the event that the VF stops cleaning Rx descriptors for whatever reason. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c \| 14 ++++++++++++++ drivers/net/ethernet/intel/ixgbe/ixgbe_type.h \| 7 ++++--- 2 files changed, 18 insertions(+), 3 deletions(-) Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE	Thomas Haller	1	-75/+109
	Refactor the deletion/update of prefix routes when removing an address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address present with this flag, to not cleanup the route. Instead, assume that userspace is taking care of this route. Also perform the same cleanup, when userspace changes an existing address to add NOPREFIXROUTE (to an address that didn't have this flag). This is done because when the address was added, a prefix route was created for it. Since the user now wants to handle this route by himself, we cleanup this route. This cleanup of the route is not totally robust. There is no guarantee, that the route we are about to delete was really the one added by the kernel. This behavior does not change by the patch, and in practice it should work just fine. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes	Thomas Haller	2	-6/+14
	When adding/modifying an IPv6 address, the userspace application needs a way to suppress adding a prefix route. This is for example relevant together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf generated addresses, but depending on on-link, no route for the prefix should be added. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	Revert "batman-adv: drop dependency against CRC16"	David S. Miller	1	-0/+1
	This reverts commit 12afc36e38b3b6a0ec9bda71632c2285e7fdbab2. The dependency is actually still necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	sctp: create helper function to enable\|disable sackdelay	wangweidong	1	-18/+19
	add sctp_spp_sackdelay_{enable\|disable} helper function for avoiding code duplication. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper	Li RongQing	5	-6/+8
	Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h, and use this macro as possible. And define ip6_tclass helper to return tclass Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	be2net: update driver version to 10.0.x	Sathya Perla	1	-1/+1
	Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	be2net: cleanup wake-on-lan code	Suresh Reddy	4	-35/+19
	This patch cleans-up wake-on-lan code in the following ways: 1) Removes some driver hacks in be_cmd_get_acpi_wol_cap() that were based on incorrect assumptions. 2) Uses the adapter->wol_en and wol_cap variables for checking if WoL is supported and enabled on an interface instead of referring to the exclusion list via the macro be_is_wol_supported() Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	be2net: use GET_MAC_LIST cmd to query mac-address from a pmac-id	Suresh Reddy	3	-17/+23
	The use of NTKW_MAC_QUERY cmd has been deprecated for Skyhawk-R. Replace the last remaining usage in be_vfs_mac_query() routine. Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15	be2net: do not use frag index in the RX-compl entry	Suresh Reddy	2	-22/+9
	Instead, use the tail of the RXQ to pick the associated RXQ entry This fix is required in preparation for supporting RXQ lengths greater than 1K. For such queues, the frag index in the RX-compl entry is not valid as it is only a 10 bit entry not capable of addressing RXQs longer than 1K. Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>