linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-10-18	x86,kvm,vmx: Preserve CR4 across VM entry	Andy Lutomirski	1	-2/+14
	CR4 isn't constant; at least the TSD and PCE bits can vary. TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks like it's correct. This adds a branch and a read from cr4 to each vm entry. Because it is extremely likely that consecutive entries into the same vcpu will have the same host cr4 value, this fixes up the vmcs instead of restoring cr4 after the fact. A subsequent patch will add a kernel-wide cr4 shadow, reducing the overhead in the common case to just two memory reads and a branch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Cc: Petr Matousek <pmatouse@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-18	futex: Ensure get_futex_key_refs() always implies a barrier	Catalin Marinas	1	-0/+2
	Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up) changes the futex code to avoid taking a lock when there are no waiters. This code has been subsequently fixed in commit 11d4616bd07f (futex: revert back to the explicit waiter counting code). Both the original commit and the fix-up rely on get_futex_key_refs() to always imply a barrier. However, for private futexes, none of the cases in the switch statement of get_futex_key_refs() would be hit and the function completes without a memory barrier as required before checking the "waiters" in futex_wake() -> hb_waiters_pending(). The consequence is a race with a thread waiting on a futex on another CPU, allowing the waker thread to read "waiters == 0" while the waiter thread to have read "futex_val == locked" (in kernel). Without this fix, the problem (user space deadlocks) can be seen with Android bionic's mutex implementation on an arm64 multi-cluster system. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Matteo Franchin <Matteo.Franchin@arm.com> Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up) Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: <stable@vger.kernel.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-17	bna: fix skb->truesize underestimation	Eric Dumazet	1	-1/+1
	skb->truesize is not meant to be tracking amount of used bytes in an skb, but amount of reserved/consumed bytes in memory. For instance, if we use a single byte in last page fragment, we have to account the full size of the fragment. skb->truesize can be very different from skb->len, that has a very specific safety purpose. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	net: dsa: add includes for ethtool and phy_fixed definitions	Florian Fainelli	2	-0/+2
	net/dsa/slave.c uses functions and structures declared in phy_fixed.h but does not explicitely include it, while dsa.h needs structure declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix those by including the correct header files. Fixes: ec9436baedb6 ("net: dsa: allow drivers to do link adjustment") Fixes: ce31b31c68e7 ("net: dsa: allow updating fixed PHY link information") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	openvswitch: Set flow-key members.	Pravin B Shelar	1	-3/+3
	This patch adds missing memset which are required to initialize flow key member. For example for IP flow we need to initialize ip.frag for all cases. Found by inspection. This bug is introduced by commit 0714812134d7dcadeb7ecfbfeb18788aa7e1eaac ("openvswitch: Eliminate memset() from flow_extract"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	netrom: use linux/uaccess.h	Fabian Frederick	7	-7/+7
	replace asm/uaccess.h by linux/uaccess.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	dsa: Fix conversion from host device to mii bus	Guenter Roeck	2	-8/+22
	Commit b4d2394d01bc ("dsa: Replace mii_bus with a generic host device") replaces mii_bus with a generic host_dev, and introduces dsa_host_dev_to_mii_bus() to support conversion from host_dev to mii_bus. However, in some cases it uses to_mii_bus to perform that conversion. Since host_dev is not the phy bus device but typically a platform device, this fails and results in a crash with the affected drivers. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81781d35>] __mutex_lock_slowpath+0x75/0x100 PGD 406783067 PUD 406784067 PMD 0 Oops: 0002 [#1] SMP ... Call Trace: [<ffffffff810a538b>] ? pick_next_task_fair+0x61b/0x880 [<ffffffff81781de3>] mutex_lock+0x23/0x37 [<ffffffff81533244>] mdiobus_read+0x34/0x60 [<ffffffff8153b95a>] __mv88e6xxx_reg_read+0x8a/0xa0 [<ffffffff8153b9bc>] mv88e6xxx_reg_read+0x4c/0xa0 Fixes: b4d2394d01bc ("dsa: Replace mii_bus with a generic host device") Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	tipc: fix bug in bundled buffer reception	Jon Paul Maloy	1	-1/+6
	In commit ec8a2e5621db2da24badb3969eda7fd359e1869f ("tipc: same receive code path for connection protocol and data messages") we omitted the the possiblilty that an arriving message extracted from a bundle buffer may be a multicast message. Such messages need to be to be delivered to the socket via a separate function, tipc_sk_mcast_rcv(). As a result, small multicast messages arriving as members of a bundle buffer will be silently dropped. This commit corrects the error by considering this case in the function tipc_link_bundle_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	ipv6: introduce tcp_v6_iif()	Eric Dumazet	5	-15/+30
	Commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") added a regression for SO_BINDTODEVICE on IPv6. This is because we still use inet6_iif() which expects that IP6 control block is still at the beginning of skb->cb[] This patch adds tcp_v6_iif() helper and uses it where necessary. Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif parameter to it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	sfc: add support for skb->xmit_more	Edward Cree	2	-29/+43
	Don't ring the doorbell, and don't do PIO. This will also prevent TX Push, because there will be more than one buffer waiting when the doorbell is rung. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	r8152: return -EBUSY for runtime suspend	hayeswang	1	-7/+15
	Remove calling cancel_delayed_work_sync() for runtime suspend, because it would cause dead lock. Instead, return -EBUSY to avoid the device enters suspending if the net is running and the delayed work is pending or running. The delayed work would try to wake up the device later, so the suspending is not necessary. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	ipv4: fix a potential use after free in fou.c	Li RongQing	1	-0/+3
	pskb_may_pull() maybe change skb->data and make uh pointer oboslete, so reload uh and guehdr Fixes: 37dd0247 ("gue: Receive side for Generic UDP Encapsulation") Cc: Tom Herbert <therbert@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	ipv4: fix a potential use after free in ip_tunnel_core.c	Li RongQing	1	-1/+2
	pskb_may_pull() maybe change skb->data and make eth pointer oboslete, so set eth after pskb_may_pull() Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	hyperv: Add handling of IP header with option field in netvsc_set_hash()	Haiyang Zhang	1	-16/+10
	In case that the IP header has optional field at the end, this patch will get the port numbers after that field, and compute the hash. The general parser skb_flow_dissect() is used here. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	openvswitch: Create right mask with disabled megaflows	Pravin B Shelar	1	-21/+72
	If megaflows are disabled, the userspace does not send the netlink attribute OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask. sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the bytes that represent padding for struct sw_flow, or the bytes that represent fields that may not be set during ovs_flow_extract(). This is a problem, because when we extract a flow from a packet, we do not memset() anymore the struct sw_flow to 0. This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(), which operates on the netlink attributes rather than on the mask key. Using this approach we are sure that only the bytes that the user provided in the flow are matched. Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we now return with an error. This bug is introduced by commit 0714812134d7dcadeb7ecfbfeb18788aa7e1eaac ("openvswitch: Eliminate memset() from flow_extract"). Reported-by: Alex Wang <alexw@nicira.com> Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	vxlan: fix a free after use	Li RongQing	1	-0/+1
	pskb_may_pull maybe change skb->data and make eth pointer oboslete, so eth needs to reload Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	openvswitch: fix a use after free	Li RongQing	1	-1/+2
	pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp setting after arphdr_ok to avoid the use the freed memory Fixes: 0714812134d7d ("openvswitch: Eliminate memset() from flow_extract.") Cc: Jesse Gross <jesse@nicira.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	ipv4: dst_entry leak in ip_send_unicast_reply()	Vasily Averin	1	-3/+9
	ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork and in case errors in __ip_append_data() nobody frees stolen dst entry Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()") Signed-off-by: Vasily Averin <vvs@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	ipv4: clean up cookie_v4_check()	Cong Wang	3	-7/+6
	We can retrieve opt from skb, no need to pass it as a parameter. And opt should always be non-NULL, no need to check. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	ipv4: share tcp_v4_save_options() with cookie_v4_check()	Cong Wang	3	-29/+21
	cookie_v4_check() allocates ip_options_rcu in the same way with tcp_v4_save_options(), we can just make it a helper function. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	ipv4: call __ip_options_echo() in cookie_v4_check()	Cong Wang	1	-1/+1
	commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses") missed that cookie_v4_check() still calls ip_options_echo() which uses IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo() instead. Fixes: commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17	atm: simplify lanai.c by using module_pci_driver	Michael Opdenacker	1	-21/+1
	This simplifies the lanai.c driver by using the module_pci_driver() macro, at the expense of losing only debugging messages. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-16	netlink: fix description of portid	Nicolas Dichtel	1	-1/+1
	Avoid confusion between pid and portid. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-16	ixgbe: check for vfs outside of sriov_num_vfs before dereference	Emil Tantilov	1	-0/+3
	The check for vfinfo is not sufficient because it does not protect against specifying vf that is outside of sriov_num_vfs range. All of the ndo functions have a check for it except for ixgbevf_ndo_set_spoofcheck(). The following patch is all we need to protect against this panic: ip link set p96p1 vf 0 spoofchk off BUG: unable to handle kernel NULL pointer dereference at 0000000000000052 IP: [<ffffffffa044a1c1>] ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe] Reported-by: Thierry Herbelot <thierry.herbelot@6wind.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Acked-by: Thierry Herbelot <thierry.herbelot@6wind.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-16	fm10k: Add CONFIG_FM10K_VXLAN configuration option	Andy Zhou	2	-3/+14
	Compiling with CONFIG_FM10K=y and VXLAN=m resulting in linking error: drivers/built-in.o: In function `fm10k_open': (.text+0x1f9d7a): undefined reference to `vxlan_get_rx_port' make: *** [vmlinux] Error 1 The fix follows the same strategy as I40E. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-16	fm10k: Unlock mailbox on VLAN addition failures	Matthew Vick	1	-3/+4
	After grabbing the mailbox lock and detecting an error, the lock must be released before the error code can be returned. Signed-off-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-16	fm10k: Check the host state when bringing the interface up	Matthew Vick	1	-0/+1
	Set the flag to fetch the host state before kicking off the service task that reads the host state when bringing the interface back up. Signed-off-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-15	vxlan: using pskb_may_pull as early as possible	Li RongQing	1	-4/+2
	pskb_may_pull should be used to check if skb->data has enough space, skb->len can not ensure that. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	vxlan: fix a use after free in vxlan_encap_bypass	Li RongQing	1	-3/+5
	when netif_rx() is done, the netif_rx handled skb maybe be freed, and should not be used. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	openvswitch: use vport instead of p	Fabian Frederick	1	-2/+2
	All functions used struct vport *vport except ovs_vport_find_upcall_portid. This fixes 1 kerneldoc warning Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	openvswitch: kerneldoc warning fix	Fabian Frederick	1	-1/+1
	s/sock/gs Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	gianfar: Add FCS to rx buffer size (fix)	Claudiu Manoil	1	-1/+1
	For each Rx frame the eTSEC writes its FCS (Frame Check Sequence) to the Rx buffer. The eTSEC h/w manual states in the "Receive Buffer Descriptor Field Descriptions" table: "Data length is the number of octets written by the eTSEC into this BD's data buffer if L is cleared (the value is equal to MRBLR), or, if L is set, the length of the frame including CRC, FCB (if RCTRL[PRSDEP > 00), preamble (if MACCFG2[PreAmRxEn]=1), time stamp (if RCTRL[TS] = 1) and any padding (RCTRL[PAL])." Though the FCS bytes are removed by the driver before passing the skb to the net stack, the Rx buffer size computation does not currently take into account the FCS bytes (4 bytes). Because the Rx buffer size is multiple of 512 bytes, leaving out the FCS is not a problem for the default MTU of 1500, as the Rx buffer size is 1536 in this case. However, for custom MTUs, where the difference between the MTU size and the Rx buffer size is less, this can be a problem as the computed Rx buffer size won't be enough to accomodate the FCS for a received frame that is big enough (close to MTU size). In such case the received frame is considered to be incomplete (L flag not set in the RxBD status) and silently dropped. Note that the driver does not currently support S/G on Rx, so it has to compute its Rx buffer size based on the MTU of the device. Reported-by: Kristian Otnes <kotnes@cisco.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	virtio_net: fix use after free	Michael S. Tsirkin	1	-1/+3
	commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb->xmit_more after the skb has been put in TX VQ. Since some paths process the ring and free the skb immediately, this can cause use after free. Fix by storing xmit_more in a local variable. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	net: fec: ptp: fix convergence issue to support LinuxPTP stack	Nimrod Andy	3	-42/+56
	iMX6SX IEEE 1588 module has one hw issue in capturing the ATVR register. The current SW flow is: ENET0->ATCR \|= ENET_ATCR_CAPTURE_MASK; ts_counter_ns = ENET0->ATVR; The ATVR value is not expected value that cause LinuxPTP stack cannot be convergent. ENET Block Guide/ Chapter for the iMX6SX (PELE) address the issue: After set ENET_ATCR[Capture], there need some time cycles before the counter value is capture in the register clock domain. The wait-time-cycles is at least 6 clock cycles of the slower clock between the register clock and the 1588 clock. So need something like: ENET0->ATCR \|= ENET_ATCR_CAPTURE_MASK; wait(); ts_counter_ns = ENET0->ATVR; For iMX6SX, the 1588 ts_clk is fixed to 25Mhz, register clock is 66Mhz, so the wait-time-cycles must be greater than 240ns (40ns * 6). The patch add 1us delay before cpu read ATVR register. Changes V2: Modify the commit/comments log to describe the issue clearly. Signed-off-by: Fugang Duan <B38611@freescale.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	Drivers: ide: Remove typedef atiixp_ide_timing	Himangi Saraogi	1	-4/+4
	The Linux kernel coding style guidelines suggest not using typedefs for structure types. This patch gets rid of the typedef for atiixp_ide_timing. The following Coccinelle semantic patch detects the case: @tn1@ type td; @@ typedef struct { ... } td; @script:python tf@ td << tn1.td; tdres; @@ coccinelle.tdres = td; @@ type tn1.td; identifier tf.tdres; @@ -typedef struct + tdres { ... } -td ; @@ type tn1.td; identifier tf.tdres; @@ -td + struct tdres Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	cxgb4i : Fix -Wmaybe-uninitialized warning.	Anish Bhatt	1	-1/+1
	Identified by kbuild test robot. csk family is always set to be AF_INET or AF_INET6, so skb will always be initialized to some value but there is no harm in silencing the warning anyways. Signed-off-by: Anish Bhatt <anish@chelsio.com> Fixes : f42bb57c61fd ('cxgb4i : Fix -Wunused-function warning') Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	net: Add ndo_gso_check	Tom Herbert	4	-4/+14
	Add ndo_gso_check which a device can define to indicate whether is is capable of doing GSO on a packet. This funciton would be called from the stack to determine whether software GSO is needed to be done. A driver should populate this function if it advertises GSO types for which there are combinations that it wouldn't be able to handle. For instance a device that performs UDP tunneling might only implement support for transparent Ethernet bridging type of inner packets or might have limitations on lengths of inner headers. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	stmmac: fix sti compatibililies	Giuseppe CAVALLARO	2	-5/+6
	this patch is to fix the stmmac data compatibilities for all the SoCs inside the platform file. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	cxgb4i: Remove duplicate call to dst_neigh_lookup()	Anish Bhatt	1	-5/+0
	There is an extra call to dst_neigh_lookup() leftover in cxgb4i that can cause an unreleased refcnt issue. Remove extraneous call. Signed-off-by: Anish Bhatt <anish@chelsio.com> Fixes : 759a0cc5a3e1b ('cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api') Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	cxgb4i : Fix -Wunused-function warning	Anish Bhatt	2	-0/+10
	A bunch of ipv6 related code is left on by default. While this causes no compilation issues, there is no need to have this enabled by default. Guard with an ipv6 check, which also takes care of a -Wunused-function warning. Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built	Anish Bhatt	2	-1/+9
	cxgb4 ipv6 does not guard against ipv6 being disabled, or the standard ipv6 module vs inbuilt tri-state issue. This was fixed for cxgb4i & iw_cxgb4 but missed for cxgb4. Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15	cxgb4i : Remove duplicated CLIP handling code	Anish Bhatt	2	-133/+7
	cxgb4 already handles CLIP updates from a previous changeset for iw_cxgb4, there is no need to have this functionality in cxgb4i. Remove duplicated code Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14	sparc64: Fix FPU register corruption with AES crypto offload.	David S. Miller	2	-1/+21
	The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the key material is preloaded into the FPU registers, and then we loop over and over doing the crypt operation, reusing those pre-cooked key registers. There are intervening blkcipher*() calls between the crypt operation calls. And those might perform memcpy() and thus also try to use the FPU. The sparc64 kernel FPU usage mechanism is designed to allow such recursive uses, but with a catch. There has to be a trap between the two FPU using threads of control. The mechanism works by, when the FPU is already in use by the kernel, allocating a slot for FPU saving at trap time. Then if, within the trap handler, we try to use the FPU registers, the pre-trap FPU register state is saved into the slot. Then at trap return time we notice this and restore the pre-trap FPU state. Over the long term there are various more involved ways we can make this work, but for a quick fix let's take advantage of the fact that the situation where this happens is very limited. All sparc64 chips that support the crypto instructiosn also are using the Niagara4 memcpy routine, and that routine only uses the FPU for large copies where we can't get the source aligned properly to a multiple of 8 bytes. We look to see if the FPU is already in use in this context, and if so we use the non-large copy path which only uses integer registers. Furthermore, we also limit this special logic to when we are doing kernel copy, rather than a user copy. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14	mnt: Prevent pivot_root from creating a loop in the mount tree	Eric W. Biederman	1	-0/+3
	Andy Lutomirski recently demonstrated that when chroot is used to set the root path below the path for the new ``root'' passed to pivot_root the pivot_root system call succeeds and leaks mounts. In examining the code I see that starting with a new root that is below the current root in the mount tree will result in a loop in the mount tree after the mounts are detached and then reattached to one another. Resulting in all kinds of ugliness including a leak of that mounts involved in the leak of the mount loop. Prevent this problem by ensuring that the new mount is reachable from the current root of the mount tree. [Added stable cc. Fixes CVE-2014-7970. --Andy] Cc: stable@vger.kernel.org Reported-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2014-10-14	tcp: TCP Small Queues and strange attractors	Eric Dumazet	1	-7/+19
	TCP Small queues tries to keep number of packets in qdisc as small as possible, and depends on a tasklet to feed following packets at TX completion time. Choice of tasklet was driven by latencies requirements. Then, TCP stack tries to avoid reorders, by locking flows with outstanding packets in qdisc in a given TX queue. What can happen is that many flows get attracted by a low performing TX queue, and cpu servicing TX completion has to feed packets for all of them, making this cpu 100% busy in softirq mode. This became particularly visible with latest skb->xmit_more support Strategy adopted in this patch is to detect when tcp_wfree() is called from ksoftirqd and let the outstanding queue for this flow being drained before feeding additional packets, so that skb->ooo_okay can be set to allow select_queue() to select the optimal queue : Incoming ACKS are normally handled by different cpus, so this patch gives more chance for these cpus to take over the burden of feeding qdisc with future packets. Tested: lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 & lpaa23:~# sar -n DEV 1 10 \| grep eth1 06:16:18 AM eth1 595448.00 1190564.00 38381.09 1760253.12 0.00 0.00 1.00 06:16:19 AM eth1 594858.00 1189686.00 38340.76 1758952.72 0.00 0.00 0.00 06:16:20 AM eth1 597017.00 1194019.00 38480.79 1765370.29 0.00 0.00 1.00 06:16:21 AM eth1 595450.00 1190936.00 38380.19 1760805.05 0.00 0.00 0.00 06:16:22 AM eth1 596385.00 1193096.00 38442.56 1763976.29 0.00 0.00 1.00 06:16:23 AM eth1 598155.00 1195978.00 38552.97 1768264.60 0.00 0.00 0.00 06:16:24 AM eth1 594405.00 1188643.00 38312.57 1757414.89 0.00 0.00 1.00 06:16:25 AM eth1 593366.00 1187154.00 38252.16 1755195.83 0.00 0.00 0.00 06:16:26 AM eth1 593188.00 1186118.00 38232.88 1753682.57 0.00 0.00 1.00 06:16:27 AM eth1 596301.00 1192241.00 38440.94 1762733.09 0.00 0.00 0.00 Average: eth1 595457.30 1190843.50 38381.69 1760664.84 0.00 0.00 0.50 lpaa23:~# ./tc -s -d qd sh dev eth1 \| grep backlog backlog 7606336b 2513p requeues 167982 backlog 224072b 74p requeues 566 backlog 581376b 192p requeues 5598 backlog 181680b 60p requeues 1070 backlog 5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows backlog 157456b 52p requeues 1758 backlog 672216b 222p requeues 3025 backlog 60560b 20p requeues 24541 backlog 448144b 148p requeues 21258 lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect Immediate jump to full bandwidth, and traffic is properly shard on all tx queues. lpaa23:~# sar -n DEV 1 10 \| grep eth1 06:16:46 AM eth1 1397632.00 2795397.00 90081.87 4133031.26 0.00 0.00 1.00 06:16:47 AM eth1 1396874.00 2793614.00 90032.99 4130385.46 0.00 0.00 0.00 06:16:48 AM eth1 1395842.00 2791600.00 89966.46 4127409.67 0.00 0.00 1.00 06:16:49 AM eth1 1395528.00 2791017.00 89946.17 4126551.24 0.00 0.00 0.00 06:16:50 AM eth1 1397891.00 2795716.00 90098.74 4133497.39 0.00 0.00 1.00 06:16:51 AM eth1 1394951.00 2789984.00 89908.96 4125022.51 0.00 0.00 0.00 06:16:52 AM eth1 1394608.00 2789190.00 89886.90 4123851.36 0.00 0.00 1.00 06:16:53 AM eth1 1395314.00 2790653.00 89934.33 4125983.09 0.00 0.00 0.00 06:16:54 AM eth1 1396115.00 2792276.00 89984.25 4128411.21 0.00 0.00 1.00 06:16:55 AM eth1 1396829.00 2793523.00 90030.19 4130250.28 0.00 0.00 0.00 Average: eth1 1396158.40 2792297.00 89987.09 4128439.35 0.00 0.00 0.50 lpaa23:~# tc -s -d qd sh dev eth1 \| grep backlog backlog 7900052b 2609p requeues 173287 backlog 878120b 290p requeues 589 backlog 1068884b 354p requeues 5621 backlog 996212b 329p requeues 1088 backlog 984100b 325p requeues 115316 backlog 956848b 316p requeues 1781 backlog 1080996b 357p requeues 3047 backlog 975016b 322p requeues 24571 backlog 990156b 327p requeues 21274 (All 8 TX queues get a fair share of the traffic) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14	qlcnic: Fix number of arguments in destroy tx context command	Rajesh Borundia	1	-1/+1
	o Number of arguments taken by destroy tx command is three instead of two. Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14	qlcnic: Fix programming number of arguments in a command.	Rajesh Borundia	1	-2/+2
	o Initially we were programming maximum number of arguments. Instead we should program number of arguments required in a command. o Maximum number of arguments for 82xx adapter is four. Fix it for GET_ESWITCH_STATS command. Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14	genl_magic: Resolve logical-op warnings	Mark Rustad	1	-2/+2
	Resolve "logical 'and' applied to non-boolean constant" warnings" that appear in W=2 builds by adding !! to a bit test. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14	net: Trap attempts to call sock_kfree_s() with a NULL pointer.	David S. Miller	1	-0/+2
	Unlike normal kfree() it is never right to call sock_kfree_s() with a NULL pointer, because sock_kfree_s() also has the side effect of discharging the memory from the sockets quota. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14	rds: avoid calling sock_kfree_s() on allocation failure	Cong Wang	1	-3/+4
	It is okay to free a NULL pointer but not okay to mischarge the socket optmem accounting. Compile test only. Reported-by: rucsoftsec@gmail.com Cc: Chien Yen <chien.yen@oracle.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>