linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-05-12	net: mvneta: bm: fix dependencies again	Arnd Bergmann	1	-1/+1
	I tried to fix this before, but my previous fix was incomplete and we can still get the same link error in randconfig builds because of the way that Kconfig treats the default y if MVNETA=y && MVNETA_BM_ENABLE line that does not actually trigger when MVNETA_BM_ENABLE=m, unlike I intended. Changing the line to use MVNETA_BM_ENABLE!=n however has the desired effect and hopefully makes all configurations work as expected. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 019ded3aa7c9 ("net: mvneta: bm: clarify dependencies") Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11	bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)	Michael Chan	2	-0/+61
	Add detection and recovery code when the hardware returned opaque value does not match the expected consumer index. Once the issue is detected, we skip the processing of all RX and LRO/GRO packets. These completion entries are discarded without sending the SKB to the stack and without producing new buffers. The function will be reset from a workqueue. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11	bnxt_en: Add workaround to detect bad opaque in rx completion (part 1)	Michael Chan	2	-0/+4
	There is a rare hardware bug that can cause a bad opaque value in the RX or TPA completion. When this happens, the hardware may have used the same buffer twice for 2 rx packets. In addition, the driver will also crash later using the bad opaque as the index into the ring. The rx opaque value is predictable and is always monotonically increasing. The workaround is to keep track of the expected next opaque value and compare it with the one returned by hardware during RX and TPA start completions. If they miscompare, we will not process any more RX and TPA completions and exit NAPI. We will then schedule a workqueue to reset the function. This patch adds the logic to keep track of the next rx consumer index. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11	qlcnic: potential NULL dereference in qlcnic_83xx_get_minidump_template()	Dan Carpenter	1	-2/+6
	If qlcnic_fw_cmd_get_minidump_temp() fails then "fw_dump->tmpl_hdr" is NULL or possibly freed. It can lead to an oops later. Fixes: d01a6d3c8ae1 ('qlcnic: Add support to enable capability to extend minidump for iSCSI') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11	gre: do not keep the GRE header around in collect medata mode	Jiri Benc	1	-1/+6
	For ipgre interface in collect metadata mode, it doesn't make sense for the interface to be of ARPHRD_IPGRE type. The outer header of received packets is not needed, as all the information from it is present in metadata_dst. We already don't set ipgre_header_ops for collect metadata interfaces, which is the only consumer of mac_header pointing to the outer IP header. Just set the interface type to ARPHRD_NONE in collect metadata mode for ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset mac_header. Fixes: a64b04d86d14 ("gre: do not assign header_ops in collect metadata mode") Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11	openvswitch: Fix cached ct with helper.	Joe Stringer	1	-0/+13
	When using conntrack helpers from OVS, a common configuration is to perform a lookup without specifying a helper, then go through a firewalling policy, only to decide to attach a helper afterwards. In this case, the initial lookup will cause a ct entry to be attached to the skb, then the later commit with helper should attach the helper and confirm the connection. However, the helper attachment has been missing. If the user has enabled automatic helper attachment, then this issue will be masked as it will be applied in init_conntrack(). It is also masked if the action is executed from ovs_packet_cmd_execute() as that will construct a fresh skb. This patch fixes the issue by making an explicit call to try to assign the helper if there is a discrepancy between the action's helper and the current skb->nfct. Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11	x86/extable: ensure entries are swapped completely when sorting	Mathias Krause	1	-0/+8
	The x86 exception table sorting was changed in commit 29934b0fb8ff ("x86/extable: use generic search and sort routines") to use the arch independent code in lib/extable.c. However, the patch was mangled somehow on its way into the kernel from the last version posted at [1]. The committed version kind of attempted to incorporate the changes of commit 548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options") as in _completely_ _ignoring_ the x86 specific 'handler' member of struct exception_table_entry. This effectively broke the sorting as entries will only partly be swapped now. Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the exception table doesn't need to be sorted at runtime. However, in case that ever changes, we better not break the exception table sorting just because of that. [ Ard Biesheuvel points out that BUILDTIME_EXTABLE_SORT applies to the core image only, but we still rely on the sorting routines for modules in that case - Linus ] Fix this by providing a swap_ex_entry_fixup() macro that takes care of the 'handler' member. [1] https://lkml.org/lkml/2016/1/27/232 Signed-off-by: Mathias Krause <minipli@googlemail.com> Fixes: 29934b0fb8f ("x86/extable: use generic search and sort routines") Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-10	net sched: ife action fix late binding	Jamal Hadi Salim	1	-4/+10
	The process below was broken and is fixed with this patch. //add an ife action and give it an instance id of 1 sudo tc actions add action ife encode \ type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1 //create a filter which binds to ife action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:11 action ife index 1 Message before fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net sched: skbedit action fix late binding	Jamal Hadi Salim	1	-7/+11
	The process below was broken and is fixed with this patch. //add a skbedit action and give it an instance id of 1 sudo tc actions add action skbedit mark 10 index 1 //create a filter which binds to skbedit action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action skbedit index 1 Message before fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net sched: simple action fix late binding	Jamal Hadi Salim	1	-6/+12
	The process below was broken and is fixed with this patch. //add a simple action and give it an instance id of 1 sudo tc actions add action simple sdata "foobar" index 1 //create a filter which binds to simple action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action simple index 1 Message before fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net sched: mirred action fix late binding	Jamal Hadi Salim	1	-6/+13
	The process below was broken and is fixed with this patch. //add an mirred action and give it an instance id of 1 sudo tc actions add action mirred egress mirror dev $MDEV index 1 //create a filter which binds to mirred action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action mirred index 1 Message before bug fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net sched: ipt action fix late binding	Jamal Hadi Salim	1	-7/+12
	This was broken and is fixed with this patch. //add an ipt action and give it an instance id of 1 sudo tc actions add action ipt -j mark --set-mark 2 index 1 //create a filter which binds to ipt action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action ipt index 1 Message before bug fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net sched: vlan action fix late binding	Jamal Hadi Salim	1	-6/+16
	Late vlan action binding was broken and is fixed with this patch. //add a vlan action to pop and give it an instance id of 1 sudo tc actions add action vlan pop index 1 //create filter which binds to vlan action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32 \ match ip dst 17.0.0.1/32 flowid 1:1 action vlan index 1 current message(before bug fix) was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net: phylib: fix interrupts re-enablement in phy_start	Shaohui Xie	1	-3/+5
	If phy was suspended and is starting, current driver always enable phy's interrupts, if phy works in polling, phy can raise unexpected interrupt which will not be handled, the interrupt will block system enter suspend again. So interrupts should only be re-enabled if phy works in interrupt. Signed-off-by: Shaohui Xie <Shaohui.Xie@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	tcp: refresh skb timestamp at retransmit time	Eric Dumazet	1	-2/+4
	In the very unlikely case __tcp_retransmit_skb() can not use the cloning done in tcp_transmit_skb(), we need to refresh skb_mstamp before doing the copy and transmit, otherwise TCP TS val will be an exact copy of original transmit. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net: nps_enet: bug fix - handle lost tx interrupts	Elad Kanfi	1	-0/+15
	The tx interrupt is of edge type, and in case such interrupt is triggered while it is masked it will not be handled even after tx interrupts are re-enabled in the end of NAPI poll. This will cause tx network to stop in the following scenario: * Rx is being handled, hence interrupts are masked. * Tx interrupt is triggered after checking if there is some tx to handle and before re-enabling the interrupts. In this situation only rx transaction will release tx requests. In order to handle the tx that was missed( if there was one ), a NAPI reschdule was added after enabling the interrupts. Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Acked-by: Noam Camus <noamca@mellanox.com> Acked-by: Gilad Ben-Yossef <giladby@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net: nps_enet: Tx handler synchronization	Elad Kanfi	2	-8/+9
	Below is a description of a possible problematic sequence. CPU-A is sending a frame and CPU-B handles the interrupt that indicates the frame was sent. CPU-B reads an invalid value of tx_packet_sent. CPU-A CPU-B ----- ----- nps_enet_send_frame . . tx_skb = skb tx_packet_sent = true order HW to start tx . . HW complete tx ------> get tx complete interrupt . . if(tx_packet_sent == true) handle tx_skb end memory transaction (tx_packet_sent actually written) Furthermore there is a dependency between tx_skb and tx_packet_sent. There is no assurance that tx_skb contains a valid pointer at CPU B when it sees tx_packet_sent == true. Solution: Initialize tx_skb to NULL and use it to indicate that packet was sent, in this way tx_packet_sent can be removed. Add a write memory barrier after setting tx_skb in order to make sure that it is valid before HW is informed and IRQ is fired. Fixed sequence will be: CPU-A CPU-B ----- ----- tx_skb = skb wmb() . . order HW to start tx . . HW complete tx ------> get tx complete interrupt . . if(tx_skb != NULL) handle tx_skb tx_skb = NULL Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Acked-by: Noam Camus <noamca@mellanox.com> Acked-by: Gilad Ben-Yossef <giladby@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	kvmconfig: add more virtio drivers	Andrey Utkin	1	-0/+3
	"make defconfig kvmconfig" is supposed to end up with usable kernel for KVM guest. In practice, it won't work for e.g. Hetzner VPS (KVM-based) unless you add these options. Signed-off-by: Andrey Utkin <andrey_utkin@fastmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-10	x86/kvm: Add stack frame dependency to fastop() inline asm	Josh Poimboeuf	1	-1/+5
	The kbuild test robot reported this objtool warning [1]: arch/x86/kvm/emulate.o: warning: objtool: fastop()+0x69: call without frame pointer save/setup The issue seems to be caused by CONFIG_PROFILE_ALL_BRANCHES. With that option, for some reason gcc decides not to create a stack frame in fastop() before doing the inline asm call, which can result in a bad stack trace. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the inline asm statement. This change has no effect for !CONFIG_PROFILE_ALL_BRANCHES. [1] https://lists.01.org/pipermail/kbuild-all/2016-March/018249.html Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-10	sched/rt, sched/dl: Don't push if task's scheduling class was changed	Xunlei Pang	2	-0/+2
	We got this warning: WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0 [...] Call Trace: dump_stack+0x63/0x87 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 set_task_cpu+0x1af/0x1c0 push_dl_task.part.34+0xea/0x180 push_dl_tasks+0x17/0x30 __balance_callback+0x45/0x5c __sched_setscheduler+0x906/0xb90 SyS_sched_setattr+0x150/0x190 do_syscall_64+0x62/0x110 entry_SYSCALL64_slow_path+0x25/0x25 This corresponds to: WARN_ON_ONCE(p->state == TASK_RUNNING && p->sched_class == &fair_sched_class && (p->on_rq && !task_on_rq_migrating(p))) It happens because in find_lock_later_rq(), the task whose scheduling class was changed to fair class is still pushed away as if it were a deadline task ... So, check in find_lock_later_rq() after double_lock_balance(), if the scheduling class of the deadline task was changed, break and retry. Apply the same logic to RT tasks. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@arm.com> Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10	x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n	Thomas Gleixner	1	-1/+1
	Josef reported that the uncore driver trips over with CONFIG_SMP=n because x86_max_cores is 16 instead of 12. The reason is, that for SMP=n the extended topology detection is a NOOP and the cache leaf is used to determine the number of cores. That's wrong in two aspects: 1) The cache leaf enumerates the maximum addressable number of cores in the package, which is obviously not correct 2) UP has no business with topology bits at all. Make intel_num_cpu_cores() return 1 for CONFIG_SMP=n Reported-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team <Kernel-team@fb.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/761b4a2a-0332-7954-f030-c6639f949612@fb.com
2016-05-10	export tc ife uapi header	Jamal Hadi Salim	1	-0/+1
	Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10	net: thunderx: avoid exposing kernel stack	xypron.glpk@gmx.de	1	-0/+4
	Reserved fields should be set to zero to avoid exposing bits from the kernel stack. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09	net: fix a kernel infoleak in x25 module	Kangjie Lu	1	-0/+1
	Stack object "dte_facilities" is allocated in x25_rx_call_request(), which is supposed to be initialized in x25_negotiate_facilities. However, 5 fields (8 bytes in total) are not initialized. This object is then copied to userland via copy_to_user, thus infoleak occurs. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09	ravb: Add missing free_irq() call to ravb_close()	Geert Uytterhoeven	1	-0/+2
	When reopening the network device on ra7795/salvator-x, e.g. after a DHCP timeout: IP-Config: Reopening network devices... genirq: Flags mismatch irq 139. 00000000 (eth0:ch24:emac) vs. 00000000 (eth0:ch24:emac) ravb e6800000.ethernet eth0: cannot request IRQ eth0:ch24:emac IP-Config: Failed to open eth0 IP-Config: No network devices available The "mismatch" is due to requesting an IRQ that is already in use, while IRQF_PROBE_SHARED wasn't set. However, the real cause is that ravb_close() doesn't release the R-Car Gen3-specific secondary IRQ. Add the missing free_irq() call to fix this. Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09	uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h	Mikko Rapeli	2	-0/+72
	glibc's net/if.h contains copies of definitions from linux/if.h and these conflict and cause build failures if both files are included by application source code. Changes in uapi headers, which fixed header file dependencies to include linux/if.h when it was needed, e.g. commit 1ffad83d, made the net/if.h and linux/if.h incompatibilities visible as build failures for userspace applications like iproute2 and xtables-addons. This patch fixes compile errors when glibc net/if.h is included before linux/if.h: ./linux/if.h:99:21: error: redeclaration of enumerator ‘IFF_NOARP’ ./linux/if.h:98:23: error: redeclaration of enumerator ‘IFF_RUNNING’ ./linux/if.h:97:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’ ./linux/if.h:96:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’ ./linux/if.h:95:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’ ./linux/if.h:94:21: error: redeclaration of enumerator ‘IFF_DEBUG’ ./linux/if.h:93:25: error: redeclaration of enumerator ‘IFF_BROADCAST’ ./linux/if.h:92:19: error: redeclaration of enumerator ‘IFF_UP’ ./linux/if.h:252:8: error: redefinition of ‘struct ifconf’ ./linux/if.h:203:8: error: redefinition of ‘struct ifreq’ ./linux/if.h:169:8: error: redefinition of ‘struct ifmap’ ./linux/if.h:107:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’ ./linux/if.h:106:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’ ./linux/if.h:105:23: error: redeclaration of enumerator ‘IFF_PORTSEL’ ./linux/if.h:104:25: error: redeclaration of enumerator ‘IFF_MULTICAST’ ./linux/if.h:103:21: error: redeclaration of enumerator ‘IFF_SLAVE’ ./linux/if.h:102:22: error: redeclaration of enumerator ‘IFF_MASTER’ ./linux/if.h:101:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’ ./linux/if.h:100:23: error: redeclaration of enumerator ‘IFF_PROMISC’ The cases where linux/if.h is included before net/if.h need a similar fix in the glibc side, or the order of include files can be changed userspace code as a workaround. This change was tested in x86 userspace on Debian unstable with scripts/headers_compile_test.sh: $ make headers_install && \ cd usr/include && ../../scripts/headers_compile_test.sh -l -k ... cc -Wall -c -nostdinc -I /usr/lib/gcc/i586-linux-gnu/5/include -I /usr/lib/gcc/i586-linux-gnu/5/include-fixed -I . -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH/i586-linux-gnu -o /dev/null ./linux/if.h_libc_before_kernel.h PASSED libc before kernel test: ./linux/if.h Reported-by: Jan Engelhardt <jengelh@inai.de> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Reported-by: Stephen Hemminger <shemming@brocade.com> Reported-by: Waldemar Brodkorb <mail@waldemar-brodkorb.de> Cc: Gabriel Laskar <gabriel@lse.epita.fr> Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09	perf/core: Change the default paranoia level to 2	Andy Lutomirski	2	-2/+2
	Allowing unprivileged kernel profiling lets any user dump follow kernel control flow and dump kernel registers. This most likely allows trivial kASLR bypassing, and it may allow other mischief as well. (Off the top of my head, the PERF_SAMPLE_REGS_INTR output during /dev/urandom reads could be quite interesting.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-09	zsmalloc: fix zs_can_compact() integer overflow	Sergey Senozhatsky	1	-2/+5
	zs_can_compact() has two race conditions in its core calculation: unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) - zs_stat_get(class, OBJ_USED); 1) classes are not locked, so the numbers of allocated and used objects can change by the concurrent ops happening on other CPUs 2) shrinker invokes it from preemptible context Depending on the circumstances, thus, OBJ_ALLOCATED can become less than OBJ_USED, which can result in either very high or negative `total_scan' value calculated later in do_shrink_slab(). do_shrink_slab() has some logic to prevent those cases: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 However, due to the way `total_scan' is calculated, not every shrinker->count_objects() overflow can be spotted and handled. To demonstrate the latter, I added some debugging code to do_shrink_slab() (x86_64) and the results were: vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] vmscan: but total_scan > 0: 92679974445502 vmscan: resulting total_scan: 92679974445502 [..] vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] vmscan: but total_scan > 0: 22634041808232578 vmscan: resulting total_scan: 22634041808232578 Even though shrinker->count_objects() has returned an overflowed value, the resulting `total_scan' is positive, and, what is more worrisome, it is insanely huge. This value is getting used later on in shrinker->scan_objects() loop: while (total_scan >= batch_size \|\| total_scan >= freeable) { unsigned long ret; unsigned long nr_to_scan = min(batch_size, total_scan); shrinkctl->nr_to_scan = nr_to_scan; ret = shrinker->scan_objects(shrinker, shrinkctl); if (ret == SHRINK_STOP) break; freed += ret; count_vm_events(SLABS_SCANNED, nr_to_scan); total_scan -= nr_to_scan; cond_resched(); } `total_scan >= batch_size' is true for a very-very long time and 'total_scan >= freeable' is also true for quite some time, because `freeable < 0' and `total_scan' is large enough, for example, 22634041808232578. The only break condition, in the given scheme of things, is shrinker->scan_objects() == SHRINK_STOP test, which is a bit too weak to rely on, especially in heavy zsmalloc-usage scenarios. To fix the issue, take a pool stat snapshot and use it instead of racy zs_stat_get() calls. Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-09	Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan""	Robin Humble	1	-1/+1
	This reverts the 4.6-rc1 commit 7e2bc81da333 ("proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan") because it breaks /proc/$PID/whcan formatting in ps and top. Revert also because the patch is inconsistent - it adds a newline at the end of only the '0' wchan, and does not add a newline when /proc/$PID/wchan contains a symbol name. eg. $ ps -eo pid,stat,wchan,comm PID STAT WCHAN COMMAND ... 1189 S - dbus-launch 1190 Ssl 0 dbus-daemon 1198 Sl 0 lightdm 1299 Ss ep_pol systemd 1301 S - (sd-pam) 1304 Ss wait sh Signed-off-by: Robin Humble <plaguedbypenguins@gmail.com> Cc: Minfei Huang <mnfhuang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-09	compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16()	Josh Poimboeuf	1	-1/+1
	gcc support for __builtin_bswap16() was supposedly added for powerpc in gcc 4.6, and was then later added for other architectures in gcc 4.8. However, Stephen Rothwell reported that attempting to use it on powerpc in gcc 4.6 fails with: lib/vsprintf.c:160:2: error: initializer element is not constant lib/vsprintf.c:160:2: error: (near initialization for 'decpair[0]') lib/vsprintf.c:160:2: error: initializer element is not constant lib/vsprintf.c:160:2: error: (near initialization for 'decpair[1]') ... I'm not entirely sure what those errors mean, but I don't see them on gcc 4.8. So let's consider gcc 4.8 to be the official starting point for __builtin_bswap16(). Arnd Bergmann adds: "I found the commit in gcc-4.8 that replaced the powerpc-specific implementation of __builtin_bswap16 with an architecture-independent one. Apparently the powerpc version (gcc-4.6 and 4.7) just mapped to the lhbrx/sthbrx instructions, so it ended up not being a constant, though the intent of the patch was mainly to add support for the builtin to x86: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624 has the patch that went into gcc-4.8 and more information." Fixes: 7322dd755e7d ("byteswap: try to avoid __builtin_constant_p gcc bug") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-09	net/mlx5e: make VXLAN support conditional	Arnd Bergmann	5	-3/+24
	VXLAN can be disabled at compile-time or it can be a loadable module while mlx5 is built-in, which leads to a link error: drivers/net/built-in.o: In function `mlx5e_create_netdev': ntb_netdev.c:(.text+0x106de4): undefined reference to `vxlan_get_rx_port' This avoids the link error and makes the vxlan code optional, like the other ethernet drivers do as well. Link: https://patchwork.ozlabs.org/patch/589296/ Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09	Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue"	Arnd Bergmann	1	-1/+0
	This reverts commit 69976fb1045850a742deb9790ea49cbc6f497531. We cannot select VXLAN when IPv4 support is disabled, that just gives us additional build errors, including: warning: (MLX5_CORE_EN) selects VXLAN which has unmet direct dependencies (NETDEVICES && NET_CORE && INET) In file included from ../drivers/net/vxlan.c:36:0: include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads': include/net/udp_tunnel.h:112:9: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration] return iptunnel_handle_offloads(skb, type); ^~~~~~~~~~~~~~~~~~~~~~~~ I'm sending a proper fix for the original bug in a separate patch. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09	macsec: key identifier is 128 bits, not 64	Sabrina Dubroca	2	-7/+16
	The MACsec standard mentions a key identifier for each key, but doesn't specify anything about it, so I arbitrarily chose 64 bits. IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the key identifier to be 128 bits (96 bits "member identifier" + 32 bits "key number"). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08	Documentation/networking: more accurate LCO explanation	Shmulik Ladkani	1	-7/+7
	In few places the term "ones-complement sum" was used but the actual meaning is "the complement of the ones-complement sum". Also, avoid enclosing long statements with underscore, to ease readability. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08	macvtap: segmented packet is consumed	Eric Dumazet	1	-1/+1
	If GSO packet is segmented and its segments are properly queued, we call consume_skb() instead of kfree_skb() to be drop monitor friendly. Fixes: 3e4f8b7873709 ("macvtap: Perform GSO on forwarding path.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08	tools: bpf_jit_disasm: check for klogctl failure	Colin Ian King	1	-0/+3
	klogctl can fail and return -ve len, so check for this and return NULL to avoid passing a (size_t)-1 to malloc. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08	qede: uninitialized variable in qede_start_xmit()	Dan Carpenter	1	-1/+1
	"data_split" was never set to false. It's just uninitialized. Fixes: 2950219d87b0 ('qede: Add basic network device support') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08	Linux 4.6-rc7	Linus Torvalds	1	-1/+1

2016-05-08	MAINTAINERS: Add mmiotrace entry	Ingo Molnar	1	-0/+14
	The Nouveau maintainers would like to follow and review mmiotrace changes as well, so create a separate entry for that code. The high level bits are living in the tracing code, the low level bits in the x86 code. Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Pekka Paalanen <ppaalanen@gmail.com> Acked-by: karol herbst <karolherbst@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07	netxen: netxen_rom_fast_read() doesn't return -1	Dan Carpenter	1	-1/+2
	The error handling is broken here. netxen_rom_fast_read() returns zero on success and -EIO on error. It never returns -1. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07	netxen: reversed condition in netxen_nic_set_link_parameters()	Dan Carpenter	1	-1/+1
	My static checker complains that we are using "autoneg" without initializing it. The problem is the ->phy_read() condition is reversed so we only set this on error instead of success. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07	netxen: fix error handling in netxen_get_flash_block()	Dan Carpenter	1	-4/+8
	My static checker complained that "v" can be used unintialized if netxen_rom_fast_read() returns -EIO. That function never actually returns -1. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07	x86/topology: Handle CPUID bogosity gracefully	Thomas Gleixner	1	-0/+5
	Joseph reported that a XEN guest dies with a division by 0 in the package topology setup code. This happens if cpu_info.x86_max_cores is zero. Handle that case and emit a warning. This does not fix the underlying XEN bug, but makes the code more robust. Reported-and-tested-by: Joseph Salisbury <joseph.salisbury@canonical.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1605062046270.3540@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-07	sched/fair: Fix !CONFIG_SMP kernel cpufreq governor breakage	Rafael J. Wysocki	1	-1/+8
	The following commit: 34e2c555f3e1 ("cpufreq: Add mechanism for registering utilization update callbacks") overlooked the fact that update_load_avg(), where CFS invokes cpufreq utilization update callbacks, becomes an empty stub on UP kernels. In consequence, if !CONFIG_SMP, cpufreq governors are never invoked from CFS and they do not have a chance to evaluate CPU performace levels and update them often enough. Needless to say, things don't work as expected then. Fix the problem by making the !CONFIG_SMP stub of update_load_avg() invoke cpufreq update callbacks too. Reported-by: Steve Muckle <steve.muckle@linaro.org> Tested-by: Steve Muckle <steve.muckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Steve Muckle <steve.muckle@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux PM list <linux-pm@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Fixes: 34e2c555f3e1 (cpufreq: Add mechanism for registering utilization update callbacks) Link: http://lkml.kernel.org/r/6282396.VVEdgVYxO3@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-06	mlxsw: spectrum: Add missing rollback in flood configuration	Ido Schimmel	1	-0/+8
	When we fail to set the flooding configuration for the broadcast and unregistered multicast traffic, we should revert the flooding configuration of the unknown unicast traffic. Fixes: 0293038e0c36 ("mlxsw: spectrum: Add support for flood control") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06	mlxsw: spectrum: Fix rollback order in LAG join failure	Ido Schimmel	1	-2/+2
	Make the leave procedure in the error path symmetric to the join procedure and first remove the port from the collector before potentially destroying the LAG. Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06	udp_offload: Set encapsulation before inner completes.	Jarno Rajahalme	5	-3/+18
	UDP tunnel segmentation code relies on the inner offsets being set for an UDP tunnel GSO packet, but the inner _complete() functions will set the inner offsets only if 'encapsulation' is set before calling them. Currently, udp_gro_complete() sets 'encapsulation' only after the inner _complete() functions are done. This causes the inner offsets having invalid values after udp_gro_complete() returns, which in turn will make it impossible to properly segment the packet in case it needs to be forwarded, which would be visible to the user either as invalid packets being sent or as packet loss. This patch fixes this by setting skb's 'encapsulation' in udp_gro_complete() before calling into the inner complete functions, and by making each possible UDP tunnel gro_complete() callback set the inner_mac_header to the beginning of the tunnel payload. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06	udp_tunnel: Remove redundant udp_tunnel_gro_complete().	Jarno Rajahalme	4	-15/+0
	The setting of the UDP tunnel GSO type is already performed by udp[46]_gro_complete(). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06	qede: prevent chip hang when increasing channels	Sudarsana Reddy Kalluru	1	-4/+2
	qede requires qed to provide enough resources to accommodate 16 combined channels, but that upper-bound isn't actually being enforced by it. Instead, qed inform back to qede how many channels can be opened based on available resources - but that calculation doesn't really take into account the resources requested by qede; Instead it considers other FW/HW available resources. As a result, if a user would increase the number of channels to more than 16 [e.g., using ethtool] the chip would hang. This change increments the resources requested by qede to 64 combined channels instead of 16; This value is an upper bound on the possible available channels [due to other FW/HW resources]. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06	net: ipv6: tcp reset, icmp need to consider L3 domain	David Ahern	2	-4/+8
	Responses for packets to unused ports are getting lost with L3 domains. IPv4 has ip_send_unicast_reply for sending TCP responses which accounts for L3 domains; update the IPv6 counterpart tcp_v6_send_response. For icmp the L3 master check needs to be moved up in icmp6_send to properly respond to UDP packets to a port with no listener. Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>