linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-09-17	workqueue: make workqueue available early during boot	Tejun Heo	3	-17/+76
	Workqueue is currently initialized in an early init call; however, there are cases where early boot code has to be split and reordered to come after workqueue initialization or the same code path which makes use of workqueues is used both before workqueue initailization and after. The latter cases have to gate workqueue usages with keventd_up() tests, which is nasty and easy to get wrong. Workqueue usages have become widespread and it'd be a lot more convenient if it can be used very early from boot. This patch splits workqueue initialization into two steps. workqueue_init_early() which sets up the basic data structures so that workqueues can be created and work items queued, and workqueue_init() which actually brings up workqueues online and starts executing queued work items. The former step can be done very early during boot once memory allocation, cpumasks and idr are initialized. The latter right after kthreads become available. This allows work item queueing and canceling from very early boot which is what most of these use cases want. * As systemd_wq being initialized doesn't indicate that workqueue is fully online anymore, update keventd_up() to test wq_online instead. The follow-up patches will get rid of all its usages and the function itself. * Flushing doesn't make sense before workqueue is fully initialized. The flush functions trigger WARN and return immediately before fully online. * Work items are never in-flight before fully online. Canceling can always succeed by skipping the flush step. * Some code paths can no longer assume to be called with irq enabled as irq is disabled during early boot. Use irqsave/restore operations instead. v2: Watchdog init, which requires timer to be running, moved from workqueue_init_early() to workqueue_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
2016-09-16	workqueue: dump workqueue state on sanity check failures in destroy_workqueue()	Tejun Heo	1	-0/+2
	destroy_workqueue() performs a number of sanity checks to ensure that the workqueue is empty before proceeding with destruction. However, it's not always easy to tell what's going on just from the warning message. Let's dump workqueue state after sanity check failures to help debugging. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/CACT4Y+Zs6vkjHo9qHb4TrEiz3S4+quvvVQ9VWvj2Mx6pETGb9Q@mail.gmail.com Cc: Dmitry Vyukov <dvyukov@google.com>
2016-09-14	arm/xen: fix SMP guests boot	Vitaly Kuznetsov	1	-4/+3
	Commit 88e957d6e47f ("xen: introduce xen_vcpu_id mapping") broke SMP ARM guests on Xen. When FIFO-based event channels are in use (this is the default), evtchn_fifo_alloc_control_block() is called on CPU_UP_PREPARE event and this happens before we set up xen_vcpu_id mapping in xen_starting_cpu. Temporary fix the issue by setting direct Linux CPU id <-> Xen vCPU id mapping for all possible CPUs at boot. We don't currently support kexec/kdump on Xen/ARM so these ids always match. In future, we have several ways to solve the issue, e.g.: - Eliminate all hypercalls from CPU_UP_PREPARE, do them from the starting CPU. This can probably be done for both x86 and ARM and, if done, will allow us to get Xen's idea of vCPU id from CPUID/MPIDR on the starting CPU directly, no messing with ACPI/device tree required. - Save vCPU id information from ACPI/device tree on ARM and use it to initialize xen_vcpu_id mapping. This is the same trick we currently do on x86. Reported-by: Julien Grall <julien.grall@arm.com> Tested-by: Wei Chen <Wei.Chen@arm.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-09-13	avr32: fix copy_from_user()	Al Viro	3	-4/+13
	really ugly, but apparently avr32 compilers turns access_ok() into something so bad that they want it in assembler. Left that way, zeroing added in inline wrapper. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	microblaze: fix __get_user()	Al Viro	1	-1/+1
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	microblaze: fix copy_from_user()	Al Viro	1	-3/+6
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	m32r: fix __get_user()	Al Viro	1	-1/+1
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	blackfin: fix copy_from_user()	Al Viro	1	-4/+5
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	sparc32: fix copy_from_user()	Al Viro	1	-1/+3
	Cc: stable@vger.kernel.org Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	sh: fix copy_from_user()	Al Viro	1	-1/+4
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	sh64: failing __get_user() should zero	Al Viro	1	-0/+1
	It could be done in exception-handling bits in __get_user_b() et.al., but the surgery involved would take more knowledge of sh64 details than I have or _want_ to have. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	score: fix copy_from_user() and friends	Al Viro	1	-21/+20
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	score: fix __get_user/get_user	Al Viro	1	-1/+4
	* should zero on any failure * __get_user() should use __copy_from_user(), not copy_from_user() Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	s390: get_user() should zero on failure	Al Viro	1	-4/+4
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	ppc32: fix copy_from_user()	Al Viro	1	-23/+2
	should clear on access_ok() failures. Also remove the useless range truncation logics. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	parisc: fix copy_from_user()	Al Viro	1	-2/+4
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	openrisc: fix copy_from_user()	Al Viro	1	-24/+11
	... that should zero on faults. Also remove the <censored> helpful logics wrt range truncation copied from ppc32. Where it had ever been needed only in case of copy_from_user() and had not been merged into the mainline until a month after the need had disappeared. A decade before openrisc went into mainline, I might add... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	nios2: fix __get_user()	Al Viro	1	-2/+2
	a) should not leave crap on fault b) should _not_ require access_ok() in any cases. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	nios2: copy_from_user() should zero the tail of destination	Al Viro	1	-3/+6
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	mn10300: copy_from_user() should zero on access_ok() failure...	Al Viro	1	-1/+3
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	mn10300: failing __get_user() and get_user() should zero	Al Viro	1	-0/+1
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	mips: copy_from_user() must zero the destination on access_ok() failure	Al Viro	1	-0/+3
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	ARC: uaccess: get_user to zero out dest in cause of fault	Vineet Gupta	1	-2/+9
	Al reported potential issue with ARC get_user() as it wasn't clearing out destination pointer in case of fault due to bad address etc. Verified using following \| { \| u32 bogus1 = 0xdeadbeef; \| u64 bogus2 = 0xdead; \| int rc1, rc2; \| \| pr_info("Orig values %x %llx\n", bogus1, bogus2); \| rc1 = get_user(bogus1, (u32 __user )0x40000000); \| rc2 = get_user(bogus2, (u64 __user )0x50000000); \| pr_info("access %d %d, new values %x %llx\n", \| rc1, rc2, bogus1, bogus2); \| } \| [ARCLinux]# insmod /mnt/kernel-module/qtn.ko \| Orig values deadbeef dead \| access -14 -14, new values 0 0 Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	metag: copy_from_user() should zero the destination on access_ok() failure	Al Viro	1	-1/+2
	Cc: stable@vger.kernel.org Acked-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	ia64: copy_from_user() should zero the destination on access_ok() failure	Al Viro	1	-14/+11
	Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	hexagon: fix strncpy_from_user() error return	Al Viro	1	-1/+2
	It's -EFAULT, not -1 (and contrary to the comment in there, __strnlen_user() can return 0 - on faults). Cc: stable@vger.kernel.org Acked-by: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	frv: fix clear_user()	Al Viro	1	-3/+9
	It should check access_ok(). Otherwise a bunch of places turn into trivially exploitable rootholes. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	cris: buggered copy_from_user/copy_to_user/clear_user	Al Viro	1	-39/+32
	* copy_from_user() on access_ok() failure ought to zero the destination * none of those primitives should skip the access_ok() check in case of small constant size. Cc: stable@vger.kernel.org Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13	asm-generic: make get_user() clear the destination on errors	Al Viro	1	-3/+7
	both for access_ok() failures and for faults halfway through Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-11	Linux 4.8-rc6	Linus Torvalds	1	-1/+1

2016-09-11	net/mlx4_en: Fix panic on xmit while port is down	Moshe Shemesh	1	-5/+7
	When port is down, tx drop counter update is not needed. Updating the counter in this case can cause a kernel panic as when the port is down, ring can be NULL. Fixes: 63a664b7e92b ("net/mlx4_en: fix tx_dropped bug") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11	net/mlx4_en: Fixes for DCBX	Tariq Toukan	4	-43/+28
	This patch adds a capability check before enabling DCBX. In addition, it re-organizes the relevant data structures, and fixes a typo in a define. Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11	net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()	Kamal Heib	1	-1/+4
	mlx4_en_dcbnl_set_state() returns u8, the return value from mlx4_en_setup_tc() could be negative in case of failure, so fix that. Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11	net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()	Kamal Heib	1	-10/+11
	mlx4_en_dcbnl_set_all() returns u8, so return value can't be negative in case of failure. Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11	nvme: make NVME_RDMA depend on BLOCK	Linus Torvalds	1	-1/+1
	Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci driver") removed the dependency on BLK_DEV_NVME, but the cdoe does depend on the block layer (which used to be an implicit dependency through BLK_DEV_NVME). Otherwise you get various errors from the kbuild test robot random config testing when that happens to hit a configuration with BLOCK device support disabled. Cc: Christoph Hellwig <hch@lst.de> Cc: Jay Freyensee <james_p_freyensee@linux.intel.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-11	NFSv4.1: Fix the CREATE_SESSION slot number accounting	Trond Myklebust	1	-2/+10
	Ensure that we conform to the algorithm described in RFC5661, section 18.36.4 for when to bump the sequence id. In essence we do it for all cases except when the RPC call timed out, or in case of the server returning NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org
2016-09-10	net: ethernet: renesas: sh_eth: add POST registers for rz	Chris Brandt	1	-0/+7
	Due to a mistake in the hardware manual, the FWSLC and POST1-4 registers were not documented and left out of the driver for RZ/A making the CAM feature non-operational. Additionally, when the offset values for POST1-4 are left blank, the driver attempts to set them using an offset of 0xFFFF which can cause a memory corruption or panic. This patch fixes the panic and properly enables CAM. Reported-by: Daniel Palmer <daniel@0x0f.com> Signed-off-by: Chris Brandt <chris.brandt@renesas.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10	perf/x86/intel: Fix PEBSv3 record drain	Peter Zijlstra	1	-8/+11
	Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running the perf fuzzer. This means the PEBSv3 record included a status bit for an inactive event, something that _should_ not happen. Move the code that filters the status bits against our known PEBS events up a spot to guarantee we only deal with events we know about. Further add "continue" statements to the WARN_ON_ONCE()s such that we'll not die nor generate silly events in case we ever do hit them again. Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: stable@vger.kernel.org Fixes: a3d86542de88 ("perf/x86/intel/pebs: Add PEBSv3 decoding") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10	perf/x86/intel/bts: Kill a silly warning	Alexander Shishkin	1	-2/+0
	At the moment, intel_bts will WARN() out if there is more than one event writing to the same ring buffer, via SET_OUTPUT, and will only send data from one event to a buffer. There is no reason to have this warning in, so kill it. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10	perf/x86/intel/bts: Fix BTS PMI detection	Alexander Shishkin	1	-4/+15
	Since BTS doesn't have a dedicated PMI status bit, the driver needs to take extra care to check for the condition that triggers it to avoid spurious NMI warnings. Regardless of the local BTS context state, the only way of knowing that the NMI is ours is to compare the write pointer against the interrupt threshold. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10	perf/x86/intel/bts: Fix confused ordering of PMU callbacks	Alexander Shishkin	1	-24/+80
	The intel_bts driver is using a CPU-local 'started' variable to order callbacks and PMIs and make sure that AUX transactions don't get messed up. However, the ordering rules in regard to this variable is a complete mess, which recently resulted in perf_fuzzer-triggered warnings and panics. The general ordering rule that is patch is enforcing is that this cpu-local variable be set only when the cpu-local AUX transaction is active; consequently, this variable is to be checked before the AUX related bits can be touched. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10	perf/core: Fix aux_mmap_count vs aux_refcount order	Alexander Shishkin	1	-4/+11
	The order of accesses to ring buffer's aux_mmap_count and aux_refcount has to be preserved across the users, namely perf_mmap_close() and perf_aux_output_begin(), otherwise the inversion can result in the latter holding the last reference to the aux buffer and subsequently free'ing it in atomic context, triggering a warning. > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 257 at kernel/events/ring_buffer.c:541 __rb_free_aux+0x11a/0x130 > CPU: 0 PID: 257 Comm: stopbug Not tainted 4.8.0-rc1+ #2596 > Call Trace: > [<ffffffff810f3e0b>] __warn+0xcb/0xf0 > [<ffffffff810f3f3d>] warn_slowpath_null+0x1d/0x20 > [<ffffffff8121182a>] __rb_free_aux+0x11a/0x130 > [<ffffffff812127a8>] rb_free_aux+0x18/0x20 > [<ffffffff81212913>] perf_aux_output_begin+0x163/0x1e0 > [<ffffffff8100c33a>] bts_event_start+0x3a/0xd0 > [<ffffffff8100c42d>] bts_event_add+0x5d/0x80 > [<ffffffff81203646>] event_sched_in.isra.104+0xf6/0x2f0 > [<ffffffff8120652e>] group_sched_in+0x6e/0x190 > [<ffffffff8120694e>] ctx_sched_in+0x2fe/0x5f0 > [<ffffffff81206ca0>] perf_event_sched_in+0x60/0x80 > [<ffffffff81206d1b>] ctx_resched+0x5b/0x90 > [<ffffffff81207281>] __perf_event_enable+0x1e1/0x240 > [<ffffffff81200639>] event_function+0xa9/0x180 > [<ffffffff81202000>] ? perf_cgroup_attach+0x70/0x70 > [<ffffffff8120203f>] remote_function+0x3f/0x50 > [<ffffffff811971f3>] flush_smp_call_function_queue+0x83/0x150 > [<ffffffff81197bd3>] generic_smp_call_function_single_interrupt+0x13/0x60 > [<ffffffff810a6477>] smp_call_function_single_interrupt+0x27/0x40 > [<ffffffff81a26ea9>] call_function_single_interrupt+0x89/0x90 > [<ffffffff81120056>] finish_task_switch+0xa6/0x210 > [<ffffffff81120017>] ? finish_task_switch+0x67/0x210 > [<ffffffff81a1e83d>] __schedule+0x3dd/0xb50 > [<ffffffff81a1efe5>] schedule+0x35/0x80 > [<ffffffff81128031>] sys_sched_yield+0x61/0x70 > [<ffffffff81a25be5>] entry_SYSCALL_64_fastpath+0x18/0xa8 > ---[ end trace 6235f556f5ea83a9 ]--- This patch puts the checks in perf_aux_output_begin() in the same order as that of perf_mmap_close(). Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10	perf/core: Fix a race between mmap_close() and set_output() of AUX events	Alexander Shishkin	1	-6/+25
	In the mmap_close() path we need to stop all the AUX events that are writing data to the AUX area that we are unmapping, before we can safely free the pages. To determine if an event needs to be stopped, we're comparing its ->rb against the one that's getting unmapped. However, a SET_OUTPUT ioctl may turn up inside an AUX transaction and swizzle event::rb to some other ring buffer, but the transaction will keep writing data to the old ring buffer until the event gets scheduled out. At this point, mmap_close() will skip over such an event and will proceed to free the AUX area, while it's still being used by this event, which will set off a warning in the mmap_close() path and cause a memory corruption. To avoid this, always stop an AUX event before its ->rb is updated; this will release the (potentially) last reference on the AUX area of the buffer. If the event gets restarted, its new ring buffer will be used. If another SET_OUTPUT comes and switches it back to the old ring buffer that's getting unmapped, it's also fine: this ring buffer's aux_mmap_count will be zero and AUX transactions won't start any more. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10	fscrypto: require write access to mount to set encryption policy	Eric Biggers	4	-25/+29
	Since setting an encryption policy requires writing metadata to the filesystem, it should be guarded by mnt_want_write/mnt_drop_write. Otherwise, a user could cause a write to a frozen or readonly filesystem. This was handled correctly by f2fs but not by ext4. Make fscrypt_process_policy() handle it rather than relying on the filesystem to get it right. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs} Signed-off-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-09	fscrypto: only allow setting encryption policy on directories	Eric Biggers	1	-0/+2
	The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption policy on nondirectory files. This was unintentional, and in the case of nonempty regular files did not behave as expected because existing data was not actually encrypted by the ioctl. In the case of ext4, the user could also trigger filesystem errors in ->empty_dir(), e.g. due to mismatched "directory" checksums when the kernel incorrectly tried to interpret a regular file as a directory. This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with kernels v4.6 and later. It appears that older kernels only permitted directories and that the check was accidentally lost during the refactoring to share the file encryption code between ext4 and f2fs. This patch restores the !S_ISDIR() check that was present in older kernels. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-09	fscrypto: add authorization check for setting encryption policy	Eric Biggers	1	-0/+3
	On an ext4 or f2fs filesystem with file encryption supported, a user could set an encryption policy on any empty directory() to which they had readonly access. This is obviously problematic, since such a directory might be owned by another user and the new encryption policy would prevent that other user from creating files in their own directory (for example). Fix this by requiring inode_owner_or_capable() permission to set an encryption policy. This means that either the caller must own the file, or the caller must have the capability CAP_FOWNER. () Or also on any regular file, for f2fs v4.6 and later and ext4 v4.8-rc1 and later; a separate bug fix is coming for that. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs} Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-09	drivers: net: phy: mdio-xgene: Add hardware dependency	Jean Delvare	1	-0/+1
	The mdio-xgene driver is only useful on X-Gene SoC. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Iyappan Subramanian <isubramanian@apm.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	dwc_eth_qos: do not register semi-initialized device	Lars Persson	1	-20/+18
	We move register_netdev() to the end of dwceqos_probe() to close any races where the netdev callbacks are called before the initialization has finished. Reported-by: Pavel Andrianov <andrianov@ispras.ru> Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	sctp: identify chunks that need to be fragmented at IP level	Marcelo Ricardo Leitner	1	-1/+12
	Previously, without GSO, it was easy to identify it: if the chunk didn't fit and there was no data chunk in the packet yet, we could fragment at IP level. So if there was an auth chunk and we were bundling a big data chunk, it would fragment regardless of the size of the auth chunk. This also works for the context of PMTU reductions. But with GSO, we cannot distinguish such PMTU events anymore, as the packet is allowed to exceed PMTU. So we need another check: to ensure that the chunk that we are adding, actually fits the current PMTU. If it doesn't, trigger a flush and let it be fragmented at IP level in the next round. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09	libnvdimm: allow legacy (e820) pmem region to clear bad blocks	Dave Jiang	1	-1/+5
	Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation where legacy pmem is being used or a pmem region created by using memmap kernel parameter, the injected bad blocks are not cleared due to nvdimm_clear_poison() failing from lack of ndctl function pointer. In this case we need to just return as handled and allow the bad blocks to be cleared rather than fail. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>