wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2016-03-05	char: genrtc: replace blacklist with whitelist	Arnd Bergmann	1	-1/+2
	Every new architecture has to add itself to the growing list of those that do not support the old "generic" RTC driver. This replaces the long list of architectures that don't support it with a shorter list of those that do. The list is taken from those architectures that have a non-empty asm/rtc.h header file and were not explicitly blacklisted. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-05	drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular	Paul Gortmaker	1	-2/+2
	In commit 941943cf519f7cacbbcecee5c4ef4b77b466bd5c ("drivers/hwtracing: make coresight-* explicitly non-modular") we removed all uses of modular functions/macros in favour of their built-in equivlents in this subsystem. However that commit and commit 0bcbf2e30ff2271b54f54c8697a185f7d86ec6e4 ("coresight: etm-perf: new PMU driver for ETM tracers") were in flight at the same time, and hence one new non-modular user of module_init crept back in. Fix it up like we did all the others. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-05	drivers: char: mem: fix IS_ERROR_VALUE usage	Andrzej Hajda	1	-1/+1
	IS_ERR_VALUE macro should be used only with unsigned long type. Specifically it works incorrectly with longer types. The patch follows conclusion from discussion on LKML [1][2]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2120927 [2]: http://permalink.gmane.org/gmane.linux.kernel/2150581 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-05	char: xillybus: Fix internal data structure initialization	Eli Billauer	1	-1/+3
	A couple of fields in a data structure, which is used by the driver only, were not initialized properly during the driver's setup. The primary issue with this bug was that channel->wr_buf_size remained zero, so calls to dma_sync_single_for_cpu() took place with zero size, and consequently did nothing. This had a rather minimal practical impact, because (a) these calls are NOPs on Intel/AMD platforms, as well as other platforms with coherent cache, and (b) it's extremely rare that any cache line would survive between two reads from a given DMA buffer Hence no significant practical difference is expected with this patch. Signed-off-by: Eli Billauer <eli.billauer@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-05	pch_phub: return -ENODATA if ROM can't be mapped	Colin Ian King	1	-1/+3
	The error return err is not initialized for the case when pci_map_rom fails and no ROM can me mapped. Fix this by setting ret to -ENODATA; (this is the same error value that is returned if the ROM data is successfully mapped but does not match the expected ROM signature.). Issue found from static code analysis using CoverityScan. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: vmbus: Support kexec on ws2012 r2 and above	Alex Ng	1	-1/+9
	WS2012 R2 and above hosts can support kexec in that thay can support reconnecting to the host (as would be needed in the kexec path) on any CPU. Enable this. Pre ws2012 r2 hosts don't have this ability and consequently cannot support kexec. Signed-off-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: vmbus: Support handling messages on multiple CPUs	K. Y. Srinivasan	3	-7/+17
	Starting with Windows 2012 R2, message inteerupts can be delivered on any VCPU in the guest. Support this functionality. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: utils: Remove util transport handler from list if registration fails	Alex Ng	1	-0/+3
	If util transport fails to initialize for any reason, the list of transport handlers may become corrupted due to freeing the transport handler without removing it from the list. Fix this by cleaning it up from the list. Signed-off-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: util: Pass the channel information during the init call	K. Y. Srinivasan	5	-3/+5
	Pass the channel information to the util drivers that need to defer reading the channel while they are processing a request. This would address the following issue reported by Vitaly: Commit 3cace4a61610 ("Drivers: hv: utils: run polling callback always in interrupt context") removed direct _transaction.state = HVUTIL_READY assignments from _handle_handshake() functions introducing the following race: if a userspace daemon connects before we get first non-negotiation request from the server hv_poll_channel() won't set transaction state to HVUTIL_READY as (!channel) condition will fail, we set it to non-NULL on the first real request from the server. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload()	Vitaly Kuznetsov	1	-1/+1
	Message header is modified by the hypervisor and we read it in a loop, we need to prevent compilers from optimizing accesses. There are no such optimizations at this moment, this is just a future proof. Suggested-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Radim Kr.má<rkrcmar@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: vmbus: remove code duplication in message handling	Vitaly Kuznetsov	3	-47/+27
	We have 3 functions dealing with messages and they all implement the same logic to finalize reads, move it to vmbus_signal_eom(). Suggested-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Radim Kr.má<rkrcmar@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: vmbus: avoid wait_for_completion() on crash	Vitaly Kuznetsov	4	-6/+6
	wait_for_completion() may sleep, it enables interrupts and this is something we really want to avoid on crashes because interrupt handlers can cause other crashes. Switch to the recently introduced vmbus_wait_for_unload() doing busy wait instead. Reported-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Radim Kr.má<rkrcmar@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages	Vitaly Kuznetsov	1	-35/+33
	We must handle HVMSG_TIMER_EXPIRED messages in the interrupt context and we offload all the rest to vmbus_on_msg_dpc() tasklet. This functions loops to see if there are new messages pending. In case we'll ever see HVMSG_TIMER_EXPIRED message there we're going to lose it as we can't handle it from there. Avoid looping in vmbus_on_msg_dpc(), we're OK with handling one message per interrupt. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Radim Kr.má<rkrcmar@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	misc: at24: replace memory_accessor with nvmem_device_read	Andrew Lunn	6	-52/+13
	Now that the AT24 uses the NVMEM framework, replace the memory_accessor in the setup() callback with nvmem API calls. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	eeprom: 93xx46: extend driver to plug into the NVMEM framework	Andrew Lunn	2	-27/+95
	Add a regmap for accessing the EEPROM, and then use that with the NVMEM framework. Enable backward compatibility in the NVMEM config structure, so that the 'eeprom' file in sys is provided by the framework. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	eeprom: at25: extend driver to plug into the NVMEM framework	Andrew Lunn	2	-55/+71
	Add a regmap for accessing the EEPROM, and then use that with the NVMEM framework. Enable backwards compatibility in the NVMEM config, so that the 'eeprom' file in sys is provided by the framework. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	eeprom: at25: Remove in kernel API for accessing the EEPROM	Andrew Lunn	2	-28/+0
	The setup() callback is not used by any in kernel code. Remove it. Any new code which requires access to the eeprom can use the NVMEM API. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	eeprom: at24: extend driver to plug into the NVMEM framework	Andrew Lunn	2	-41/+82
	Add a regmap for accessing the EEPROM, and then use that with the NVMEM framework. Set the NVMEM config structure to enable backward, so that the 'eeprom' file in sys is provided by the framework. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	nvmem: Add backwards compatibility support for older EEPROM drivers.	Andrew Lunn	2	-9/+79
	Older drivers made an 'eeprom' file available in the /sys device directory. Have the NVMEM core provide this to retain backwards compatibility. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	nvmem: Add flag to export NVMEM to root only	Andrew Lunn	2	-2/+56
	Legacy AT24, AT25 EEPROMs are exported in sys so that only root can read the contents. The EEPROMs may contain sensitive information. Add a flag so the provide can indicate that NVMEM should also restrict access to root only. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	misc: ad525x_dpot: Fix the enabling of the "otpXen" attributes	Dan Bogdan Nechita	1	-1/+1
	Currently writing the attributes with "echo" will result in comparing: "enabled\n" with "enabled\0" and attribute is always set to false. Use the sysfs_streq() instead because it treats both NUL and new-line-then-NUL as equivalent string terminations. Signed-off-by: Dan Bogdan Nechita <dan.bogdan.nechita@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors	Michael Hennerich	1	-1/+1
	Fix RDAC read back errors caused by a typo. Value must shift by 2. Fixes: a4bd394956f2 ("drivers/misc/ad525x_dpot.c: new features") Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	mei: me: add broxton pci device ids	Tomas Winkler	2	-0/+7
	Add device ids for Broxton SoC based devices. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01	lkdtm: improve use-after-free tests	Kees Cook	1	-4/+15
	This improves the order of operations on the use-after-free tests to try to make sure we've executed any available sanity-checking code, and to report the poisoning that was found. Signed-off-by: Kees Cook <keescook@chromium.org>
2016-03-01	lkdtm: add test for atomic_t underflow/overflow	David Windsor	1	-0/+13
	dmesg output of running this LKDTM test with PaX: [187095.475573] lkdtm: No crash points registered, enable through debugfs [187118.020257] lkdtm: Performing direct entry WRAP_ATOMIC [187118.030045] lkdtm: attempting atomic underflow [187118.030929] PAX: refcount overflow detected in: bash:1790, uid/euid: 0/0 [187118.071667] PAX: refcount overflow occured at: lkdtm_do_action+0x19e/0x400 [lkdtm] [187118.081423] CPU: 3 PID: 1790 Comm: bash Not tainted 4.2.6-pax-refcount-split+ #2 [187118.083403] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [187118.102596] task: ffff8800da8de040 ti: ffff8800da8e4000 task.ti: ffff8800da8e4000 [187118.111321] RIP: 0010:[<ffffffffc00fc2fe>] [<ffffffffc00fc2fe>] lkdtm_do_action+0x19e/0x400 [lkdtm] ... [187118.128074] lkdtm: attempting atomic overflow [187118.128080] PAX: refcount overflow detected in: bash:1790, uid/euid: 0/0 [187118.128082] PAX: refcount overflow occured at: lkdtm_do_action+0x1b6/0x400 [lkdtm] [187118.128085] CPU: 3 PID: 1790 Comm: bash Not tainted 4.2.6-pax-refcount-split+ #2 [187118.128086] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [187118.128088] task: ffff8800da8de040 ti: ffff8800da8e4000 task.ti: ffff8800da8e4000 [187118.128092] RIP: 0010:[<ffffffffc00fc316>] [<ffffffffc00fc316>] lkdtm_do_action+0x1b6/0x400 [lkdtm] Signed-off-by: David Windsor <dave@progbits.org> [cleaned up whitespacing, keescook] Signed-off-by: Kees Cook <keescook@chromium.org>
2016-03-01	lkdtm: Add read/write after free tests for buddy memory	Laura Abbott	1	-0/+45
	The current tests for read/write after free work on slab allocated memory. Memory straight from the buddy allocator may behave slightly differently and have a different set of parameters to test. Add tests for those cases as well. On a basic x86 boot: # echo WRITE_BUDDY_AFTER_FREE > /sys/kernel/debug/provoke-crash/DIRECT [ 22.291950] lkdtm: Performing direct entry WRITE_BUDDY_AFTER_FREE [ 22.292983] lkdtm: Writing to the buddy page before free [ 22.293950] lkdtm: Attempting bad write to the buddy page after free # echo READ_BUDDY_AFTER_FREE > /sys/kernel/debug/provoke-crash/DIRECT [ 32.375601] lkdtm: Performing direct entry READ_BUDDY_AFTER_FREE [ 32.379896] lkdtm: Value in memory before free: 12345678 [ 32.383854] lkdtm: Attempting to read from freed memory [ 32.389309] lkdtm: Buddy page was not poisoned On x86 with CONFIG_DEBUG_PAGEALLOC and debug_pagealloc=on: # echo WRITE_BUDDY_AFTER_FREE > /sys/kernel/debug/provoke-crash/DIRECT [ 17.475533] lkdtm: Performing direct entry WRITE_BUDDY_AFTER_FREE [ 17.477360] lkdtm: Writing to the buddy page before free [ 17.479089] lkdtm: Attempting bad write to the buddy page after free [ 17.480904] BUG: unable to handle kernel paging request at ffff88000ebd8000 # echo READ_BUDDY_AFTER_FREE > /sys/kernel/debug/provoke-crash/DIRECT [ 14.606433] lkdtm: Performing direct entry READ_BUDDY_AFTER_FREE [ 14.607447] lkdtm: Value in memory before free: 12345678 [ 14.608161] lkdtm: Attempting to read from freed memory [ 14.608860] BUG: unable to handle kernel paging request at ffff88000eba3000 Note that arches without ARCH_SUPPORTS_DEBUG_PAGEALLOC may not produce the same crash. Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-03-01	lkdtm: Update WRITE_AFTER_FREE test	Laura Abbott	1	-4/+13
	The SLUB allocator may use the first word of a freed block to store the freelist information. This may make it harder to test poisoning features. Change the WRITE_AFTER_FREE test to better match what the READ_AFTER_FREE test does and also print out a big more information. Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-03-01	lkdtm: Add READ_AFTER_FREE test	Laura Abbott	1	-0/+38
	In a similar manner to WRITE_AFTER_FREE, add a READ_AFTER_FREE test to test free poisoning features. Sample output when no sanitization is present: # echo READ_AFTER_FREE > /sys/kernel/debug/provoke-crash/DIRECT [ 17.542473] lkdtm: Performing direct entry READ_AFTER_FREE [ 17.543866] lkdtm: Value in memory before free: 12345678 [ 17.545212] lkdtm: Attempting bad read from freed memory [ 17.546542] lkdtm: Memory was not poisoned with slub_debug=P: # echo READ_AFTER_FREE > /sys/kernel/debug/provoke-crash/DIRECT [ 22.415531] lkdtm: Performing direct entry READ_AFTER_FREE [ 22.416366] lkdtm: Value in memory before free: 12345678 [ 22.417137] lkdtm: Attempting bad read from freed memory [ 22.417897] lkdtm: Memory correctly poisoned, calling BUG Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-03-01	MAINTAINERS: add myself as lkdtm maintainer	Kees Cook	1	-0/+5
	Officially claim maintainership over the LKDTM code. Signed-off-by: Kees Cook <keescook@chromium.org>
2016-03-01	sparc64: Fix sparc64_set_context stack handling.	David S. Miller	1	-1/+1
	Like a signal return, we should use synchronize_user_stack() rather than flush_user_windows(). Reported-by: Ilya Malakhov <ilmalakhovthefirst@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01	sparc32: Add -Wa,-Av8 to KBUILD_CFLAGS.	David S. Miller	1	-0/+6
	Binutils used to be (erroneously) extremely permissive about instruction usage. But that got fixed and if you don't properly tell it to accept classes of instructions it will fail. This uncovered a specs bug on sparc in gcc where it wouldn't pass the proper options to binutils options. Deal with this in the kernel build by adding -Wa,-Av8 to KBUILD_CFLAGS. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29	extcon: palmas: Drop IRQF_EARLY_RESUME flag	Grygorii Strashko	1	-2/+2
	Palams extcon IRQs are nested threaded and wired to the Palmas inerrupt controller. So, this flag is not required for nested irqs anymore, since commit 3c646f2c6aa9 ("genirq: Don't suspend nested_thread irqs over system suspend") was merged. Cc: MyungJoo Ham <myungjoo.ham@samsung.com> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Roger Quadros <rogerq@ti.com> Cc: Nishanth Menon <nm@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2016-02-28	Linux 4.5-rc6	Linus Torvalds	1	-1/+1

2016-02-27	do_last(): ELOOP failure exit should be done after leaving RCU mode	Al Viro	1	-5/+4
	... or we risk seeing a bogus value of d_is_symlink() there. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27	should_follow_link(): validate ->d_seq after having decided to follow	Al Viro	1	-0/+5
	... otherwise d_is_symlink() above might have nothing to do with the inode value we've got. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27	namei: ->d_inode of a pinned dentry is stable only for positives	Al Viro	1	-2/+2
	both do_last() and walk_component() risk picking a NULL inode out of dentry about to become positive, then checking its flags and seeing that it's not negative anymore and using (already stale by then) value they'd fetched earlier. Usually ends up oopsing soon after that... Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27	do_last(): don't let a bogus return value from ->open() et.al. to confuse us	Al Viro	1	-0/+4
	... into returning a positive to path_openat(), which would interpret that as "symlink had been encountered" and proceed to corrupt memory, etc. It can only happen due to a bug in some ->open() instance or in some LSM hook, etc., so we report any such event and make sure it doesn't trick us into further unpleasantness. Cc: stable@vger.kernel.org # v3.6+, at least Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27	fs: return -EOPNOTSUPP if clone is not supported	Christoph Hellwig	1	-2/+4
	-EBADF is a rather confusing error if an operations is not supported, and nfsd gets rather upset about it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27	hpfs: don't truncate the file when delete fails	Mikulas Patocka	1	-28/+3
	The delete opration can allocate additional space on the HPFS filesystem due to btree split. The HPFS driver checks in advance if there is available space, so that it won't corrupt the btree if we run out of space during splitting. If there is not enough available space, the HPFS driver attempted to truncate the file, but this results in a deadlock since the commit 7dd29d8d865efdb00c0542a5d2c87af8c52ea6c7 ("HPFS: Introduce a global mutex and lock it on every callback from VFS"). This patch removes the code that tries to truncate the file and -ENOSPC is returned instead. If the user hits -ENOSPC on delete, he should try to delete other files (that are stored in a leaf btree node), so that the delete operation will make some space for deleting the file stored in non-leaf btree node. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@vger.kernel.org # 2.6.39+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27	ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()	Ross Zwisler	2	-35/+3
	As it is currently written ext4_dax_mkwrite() assumes that the call into __dax_mkwrite() will not have to do a block allocation so it doesn't create a journal entry. For a read that creates a zero page to cover a hole followed by a write that actually allocates storage this is incorrect. The ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls get_blocks() to allocate storage. Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault() as this function already has all the logic needed to allocate a journal entry and call __dax_fault(). Also update the ext2 fault handlers in this same way to remove duplicate code and keep the logic between ext2 and ext4 the same. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-27	dax: move writeback calls into the filesystems	Ross Zwisler	7	-16/+43
	Previously calls to dax_writeback_mapping_range() for all DAX filesystems (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range(). dax_writeback_mapping_range() needs a struct block_device, and it used to get that from inode->i_sb->s_bdev. This is correct for normal inodes mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw block devices and for XFS real-time files. Instead, call dax_writeback_mapping_range() directly from the filesystem ->writepages function so that it can supply us with a valid block device. This also fixes DAX code to properly flush caches in response to sync(2). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	dax: give DAX clearing code correct bdev	Ross Zwisler	6	-10/+13
	dax_clear_blocks() needs a valid struct block_device and previously it was using inode->i_sb->s_bdev in all cases. This is correct for normal inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for DAX raw block devices and for XFS real-time devices. Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change its arguments to take a bdev and a sector instead of an inode and a block. This better reflects what the function does, and it allows the filesystem and raw block device code to pass in an appropriate struct block_device. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	ext4: online defrag not supported with DAX	Ross Zwisler	1	-0/+5
	Online defrag operations for ext4 are hard coded to use the page cache. See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page() When combined with DAX I/O, which circumvents the page cache, this can result in data corruption. This was observed with xfstests ext4/307 and ext4/308. Fix this by only allowing online defrag for non-DAX files. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	ext2, ext4: only set S_DAX for regular inodes	Ross Zwisler	2	-2/+2
	When S_DAX is set on an inode we assume that if there are pages attached to the mapping (mapping->nrpages != 0), those pages are clean zero pages that were used to service reads from holes. Any dirty data associated with the inode should be in the form of DAX exceptional entries (mapping->nrexceptional) that is written back via dax_writeback_mapping_range(). With the current code, though, this isn't always true. For example, ext2 and ext4 directory inodes can have S_DAX set, but have their dirty data stored as dirty page cache entries. For these types of inodes, having S_DAX set doesn't really make sense since their I/O doesn't actually happen through the DAX code path. Instead, only allow S_DAX to be set for regular inodes for ext2 and ext4. This allows us to have strict DAX vs non-DAX paths in the writeback code. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	block: disable block device DAX by default	Dan Williams	2	-1/+18
	The recent *sync enabling discovered that we are inserting into the block_device pagecache counter to the expectations of the dirty data tracking for dax mappings. This can lead to data corruption. We want to support DAX for block devices eventually, but it requires wider changes to properly manage the pagecache. dump_stack+0x85/0xc2 dax_writeback_mapping_range+0x60/0xe0 blkdev_writepages+0x3f/0x50 do_writepages+0x21/0x30 __filemap_fdatawrite_range+0xc6/0x100 filemap_write_and_wait+0x4a/0xa0 set_blocksize+0x70/0xd0 sb_set_blocksize+0x1d/0x50 ext4_fill_super+0x75b/0x3360 mount_bdev+0x180/0x1b0 ext4_mount+0x15/0x20 mount_fs+0x38/0x170 Mark the support broken so its disabled by default, but otherwise still available for testing. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	ocfs2: unlock inode if deleting inode from orphan fails	Guozhonghua	1	-0/+1
	When doing append direct io cleanup, if deleting inode fails, it goes out without unlocking inode, which will cause the inode deadlock. This issue was introduced by commit cf1776a9e834 ("ocfs2: fix a tiny race when truncate dio orohaned entry"). Signed-off-by: Guozhonghua <guozhonghua@h3c.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	mm: ASLR: use get_random_long()	Daniel Cashman	8	-14/+14
	Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	drivers: char: random: add get_random_long()	Daniel Cashman	2	-0/+23
	Commit d07e22597d1d ("mm: mmap: add new /proc tunable for mmap_base ASLR") added the ability to choose from a range of values to use for entropy count in generating the random offset to the mmap_base address. The maximum value on this range was set to 32 bits for 64-bit x86 systems, but this value could be increased further, requiring more than the 32 bits of randomness provided by get_random_int(), as is already possible for arm64. Add a new function: get_random_long() which more naturally fits with the mmap usage of get_random_int() but operates exactly the same as get_random_int(). Also, fix the shifting constant in mmap_rnd() to be an unsigned long so that values greater than 31 bits generate an appropriate mask without overflow. This is especially important on x86, as its shift instruction uses a 5-bit mask for the shift operand, which meant that any value for mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base randomization. Finally, replace calls to get_random_int() with get_random_long() where appropriate. This patch (of 2): Add get_random_long(). Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	mm: numa: quickly fail allocations for NUMA balancing on full nodes	Mel Gorman	1	-1/+1
	Commit 4167e9b2cf10 ("mm: remove GFP_THISNODE") removed the GFP_THISNODE flag combination due to confusing semantics. It noted that alloc_misplaced_dst_page() was one such user after changes made by commit e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify"). Unfortunately when GFP_THISNODE was removed, users of alloc_misplaced_dst_page() started waking kswapd and entering direct reclaim because the wrong GFP flags are cleared. The consequence is that workloads that used to fit into memory now get reclaimed which is addressed by this patch. The problem can be demonstrated with "mutilate" that exercises memcached which is software dedicated to memory object caching. The configuration uses 80% of memory and is run 3 times for varying numbers of clients. The results on a 4-socket NUMA box are mutilate 4.4.0 4.4.0 vanilla numaswap-v1 Hmean 1 8394.71 ( 0.00%) 8395.32 ( 0.01%) Hmean 4 30024.62 ( 0.00%) 34513.54 ( 14.95%) Hmean 7 32821.08 ( 0.00%) 70542.96 (114.93%) Hmean 12 55229.67 ( 0.00%) 93866.34 ( 69.96%) Hmean 21 39438.96 ( 0.00%) 85749.21 (117.42%) Hmean 30 37796.10 ( 0.00%) 50231.49 ( 32.90%) Hmean 47 18070.91 ( 0.00%) 38530.13 (113.22%) The metric is queries/second with the more the better. The results are way outside of the noise and the reason for the improvement is obvious from some of the vmstats 4.4.0 4.4.0 vanillanumaswap-v1r1 Minor Faults 1929399272 2146148218 Major Faults 19746529 3567 Swap Ins 57307366 9913 Swap Outs 50623229 17094 Allocation stalls 35909 443 DMA allocs 0 0 DMA32 allocs 72976349 170567396 Normal allocs 5306640898 5310651252 Movable allocs 0 0 Direct pages scanned 404130893 799577 Kswapd pages scanned 160230174 0 Kswapd pages reclaimed 55928786 0 Direct pages reclaimed 1843936 41921 Page writes file 2391 0 Page writes anon 50623229 17094 The vanilla kernel is swapping like crazy with large amounts of direct reclaim and kswapd activity. The figures are aggregate but it's known that the bad activity is throughout the entire test. Note that simple streaming anon/file memory consumers also see this problem but it's not as obvious. In those cases, kswapd is awake when it should not be. As there are at least two reclaim-related bugs out there, it's worth spelling out the user-visible impact. This patch only addresses bugs related to excessive reclaim on NUMA hardware when the working set is larger than a NUMA node. There is a bug related to high kswapd CPU usage but the reports are against laptops and other UMA hardware and is not addressed by this patch. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [4.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27	mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED	Andrea Arcangeli	1	-2/+12
	pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were introduced to locklessy (but atomically) detect when a pmd is a regular (stable) pmd or when the pmd is unstable and can infinitely transition from pmd_none() and pmd_trans_huge() from under us, while only holding the mmap_sem for reading (for writing not). While holding the mmap_sem only for reading, MADV_DONTNEED can run from under us and so before we can assume the pmd to be a regular stable pmd we need to compare it against pmd_none() and pmd_trans_huge() in an atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a tiny window for a race. Useful applications are unlikely to notice the difference as doing MADV_DONTNEED concurrently with a page fault would lead to undefined behavior. [akpm@linux-foundation.org: tidy up comment grammar/layout] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>