linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-03-08	overlayfs: remove now unnecessary header file include	Linus Torvalds	1	-1/+0
	This removes the extra include header file that was added in commit e58bc927835a "Pull overlayfs updates from Miklos Szeredi" now that it is no longer needed. There are probably other such includes that got added during the scheduler header splitup series, but this is the one that annoyed me personally and I know about. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-08	xfs: try any AG when allocating the first btree block when reflinking	Christoph Hellwig	2	-6/+10
	When a reflink operation causes the bmap code to allocate a btree block we're currently doing single-AG allocations due to having ->firstblock set and then try any higher AG due a little reflink quirk we've put in when adding the reflink code. But given that we do not have a minleft reservation of any kind in this AG we can still not have any space in the same or higher AG even if the file system has enough free space. To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back path instead. [And yes, we need to redo this properly instead of piling hacks over hacks. I'm working on that, but it's not going to be a small series. In the meantime this fixes the customer reported issue] Also add a warning for failing allocations to make it easier to debug. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-03-08	sched/headers: fix up header file dependency on <linux/sched/signal.h>	Linus Torvalds	2	-21/+49
	The scheduler header file split and cleanups ended up exposing a few nasty header file dependencies, and in particular it showed how we in <linux/wait.h> ended up depending on "signal_pending()", which now comes from <linux/sched/signal.h>. That's a very subtle and annoying dependency, which already caused a semantic merge conflict (see commit e58bc927835a "Pull overlayfs updates from Miklos Szeredi", which added that fixup in the merge commit). It turns out that we can avoid this dependency _and_ improve code generation by moving the guts of the fairly nasty helper #define __wait_event_interruptible_locked() to out-of-line code. The code that includes the signal_pending() check is all in the slow-path where we actually go to sleep waiting for the event anyway, so using a helper function is the right thing to do. Using a helper function is also what we already did for the non-locked versions, see the "__wait_event()" macros and the "prepare_to_wait()" set of helper functions. We might want to try to unify all these macro games, we have a _lot_ of subtly different wait-event loops. But this is the minimal patch to fix the annoying header dependency. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-08	xfs: use iomap new flag for newly allocated delalloc blocks	Brian Foster	2	-17/+32
	Commit fa7f138 ("xfs: clear delalloc and cache on buffered write failure") fixed one regression in the iomap error handling code and exposed another. The fundamental problem is that if a buffered write is a rewrite of preexisting delalloc blocks and the write fails, the failure handling code can punch out preexisting blocks with valid file data. This was reproduced directly by sub-block writes in the LTP kernel/syscalls/write/write03 test. A first 100 byte write allocates a single block in a file. A subsequent 100 byte write fails and punches out the block, including the data successfully written by the previous write. To address this problem, update the ->iomap_begin() handler to distinguish newly allocated delalloc blocks from preexisting delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the ->iomap_end() handler to decide when a failed or short write should punch out delalloc blocks. This introduces the subtle requirement that ->iomap_begin() should never combine newly allocated delalloc blocks with existing blocks in the resulting iomap descriptor. This can occur when a new delalloc reservation merges with a neighboring extent that is part of the current write, for example. Therefore, drop the post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and just return the record inserted into the fork. This ensures only new blocks are returned and thus that preexisting delalloc blocks are always handled as "found" blocks and not punched out on a failed rewrite. Reported-by: Xiong Zhou <xzhou@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-03-08	axonram: Fix gendisk handling	Jan Kara	1	-1/+4
	It is invalid to call del_gendisk() when disk->queue is NULL. Fix error handling in axon_ram_probe() to avoid doing that. Also del_gendisk() does not drop a reference to gendisk allocated by alloc_disk(). That has to be done by put_disk(). Add that call where needed. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	blk: improve order of bio handling in generic_make_request()	NeilBrown	1	-4/+21
	To avoid recursion on the kernel stack when stacked block devices are in use, generic_make_request() will, when called recursively, queue new requests for later handling. They will be handled when the make_request_fn for the current bio completes. If any bios are submitted by a make_request_fn, these will ultimately be handled seqeuntially. If the handling of one of those generates further requests, they will be added to the end of the queue. This strict first-in-first-out behaviour can lead to deadlocks in various ways, normally because a request might need to wait for a previous request to the same device to complete. This can happen when they share a mempool, and can happen due to interdependencies particular to the device. Both md and dm have examples where this happens. These deadlocks can be erradicated by more selective ordering of bios. Specifically by handling them in depth-first order. That is: when the handling of one bio generates one or more further bios, they are handled immediately after the parent, before any siblings of the parent. That way, when generic_make_request() calls make_request_fn for some particular device, we can be certain that all previously submited requests for that device have been completely handled and are not waiting for anything in the queue of requests maintained in generic_make_request(). An easy way to achieve this would be to use a last-in-first-out stack instead of a queue. However this will change the order of consecutive bios submitted by a make_request_fn, which could have unexpected consequences. Instead we take a slightly more complex approach. A fresh queue is created for each call to a make_request_fn. After it completes, any bios for a different device are placed on the front of the main queue, followed by any bios for the same device, followed by all bios that were already on the queue before the make_request_fn was called. This provides the depth-first approach without reordering bios on the same level. This, by itself, it not enough to remove all deadlocks. It just makes it possible for drivers to take the extra step required themselves. To avoid deadlocks, drivers must never risk waiting for a request after submitting one to generic_make_request. This includes never allocing from a mempool twice in the one call to a make_request_fn. A common pattern in drivers is to call bio_split() in a loop, handling the first part and then looping around to possibly split the next part. Instead, a driver that finds it needs to split a bio should queue (with generic_make_request) the second part, handle the first part, and then return. The new code in generic_make_request will ensure the requests to underlying bios are processed first, then the second bio that was split off. If it splits again, the same process happens. In each case one bio will be completely handled before the next one is attempted. With this is place, it should be possible to disable the punt_bios_to_recover() recovery thread for many block devices, and eventually it may be possible to remove it completely. Ref: http://www.spinics.net/lists/raid/msg54680.html Tested-by: Jinpu Wang <jinpu.wang@profitbricks.com> Inspired-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	Revert "scsi, block: fix duplicate bdi name registration crashes"	Jan Kara	5	-65/+8
	This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes leaking of device numbers for SCSI when SCSI registers multiple gendisks for one request_queue in succession. It can be easily reproduced using Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE. Furthermore the protection provided by this commit is not needed anymore as the problem it was fixing got also fixed by commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()". [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2 Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	block: Make del_gendisk() safer for disks without queues	Jan Kara	1	-6/+10
	Commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" added disk->queue dereference to del_gendisk(). Although del_gendisk() is not supposed to be called without disk->queue valid and blk_unregister_queue() warns in that case, this change will make it oops instead. Return to the old more robust behavior of just warning when del_gendisk() gets called for gendisk with disk->queue being NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	bdi: Fix use-after-free in wb_congested_put()	Jan Kara	1	-15/+21
	bdi_writeback_congested structures get created for each blkcg and bdi regardless whether bdi is registered or not. When they are created in unregistered bdi and the request queue (and thus bdi) is then destroyed while blkg still holds reference to bdi_writeback_congested structure, this structure will be referencing freed bdi and last wb_congested_put() will try to remove the structure from already freed bdi. With commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()", SCSI started to destroy bdis without calling bdi_unregister() first (previously it was calling bdi_unregister() even for unregistered bdis) and thus the code detaching bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we started hitting this use-after-free bug. It is enough to boot a KVM instance with virtio-scsi device to trigger this behavior. Fix the problem by detaching bdi_writeback_congested structures in bdi_exit() instead of bdi_unregister(). This is also more logical as they can get attached to bdi regardless whether it ever got registered or not. Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820 Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	block: Allow bdi re-registration	Jan Kara	1	-0/+7
	SCSI can call device_add_disk() several times for one request queue when a device in unbound and bound, creating new gendisk each time. This will lead to bdi being repeatedly registered and unregistered. This was not a big problem until commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" since bdi was only registered repeatedly (bdi_register() handles repeated calls fine, only we ended up leaking reference to gendisk due to overwriting bdi->owner) but unregistered only in blk_cleanup_queue() which didn't get called repeatedly. After 165a5e22fafb we were doing correct bdi_register() - bdi_unregister() cycles however bdi_unregister() is not prepared for it. So make sure bdi_unregister() cleans up bdi in such a way that it is prepared for a possible following bdi_register() call. An easy way to provoke this behavior is to enable CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a scsi disk which immediately hangs without this fix. Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820 Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	block/sed: Fix opal user range check and unused variables	Jon Derrick	1	-8/+2
	Fixes check that the opal user is within the range, and cleans up unused method variables. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	zram: set physical queue limits to avoid array out of bounds accesses	Johannes Thumshirn	1	-0/+2
	zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using the NVMe over Fabrics loopback target which potentially sends a huge bulk of pages attached to the bio's bvec this results in a kernel panic because of array out of bounds accesses in zram_decompress_page(). Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	blk-mq: free hctx->cpumask in release handler of hctx's kobject	Ming Lei	2	-12/+1
	It is obviously that hctx->cpumask is per hctx, and both share same lifetime, so this patch moves freeing of hctx->cpumask into release handler of hctx's kobject. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	blk-mq: make lifetime consistent between hctx and its kobject	Ming Lei	2	-9/+11
	This patch removes kobject_put() over hctx in __blk_mq_unregister_dev(), and trys to keep lifetime consistent between hctx and hctx's kobject. Now blk_mq_sysfs_register() and blk_mq_sysfs_unregister() become totally symmetrical, and kobject's refcounter drops to zero just when the hctx is freed. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	blk-mq: make lifetime consitent between q/ctx and its kobject	Ming Lei	3	-8/+20
	Currently from kobject view, both q->mq_kobj and ctx->kobj can be released during one cycle of blk_mq_register_dev() and blk_mq_unregister_dev(). Actually, sw queue's lifetime is same with its request queue's, which is covered by request_queue->kobj. So we don't need to call kobject_put() for the two kinds of kobject in __blk_mq_unregister_dev(), instead we do that in release handler of request queue. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()	Ming Lei	3	-4/+5
	Both q->mq_kobj and sw queues' kobjects should have been initialized once, instead of doing that each add_disk context. Also this patch removes clearing of ctx in blk_mq_init_cpu_queues() because percpu allocator fills zero to allocated variable. This patch fixes one issue[1] reported from Omar. [1] kernel wearning when doing unbind/bind on one scsi-mq device [ 19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong. [ 19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34 [ 19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014 [ 19.350920] Workqueue: events_unbound async_run_entry_fn [ 19.350920] Call Trace: [ 19.350920] dump_stack+0x63/0x83 [ 19.350920] kobject_init+0x77/0x90 [ 19.350920] blk_mq_register_dev+0x40/0x130 [ 19.350920] blk_register_queue+0xb6/0x190 [ 19.350920] device_add_disk+0x1ec/0x4b0 [ 19.350920] sd_probe_async+0x10d/0x1c0 [sd_mod] [ 19.350920] async_run_entry_fn+0x48/0x150 [ 19.350920] process_one_work+0x1d0/0x480 [ 19.350920] worker_thread+0x48/0x4e0 [ 19.350920] kthread+0x101/0x140 [ 19.350920] ? process_one_work+0x480/0x480 [ 19.350920] ? kthread_create_on_node+0x60/0x60 [ 19.350920] ret_from_fork+0x2c/0x40 Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08	ktest: Make sure wait_for_input does honor the timeout	Steven Rostedt (VMware)	1	-7/+11
	The function wait_for_input takes in a timeout, and even has a default timeout. But if for some reason the STDIN descriptor keeps sending in data, the function will never time out. The timout is to wait for the data from the passed in file descriptor, not for STDIN. Adding a test in the case where there's no data from the passed in file descriptor that checks to see if the timeout passed, will ensure that it will timeout properly even if there's input in STDIN. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-08	ktest: Fix while loop in wait_for_input	Steven Rostedt (VMware)	1	-3/+4
	The run_command function was changed to use the wait_for_input function to allow having a timeout if the command to run takes too much time. There was a bug in the wait_for_input where it could end up going into an infinite loop. There's two issues here. One is that the return value of the sysread wasn't used for the write (to write a proper size), and that it should continue processing the passed in file descriptor too even if there was input. There was no check for error, if for some reason STDIN returned an error, the function would go into an infinite loop and never exit. Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 6e98d1b4415f ("ktest: Add timeout to ssh command") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-08	MIPS: Add missing include files	Arnd Bergmann	14	-0/+20
	After the split of linux/sched.h, several platforms in arch/mips stopped building. Add the respective additional #include statements to fix the problem I first tried adding these into asm/processor.h, but ran into circular header dependencies with that which I could not figure out. The commit I listed as causing the problem is the branch merge, as there is likely a combination of multiple patches in that branch. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: ralf@linux-mips.org Fixes: 1827adb11ad2 ("Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") Link: http://lkml.kernel.org/r/20170308072931.3836696-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-07	xfs: remove kmem_zalloc_greedy	Darrick J. Wong	3	-24/+2
	The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses it to grab 1-4 pages for staging of inobt records. The infinite loop in the greedy allocation function is causing hangs[1] in generic/269, so just get rid of the greedy allocator in favor of kmem_zalloc_large. This makes bulkstat somewhat more likely to ENOMEM if there's really no pages to spare, but eliminates a source of hangs. [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> --- v2: remove single-page fallback
2017-03-07	xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask	Chandan Rajendra	1	-2/+1
	When block size is larger than inode cluster size, the call to XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs would have set xfs_sb->sb_inoalignmt to 0. Hence in xfs_set_inoalignment(), xfs_mount->m_inoalign_mask gets initialized to -1 instead of 0. However, xfs_mount->m_sinoalign would get correctly intialized to 0 because for every positive value of xfs_mount->m_dalign, the condition "!(mp->m_dalign & mp->m_inoalign_mask)" would evaluate to false. Also, xfs_imap() worked fine even with xfs_mount->m_inoalign_mask having -1 as the value because blks_per_cluster variable would have the value 1 and hence we would never have a need to use xfs_mount->m_inoalign_mask to compute the inode chunk's agbno and offset within the chunk. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-03-07	xfs: fix and streamline error handling in xfs_end_io	Christoph Hellwig	1	-32/+27
	There are two different cases of buffered I/O errors: - first we can have an already shutdown fs. In that case we should skip any on-disk operations and just clean up the appen transaction if present and destroy the ioend - a real I/O error. In that case we should cleanup any lingering COW blocks. This gets skipped in the current code and is fixed by this patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-03-07	xfs: only reclaim unwritten COW extents periodically	Christoph Hellwig	6	-13/+22
	We only want to reclaim preallocations from our periodic work item. Currently this is archived by looking for a dirty inode, but that check is rather fragile. Instead add a flag to xfs_reflink_cancel_cow_* so that the caller can ask for just cancelling unwritten extents in the COW fork. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix typos in commit message] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-03-07	PCI/ASPM: Always set link->downstream to avoid NULL dereference on remove	Yinghai Lu	1	-3/+2
	We call pcie_aspm_exit_link_state() when we remove a device. If the device is the last PCIe function to be removed below a bridge and the bridge has an ASPM link_state struct, we disable ASPM on the link. Disabling ASPM requires link->downstream (used in pcie_config_aspm_link()). We previously set link->downstream in pcie_aspm_cap_init(), but only if the device was not blacklisted. Removing the blacklisted device caused a NULL pointer dereference in the pcie_aspm_exit_link_state() -> pcie_config_aspm_link() path: # echo 1 > /sys/bus/pci/devices/0000\:0b\:00.0/remove ... BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 IP: pcie_config_aspm_link+0x5d/0x2b0 Call Trace: pcie_aspm_exit_link_state+0x75/0x130 pci_stop_bus_device+0xa4/0xb0 pci_stop_and_remove_bus_device_locked+0x1a/0x30 remove_store+0x50/0x70 dev_attr_store+0x18/0x30 sysfs_kf_write+0x44/0x60 kernfs_fop_write+0x10e/0x190 __vfs_write+0x28/0x110 ? rcu_read_lock_sched_held+0x5d/0x80 ? rcu_sync_lockdep_assert+0x2c/0x60 ? __sb_start_write+0x173/0x1a0 ? vfs_write+0xb3/0x180 vfs_write+0xc4/0x180 SyS_write+0x49/0xa0 do_syscall_64+0xa6/0x1c0 entry_SYSCALL64_slow_path+0x25/0x25 ---[ end trace bd187ee0267df5d9 ]--- To avoid this, set link->downstream in alloc_pcie_link_state(), so every pcie_link_state structure has a valid link->downstream pointer. [bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rajat Jain <rajatja@google.com> CC: stable@vger.kernel.org
2017-03-07	PCI: Prevent VPD access for QLogic ISP2722	Ethan Zhao	1	-0/+1
	QLogic ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter has the VPD access issue too, while read the common pci-sysfs access interface shown as /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0/vpd with simple 'cat' could cause system hang and panic: Kernel panic - not syncing: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources: 1. Integrated Management Log (IML) 2. OA Syslog 3. OA Forward Progress Log 4. iLO Event Log CPU: 0 PID: 15070 Comm: udevadm Not tainted 4.1.12 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015 0000000000000086 000000007f0cdf51 ffff880c4fa05d58 ffffffff817193de ffffffffa00b42d8 0000000000000075 ffff880c4fa05dd8 ffffffff81714072 0000000000000008 ffff880c4fa05de8 ffff880c4fa05d88 000000007f0cdf51 Call Trace: <NMI> [<ffffffff817193de>] dump_stack+0x63/0x81 [<ffffffff81714072>] panic+0xd0/0x20e [<ffffffffa00b390d>] hpwdt_pretimeout+0xdd/0xe0 [hpwdt] [<ffffffff81021fc9>] ? sched_clock+0x9/0x10 [<ffffffff8101c101>] nmi_handle+0x91/0x170 [<ffffffff8101c10c>] ? nmi_handle+0x9c/0x170 [<ffffffff8101c5fe>] io_check_error+0x1e/0xa0 [<ffffffff8101c719>] default_do_nmi+0x99/0x140 [<ffffffff8101c8b4>] do_nmi+0xf4/0x170 [<ffffffff817232c5>] end_repeat_nmi+0x1a/0x1e [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120 [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120 [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120 <<EOE>> [<ffffffff815db4b3>] raw_pci_read+0x23/0x40 [<ffffffff815db4fc>] pci_read+0x2c/0x30 [<ffffffff8136f612>] pci_user_read_config_word+0x72/0x110 [<ffffffff8136f746>] pci_vpd_pci22_wait+0x96/0x130 [<ffffffff8136ff9b>] pci_vpd_pci22_read+0xdb/0x1a0 [<ffffffff8136ea30>] pci_read_vpd+0x20/0x30 [<ffffffff8137d590>] read_vpd_attr+0x30/0x40 [<ffffffff8128e037>] sysfs_kf_bin_read+0x47/0x70 [<ffffffff8128d24e>] kernfs_fop_read+0xae/0x180 [<ffffffff8120dd97>] __vfs_read+0x37/0x100 [<ffffffff812ba7e4>] ? security_file_permission+0x84/0xa0 [<ffffffff8120e366>] ? rw_verify_area+0x56/0xe0 [<ffffffff8120e476>] vfs_read+0x86/0x140 [<ffffffff8120f3f5>] SyS_read+0x55/0xd0 [<ffffffff81720f2e>] system_call_fastpath+0x12/0x71 Shutting down cpus with NMI Kernel Offset: disabled drm_kms_helper: panic occurred, switching back to text console So blacklist the access to its VPD. Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v4.6+
2017-03-07	PCI: exynos: Initialize elbi_base even when using PHY framework	Jaehoon Chung	1	-4/+4
	Even when using the PHY framework, we need the elbi_base. Before this patch, we didn't initialize elbi_base, which caused NULL pointer dereferences later. Fixes: e7cd7ef58e1f ("PCI: exynos: Support the PHY generic framework") Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-03-07	radix tree test suite: Specify -m32 in LDFLAGS too	Matthew Wilcox	1	-0/+1
	Michael's patch to use the default make rule for linking and the patch from Rehas to use -m32 if building a 32-bit test-suite on a 64-bit platform don't work well together. Reported-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	ida: Free correct IDA bitmap	Matthew Wilcox	4	-5/+35
	There's a relatively rare race where we look at the per-cpu preallocated IDA bitmap, see it's NULL, allocate a new one, and atomically update it. If the kmalloc() happened to sleep and we were rescheduled to a different CPU, or an interrupt came in at the exact right time, another task might have successfully allocated a bitmap and already deposited it. I forgot what the semantics of cmpxchg() were and ended up freeing the wrong bitmap leading to KASAN reporting a use-after-free. Dmitry found the bug with syzkaller & wrote the patch. I wrote the test case that will reproduce the bug without his patch being applied. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Depend on Makefile and quieten grep	Matthew Wilcox	1	-2/+2
	Changing the CFLAGS in the Makefile didn't always lead to a recompilation because the OFILES didn't depend on the Makefile. Also, after doing make clean, grep would still complain about a missing map-shift.h; we need -s as well as -q. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Fix build with --as-needed	Michael Ellerman	1	-4/+2
	Currently the radix tree test suite doesn't build with toolchains that use --as-needed by default, for example Ubuntu's: cc -I. -I../../include -g -O2 -Wall -D_LGPL_SOURCE -fsanitize=address -lpthread -lurcu main.o ... -o main /usr/bin/ld: regression1.o: undefined reference to symbol 'pthread_join@@GLIBC_2.17' /lib/powerpc64le-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status This is caused by the custom makefile rules placing LDFLAGS before the .o files that need the libraries. We could fix it by using --no-as-needed, or rewriting the custom rules. But we can also just drop the custom rules and move the libraries to LDLIBS, and then the default rules work correctly - with the one caveat that we need to add -fsanitize=address to LDFLAGS because that must be passed to the linker as well as the compiler. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Build 32 bit binaries	Rehas Sachdeva	1	-0/+4
	Add option 'make BUILD=32' for building 32-bit binaries. Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Add performance test for radix_tree_join()	Rehas Sachdeva	1	-0/+47
	Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Add performance test for radix_tree_split()	Rehas Sachdeva	1	-0/+44
	Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Add performance benchmarks	Rehas Sachdeva	1	-7/+75
	Add performance benchmarks for radix tree insertion, tagging and deletion. Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Add test for radix_tree_clear_tags()	Rehas Sachdeva	1	-0/+29
	Assert that radix_tree_clear_tags() clears the tags on the passed node and slot. Assert that the case where the radix tree has only one entry at index zero and the node is NULL, is also handled. Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Add tests for ida_simple_get() and ida_simple_remove()	Rehas Sachdeva	1	-0/+19
	Assert that ida_simple_get() allocates an id in the passed range or returns error on failure, and ida_simple_remove() releases an allocated id. Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	radix tree test suite: Add test for idr_get_next()	Rehas Sachdeva	1	-0/+25
	Assert that idr_get_next() returns the next populated entry in the tree with an ID greater than or equal to the value pointed to by @nextid argument. Signed-off-by: Rehas Sachdeva <aquannie@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07	[media] v4l: vsp1: Adapt vsp1_du_setup_lif() interface to use a structure	Kieran Bingham	3	-21/+33
	The interface to configure the LIF in the VSP1 requires adapting the function prototype for any changes. This makes extending the interface difficult. Change the function prototype to pass a structure which can be easily extended. This changes the means of disabling the pipeline, by now passing a NULL configuration rather than passing either a 0 width or height. [Fixed kerneldoc, made vsp1_du_setup_lif() cfg argument const] Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-03-07	jiffies: Revert bogus conversion of NSEC_PER_SEC to TICK_NSEC	Frederic Weisbecker	1	-1/+1
	commit 93825f2ec736 converted NSEC_PER_SEC to TICK_NSEC because the author confused NSEC_PER_JIFFY with NSEC_PER_SEC. As a result, the calculation of refined jiffies got broken, triggering lockups. Fixes: 93825f2ec736 ("jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY") Reported-and-tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1488880534-3777-1-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-07	objtool: Fix another GCC jump table detection issue	Josh Poimboeuf	3	-3/+25
	Arnd Bergmann reported a (false positive) objtool warning: drivers/infiniband/sw/rxe/rxe_resp.o: warning: objtool: rxe_responder()+0xfe: sibling call from callable instruction with changed frame pointer The issue is in find_switch_table(). It tries to find a switch statement's jump table by walking backwards from an indirect jump instruction, looking for a relocation to the .rodata section. In this case it stopped walking prematurely: the first .rodata relocation it encountered was for a variable (resp_state_name) instead of a jump table, so it just assumed there wasn't a jump table. The fix is to ignore any .rodata relocation which refers to an ELF object symbol. This works because the jump tables are anonymous and have no symbols associated with them. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3732710ff6f2 ("objtool: Improve rare switch jump table pattern detection") Link: http://lkml.kernel.org/r/20170302225723.3ndbsnl4hkqbne7a@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-07	drivers/char/nwbutton: Fix build breakage caused by include file reshuffling	Guenter Roeck	1	-1/+1
	Fix: drivers/char/nwbutton.c: In function 'button_sequence_finished': drivers/char/nwbutton.c:134:3: error: implicit declaration of function 'kill_cad_pid' The declaration has been moved from one include file to another. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c3edc4010e9d102 ("sched/headers: Move task_struct::signal and ...") Link: http://lkml.kernel.org/r/1488762811-9022-1-git-send-email-linux@roeck-us.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-07	h8300: Fix build breakage caused by header file changes	Guenter Roeck	1	-1/+1
	Fix the following h8300 build failures: arch/h8300/kernel/ptrace_h.c: In function ‘trace_trap’: arch/h8300/kernel/ptrace_h.c:253:3: error: implicit declaration of function ‘force_sig’ Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp Fixes: c3edc4010e9d ("sched/headers: Move task_struct::signal and ...") Link: http://lkml.kernel.org/r/1488738434-3504-1-git-send-email-linux@roeck-us.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-07	avr32: Fix build error caused by include file reshuffling	Guenter Roeck	1	-1/+1
	Various avr32 builds fail: arch/avr32/oprofile/backtrace.c:58: error: dereferencing pointer to incomplete type arch/avr32/oprofile/backtrace.c:60: error: implicit declaration of function 'user_mode' Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: oprofile-list@lists.sf.net Fixes: f780d89a0e82 ("sched/headers: Remove <asm/ptrace.h> from ...") Link: http://lkml.kernel.org/r/1488762357-4500-1-git-send-email-linux@roeck-us.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-06	ucount: Remove the atomicity from ucount->count	Eric W. Biederman	2	-8/+12
	Always increment/decrement ucount->count under the ucounts_lock. The increments are there already and moving the decrements there means the locking logic of the code is simpler. This simplification in the locking logic fixes a race between put_ucounts and get_ucounts that could result in a use-after-free because the count could go zero then be found by get_ucounts and then be freed by put_ucounts. A bug presumably this one was found by a combination of syzkaller and KASAN. JongWhan Kim reported the syzkaller failure and Dmitry Vyukov spotted the race in the code. Cc: stable@vger.kernel.org Fixes: f6b2db1a3e8d ("userns: Make the count of user namespaces per user") Reported-by: JongHwan Kim <zzoru007@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-03-06	iomap: invalidate page caches should be after iomap_dio_complete() in direct write	Eryu Guan	1	-7/+10
	After XFS switching to iomap based DIO (commit acdda3aae146 ("xfs: use iomap_dio_rw")), I started to notice dio29/dio30 tests failures from LTP run on ppc64 hosts, and they can be reproduced on x86_64 hosts with 512B/1k block size XFS too. dio29 diotest3 -b 65536 -n 100 -i 1000 -o 1024000 dio30 diotest6 -b 65536 -n 100 -i 1000 -o 1024000 The failure message is like: bufcmp: offset 0: Expected: 0x62, got 0x0 diotest03 1 TPASS : Read with Direct IO, Write without diotest03 2 TFAIL : diotest3.c:142: comparsion failed; child=98 offset=1425408 diotest03 3 TFAIL : diotest3.c:194: Write Direct-child 98 failed Direct write wrote 0x62 but buffer read got zero. This is because, when doing direct write to a hole or preallocated file, we invalidate the page caches before converting the extent from unwritten state to normal state, which is done by iomap_dio_complete(), thus leave a window for other buffer reader to cache the unwritten state extent. Consider this case, with sub-page blocksize XFS, two processes are direct writing to different blocksize-aligned regions (say 512B) of the same preallocated file, and reading the region back via buffered I/O to compare contents. process A, region [0,512] process B, region [512,1024] xfs_file_write_iter xfs_file_aio_dio_write iomap_dio_rw iomap_apply invalidate_inode_pages2_range xfs_file_write_iter xfs_file_aio_dio_write iomap_dio_rw iomap_apply invalidate_inode_pages2_range iomap_dio_complete xfs_file_read_iter xfs_file_buffered_aio_read generic_file_read_iter do_generic_file_read <readahead fills pagecache with 0> iomap_dio_complete xfs_file_read_iter <read gets 0 from pagecache> Process A first invalidates page caches, at this point the underlying extent is still in unwritten state (iomap_dio_complete not called yet), and process B finishs direct write and populates page caches via readahead, which caches zeros in page for region A, then process A reads zeros from page cache, instead of the actual data. Fix it by invalidating page caches after converting unwritten extent to make sure we read content from disk after extent state changed, as what we did before switching to iomap based dio. Also introduce a new 'start' variable to save the original write offset (iomap_dio_complete() updates iocb->ki_pos), and a 'err' variable for invalidating caches result, cause we can't reuse 'ret' anymore. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-03-06	[media] dw2102: don't do DMA on stack	Jonathan McDowell	1	-99/+145
	On Kernel 4.9, WARNINGs about doing DMA on stack are hit at the dw2102 driver: one in su3000_power_ctrl() and the other in tt_s2_4600_frontend_attach(). Both were due to the use of buffers on the stack as parameters to dvb_usb_generic_rw() and the resulting attempt to do DMA with them. The device was non-functional as a result. So, switch this driver over to use a buffer within the device state structure, as has been done with other DVB-USB drivers. Tested with TechnoTrend TT-connect S2-4600. [mchehab@osg.samsung.com: fixed a warning at su3000_i2c_transfer() that state var were dereferenced before check 'd'] Signed-off-by: Jonathan McDowell <noodles@earth.li> Cc: <stable@vger.kernel.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-03-06	powerpc: Sort the selects under CONFIG_PPC	Michael Ellerman	1	-66/+72
	We have a big list of selects under CONFIG_PPC, and currently they're completely unsorted. This means people tend to add new selects at the bottom of the list, and so two commits which both add a new select will often conflict. Instead sort it alphabetically. This is nicer in and of itself, but also means two commits that add a new select will have a greater chance of not conflicting. Add a note at the top and bottom asking people to keep it sorted. And while we're here pad out the 'if' expressions to make them stand out. Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-06	powerpc/64: Fix L1D cache shape vector reporting L1I values	Michael Ellerman	1	-2/+2
	It seems we didn't pay quite enough attention when testing the new cache shape vectors, which means we didn't notice the bug where the vector for the L1D was using the L1I values. Fix it, resulting in eg: L1I cache size: 0x8000 32768B 32K L1I line size: 0x80 8-way associative L1D cache size: 0x10000 65536B 64K L1D line size: 0x80 8-way associative Fixes: 98a5f361b862 ("powerpc: Add new cache geometry aux vectors") Cut-and-paste-bug-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Badly-reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-06	x86/build/x86_64_defconfig: Enable CONFIG_R8169	Andy Shevchenko	1	-0/+1
	Very common PCIe ethernet card. Already enabled in i386_defconfig. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/20170306085748.85957-1-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-06	x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk	Matjaz Hegedic	1	-0/+8
	Without the parameter reboot=a, ASUS EeeBook X205TA/W will hang when it should reboot. This adds the appropriate quirk, thus fixing the problem. Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1488737804-20681-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>