linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-03-04	gpu: host1x: Set DMA mask	Alexandre Courbot	2	-0/+8
	The default DMA mask covers a 32 bits address range, but host1x devices can address a larger range on TK1 and TX1. Set the DMA mask to the range addressable when we use the IOMMU to prevent the use of bounce buffers. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2016-03-04	tracing: Do not have 'comm' filter override event 'comm' field	Steven Rostedt (Red Hat)	3	-12/+17
	Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names" added a 'comm' filter that will filter events based on the current tasks struct 'comm'. But this now hides the ability to filter events that have a 'comm' field too. For example, sched_migrate_task trace event. That has a 'comm' field of the task to be migrated. echo 'comm == "bash"' > events/sched_migrate_task/filter will now filter all sched_migrate_task events for tasks named "bash" that migrates other tasks (in interrupt context), instead of seeing when "bash" itself gets migrated. This fix requires a couple of changes. 1) Change the look up order for filter predicates to look at the events fields before looking at the generic filters. 2) Instead of basing the filter function off of the "comm" name, have the generic "comm" filter have its own filter_type (FILTER_COMM). Test against the type instead of the name to assign the filter function. 3) Add a new "COMM" filter that works just like "comm" but will filter based on the current task, even if the trace event contains a "comm" field. Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU". Cc: stable@vger.kernel.org # v4.3+ Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names" Reported-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03	Btrfs: fix loading of orphan roots leading to BUG_ON	Filipe Manana	1	-1/+9
	When looking for orphan roots during mount we can end up hitting a BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is replayed and qgroups are enabled. This is because after a log tree is replayed, a transaction commit is made, which triggers qgroup extent accounting which in turn does backref walking which ends up reading and inserting all roots in the radix tree fs_info->fs_root_radix, including orphan roots (deleted snapshots). So after the log tree is replayed, when finding orphan roots we hit the BUG_ON with the following trace: [118209.182438] ------------[ cut here ]------------ [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314! [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs] [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G W 4.5.0-rc5-btrfs-next-24+ #1 [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000 [118209.186318] RIP: 0010:[<ffffffffa04237d7>] [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] [118209.186318] RSP: 0018:ffff8800af34faa8 EFLAGS: 00010246 [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001 [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000 [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000 [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000 [118209.186318] FS: 00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000 [118209.186318] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0 [118209.186318] Stack: [118209.186318] fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000 [118209.186318] fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000 [118209.186318] 0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000 [118209.186318] Call Trace: [118209.186318] [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs] [118209.186318] [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs] [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde [118209.186318] [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs] [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf [118209.186318] [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3 [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde [118209.186318] [<ffffffff81195637>] do_mount+0x8a6/0x9e8 [118209.186318] [<ffffffff8119598d>] SyS_mount+0x77/0x9f [118209.186318] [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b 4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00 [118209.186318] RIP [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] [118209.186318] RSP <ffff8800af34faa8> [118209.230735] ---[ end trace 83938f987d85d477 ]--- So fix this by not treating the error -EEXIST, returned when attempting to insert a root already inserted by the backref walking code, as an error. The following test case for xfstests reproduces the bug: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { _cleanup_flakey cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _require_dm_target flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey _run_btrfs_util_prog quota enable $SCRATCH_MNT # Create 2 directories with one file in one of them. # We use these just to trigger a transaction commit later, moving the file from # directory a to directory b and doing an fsync against directory a. mkdir $SCRATCH_MNT/a mkdir $SCRATCH_MNT/b touch $SCRATCH_MNT/a/f sync # Create our test file with 2 4K extents. $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar \| _filter_xfs_io # Create a snapshot and delete it. This doesn't really delete the snapshot # immediately, just makes it inaccessible and invisible to user space, the # snapshot is deleted later by a dedicated kernel thread (cleaner kthread) # which is woke up at the next transaction commit. # A root orphan item is inserted into the tree of tree roots, so that if a # power failure happens before the dedicated kernel thread does the snapshot # deletion, the next time the filesystem is mounted it resumes the snapshot # deletion. _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap # Now overwrite half of the extents we wrote before. Because we made a snapshpot # before, which isn't really deleted yet (since no transaction commit happened # after we did the snapshot delete request), the non overwritten extents get # referenced twice, once by the default subvolume and once by the snapshot. $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar \| _filter_xfs_io # Now move file f from directory a to directory b and fsync directory a. # The fsync on the directory a triggers a transaction commit (because a file # was moved from it to another directory) and the file fsync leaves a log tree # with file extent items to replay. mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar echo "File digest before power failure:" md5sum $SCRATCH_MNT/foobar \| _filter_scratch # Now simulate a power failure and mount the filesystem to replay the log tree. # After the log tree was replayed, we used to hit a BUG_ON() when processing # the root orphan item for the deleted snapshot. This is because when processing # an orphan root the code expected to be the first code inserting the root into # the fs_info->fs_root_radix radix tree, while in reallity it was the second # caller attempting to do it - the first caller was the transaction commit that # took place after replaying the log tree, when updating the qgroup counters. _flakey_drop_and_remount echo "File digest before after failure:" # Must match what he got before the power failure. md5sum $SCRATCH_MNT/foobar \| _filter_scratch _unmount_flakey status=0 exit Fixes: 2d9e97761087 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref") Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-03-03	block: support large requests in blk_rq_map_user_iov	Christoph Hellwig	1	-30/+61
	This patch adds support for larger requests in blk_rq_map_user_iov by allowing it to build multiple bios for a request. This functionality used to exist for the non-vectored blk_rq_map_user in the past, and this patch reuses the existing functionality for it on the unmap side, which stuck around. Thanks to the iov_iter API supporting multiple bios is fairly trivial, as we can just iterate the iov until we've consumed the whole iov_iter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jeff Lien <Jeff.Lien@hgst.com> Tested-by: Jeff Lien <Jeff.Lien@hgst.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	block: fix blk_rq_get_max_sectors for driver private requests	Christoph Hellwig	1	-1/+1
	Driver private request types should not get the artifical cap for the FS requests. This is important to use the full device capabilities for internal command or NVMe pass through commands. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jeff Lien <Jeff.Lien@hgst.com> Tested-by: Jeff Lien <Jeff.Lien@hgst.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Updated by me to use an explicit check for the one command type that does support extended checking, instead of relying on the ordering of the enum command values - as suggested by Keith. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	nvme: fix max_segments integer truncation	Christoph Hellwig	1	-2/+4
	The block layer uses an unsigned short for max_segments. The way we calculate the value for NVMe tends to generate very large 32-bit values, which after integer truncation may lead to a zero value instead of the desired outcome. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jeff Lien <Jeff.Lien@hgst.com> Tested-by: Jeff Lien <Jeff.Lien@hgst.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	nvme: set queue limits for the admin queue	Christoph Hellwig	1	-10/+19
	Factor out a helper to set all the device specific queue limits and apply them to the admin queue in addition to the I/O queues. Without this the command size on the admin queue is arbitrarily low, and the missing other limitations are just minefields waiting for victims. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jeff Lien <Jeff.Lien@hgst.com> Tested-by: Jeff Lien <Jeff.Lien@hgst.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	writeback: flush inode cgroup wb switches instead of pinning super_block	Tejun Heo	3	-13/+47
	If cgroup writeback is in use, inodes can be scheduled for asynchronous wb switching. Before 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches"), this could race with umount leading to super_block being destroyed while inodes are pinned for wb switching. 5ff8eaac1636 fixed it by bumping s_active while wb switches are in flight; however, this allowed in-flight wb switches to make umounts asynchronous when the userland expected synchronosity - e.g. fsck immediately following umount may fail because the device is still busy. This patch removes the problematic super_block pinning and instead makes generic_shutdown_super() flush in-flight wb switches. wb switches are now executed on a dedicated isw_wq so that they can be flushed and isw_nr_in_flight keeps track of the number of in-flight wb switches so that flushing can be avoided in most cases. v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE check in inode_switch_wbs() as Jan an Al suggested. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tahsin Erdogan <tahsin@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com Fixes: 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches") Cc: stable@vger.kernel.org #v4.5 Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	NVMe: Fix 0-length integrity payload	Keith Busch	1	-1/+1
	A user could send a passthrough IO command with a metadata pointer to a namespace without metadata. With metadata length of 0, kmalloc returns ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as the bio's integrity payload, which causes an access fault on completion. This patch ignores the users metadata buffer if the namespace format does not support separate metadata. Reported-by: Stephen Bates <stephen.bates@microsemi.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	NVMe: Don't allow unsupported flags	Keith Busch	1	-0/+4
	The command flags can change the meaning of other fields in the command that the driver is not prepared to handle. Specifically, the user could passthrough an SGL flag, causing the controller to misinterpret the PRP list the driver created, potentially corrupting memory or data. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	NVMe: Move error handling to failed reset handler	Keith Busch	3	-18/+50
	This moves failed queue handling out of the namespace removal path and into the reset failure path, fixing a hanging condition if the controller fails or link down during del_gendisk. Previously the driver had to see the controller as degraded prior to calling del_gendisk to setup the queues to fail. But, if the controller happened to fail after this, there was no task to end outstanding requests. On failure, all namespace states are set to dead. This has capacity revalidate to 0, and ends all new requests with error status. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	NVMe: Simplify device reset failure	Keith Busch	1	-27/+21
	A reset failure schedules the device to unbind from the driver through the pci driver's remove. This cleans up all intialization, so there is no need to duplicate the potentially racy cleanup. To help understand why a reset failed, the status is logged with the existing warning message. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	NVMe: Fix namespace removal deadlock	Keith Busch	3	-7/+26
	This patch makes nvme namespace removal lockless. It is up to the caller to ensure no active namespace scanning is occuring. To ensure no scan work occurs, the nvme pci driver adds a removing state to the controller device to avoid queueing scan work during removal. The work is flushed after setting the state, so no new scan work can be queued. The lockless removal allows the driver to cleanup a namespace request_queue if the controller fails during removal. Previously this could deadlock trying to acquire the namespace mutex in order to handle such events. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	NVMe: Use IDA for namespace disk naming	Keith Busch	2	-3/+14
	A namespace may be detached from a controller, but a user may be holding a reference to it. Attaching a new namespace with the same NSID will create duplicate names when using the NSID to name the disk. This patch uses an IDA that is released only when the last reference is released instead of using the namespace ID. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	NVMe: Don't unmap controller registers on reset	Keith Busch	1	-29/+42
	Unmapping the registers on reset or shutdown is not necessary. Keeping the mapping simplifies reset handling. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	block: merge: get the 1st and last bvec via helpers	Ming Lei	1	-6/+2
	This patch applies the two introduced helpers to figure out the 1st and last bvec. Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	block: get the 1st and last bvec via helpers	Ming Lei	1	-4/+9
	This patch applies the two introduced helpers to figure out the 1st and last bvec, and fixes the original way after bio splitting. Cc: stable@vger.kernel.org Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	block: check virt boundary in bio_will_gap()	Ming Lei	1	-5/+11
	In the following patch, the way for figuring out the last bvec will be changed with a bit cost introduced, so return immediately if the queue doesn't have virt boundary limit. Actually most of devices have not this limit. Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	block: bio: introduce helpers to get the 1st and last bvec	Ming Lei	1	-0/+37
	The bio passed to bio_will_gap() may be fast cloned from upper layer(dm, md, bcache, fs, ...), or from bio splitting in block core. Unfortunately bio_will_gap() just figures out the last bvec via 'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously wrong. This patch introduces two helpers for getting the first and last bvec of one bio for fixing the issue. Cc: stable@vger.kernel.org Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03	IB/core: Use GRH when the path hop-limit > 0	Or Gerlitz	1	-1/+1
	According to IBTA spec v1.3 section 12.7.19, QPs should use GRH when the path returned by the SA has hop-limit > 0. Currently, we do that only for the > 1 case, fix that. Fixes: 6d969a471ba1 ('IB/sa: Add ib_init_ah_from_path()') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03	IB/{core, mlx5}: Fix input len in vendor part of create_qp/srq	Majd Dibbiny	2	-11/+9
	Currently, the inlen field of the vendor's part of the command doesn't match the command buffer. This happens because the inlen accommodates ib_uverbs_cmd_hdr which is deducted from the in buffer. This is problematic since the vendor function could be called either from the legacy verb (where the input length mismatches the actual length) or by the extended verb (where the length matches). The vendor has no idea which function calls it and therefore has no way to know how the length variable should be treated. Fixing this by aligning the inlen to the correct length. All vendor drivers either assumed that inlen >= sizeof(vendor_uhw_cmd) or just failed wrongly (mlx5) and fixed in this patch. Fixes: cfb5e088e26a ('IB/mlx5: Add CQE version 1 support to user QPs and SRQs') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03	IB/mlx5: Avoid using user-index for SRQs	Majd Dibbiny	1	-11/+19
	Normal SRQs, unlike XRC SRQs, don't have user-index, therefore avoid verifying it and using it. Fixes: cfb5e088e26a ('IB/mlx5: Add CQE version 1 support to user QPs and SRQs') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03	PM / sleep / x86: Fix crash on graph trace through x86 suspend	Todd E Brandt	1	-0/+7
	Pause/unpause graph tracing around do_suspend_lowlevel as it has inconsistent call/return info after it jumps to the wakeup vector. The graph trace buffer will otherwise become misaligned and may eventually crash and hang on suspend. To reproduce the issue and test the fix: Run a function_graph trace over suspend/resume and set the graph function to suspend_devices_and_enter. This consistently hangs the system without this fix. Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-02	userfaultfd: don't block on the last VM updates at exit time	Linus Torvalds	1	-0/+6
	The exit path will do some final updates to the VM of an exiting process to inform others of the fact that the process is going away. That happens, for example, for robust futex state cleanup, but also if the parent has asked for a TID update when the process exits (we clear the child tid field in user space). However, at the time we do those final VM accesses, we've already stopped accepting signals, so the usual "stop waiting for userfaults on signal" code in fs/userfaultfd.c no longer works, and the process can become an unkillable zombie waiting for something that will never happen. To solve this, just make handle_userfault() abort any user fault handling if we're already in the exit path past the signal handling state being dead (marked by PF_EXITING). This VM special case is pretty ugly, and it is possible that we should look at finalizing signals later (or move the VM final accesses earlier). But in the meantime this is a fairly minimally intrusive fix. Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-02	drm/amdgpu: return from atombios_dp_get_dpcd only when error	Arindam Nath	1	-1/+1
	In amdgpu_connector_hotplug(), we need to start DP link training only after we have received DPCD. The function amdgpu_atombios_dp_get_dpcd() returns non-zero value only when an error condition is met, otherwise returns zero. So in case the function encounters an error, we need to skip rest of the code and return from amdgpu_connector_hotplug() immediately. Only when we are successfull in reading DPCD pin, we should carry on with turning-on the monitor. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-03-02	drm/amdgpu/cz: remove commented out call to enable vce pg	Alex Deucher	1	-2/+1
	This code path is not currently enabled now that we properly respect the vce pg flags, so uncomment the actual pg calls so the code is as it should be we are eventually able to enable vce pg. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-02	drm/amdgpu/powerplay/cz: enable/disable vce dpm independent of vce pg	Alex Deucher	1	-1/+1
	If we don't disable it when vce is not in use, we use extra power if vce pg is disabled. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-02	drm/amdgpu/cz: enable/disable vce dpm even if vce pg is disabled	Alex Deucher	1	-3/+1
	I missed this when cleaning up the vce pg handling. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-02	drm/amdgpu/gfx8: specify which engine to wait before vm flush	Chunming Zhou	1	-1/+2
	Select between me and pfp properly. Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-02	drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well	Christian König	1	-0/+13
	We never ported that back to CIK, so we could run into VM faults here. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-03-02	drm/amd/powerplay: send event to notify powerplay all modules are initialized.	Rex Zhu	1	-1/+3
	with this event, powerplay can adjust current power state if needed. Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-02	drm/amd/powerplay: export AMD_PP_EVENT_COMPLETE_INIT task to amdgpu.	Rex Zhu	2	-1/+5
	This is needed to init the dynamic states without a display. To be used in the next commit. Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-02	drm/radeon/pm: update current crtc info after setting the powerstate	Alex Deucher	1	-4/+4
	On CI, we need to see if the number of crtcs changes to determine whether or not we need to upload the mclk table again. In practice we don't currently upload the mclk table again after the initial load. The only reason you would would be to add new states, e.g., for arbitrary mclk setting which is not currently supported. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-03-02	drm/amdgpu/pm: update current crtc info after setting the powerstate	Alex Deucher	1	-3/+3
	On CI, we need to see if the number of crtcs changes to determine whether or not we need to upload the mclk table again. In practice we don't currently upload the mclk table again after the initial load. The only reason you would would be to add new states, e.g., for arbitrary mclk setting which is not currently supported. Acked-by: Jordan Lazare <Jordan.Lazare@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-03-02	vhost: fix error path in vhost_init_used()	Greg Kurz	1	-4/+11
	We don't want side effects. If something fails, we rollback vq->is_le to its previous value. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02	virtio-pci: read the right virtio_pci_notify_cap field	Ladi Prosek	1	-1/+1
	Looks like a copy-paste bug. The value is used as an optimization and a wrong value probably isn't causing any serious damage. Found when porting this code to Windows. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02	drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)	Chris Wilson	1	-14/+12
	commit 09731280028ce03e6a27e1998137f1775a2839f3 Author: Imre Deak <imre.deak@intel.com> Date: Wed Feb 17 14:17:42 2016 +0200 drm/i915: Add helper to get a display power ref if it was already enabled left the rpm wakelock assertions unbalanced if CONFIG_PM was disabled as intel_runtime_pm_get_if_in_use() would return true without incrementing the local bookkeeping required for the assertions. Fixes: 09731280028c ("drm/i915: Add helper to get a display power ref if it was already enabled") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> CC: Mika Kuoppala <mika.kuoppala@intel.com> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> CC: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1456434628-22574-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Imre Deak <imre.deak@intel.com> (cherry picked from commit 135dc79efbc119ea5fb34475996983159e6ca31c) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-03-02	drm/i915/skl: Fix power domain suspend sequence	Imre Deak	1	-3/+3
	During system suspend we need to first disable power wells then unitialize the display core. In case power well support is disabled we did this in the wrong order, so fix this up. Fixes: d314cd43 ("drm/i915: fix handling of the disable_power_well module option") CC: stable@vger.kernel.org CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1456778945-5411-1-git-send-email-imre.deak@intel.com (cherry picked from commit 2622d79bd9d18fd04b650234e6a218c5f95cf308) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-03-02	kvm: x86: Update tsc multiplier on change.	Owen Hofmann	1	-5/+9
	vmx.c writes the TSC_MULTIPLIER field in vmx_vcpu_load, but only when a vcpu has migrated physical cpus. Record the last value written and update in vmx_vcpu_load on any change, otherwise a cpu migration must occur for TSC frequency scaling to take effect. Cc: stable@vger.kernel.org Fixes: ff2c3a1803775cc72dc6f624b59554956396b0ee Signed-off-by: Owen Hofmann <osh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-02	mips/kvm: fix ioctl error handling	Michael S. Tsirkin	1	-2/+2
	Returning directly whatever copy_to_user(...) or copy_from_user(...) returns may not do the right thing if there's a pagefault: copy_to_user/copy_from_user return the number of bytes not copied in this case, but ioctls need to return -EFAULT instead. Fix up kvm on mips to do return copy_to_user(...)) ? -EFAULT : 0; and return copy_from_user(...)) ? -EFAULT : 0; everywhere. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-02	drm/ast: Fix incorrect register check for DRAM width	Timothy Pearson	1	-1/+1
	During DRAM initialization on certain ASpeed devices, an incorrect bit (bit 10) was checked in the "SDRAM Bus Width Status" register to determine DRAM width. Query bit 6 instead in accordance with the Aspeed AST2050 datasheet v1.05. Signed-off-by: Timothy Pearson <tpearson@raptorengineeringinc.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-01	parisc: Wire up copy_file_range syscall	Helge Deller	2	-1/+3
	Signed-off-by: Helge Deller <deller@gmx.de>
2016-03-01	parisc: Fix ptrace syscall number and return value modification	Helge Deller	2	-6/+15
	Mike Frysinger reported that his ptrace testcase showed strange behaviour on parisc: It was not possible to avoid a syscall and the return value of a syscall couldn't be changed. To modify a syscall number, we were missing to save the new syscall number to gr20 which is then picked up later in assembly again. The effect that the return value couldn't be changed is a side-effect of another bug in the assembly code. When a process is ptraced, userspace expects each syscall to report entrance and exit of a syscall. If a syscall number was given which doesn't exist, we jumped to the normal syscall exit code instead of informing userspace that the (non-existant) syscall exits. This unexpected behaviour confuses userspace and thus the bug was misinterpreted as if we can't change the return value. This patch fixes both problems and was tested on 64bit kernel with 32bit userspace. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: stable@vger.kernel.org # v4.0+ Tested-by: Mike Frysinger <vapier@gentoo.org>
2016-03-01	parisc: Use parentheses around expression in floppy.h	Helge Deller	1	-1/+1
	David Binderman reported a style issue in the floppy.h header file: arch/parisc/include/asm/floppy.h:221: (style) Boolean result is used in bitwise operation. Clarify expression with parentheses. Reported-by: David Binderman <dcb314@hotmail.com> Cc: David Binderman <dcb314@hotmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2016-03-01	CIFS: Fix duplicate line introduced by clone_file_range patch	Steve French	1	-1/+0
	Commit 04b38d601239b4 ("vfs: pull btrfs clone API to vfs layer") added a duplicated line (in cifsfs.c) which causes a sparse compile warning. Signed-off-by: Steve French <steve.french@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-03-01	sparc64: Fix sparc64_set_context stack handling.	David S. Miller	1	-1/+1
	Like a signal return, we should use synchronize_user_stack() rather than flush_user_windows(). Reported-by: Ilya Malakhov <ilmalakhovthefirst@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01	sparc32: Add -Wa,-Av8 to KBUILD_CFLAGS.	David S. Miller	1	-0/+6
	Binutils used to be (erroneously) extremely permissive about instruction usage. But that got fixed and if you don't properly tell it to accept classes of instructions it will fail. This uncovered a specs bug on sparc in gcc where it wouldn't pass the proper options to binutils options. Deal with this in the kernel build by adding -Wa,-Av8 to KBUILD_CFLAGS. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01	drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)	Chris Wilson	1	-14/+12
	commit 09731280028ce03e6a27e1998137f1775a2839f3 Author: Imre Deak <imre.deak@intel.com> Date: Wed Feb 17 14:17:42 2016 +0200 drm/i915: Add helper to get a display power ref if it was already enabled left the rpm wakelock assertions unbalanced if CONFIG_PM was disabled as intel_runtime_pm_get_if_in_use() would return true without incrementing the local bookkeeping required for the assertions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> CC: Mika Kuoppala <mika.kuoppala@intel.com> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> CC: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-01	cpufreq: mediatek: allow building as a module	Arnd Bergmann	2	-2/+3
	The MT8173 cpufreq driver can currently only be built-in, but it has a Kconfig dependency on the thermal core. THERMAL can be a loadable module, which in turn makes this driver impossible to build. It is nicer to make the cpufreq driver a module as well, so this patch turns the option in to a 'tristate' and adapts the dependency accordingly. The driver has no module_exit() function, so it will continue to not support unloading, but it can be built as a module and loaded at runtime now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5269e7067cd6 (cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-01	cpufreq: qoriq: allow building as module with THERMAL=m	Arnd Bergmann	1	-0/+1
	My previous patch to avoid link errors with the qoriq cpufreq driver disallowed all of the broken cases, but also prevented the driver from being built when CONFIG_THERMAL is a module. This changes the dependency to allow the cpufreq driver to also be a module in this case, just not built-in. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 8ae1702a0df5 (cpufreq: qoriq: Register cooling device based on device tree) Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>