aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/block/noop-iosched.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2014-07-31mnt: Change the default remount atime from relatime to the existing valueEric W. Biederman1-0/+8
Since March 2009 the kernel has treated the state that if no MS_..ATIME flags are passed then the kernel defaults to relatime. Defaulting to relatime instead of the existing atime state during a remount is silly, and causes problems in practice for people who don't specify any MS_...ATIME flags and to get the default filesystem atime setting. Those users may encounter a permission error because the default atime setting does not work. A default that does not work and causes permission problems is ridiculous, so preserve the existing value to have a default atime setting that is always guaranteed to work. Using the default atime setting in this way is particularly interesting for applications built to run in restricted userspace environments without /proc mounted, as the existing atime mount options of a filesystem can not be read from /proc/mounts. In practice this fixes user space that uses the default atime setting on remount that are broken by the permission checks keeping less privileged users from changing more privileged users atime settings. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31mnt: Correct permission checks in do_remountEric W. Biederman2-3/+38
While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remountEric W. Biederman1-3/+10
There are no races as locked mount flags are guaranteed to never change. Moving the test into do_remount makes it more visible, and ensures all filesystem remounts pass the MNT_LOCK_READONLY permission check. This second case is not an issue today as filesystem remounts are guarded by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged mount namespaces, but it could become an issue in the future. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31mnt: Only change user settable mount flags in remountEric W. Biederman2-2/+4
Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-29namespaces: Use task_lock and not rcu to protect nsproxyEric W. Biederman8-40/+31
The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit cf7b708c8d1d7a27736771bcf4c457b332b0f818 Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-20Linux 3.16-rc6Linus Torvalds1-1/+1
2014-07-20um: segv: Save regs only in case of a kernel mode faultRichard Weinberger1-1/+1
...otherwise me lose user mode regs and the resulting stack trace is useless. Signed-off-by: Richard Weinberger <richard@nod.at>
2014-07-20um: Fix hung task in fix_range_common()Richard Weinberger1-1/+5
If do_ops() fails we have to release current->mm->mmap_sem otherwise the failing task will never terminate. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2014-07-20um: Ensure that a stub page cannot get unmappedRichard Weinberger1-0/+3
Trinity discovered an execution path such that a task can unmap his stub page. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2014-07-20Revert "um: Fix wait_stub_done() error handling"Richard Weinberger1-7/+2
This reverts commit 0974a9cadc7886f7baaa458bb0c89f5c5f9d458e. The real for for that issue is to release current->mm->mmap_sem in fix_range_common(). Signed-off-by: Richard Weinberger <richard@nod.at>
2014-07-19btrfs: test for valid bdev before kobj removal in btrfs_rm_deviceEric Sandeen1-4/+4
commit 99994cd btrfs: dev delete should remove sysfs entry added a btrfs_kobj_rm_device, which dereferences device->bdev... right after we check whether device->bdev might be NULL. I don't honestly know if it's possible to have a NULL device->bdev here, but assuming that it is (given the test), we need to move the kobject removal to be under that test. (Coverity spotted this) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-19Btrfs: fix abnormal long waiting in fsyncLiu Bo1-0/+11
xfstests generic/127 detected this problem. With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush data within the passed range. This is the cause of the above problem, -- btrfs's fsync has a stage called 'sync log' which will wait for all the ordered extents it've recorded to finish. In xfstests/generic/127, with mixed operations such as truncate, fallocate, punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will mmap, and then msync. And I find that msync will wait for quite a long time (about 20s in my case), thanks to ftrace, it turns out that the previous fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, there can be some ordered extents created but not getting corresponding pages flushed, then they're left in memory until we fsync which runs into the stage 'sync log', and fsync will just wait for the system writeback thread to flush those pages and get ordered extents finished, so the latency is inevitable. This adds a flush similar to btrfs_start_ordered_extent() in btrfs_wait_logged_extents() to fix that. Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-19random: check for increase of entropy_count because of signed conversionHannes Frederic Sowa1-3/+14
The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could actually increase entropy_count if during assignment of the unsigned expression on the RHS (mind the -=) we reduce the value modulo 2^width(int) and assign it to entropy_count. Trinity found this. [ Commit modified by tytso to add an additional safety check for a negative entropy_count -- which should never happen, and to also add an additional paranoia check to prevent overly large count values to be passed into urandom_read(). ] Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-18ARM: EXYNOS: Fix core ID used by platsmp and hotplug codeTomasz Figa2-19/+25
When CPU topology is specified in device tree, cpu_logical_map() does not return core ID anymore, but rather full MPIDR value. This breaks existing calculation of PMU register offsets on Exynos SoCs. This patch fixes the problem by adjusting the code to use only core ID bits of the value returned by cpu_logical_map() to allow CPU topology to be specified in device tree on Exynos SoCs. Signed-off-by: Tomasz Figa <t.figa@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2014-07-18ARM: at91/dt: add missing clocks property to pwm node in sam9x5.dtsiBoris BREZILLON1-0/+1
The pwm driver requires a clocks property referencing the pwm peripheral clk. Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-07-18ARM: at91/dt: fix usb0 clocks definition in sam9n12 dtsiBoris BREZILLON1-1/+1
udphs_clk (USB Device Controller clock) is referenced instead of uhphs_clk (USB Host Controller clock). Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-07-18ARM: at91: at91sam9x5: correct typo error for ohci clockBo Shen1-2/+1
Correct the typo error for the second "uhphs_clk". Signed-off-by: Bo Shen <voice.shen@atmel.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-07-18irqchip: gic: Fix core ID calculation when topology is read from DTTomasz Figa1-1/+4
Certain GIC implementation, namely those found on earlier, single cluster, Exynos SoCs, have registers mapped without per-CPU banking, which means that the driver needs to use different offset for each CPU. Currently the driver calculates the offset by multiplying value returned by cpu_logical_map() by CPU offset parsed from DT. This is correct when CPU topology is not specified in DT and aforementioned function returns core ID alone. However when DT contains CPU topology, the function changes to return cluster ID as well, which is non-zero on mentioned SoCs and so breaks the calculation in GIC driver. This patch fixes this by masking out cluster ID in CPU offset calculation so that only core ID is considered. Multi-cluster Exynos SoCs already have banked GIC implementations, so this simple fix should be enough. Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Tomasz Figa <t.figa@samsung.com> Fixes: db0d4db22a78d ("ARM: gic: allow GIC to support non-banked setups") Cc: <stable@vger.kernel.org> # v3.3+ Link: https://lkml.kernel.org/r/1405610624-18722-1-git-send-email-t.figa@samsung.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-07-18GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixesFabian Frederick1-2/+2
Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: memcontrol: Spelling s/invlidate/invalidate/Geert Uytterhoeven1-2/+2
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: cluster-devel@redhat.com Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Allow caching of glocks for flockBob Peterson1-1/+1
This patch removes the GLF_NOCACHE flag from the glocks associated with flocks. There should be no good reason not to cache glocks for flocks: they only force the glock to be demoted before they can be reacquired, which can slow down performance and even cause glock hangs, especially in cases where the flocks are held in Shared (SH) mode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Allow flocks to use normal glock dq rather than dq_waitBob Peterson2-4/+2
This patch allows flock glocks to use a non-blocking dequeue rather than dq_wait. It also reverts the previous patch I had posted regarding dq_wait. The reverted patch isn't necessarily a bad idea, but I decided this might avoid unforeseen side effects, and was therefore safer. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: replace count*size kzalloc by kcallocFabian Frederick1-2/+2
kcalloc manages count*sizeof overflow. Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Use GFP_NOFS when allocating glocksSteven Whitehouse1-2/+2
Normally GFP_KERNEL is ok here, but there is now a rarely used code path relating to deallocation of unlinked inodes (in certain corner cases) which if hit at times of memory shortage can cause recursion while trying to free memory. One solution would be to try and move the gfs2_glock_get() call so that it is no longer called while another glock is held, but that doesn't look at all easy, so GFP_NOFS is the best solution for the time being. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Fix race in glock lru glock disposalSteven Whitehouse1-3/+7
We must not leave items on the LRU list with GLF_LOCK set, since they can be removed if the glock is brought back into use, which may then potentially result in a hang, waiting for GLF_LOCK to clear. It doesn't happen very often, since it requires a glock that has not been used for a long time to be brought back into use at the same moment that the shrinker is part way through disposing of glocks. The fix is to set GLF_LOCK at a later time, when we already know that the other locks can be obtained. Also, we now only release the lru_lock in case a resched is needed, rather than on every iteration. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18GFS2: Only wait for demote when last holder is dequeuedBob Peterson1-1/+3
Function gfs2_glock_dq_wait is supposed to dequeue a glock and then wait for the lock to be demoted. The problem is, if this is a shared lock, its demote will depend on the other holders, which means you might end up waiting forever because the other process is blocked. This problem is especially apparent when dealing with nested flocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-07-18ARM: clk-imx6q: parent lvds_sel input from upstream clock gatesLucas Stach1-2/+2
The i.MX6 reference manual doesn't make a clear distinction between the fixed clock divider and the enable gate for the pcie and sata reference clocks. This lead to the lvds mux inputs in the imx6q clk driver to be parented from the ref clock (which is the divider) instead of the actual gate, which in turn prevents the upstream clock to actually be enabled when lvds clk out is active. This fixes a hard machine hang regression in kernel 3.16 for boards where only pcie is active but no sata, as with this kernel version the imx6-pcie driver is no longer enabling the upstream clock directly but only lvds clk out. Reported-by: Arne Ruhnau <arne.ruhnau@target-sg.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Arne Ruhnau <arne.ruhnau@target-sg.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
2014-07-17Drivers: hv: hv_fcopy: fix a race condition for SMP guestDexuan Cui1-1/+1
We should schedule the 5s "timer work" before starting the data transfer, otherwise, the data transfer code may finish so fast on another virtual cpu that when the code(fcopy_write()) trying to cancel the 5s "timer work" can occasionally fail because the "timer work" may haven't been scheduled yet and as a result the fcopy process will be aborted wrongly by fcopy_work_func() in 5s. Thank Liz Zhang <lizzha@microsoft.com> for the initial investigation on the bug. This addresses https://bugzilla.redhat.com/show_bug.cgi?id=1118123 Tested-by: Liz Zhang <lizzha@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-18cpufreq: make table sentinel macros unsigned to match useBrian W Hart1-2/+2
Commit 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*) moved function cpufreq_next_valid() to a public header. Warnings are now generated when objects including that header are built with -Wsign-compare (as an out-of-tree module might be): .../include/linux/cpufreq.h: In function ‘cpufreq_next_valid’: .../include/linux/cpufreq.h:519:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] while ((*pos)->frequency != CPUFREQ_TABLE_END) ^ .../include/linux/cpufreq.h:520:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID) ^ Constants CPUFREQ_ENTRY_INVALID and CPUFREQ_TABLE_END are signed, but are used with unsigned member 'frequency' of cpufreq_frequency_table. Update the macro definitions to be explicitly unsigned to match their use. This also corrects potentially wrong behavior of clk_rate_table_iter() if unsigned long is wider than usigned int. Fixes: 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*) Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-17usb: Check if port status is equal to RxDetectGavin Guo1-0/+19
When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller [1022:7814], the second hotplugging will experience the USB 3.0 pen drive is recognized as high-speed device. After bisecting the kernel, I found the commit number 41e7e056cdc662f704fa9262e5c6e213b4ab45dd (USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing some experiments, the bug can be fixed by avoiding executing the function hub_usb3_port_disable(). Because the port status with [AMD] FCH USB XHCI Controlleris [1022:7814] is already in RxDetect (I tried printing out the port status before setting to Disabled state), it's reasonable to check the port status before really executing hub_usb3_port_disable(). Fixes: 41e7e056cdc6 (USB: Allow USB 3.0 ports to be disabled.) Signed-off-by: Gavin Guo <gavin.guo@canonical.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17usb: chipidea: udc: Disable auto ZLP generation on ep0Abbas Raza1-2/+2
There are 2 methods for ZLP (zero-length packet) generation: 1) In software 2) Automatic generation by device controller 1) is implemented in UDC driver and it attaches ZLP to IN packet if descriptor->size < wLength 2) can be enabled/disabled by setting ZLT bit in the QH When gadget ffs is connected to ubuntu host, the host sends get descriptor request and wLength in setup packet is 255 while the size of descriptor which will be sent by gadget in IN packet is 64 byte. So the composite driver sets req->zero = 1. In UDC driver following code will be executed then if (hwreq->req.zero && hwreq->req.length && (hwreq->req.length % hwep->ep.maxpacket == 0)) add_td_to_list(hwep, hwreq, 0); Case-A: So in case of ubuntu host, UDC driver will attach a ZLP to the IN packet. ubuntu host will request 255 byte in IN request, gadget will send 64 byte with ZLP and host will come to know that there is no more data. But hold on, by default ZLT=0 for endpoint 0 so hardware also tries to automatically generate the ZLP which blocks enumeration for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING) Case-B: In case when gadget ffs is connected to Apple device, Apple device sends setup packet with wLength=64. So descriptor->size = 64 and wLength=64 therefore req->zero = 0 and UDC driver will not attach any ZLP to the IN packet. Apple device requests 64 bytes, gets 64 bytes and doesn't further request for IN data. But ZLT=0 by default for endpoint 0 so hardware tries to automatically generate the ZLP which blocks enumeration for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING) According to USB2.0 specs: 8.5.3.2 Variable-length Data Stage A control pipe may have a variable-length data phase in which the host requests more data than is contained in the specified data structure. When all of the data structure is returned to the host, the function should indicate that the Data stage is ended by returning a packet that is shorter than the MaxPacketSize for the pipe. If the data structure is an exact multiple of wMaxPacketSize for the pipe, the function will return a zero-length packet to indicate the end of the Data stage. In Case-A mentioned above: If we disable software ZLP generation & ZLT=0 for endpoint 0 OR if software ZLP generation is not disabled but we set ZLT=1 for endpoint 0 then enumeration doesn't block for 6 seconds. In Case-B mentioned above: If we disable software ZLP generation & ZLT=0 for endpoint then enumeration still blocks due to ZLP automatically generated by hardware and host not needing it. But if we keep software ZLP generation enabled but we set ZLT=1 for endpoint 0 then enumeration doesn't block for 6 seconds. So the proper solution for this issue seems to disable automatic ZLP generation by hardware (i.e by setting ZLT=1 for endpoint 0) and let software (UDC driver) handle the ZLP generation based on req->zero field. Cc: stable@vger.kernel.org Signed-off-by: Abbas Raza <Abbas_Raza@mentor.com> Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17drm/radeon: Make classic pageflip completion path less racy.Mario Kleiner1-3/+3
Need to protect mmio flip programming by event lock as well. Need to also first enable pflip irq, then mmio program, otherwise a flip completion may get unnoticed in the vblank of actual completion if the flip is programmed, but radeon_flip_work_func gets preempted immediately after mmio programming and before vblank. In that case the vblank irq handler wouldn't run radeon_crtc_handle_vblank() with the completion check routine, miss the completed flip, and only notice one vblank after actual completion, causing a false/delayed report of flip completion. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-17drm/radeon: Add missing vblank_put in pageflip ioctl error path.Mario Kleiner1-1/+4
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-17drm/radeon: Remove redundant fence unref in pageflip path.Mario Kleiner1-1/+0
Not needed anymore, as it is already unreffed within radeon_flip_work_func() after its only use. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-17drm/radeon: Complete page flip even if waiting on the BO fence failsMichel Dänzer1-15/+9
Otherwise the DRM core and userspace will be confused about which BO the CRTC is scanning out. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-17drm/radeon: Move pinning the BO back to radeon_crtc_page_flip()Michel Dänzer2-91/+93
As well as enabling the vblank interrupt. These shouldn't take any significant amount of time, but at least pinning the BO has actually been seen to fail in practice before, in which case we need to let userspace know about it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-17drm/radeon: Prevent too early kms-pageflips triggered by vblank.Mario Kleiner4-9/+10
Since 3.16-rc1 we have this new failure: When the userspace XOrg ddx schedules vblank events to trigger deferred kms-pageflips, e.g., via the OML_sync_control extension call glXSwapBuffersMscOML(), or if a glXSwapBuffers() is called immediately after completion of a previous swapbuffers call, e.g., in a tight rendering loop with minimal rendering, it happens frequently that the pageflip ioctl() is executed within the same vblank in which a previous kms-pageflip completed, or - for deferred swaps - always one vblank earlier than requested by the client app. This causes premature pageflips and detection of failure by the ddx, e.g., XOrg log warnings like... "(WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 201025 < target_msc 201026" ... and error/invalid return values of glXWaitForSbcOML() and Intel_swap_events extension. Reason is the new way in which kms-pageflips are programmed since 3.16. This commit changes the time window in which the hw can execute pending programmed pageflips. Before, a pending flip would get executed anywhere within the vblank interval. Now a pending flip only gets executed at the leading edge of vblank (start of front porch), making sure that a invocation of the pageflip ioctl() within a given vblank interval will only lead to pageflip completion in the following vblank. Tested to death on a DCE-4 card. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-07-17drm/radeon: set default bl level to something reasonableAlex Deucher1-3/+7
If the value in the scratch register is 0, set it to the max level. This fixes an issue where the console fb blanking code calls back into the backlight driver on unblank and then sets the backlight level to 0 after the driver has already set the mode and enabled the backlight. bugs: https://bugs.freedesktop.org/show_bug.cgi?id=81382 https://bugs.freedesktop.org/show_bug.cgi?id=70207 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: David Heidelberger <david.heidelberger@ixit.cz> Cc: stable@vger.kernel.org
2014-07-17drm/radeon: avoid leaking edid dataAlex Deucher1-0/+5
In some cases we fetch the edid in the detect() callback in order to determine what sort of monitor is connected. If that happens, don't fetch the edid again in the get_modes() callback or we will leak the edid. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2014-07-17irqchip: gic: Add binding probe for ARM GIC400Suravee Suthikulpanit1-0/+1
Commit 3ab72f9156bb "dt-bindings: add GIC-400 binding" added the "arm,gic-400" compatible string, but the corresponding IRQCHIP_DECLARE was never added to the gic driver. Therefore add the missing irqchip declaration for it. Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Removed additional empty line and adapted commit message to mark it as fixing an issue. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Will Deacon <will.deacon@arm.com> Fixes: 3ab72f9156bb ("dt-bindings: add GIC-400 binding") Cc: <stable@vger.kernel.org> # v3.14+ Link: https://lkml.kernel.org/r/2621565.f5eISveXXJ@diego Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-07-17cpufreq: move policy kobj to policy->cpu at resumeViresh Kumar1-2/+4
This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it. Consider a dual cluster platform with 2 cores per cluster. During suspend we start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj as we want to retain permissions/values/etc. Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu(). But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a link for CPU2, but would try that for CPU3 while bringing it online. Which will report errors as CPU3 already has kobj assigned to it. This bug got introduced with commit 42f921a, which overlooked this scenario. To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here only for the first CPU of a non-boot cluster. And we can't recover from this situation, if kobject_move() fails. Fixes: 42f921a6f10c (cpufreq: remove sysfs files for CPUs which failed to come back after resume) Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Reported-and-tested-by: Bu Yitian <ybu@qti.qualcomm.com> Reported-by: Saravana Kannan <skannan@codeaurora.org> Reviewed-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16IB/mlx5: Enable "block multicast loopback" for kernel consumersOr Gerlitz1-1/+1
In commit f360d88a2efd, we advertise blocking multicast loopback to both kernel and userspace consumers, but don't allow kernel consumers (e.g IPoIB) to use it with their UD QPs. Fix that. Fixes: f360d88a2efd ("IB/mlx5: Add block multicast loopback support") Reported-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-16hwmon: (adt7470) Fix writes to temperature limit registersGuenter Roeck1-3/+3
Temperature limit registers are signed. Limits therefore need to be clamped to (-128, 127) degrees C and not to (0, 255) degrees C. Without this fix, writing a limit of 128 degrees C sets the actual limit to -128 degrees C. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Reviewed-by: Axel Lin <axel.lin@ingics.com>
2014-07-17drm/qxl: return IRQ_NONE if it was not our irqJason Wang1-0/+3
Return IRQ_NONE if it was not our irq. This is necessary for the case when qxl is sharing irq line with a device A in a crash kernel. If qxl is initialized before A and A's irq was raised during this gap, returning IRQ_HANDLED in this case will cause this irq to be raised again after EOI since kernel think it was handled but in fact it was not. Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-07-16cpufreq: cpu0: OPPs can be populated at runtimeViresh Kumar2-7/+6
OPPs can be populated statically, via DT, or added at run time with dev_pm_opp_add(). While this driver handles the first case correctly, it would fail to populate OPPs added at runtime. Because call to of_init_opp_table() would fail as there are no OPPs in DT and probe will return early. To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table() unconditionally. Update bindings as well. Suggested-by: Stephen Boyd <sboyd@codeaurora.org> Tested-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOODQuentin Armitage1-1/+1
Commit ff1f0018cf66080d8e6f59791e552615648a033a ("drivers: Enable building of Kirkwood drivers for mach-mvebu") added Kirkwood into mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files. The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD || MACH_KIRKWOOD. As a consequence of this change, the cpufreq driver is no longer enabled for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for ARCH_KIRKWOOD. Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNERDavidlohr Bueso4-5/+11
Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER), encapsulate the dependencies for rwsem optimistic spinning. No logical changes here as it continues to depend on both SMP and the XADD algorithm variant. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Jason Low <jason.low2@hp.com> [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com Cc: aswin@hp.com Cc: Chris Mason <clm@fb.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Josef Bacik <jbacik@fusionio.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16locking/mutex: Disable optimistic spinning on some architecturesPeter Zijlstra6-1/+9
The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Jason Low <jason.low2@hp.com> Cc: Waiman Long <waiman.long@hp.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: John David Anglin <dave.anglin@bell.net> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: stable@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16locking/rwsem: Reduce the size of struct rw_semaphoreJason Low1-14/+11
Recent optimistic spinning additions to rwsem provide significant performance benefits on many workloads on large machines. The cost of it was increasing the size of the rwsem structure by up to 128 bits. However, now that the previous patches in this series bring the overhead of struct optimistic_spin_queue to 32 bits, this patch reorders some fields in struct rw_semaphore such that we can reduce the overhead of the rwsem structure by 64 bits (on 64 bit systems). The extra overhead required for rwsem optimistic spinning would now be up to 8 additional bytes instead of up to 16 bytes. Additionally, the size of rwsem would now be more in line with mutexes. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Scott Norton <scott.norton@hp.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fusionio.com> Link: http://lkml.kernel.org/r/1405358872-3732-6-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16locking/rwsem: Rename 'activity' to 'count'Peter Zijlstra2-18/+18
There are two definitions of struct rw_semaphore, one in linux/rwsem.h and one in linux/rwsem-spinlock.h. For some reason they have different names for the initial field. This makes it impossible to use C99 named initialization for __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing along with the structure definitions. The simpler patch is renaming the rwsem-spinlock variant to match the regular rwsem. This allows us to switch to C99 named initialization. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>