Age | Commit message (Collapse) | Author | Files | Lines |
|
Although we don't expect to take alignment faults on access to normal
memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into
system calls, leading to things like get_user accessing device memory.
Rather than OOPS the kernel, allow any exception fixups to run and
return something like -EFAULT back to userspace. This makes the
behaviour more consistent with userspace, even though applications with
access to device mappings can easily cause other issues if they try
hard enough.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com>
[will: dropped __kprobes annotation and rewrote commit mesage]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
For the same reason as commit 19514fc665ff ("arm, kbuild: make "make
install" not depend on vmlinux"), the install targets should never
trigger the rebuild of the kernel.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.
Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
Fix the problem by always updating bh->b_state bits atomically.
CC: stable@vger.kernel.org
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When slub_debug alloc_calls_show is enabled we will try to track
location and user of slab object on each online node, kmem_cache_node
structure and cpu_cache/cpu_slub shouldn't be freed till there is the
last reference to sysfs file.
This fixes the following panic:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: list_locations+0x169/0x4e0
PGD 257304067 PUD 438456067 PMD 0
Oops: 0000 [#1] SMP
CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty #2 9.30
Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011
task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000
RIP: list_locations+0x169/0x4e0
Call Trace:
alloc_calls_show+0x1d/0x30
slab_attr_show+0x1b/0x30
sysfs_read_file+0x9a/0x1a0
vfs_read+0x9c/0x170
SyS_read+0x58/0xb0
system_call_fastpath+0x16/0x1b
Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 <48> 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10
CR2: 0000000000000020
Separated __kmem_cache_release from __kmem_cache_shutdown which now
called on slab_kmem_cache_release (after the last reference to sysfs
file object has dropped).
Reintroduced locking in free_partial as sysfs file might access cache's
partial list after shutdowning - partial revert of the commit
69cb8e6b7c29 ("slub: free slabs without holding locks"). Zap
__remove_partial and use remove_partial (w/o underscores) as
free_partial now takes list_lock which s partial revert for commit
1e4dd9461fab ("slub: do not assert not having lock in removing freed
partial")
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
remap_file_pages(2) emulation can reach file which represents removed
IPC ID as long as a memory segment is mapped. It breaks expectations of
IPC subsystem.
Test case (rewritten to be more human readable, originally autogenerated
by syzkaller[1]):
#define _GNU_SOURCE
#include <stdlib.h>
#include <sys/ipc.h>
#include <sys/mman.h>
#include <sys/shm.h>
#define PAGE_SIZE 4096
int main()
{
int id;
void *p;
id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
p = shmat(id, NULL, 0);
shmctl(id, IPC_RMID, NULL);
remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);
return 0;
}
The patch changes shm_mmap() and code around shm_lock() to propagate
locking error back to caller of shm_mmap().
[1] http://github.com/google/syzkaller
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Kselftest Framework now has a dedicated mailing list linux-kselftest.
Update the entry in MAINTAINERS file.
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The pmem driver calls devm_memremap() to map a persistent memory range.
When the pmem driver is unloaded, this memremap'd range is not released
so the kernel will leak a vma.
Fix devm_memremap_release() to handle a given memremap'd address
properly.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently incorrect default hugepage pool size is reported by proc
nr_hugepages when number of pages for the default huge page size is
specified twice.
When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages
indicates the current number of pre-allocated huge pages of the default
size. Basically /proc/sys/vm/nr_hugepages displays default_hstate->
max_huge_pages and after boot time pre-allocation, max_huge_pages should
equal the number of pre-allocated pages (nr_hugepages).
Test case:
Note that this is specific to x86 architecture.
Boot the kernel with command line option 'default_hugepagesz=1G
hugepages=X hugepagesz=2M hugepages=Y hugepagesz=1G hugepages=Z'. After
boot, 'cat /proc/sys/vm/nr_hugepages' and 'sysctl -a | grep hugepages'
returns the value X. However, dmesg output shows that Z huge pages were
pre-allocated.
So, the root cause of the problem here is that the global variable
default_hstate_max_huge_pages is set if a default huge page size is
specified (directly or indirectly) on the command line. After the command
line processing in hugetlb_init, if default_hstate_max_huge_pages is set,
the value is assigned to default_hstae.max_huge_pages. However,
default_hstate.max_huge_pages may have already been set based on the
number of pre-allocated huge pages of default_hstate size.
The solution to this problem is if hstate->max_huge_pages is already set
then it should not set as a result of global max_huge_pages value.
Basically if the value of the variable hugepages is set multiple times on
a command line for a specific supported hugepagesize then proc layer
should consider the last specified value.
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") has
moved up the pte_page(pte) in x86's fast gup_pte_range(), for no
discernible reason: put it back where it belongs, after the pte_flags
check and the pfn_valid cross-check.
That may be the cause of the NULL pointer dereference in
gup_pte_range(), seen when vfio called vaddr_get_pfn() when starting a
qemu-kvm based VM.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Michael Long <Harn-Solo@gmx.de>
Tested-by: Michael Long <Harn-Solo@gmx.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We don't require a dedicated thread for fsnotify cleanup. Switch it
over to a workqueue job instead that runs on the system_unbound_wq.
In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit c510eff6beba ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").
Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.
The above commit changed the code to use call_srcu to clean up the
marks. While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.
There's also another potential problem with using call_srcu here. While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.
It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys. While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.
This patch reverts the above patch. A later patch will take a different
approach to eliminated the dedicated thread here.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Grazvydas Ignotas has reported a regression in remap_file_pages()
emulation.
Testcase:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define SIZE (4096 * 3)
int main(int argc, char **argv)
{
unsigned long *p;
long i;
p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
if (remap_file_pages(p, 4096, 0, 1, 0)) {
perror("remap_file_pages");
return -1;
}
if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) {
perror("remap_file_pages");
return -1;
}
assert(p[0] == 1);
munmap(p, SIZE);
return 0;
}
The second remap_file_pages() fails with -EINVAL.
The reason is that remap_file_pages() emulation assumes that the target
vma covers whole area we want to over map. That assumption is broken by
first remap_file_pages() call: it split the area into two vma.
The solution is to check next adjacent vmas, if they map the same file
with the same flags.
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
DAX doesn't deposit pgtables when it maps huge pages: nothing to
withdraw. It can lead to crash.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice
in the same code path, e.g. one in snd_pcm_action_nonatomic() and
another in snd_pcm_stream_lock(). Usually this is OK, but when a
write lock is issued between these two read locks, the problem
happens: the write lock is blocked due to the first reade lock, and
the second read lock is also blocked by the write lock. This
eventually deadlocks.
The reason is the way rwsem manages waiters; it's queued like FIFO, so
even if the writer itself doesn't take the lock yet, it blocks all the
waiters (including reads) queued after it.
As a workaround, in this patch, we replace the standard down_write()
with an spinning loop. This is far from optimal, but it's good
enough, as the spinning time is supposed to be relatively short for
normal PCM operations, and the code paths requiring the write lock
aren't called so often.
Reported-by: Vinod Koul <vinod.koul@intel.com>
Tested-by: Ramesh Babu <ramesh.babu@intel.com>
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.
This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.
Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
We get this right for queue_discard_max_show but not max_hw_show. Follow the
same pattern as queue_discard_max_show instead so that we don't truncate.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch ensures that devices, which got registered before arch_initcall
will be handled correctly by IOMMU-based DMA-mapping code.
Cc: <stable@vger.kernel.org>
Fixes: 13b8629f6511 ("arm64: Add IOMMU dma_ops")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit ca369d51b ("sd: Fix device-imposed transfer length limits")
introduced a new queue limit max_dev_sectors which limits the maximum
sectors for requests. The default value leads to small dasd requests
and therefor to a performance drop.
Set the max_dev_sectors value to the same value as the max_hw_sectors
to use the maximum available request size for DASD devices.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
When fixing the DAT off bug ("s390: fix DAT off memory access, e.g.
on kdump") both Christian and I missed that we can save an additional
stnsm instruction.
This saves us a couple of cycles which could improve the speed of
memcpy_real.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The qxl_gem_prime_mmap() function returns ENOSYS instead of -ENOSYS
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
In the display resume path, move the calls to drm_vblank_on()
after the point when the display engine is running again.
Since changes were made to drm_update_vblank_count() in Linux 4.4+
to emulate hw vblank counters via vblank timestamping, the function
drm_vblank_on() now needs working high precision vblank timestamping
and therefore working scanout position queries at time of call.
These don't work before the display engine gets restarted, causing
miscalculation of vblank counter increments and thereby large forward
jumps in vblank count at display resume. These jumps can cause client
hangs on resume, or desktop hangs in the case of composited desktops.
Fix this Linux 4.4 regression by reordering calls accordingly.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Make sure that drm_vblank_get/put() stay balanced in
case drm_vblank_get fails, by skipping the corresponding
put.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: michel@daenzer.net
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
drm_vblank_offdelay can have three different types of values:
< 0 is to be always treated the same as dev->vblank_disable_immediate
= 0 is to be treated as "never disable vblanks"
> 0 is to be treated as disable immediate if kms driver wants it
that way via dev->vblank_disable_immediate. Otherwise it is
a disable timeout in msecs.
This got broken in Linux 3.18+ for the implementation of
drm_vblank_on. If the user specified a value of zero which should
always reenable vblank irqs in this function, a kms driver could
override the users choice by setting vblank_disable_immediate
to true. This patch fixes the regression and keeps the user in
control.
v2: Only reenable vblank if there are clients left or the user
requested to "never disable vblanks" via offdelay 0. Enabling
vblanks even in the "delayed disable" case (offdelay > 0) was
specifically added by Ville in commit cd19e52aee922
("drm: Kick start vblank interrupts at drm_vblank_on()"),
but after discussion it turns out that this was done by accident.
Citing Ville: "I think it just ended up as a mess due to changing
some of the semantics of offdelay<0 vs. offdelay==0 vs.
disable_immediate during the review of the series. So yeah, given
how drm_vblank_put() works now, I'd just make this check for
offdelay==0."
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # 3.18+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Changes to drm_update_vblank_count() in Linux 4.4 broke the
behaviour of the pre/post modeset functions as the new update
code doesn't deal with hw vblank counter resets inbetween calls
to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
should.
This causes mistreatment of such hw counter resets as counter
wraparound, and thereby large forward jumps of the software
vblank counter which in turn cause vblank event dispatching
and vblank waits to fail/hang --> userspace clients hang.
This symptom was reported on radeon-kms to cause a infinite
hang of KDE Plasma 5 shell's login procedure, preventing users
from logging in.
Fix this by detecting when drm_update_vblank_count() is called
inside a pre->post modeset interval. If so, clamp valid vblank
increments to the safe values 0 and 1, pretty much restoring
the update behavior of the old update code of Linux 4.3 and
earlier. Also reset the last recorded hw vblank count at call
to drm_vblank_post_modeset() to be safe against hw that after
modesetting, dpms on etc. only fires its first vblank irq after
drm_vblank_post_modeset() was already called.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This fixes a regression introduced by the new drm_update_vblank_count()
implementation in Linux 4.4:
Restrict the bump of the software vblank counter in drm_update_vblank_count()
to a safe maximum value of +1 whenever there is the possibility that
concurrent readers of vblank timestamps could be active at the moment,
as the current implementation of the timestamp caching and updating is
not safe against concurrent readers for calls to store_vblank() with a
bump of anything but +1. A bump != 1 would very likely return corrupted
timestamps to userspace, because the same slot in the cache could
be concurrently written by store_vblank() and read by one of those
readers in a non-atomic fashion and without the read-retry logic
detecting this collision.
Concurrent readers can exist while drm_update_vblank_count() is called
from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
irq callers. However, all those calls are happening with the vbl_lock
locked thereby preventing a drm_vblank_get(), so the vblank refcount
can't increase while drm_update_vblank_count() is executing. Therefore
a zero vblank refcount during execution of that function signals that
is safe for arbitrary counter bumps if called from outside vblank irq,
whereas a non-zero count is not safe.
Whenever the function is called from vblank irq, we have to assume concurrent
readers could show up any time during its execution, even if the refcount
is currently zero, as vblank irqs are usually only enabled due to the
presence of readers, and because when it is called from vblank irq it
can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
Therefore also restrict bumps to +1 when the function is called from vblank
irq.
Such bumps of more than +1 can happen at other times than reenabling
vblank irqs, e.g., when regular vblank interrupts get delayed by more
than 1 frame due to long held locks, long irq off periods, realtime
preemption on RT kernels, or system management interrupts.
A better solution would be to rewrite the timestamp caching to use
full seqlocks to allow concurrent writes and reads for arbitrary
vblank counter increments.
v2: Add code comment that this is essentially a hack and should
be replaced by a full seqlock implementation for caching of
timestamps.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Otherwise if a kms driver calls into drm_vblank_off() more than once
before calling drm_vblank_on() again, the redundant calls to
vblank_disable_and_save() will call drm_update_vblank_count()
while hw vblank counters and vblank timestamping are in a undefined
state during modesets, dpms off etc.
At least with the legacy drm helpers it is not unusual to
get multiple calls to drm_vblank_off and drm_vblank_on, e.g.,
half a dozen calls to drm_vblank_off and two calls to drm_vblank_on
were observed on radeon-kms during dpms-off -> dpms-on transition.
We don't no-op calls from atomic modesetting drivers, as they
should do a proper job of tracking hw state.
Fixes large jumps of the software maintained vblank counter due to
the hardware vblank counter resetting to zero during dpms off or
modeset, e.g., if radeon-kms is modified to use drm_vblank_off/on
instead of drm_vblank_pre/post_modeset().
This fixes a regression caused by the changes made to
drm_update_vblank_count() in Linux 4.4.
v2: Don't no-op on atomic modesetting drivers, per suggestion
of Daniel Vetter.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This avoids integer overflows on 32bit machines when calculating
reloc_info size, as reported by Alan Cox.
Cc: stable@vger.kernel.org
Cc: gnomes@lxorguk.ukuu.org.uk
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This reverts commit cfcfa086d43ced33e1099b9befb12f17fca102e1.
This causes the tiling properties to break in some unexpected ways,
Revert it for now.
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
inode struct members that track cgroup writeback information
should be reinitialized when inode gets allocated from
kmem_cache. Otherwise, their values remain and get used by the
new inode.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If cgroup writeback is in use, an inode is associated with a cgroup
for writeback. If the inode's main dirtier changes to another cgroup,
the association gets updated asynchronously. Nothing was pinning the
superblock while such switches are in progress and superblock could go
away while async switching is pending or in progress leading to
crashes like the following.
kernel BUG at fs/jbd2/transaction.c:319!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51
Hardware name: Google Google, BIOS Google 01/01/2011
Workqueue: events inode_switch_wbs_work_fn
task: ffff880213dbbd40 ti: ffff880209264000 task.ti: ffff880209264000
RIP: 0010:[<ffffffff803e6922>] [<ffffffff803e6922>] start_this_handle+0x382/0x3e0
RSP: 0018:ffff880209267c30 EFLAGS: 00010202
...
Call Trace:
[<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
[<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
[<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
[<ffffffff8035338b>] evict+0xbb/0x190
[<ffffffff80354190>] iput+0x130/0x190
[<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
[<ffffffff80279819>] process_one_work+0x129/0x300
[<ffffffff80279b16>] worker_thread+0x126/0x480
[<ffffffff8027ed14>] kthread+0xc4/0xe0
[<ffffffff809771df>] ret_from_fork+0x3f/0x70
Fix it by bumping s_active while cgroup association switching is in
flight.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Tahsin Erdogan <tahsin@google.com>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Cc: stable@vger.kernel.org #v4.5+
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The commit [7f0973e973cd: ALSA: seq: Fix lockdep warnings due to
double mutex locks] split the management of two linked lists (source
and destination) into two individual calls for avoiding the AB/BA
deadlock. However, this may leave the possible double deletion of one
of two lists when the counterpart is being deleted concurrently.
It ends up with a list corruption, as revealed by syzkaller fuzzer.
This patch fixes it by checking the list emptiness and skipping the
deletion and the following process.
BugLink: http://lkml.kernel.org/r/CACT4Y+bay9qsrz6dQu31EcGaH9XwfW7o3oBzSQUG9fMszoh=Sg@mail.gmail.com
Fixes: 7f0973e973cd ('ALSA: seq: Fix lockdep warnings due to 'double mutex locks)
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Changes introduced in the upstream version of libfdt pulled in by commit
91feabc2e224 ("scripts/dtc: Update to upstream commit b06e55c88b9b") use
the strnlen() function, which isn't currently available to the EFI name-
space. Add it to the EFI namespace to avoid a linker error.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Rob Herring <robh@kernel.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When ext4_bread() fails, fname_crypto_str remains
allocated after return. Fix that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
CC: Dmitry Monakhov <dmonakhov@virtuozzo.com>
|
|
If a bio for a direct IO request fails, we were not setting the error in
the parent bio (the main DIO bio), making us not return the error to
user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO()
return the number of bytes issued for IO and not the error a bio created
and submitted by btrfs_submit_direct() got from the block layer.
This essentially happens because when we call:
dio_end_io(dio_bio, bio->bi_error);
It does not set dio_bio->bi_error to the value of the second argument.
So just add this missing assignment in endio callbacks, just as we do in
the error path at btrfs_submit_direct() when we fail to clone the dio bio
or allocate its private object. This follows the convention of what is
done with other similar APIs such as bio_endio() where the caller is
responsible for setting the bi_error field in the bio it passes as an
argument to bio_endio().
This was detected by the new generic test cases in xfstests: 271, 272,
276 and 278. Which essentially setup a dm error target, then load the
error table, do a direct IO write and unload the error table. They
expect the write to fail with -EIO, which was not getting reported
when testing against btrfs.
Cc: stable@vger.kernel.org # 4.3+
Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
|
|
Previously, samsung_gpio_drection_in/output function were not covered
with a spinlock.
For example, samsung_gpio_direction_output function consists of
two functions.
1. samsung_gpio_set
2. samsung_gpio_set_direction
When 2 CPUs try to control the same gpio pin heavily,
(situation like i2c control with gpio emulation)
This situation can cause below problem.
CPU 0 | CPU1
|
samsung_gpio_direction_output |
samsung_gpio_set(pin A as 1) | samsung_gpio_direction_output
| samsung_gpio_set(pin A as 0)
samsung_gpio_set_direction |
| samsung_gpio_set_direction
The initial value of pin A will be set as 0 while we wanted to set pin A as 1.
This patch modifies samsung_gpio_direction_in/output function
to be done in one spinlock to fix race condition.
Additionally, the new samsung_gpio_set_value was added to implement
gpio set callback(samsung_gpio_set) with spinlock using this function.
Cc: stable@vger.kernel.org
Signed-off-by: Youngmin Nam <ym0914@gmail.com>
Acked-by: Tomasz Figa <tomasz.figa@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
In my randconfig tests, I came across a bug that involves several
components:
* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing
u32 adc_clock = 450560; /* 45.056 MHz */
if (state->config.adc_clock)
adc_clock = state->config.adc_clock;
do_div(value, adc_clock);
In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.
That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():
dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.
I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.
Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de
Acked-by: Nicolas Pitre <nico@linaro.org>
Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.
This can probuce the following warning:
[ INFO: suspicious RCU usage. ]
4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
-------------------------------
include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34
Call Trace:
[c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
[c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
[c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
[c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
[c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
[c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
[c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
[c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
[c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
[c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
[c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
[c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.
Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.
Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.
Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
ECB modes don't use an initialization vector. The kernel
/proc/crypto interface doesn't reflect this properly.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The commit [991f86d7ae4e: ALSA: hda - Flush the pending probe work at
remove] introduced the sync of async probe work at remove for fixing
the race. However, this may lead to another hangup when the module
removal is performed quickly before starting the probe work, because
it issues flush_work() and it's blocked forever.
The workaround is to use cancel_work_sync() instead of flush_work()
there.
Fixes: 991f86d7ae4e ('ALSA: hda - Flush the pending probe work at remove')
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
When multiple concurrent writes happen on the ALSA sequencer device
right after the open, it may try to allocate vmalloc buffer for each
write and leak some of them. It's because the presence check and the
assignment of the buffer is done outside the spinlock for the pool.
The fix is to move the check and the assignment into the spinlock.
(The current implementation is suboptimal, as there can be multiple
unnecessary vmallocs because the allocation is done before the check
in the spinlock. But the pool size is already checked beforehand, so
this isn't a big problem; that is, the only possible path is the
multiple writes before any pool assignment, and practically seen, the
current coverage should be "good enough".)
The issue was triggered by syzkaller fuzzer.
BugLink: http://lkml.kernel.org/r/CACT4Y+bSzazpXNvtAr=WXaL8hptqjHwqEyFA+VN2AWEx=aurkg@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Decon requires that clocks should be disabled in reverse order. Otherwise
system hangs.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
dma_mmap_attrs() should be called with cpu address returned by
dma_alloc_attrs(). Existing code however passed pages array base as cpu
address. This worked only by a pure luck on ARM architecture. This patch
fixes this issue.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
BIT_IRQS_ENABLED was never set because of incorrect test in
decon_vlank_enable() function, what resulted in lack of enabling vblank
support. This patch fixes this issue.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
Patch ebf3fd403b79ba6561bd1a4bb5a7cacc99da08e5 ("drm/exynos: add
pm_runtime to DECON 5433") removed some code from decon_enable()
function, but it left set_bit(BIT_SUSPENDED, &ctx->flags) call, which
was earlier called only in error path. This patch removes it, what
finally lets driver to go out of suspended state.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
This patch fixes issue introduced by commit
cf67cc9a29ac19c98bc4fa0e6d14b0c1f592d322 ("drm/exynos: remove struct
exynos_drm_display"), which removed assigning of drm bridge to drm
encoder. Lack of it caused that no bridge callbacks were called on
encoder enable/disable actions.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
There is no point exposing all internal functions to global kernel name
space, so make all internals functions static.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
MIC is SoC component and important part of kms pipeline on Exynos5433,
so convert it to use component framework like other KMS/CRTC drivers.
MIC driver is already listed on KMS component driver list in Exynos DRM
core, so without this conversion, initialization of Exynos DRM core
fails on Exynos 5433 SoC.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
Drivers should use devm_clk* interface instead of of_clk* functions.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
This patch fixes compilation warnings (on 64bit architectures) and bugs
related to casting pointers through 32bit integers.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|