aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/x86.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-02ARM: davinci: da830-evm: fix GPIO lookup for OHCIBartosz Golaszewski1-1/+1
The fixed regulator driver doesn't specify any con_id for gpio lookup so it must be NULL in the table entry. Fixes: 274e4c336192 ("ARM: davinci: da830-evm: add a fixed regulator for ohci-da8xx") Cc: stable@vger.kernel.org Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
2019-07-02ARM: davinci: omapl138-hawk: add missing regulator constraints for OHCIBartosz Golaszewski1-0/+3
We need to enable status changes for the fixed power supply for the USB controller. Fixes: 1d272894ec4f ("ARM: davinci: omapl138-hawk: add a fixed regulator for ohci-da8xx") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
2019-07-02ARM: davinci: da830-evm: add missing regulator constraints for OHCIBartosz Golaszewski1-0/+3
We need to enable status changes for the fixed power supply for the USB controller. Fixes: 274e4c336192 ("ARM: davinci: da830-evm: add a fixed regulator for ohci-da8xx") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
2019-07-02drm/i915/ringbuffer: EMIT_INVALIDATE *before* switch contextChris Wilson1-3/+3
Despite what I think the prm recommends, commit f2253bd9859b ("drm/i915/ringbuffer: EMIT_INVALIDATE after switch context") turned out to be a huge mistake when enabling Ironlake contexts as the GPU would hang on either a MI_FLUSH or PIPE_CONTROL immediately following the MI_SET_CONTEXT of an active mesa context (more vanilla contexts, e.g. simple rendercopies with igt, do not suffer). Ville found the following clue, "[DevCTG+]: For the invalidate operation of the pipe control, the following pointers are affected. The invalidate operation affects the restore of these packets. If the pipe control invalidate operation is completed before the context save, the indirect pointers will not be restored from memory. 1. Pipeline State Pointer 2. Media State Pointer 3. Constant Buffer Packet" which suggests by us emitting the INVALIDATE prior to the MI_SET_CONTEXT, we prevent the context-restore from chasing the dangling pointers within the image, and explains why this likely prevents the GPU hang. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-1-chris@chris-wilson.co.uk (cherry picked from commit 928f8f42310f244501a7c70daac82c196112c190 in drm-intel-next) Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111014 Fixes: f2253bd9859b ("drm/i915/ringbuffer: EMIT_INVALIDATE after switch context") Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2019-07-01soc: ti: fix irq-ti-sci link errorArnd Bergmann2-3/+3
The irqchip driver depends on the SoC specific driver, but we want to be able to compile-test it elsewhere: WARNING: unmet direct dependencies detected for TI_SCI_INTA_MSI_DOMAIN Depends on [n]: SOC_TI [=n] Selected by [y]: - TI_SCI_INTA_IRQCHIP [=y] && TI_SCI_PROTOCOL [=y] drivers/irqchip/irq-ti-sci-inta.o: In function `ti_sci_inta_irq_domain_probe': irq-ti-sci-inta.c:(.text+0x204): undefined reference to `ti_sci_inta_msi_create_irq_domain' Rearrange the Kconfig and Makefile so we build the soc driver whenever its users are there, regardless of the SOC_TI option. Fixes: 49b323157bf1 ("soc: ti: Add MSI domain bus support for Interrupt Aggregator") Fixes: f011df6179bd ("irqchip/ti-sci-inta: Add msi domain support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-07-01ALSA: hda: Fix widget_mutex incomplete protectionEvan Green1-6/+12
The widget_mutex was introduced to serialize callers to hda_widget_sysfs_{re}init. However, its protection of the sysfs widget array is incomplete. For example, it is acquired around the call to hda_widget_sysfs_reinit(), which actually creates the new array, but isn't still acquired when codec->num_nodes and codec->start_nid is updated. So the lock ensures one thread sets up the new array at a time, but doesn't ensure which thread's value will end up in codec->num_nodes. If a larger num_nodes wins but a smaller array was set up, the next call to refresh_widgets() will touch free memory as it iterates over codec->num_nodes that aren't there. The widget_lock really protects both the tree as well as codec->num_nodes, start_nid, and end_nid, so make sure it's held across that update. It should also be held during snd_hdac_get_sub_nodes(), so that a very old read from that function doesn't end up clobbering a later update. Fixes: ed180abba7f1 ("ALSA: hda: Fix race between creating and refreshing sysfs entries") Signed-off-by: Evan Green <evgreen@chromium.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-07-01drm/amdgpu/gfx9: use reset default for PA_SC_FIFO_SIZEAlex Deucher1-19/+0
Recommended by the hw team. Reviewed-and-Tested-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-07-01ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messagesTakashi Sakamoto1-1/+1
In IEC 61883-6, 8 MIDI data streams are multiplexed into single MIDI conformant data channel. The index of stream is calculated by modulo 8 of the value of data block counter. In fireworks, the value of data block counter in CIP header has a quirk with firmware version v5.0.0, v5.7.3 and v5.8.0. This brings ALSA IEC 61883-1/6 packet streaming engine to miss detection of MIDI messages. This commit fixes the miss detection to modify the value of data block counter for the modulo calculation. For maintainers, this bug exists since a commit 18f5ed365d3f ("ALSA: fireworks/firewire-lib: add support for recent firmware quirk") in Linux kernel v4.2. There're many changes since the commit. This fix can be backported to Linux kernel v4.4 or later. I tagged a base commit to the backport for your convenience. Besides, my work for Linux kernel v5.3 brings heavy code refactoring and some structure members are renamed in 'sound/firewire/amdtp-stream.h'. The content of this patch brings conflict when merging -rc tree with this patch and the latest tree. I request maintainers to solve the conflict to replace 'tx_first_dbc' with 'ctx_data.tx.first_dbc'. Fixes: df075feefbd3 ("ALSA: firewire-lib: complete AM824 data block processing layer") Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-07-01vfs: move_mount: reject moving kernel internal mountsEric Biggers1-3/+4
sys_move_mount() crashes by dereferencing the pointer MNT_NS_INTERNAL, a.k.a. ERR_PTR(-EINVAL), if the old mount is specified by fd for a kernel object with an internal mount, such as a pipe or memfd. Fix it by checking for this case and returning -EINVAL. [AV: what we want is is_mounted(); use that instead of making the condition even more convoluted] Reproducer: #include <unistd.h> #define __NR_move_mount 429 #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 int main() { int fds[2]; pipe(fds); syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH); } Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-01fork: return proper negative error codeChristian Brauner1-0/+1
Make sure to return a proper negative error code from copy_process() when anon_inode_getfile() fails with CLONE_PIDFD. Otherwise _do_fork() will not detect an error and get_task_pid() will operator on a nonsensical pointer: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c R13: 00007ffc15fbb0ff R14: 00007ff07e47e9c0 R15: 0000000000000000 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 7990 Comm: syz-executor290 Not tainted 5.2.0-rc6+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] RIP: 0010:get_task_pid+0xe1/0x210 kernel/pid.c:372 Code: 89 ff e8 62 27 5f 00 49 8b 07 44 89 f1 4c 8d bc c8 90 01 00 00 eb 0c e8 0d fe 25 00 49 81 c7 38 05 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 18 00 74 08 4c 89 ff e8 31 27 5f 00 4d 8b 37 e8 f9 47 12 00 RSP: 0018:ffff88808a4a7d78 EFLAGS: 00010203 RAX: 00000000000000a7 RBX: dffffc0000000000 RCX: ffff888088180600 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88808a4a7d90 R08: ffffffff814fb3a8 R09: ffffed1015d66bf8 R10: ffffed1015d66bf8 R11: 1ffff11015d66bf7 R12: 0000000000041ffc R13: 1ffff11011494fbc R14: 0000000000000000 R15: 000000000000053d FS: 00007ff07e47e700(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b5100 CR3: 0000000094df2000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _do_fork+0x1b9/0x5f0 kernel/fork.c:2360 __do_sys_clone kernel/fork.c:2454 [inline] __se_sys_clone kernel/fork.c:2448 [inline] __x64_sys_clone+0xc1/0xd0 kernel/fork.c:2448 do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Link: https://lore.kernel.org/lkml/000000000000e0dc0d058c9e7142@google.com Reported-and-tested-by: syzbot+002e636502bc4b64eb5c@syzkaller.appspotmail.com Fixes: 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups") Cc: Jann Horn <jannh@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-01drm/amdgpu: Don't skip display settings in hwmgr_resume()Lyude Paul1-1/+1
I'm not entirely sure why this is, but for some reason: 921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed") Breaks runtime PM resume on the Radeon PRO WX 3100 (Lexa) in one the pre-production laptops I have. The issue manifests as the following messages in dmesg: [drm] UVD and UVD ENC initialized successfully. amdgpu 0000:3b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vce1 test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110 [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110). And happens after about 6-10 runtime PM suspend/resume cycles (sometimes sooner, if you're lucky!). Unfortunately I can't seem to pin down precisely which part in psm_adjust_power_state_dynamic that is causing the issue, but not skipping the display setting setup seems to fix it. Hopefully if there is a better fix for this, this patch will spark discussion around it. Fixes: 921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed") Cc: Evan Quan <evan.quan@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Rex Zhu <Rex.Zhu@amd.com> Cc: Likun Gao <Likun.Gao@amd.com> Cc: <stable@vger.kernel.org> # v5.1+ Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-01drm/amd/powerplay: use hardware fan control if no powerplay fan tableEvan Quan3-1/+8
Otherwise, you may get divided-by-zero error or corrput the SMU fan control feature. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Slava Abramov <slava.abramov@amd.com> Acked-by: Slava Abramov <slava.abramov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-07-01mtd: rawnand: ingenic: Fix ingenic_ecc dependencyPaul Cercueil4-11/+4
If MTD_NAND_JZ4780 is y and MTD_NAND_JZ4780_BCH is m, which select CONFIG_MTD_NAND_INGENIC_ECC to m, building fails: drivers/mtd/nand/raw/ingenic/ingenic_nand.o: In function `ingenic_nand_remove': ingenic_nand.c:(.text+0x177): undefined reference to `ingenic_ecc_release' drivers/mtd/nand/raw/ingenic/ingenic_nand.o: In function `ingenic_nand_ecc_correct': ingenic_nand.c:(.text+0x2ee): undefined reference to `ingenic_ecc_correct' To fix that, the ingenic_nand and ingenic_ecc modules have been fused into one single module. - The ingenic_ecc.c code is now compiled in only if $(CONFIG_MTD_NAND_INGENIC_ECC) is set. This is now a boolean instead of tristate. - To avoid changing the module name, the ingenic_nand.c file is moved to ingenic_nand_drv.c. Then the module name is still ingenic_nand. - Since ingenic_ecc.c is no more a module, the module-specific macros have been dropped, and the functions are no more exported for use by the ingenic_nand driver. Fixes: 15de8c6efd0e ("mtd: rawnand: ingenic: Separate top-level and SoC specific code") Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Hulk Robot <hulkci@huawei.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-07-01mtd: spinand: Fix max_bad_eraseblocks_per_lun info in memorgFrieder Schrempf2-3/+3
The 1Gb Macronix chip can have a maximum of 20 bad blocks, while the 2Gb version has twice as many blocks and therefore the maximum number of bad blocks is 40. The 4Gb GigaDevice GD5F4GQ4xA has twice as many blocks as its 2Gb counterpart and therefore a maximum of 80 bad blocks. Fixes: 377e517b5fa5 ("mtd: nand: Add max_bad_eraseblocks_per_lun info to memorg") Reported-by: Emil Lenngren <emil.lenngren@gmail.com> Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-06-30Linux 5.2-rc7Linus Torvalds1-2/+2
2019-06-29linux/kernel.h: fix overflow for DIV_ROUND_UP_ULLVinod Koul1-1/+2
DIV_ROUND_UP_ULL adds the two arguments and then invokes DIV_ROUND_DOWN_ULL. But on a 32bit system the addition of two 32 bit values can overflow. DIV_ROUND_DOWN_ULL does it correctly and stashes the addition into a unsigned long long so cast the result to unsigned long long here to avoid the overflow condition. [akpm@linux-foundation.org: DIV_ROUND_UP_ULL must be an rval] Link: http://lkml.kernel.org/r/20190625100518.30753-1-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm, swap: fix THP swap outHuang Ying1-5/+2
0-Day test system reported some OOM regressions for several THP (Transparent Huge Page) swap test cases. These regressions are bisected to 6861428921b5 ("block: always define BIO_MAX_PAGES as 256"). In the commit, BIO_MAX_PAGES is set to 256 even when THP swap is enabled. So the bio_alloc(gfp_flags, 512) in get_swap_bio() may fail when swapping out THP. That causes the OOM. As in the patch description of 6861428921b5 ("block: always define BIO_MAX_PAGES as 256"), THP swap should use multi-page bvec to write THP to swap space. So the issue is fixed via doing that in get_swap_bio(). BTW: I remember I have checked the THP swap code when 6861428921b5 ("block: always define BIO_MAX_PAGES as 256") was merged, and thought the THP swap code needn't to be changed. But apparently, I was wrong. I should have done this at that time. Link: http://lkml.kernel.org/r/20190624075515.31040-1-ying.huang@intel.com Fixes: 6861428921b5 ("block: always define BIO_MAX_PAGES as 256") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29fork,memcg: alloc_thread_stack_node needs to set tsk->stackAndrea Arcangeli1-1/+5
Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail") corrected two instances, but there was a third instance of this bug. Without setting tsk->stack, if memcg_charge_kernel_stack fails, it'll execute free_thread_stack() on a dangling pointer. Enterprise kernels are compiled with VMAP_STACK=y so this isn't critical, but custom VMAP_STACK=n builds should have some performance advantage, with the drawback of risking to fail fork because compaction didn't succeed. So as long as VMAP_STACK=n is a supported option it's worth fixing it upstream. Link: http://lkml.kernel.org/r/20190619011450.28048-1-aarcange@redhat.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29MAINTAINERS: add CLANG/LLVM BUILD SUPPORT infoNick Desaulniers1-0/+8
Add keyword support so that our mailing list gets cc'ed for clang/llvm patches. We're pretty active on our mailing list so far as code review. There are numerous Googlers like myself that are paid to support building the Linux kernel with Clang and LLVM. Link: http://lkml.kernel.org/r/20190620001907.255803-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warningArnd Bergmann1-2/+2
gcc gets confused in pcpu_get_vm_areas() because there are too many branches that affect whether 'lva' was initialized before it gets used: mm/vmalloc.c: In function 'pcpu_get_vm_areas': mm/vmalloc.c:991:4: error: 'lva' may be used uninitialized in this function [-Werror=maybe-uninitialized] insert_vmap_area_augment(lva, &va->rb_node, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ &free_vmap_area_root, &free_vmap_area_list); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/vmalloc.c:916:20: note: 'lva' was declared here struct vmap_area *lva; ^~~ Add an intialization to NULL, and check whether this has changed before the first use. [akpm@linux-foundation.org: tweak comments] Link: http://lkml.kernel.org/r/20190618092650.2943749-1-arnd@arndb.de Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Joel Fernandes <joelaf@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm/page_idle.c: fix oops because end_pfn is larger than max_pfnColin Ian King1-2/+2
Currently the calcuation of end_pfn can round up the pfn number to more than the actual maximum number of pfns, causing an Oops. Fix this by ensuring end_pfn is never more than max_pfn. This can be easily triggered when on systems where the end_pfn gets rounded up to more than max_pfn using the idle-page stress-ng stress test: sudo stress-ng --idle-page 0 BUG: unable to handle kernel paging request at 00000000000020d8 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:page_idle_get_page+0xc8/0x1a0 Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48 RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202 RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700 RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276 R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080 R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400 FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0 Call Trace: page_idle_bitmap_write+0x8c/0x140 sysfs_kf_bin_write+0x5c/0x70 kernfs_fop_write+0x12e/0x1b0 __vfs_write+0x1b/0x40 vfs_write+0xab/0x1b0 ksys_write+0x55/0xc0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20190618124352.28307-1-colin.king@canonical.com Fixes: 33c3fc71c8cf ("mm: introduce idle page tracking") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29initramfs: fix populate_initrd_image() section mismatchGeert Uytterhoeven1-2/+2
With gcc-4.6.3: WARNING: vmlinux.o(.text.unlikely+0x140): Section mismatch in reference from the function populate_initrd_image() to the variable .init.ramfs.info:__initramfs_size The function populate_initrd_image() references the variable __init __initramfs_size. This is often because populate_initrd_image lacks a __init annotation or the annotation of __initramfs_size is wrong. WARNING: vmlinux.o(.text.unlikely+0x14c): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:unpack_to_rootfs() The function populate_initrd_image() references the function __init unpack_to_rootfs(). This is often because populate_initrd_image lacks a __init annotation or the annotation of unpack_to_rootfs is wrong. WARNING: vmlinux.o(.text.unlikely+0x198): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:xwrite() The function populate_initrd_image() references the function __init xwrite(). This is often because populate_initrd_image lacks a __init annotation or the annotation of xwrite is wrong. Indeed, if the compiler decides not to inline populate_initrd_image(), a warning is generated. Fix this by adding the missing __init annotations. Link: http://lkml.kernel.org/r/20190617074340.12779-1-geert@linux-m68k.org Fixes: 7c184ecd262fe64f ("initramfs: factor out a helper to populate the initrd image") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm/oom_kill.c: fix uninitialized oc->constraintYafang Shao1-7/+5
In dump_oom_summary() oc->constraint is used to show oom_constraint_text, but it hasn't been set before. So the value of it is always the default value 0. We should inititialize it before. Bellow is the output when memcg oom occurs, before this patch: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null), cpuset=/,mems_allowed=0,oom_memcg=/foo,task_memcg=/foo,task=bash,pid=7997,uid=0 after this patch: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null), cpuset=/,mems_allowed=0,oom_memcg=/foo,task_memcg=/foo,task=bash,pid=13681,uid=0 Link: http://lkml.kernel.org/r/1560522038-15879-1-git-send-email-laoar.shao@gmail.com Fixes: ef8444ea01d7 ("mm, oom: reorganize the oom report in dump_header") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Wind Yu <yuzhoujian@didichuxing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHugeNaoya Horiguchi2-13/+21
madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline for hugepages with overcommitting enabled. That was caused by the suboptimal code in current soft-offline code. See the following part: ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { ... } else { /* * We set PG_hwpoison only when the migration source hugepage * was successfully dissolved, because otherwise hwpoisoned * hugepage remains on free hugepage list, then userspace will * find it as SIGBUS by allocation failure. That's not expected * in soft-offlining. */ ret = dissolve_free_huge_page(page); if (!ret) { if (set_hwpoison_free_buddy_page(page)) num_poisoned_pages_inc(); } } return ret; Here dissolve_free_huge_page() returns -EBUSY if the migration source page was freed into buddy in migrate_pages(), but even in that case we actually has a chance that set_hwpoison_free_buddy_page() succeeds. So that means current code gives up offlining too early now. dissolve_free_huge_page() checks that a given hugepage is suitable for dissolving, where we should return success for !PageHuge() case because the given hugepage is considered as already dissolved. This change also affects other callers of dissolve_free_huge_page(), which are cleaned up together. [n-horiguchi@ah.jp.nec.com: v3] Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp.nec.comLink: http://lkml.kernel.org/r/1560154686-18497-3-git-send-email-n-horiguchi@ah.jp.nec.com Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Chen, Jerry T <jerry.t.chen@intel.com> Tested-by: Chen, Jerry T <jerry.t.chen@intel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen@intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() failsNaoya Horiguchi1-0/+2
The pass/fail of soft offline should be judged by checking whether the raw error page was finally contained or not (i.e. the result of set_hwpoison_free_buddy_page()), but current code do not work like that. It might lead us to misjudge the test result when set_hwpoison_free_buddy_page() fails. Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the original page and will not return an error. Link: http://lkml.kernel.org/r/1560154686-18497-2-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen@intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29signal: remove the wrong signal_pending() check in restore_user_sigmask()Oleg Nesterov6-28/+36
This is the minimal fix for stable, I'll send cleanups later. Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") introduced the visible change which breaks user-space: a signal temporary unblocked by set_user_sigmask() can be delivered even if the caller returns success or timeout. Change restore_user_sigmask() to accept the additional "interrupted" argument which should be used instead of signal_pending() check, and update the callers. Eric said: : For clarity. I don't think this is required by posix, or fundamentally to : remove the races in select. It is what linux has always done and we have : applications who care so I agree this fix is needed. : : Further in any case where the semantic change that this patch rolls back : (aka where allowing a signal to be delivered and the select like call to : complete) would be advantage we can do as well if not better by using : signalfd. : : Michael is there any chance we can get this guarantee of the linux : implementation of pselect and friends clearly documented. The guarantee : that if the system call completes successfully we are guaranteed that no : signal that is unblocked by using sigmask will be delivered? Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Eric Wong <e@80x24.org> Tested-by: Eric Wong <e@80x24.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jason Baron <jbaron@akamai.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Laight <David.Laight@ACULAB.COM> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29fs/binfmt_flat.c: make load_flat_shared_library() workJann Horn1-16/+7
load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty. Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.) In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly. Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses") Signed-off-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemaskzhong jiang1-1/+1
mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE mempoclicies when the tasks's cpuset's mems_allowed changes. For policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES, it works by remapping the policy's allowed nodes (stored in v.nodes) using the previous value of mems_allowed (stored in w.cpuset_mems_allowed) as the domain of map and the new mems_allowed (passed as nodes) as the range of the map (see the comment of bitmap_remap() for details). The result of remapping is stored back as policy's nodemask in v.nodes, and the new value of mems_allowed should be stored in w.cpuset_mems_allowed to facilitate the next rebind, if it happens. However, 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") introduced a bug where the result of remapping is stored in w.cpuset_mems_allowed instead. Thus, a mempolicy's allowed nodes can evolve in an unexpected way after a series of rebinding due to cpuset mems_allowed changes, possibly binding to a wrong node or a smaller number of nodes which may e.g. overload them. This patch fixes the bug so rebinding again works as intended. [vbabka@suse.cz: new changlog] Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawei.com Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29fs/proc/array.c: allow reporting eip/esp for all coredumping threadsJohn Ogness1-1/+1
0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper). Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper. Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reported-by: Jan Luebbe <jlu@pengutronix.de> Tested-by: Jan Luebbe <jlu@pengutronix.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual addressAnshuman Khandual1-1/+1
The presence of struct page does not guarantee linear mapping for the pfn physical range. Device private memory which is non-coherent is excluded from linear mapping during devm_memremap_pages() though they will still have struct page coverage. Change pfn_t_to_virt() to just check for device private memory before giving out virtual address for a given pfn. pfn_t_to_virt() actually has no callers. Let's fix it for the 5.2 kernel and remove it in 5.3. Link: http://lkml.kernel.org/r/1558089514-25067-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-28drm/panfrost: Fix a double-free errorBoris Brezillon1-1/+1
drm_gem_shmem_create_with_handle() returns a GEM object and attach a handle to it. When the user closes the DRM FD, the core releases all GEM handles along with their backing GEM objs, which can lead to a double-free issue if panfrost_ioctl_create_bo() failed and went through the err_free path where drm_gem_object_put_unlocked() is called without deleting the associate handle. Replace this drm_gem_object_put_unlocked() call by a drm_gem_handle_delete() one to fix that. Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190627172414.27231-1-boris.brezillon@collabora.com
2019-06-28tracing/snapshot: Resize spare buffer if size changedEiichi Tsukata1-4/+6
Current snapshot implementation swaps two ring_buffers even though their sizes are different from each other, that can cause an inconsistency between the contents of buffer_size_kb file and the current buffer size. For example: # cat buffer_size_kb 7 (expanded: 1408) # echo 1 > events/enable # grep bytes per_cpu/cpu0/stats bytes: 1441020 # echo 1 > snapshot // current:1408, spare:1408 # echo 123 > buffer_size_kb // current:123, spare:1408 # echo 1 > snapshot // current:1408, spare:123 # grep bytes per_cpu/cpu0/stats bytes: 1443700 # cat buffer_size_kb 123 // != current:1408 And also, a similar per-cpu case hits the following WARNING: Reproducer: # echo 1 > per_cpu/cpu0/snapshot # echo 123 > buffer_size_kb # echo 1 > per_cpu/cpu0/snapshot WARNING: WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380 Modules linked in: CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380 Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48 RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093 RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8 RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005 RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000 R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060 FS: 00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0 Call Trace: ? trace_array_printk_buf+0x140/0x140 ? __mutex_lock_slowpath+0x10/0x10 tracing_snapshot_write+0x4c8/0x7f0 ? trace_printk_init_buffers+0x60/0x60 ? selinux_file_permission+0x3b/0x540 ? tracer_preempt_off+0x38/0x506 ? trace_printk_init_buffers+0x60/0x60 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 ? __ia32_sys_read+0xb0/0xb0 ? do_syscall_64+0x1f/0x390 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe This patch adds resize_buffer_duplicate_size() to check if there is a difference between current/spare buffer sizes and resize a spare buffer if necessary. Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com Cc: stable@vger.kernel.org Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28tracing: Fix memory leak in tracing_err_log_open()Takeshi Misawa1-1/+13
When tracing_err_log_open() calls seq_open(), allocated memory is not freed. kmemleak report: unreferenced object 0xffff92c0781d1100 (size 128): comm "tail", pid 15116, jiffies 4295163855 (age 22.704s) hex dump (first 32 bytes): 00 f0 08 e5 c0 92 ff ff 00 10 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000000d0687d5>] kmem_cache_alloc+0x11f/0x1e0 [<000000003e3039a8>] seq_open+0x2f/0x90 [<000000008dd36b7d>] tracing_err_log_open+0x67/0x140 [<000000005a431ae2>] do_dentry_open+0x1df/0x3a0 [<00000000a2910603>] vfs_open+0x2f/0x40 [<0000000038b0a383>] path_openat+0x2e8/0x1690 [<00000000fe025bda>] do_filp_open+0x9b/0x110 [<00000000483a5091>] do_sys_open+0x1ba/0x260 [<00000000c558b5fd>] __x64_sys_openat+0x20/0x30 [<000000006881ec07>] do_syscall_64+0x5a/0x130 [<00000000571c2e94>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by calling seq_release() in tracing_err_log_fops.release(). Link: http://lkml.kernel.org/r/20190628105640.GA1863@DESKTOP Fixes: 8a062902be725 ("tracing: Add tracing error log") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()Steven Rostedt (VMware)1-0/+5
Taking the text_mutex in ftrace_arch_code_modify_prepare() is to fix a race against module loading and live kernel patching that might try to change the text permissions while ftrace has it as read/write. This really needs to be documented in the code. Add a comment that does such. Link: http://lkml.kernel.org/r/20190627211819.5a591f52@gandalf.local.home Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()Petr Mladek2-9/+4
The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") causes a possible deadlock between register_kprobe() and ftrace_run_update_code() when ftrace is using stop_machine(). The existing dependency chain (in reverse order) is: -> #1 (text_mutex){+.+.}: validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 __mutex_lock+0x88/0x908 mutex_lock_nested+0x32/0x40 register_kprobe+0x254/0x658 init_kprobes+0x11a/0x168 do_one_initcall+0x70/0x318 kernel_init_freeable+0x456/0x508 kernel_init+0x22/0x150 ret_from_fork+0x30/0x34 kernel_thread_starter+0x0/0xc -> #0 (cpu_hotplug_lock.rw_sem){++++}: check_prev_add+0x90c/0xde0 validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 cpus_read_lock+0x62/0xd0 stop_machine+0x2e/0x60 arch_ftrace_update_code+0x2e/0x40 ftrace_run_update_code+0x40/0xa0 ftrace_startup+0xb2/0x168 register_ftrace_function+0x64/0x88 klp_patch_object+0x1a2/0x290 klp_enable_patch+0x554/0x980 do_one_initcall+0x70/0x318 do_init_module+0x6e/0x250 load_module+0x1782/0x1990 __s390x_sys_finit_module+0xaa/0xf0 system_call+0xd8/0x2d0 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); It is similar problem that has been solved by the commit 2d1e38f56622b9b ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved. To be on the safe side, text_mutex must become a low level lock taken after cpu_hotplug_lock.rw_sem. This can't be achieved easily with the current ftrace design. For example, arm calls set_all_modules_text_rw() already in ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c. This functions is called: + outside stop_machine() from ftrace_run_update_code() + without stop_machine() from ftrace_module_enable() Fortunately, the problematic fix is needed only on x86_64. It is the only architecture that calls set_all_modules_text_rw() in ftrace path and supports livepatching at the same time. Therefore it is enough to move text_mutex handling from the generic kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c: ftrace_arch_code_modify_prepare() ftrace_arch_code_modify_post_process() This patch basically reverts the ftrace part of the problematic commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race"). And provides x86_64 specific-fix. Some refactoring of the ftrace code will be needed when livepatching is implemented for arm or nds32. These architectures call set_all_modules_text_rw() and use stop_machine() at the same time. Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") Acked-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> [ As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of ftrace_run_update_code() as it is a void function. ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28NFS/flexfiles: Use the correct TCP timeout for flexfiles I/OTrond Myklebust1-1/+1
Fix a typo where we're confusing the default TCP retrans value (NFS_DEF_TCP_RETRANS) for the default TCP timeout value. Fixes: 15d03055cf39f ("pNFS/flexfiles: Set reasonable default ...") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-28SUNRPC: Fix up calculation of client message lengthTrond Myklebust1-8/+8
In the case where a record marker was used, xs_sendpages() needs to return the length of the payload + record marker so that we operate correctly in the case of a partial transmission. When the callers check return value, they therefore need to take into account the record marker length. Fixes: 06b5fc3ad94e ("Merge tag 'nfs-rdma-for-5.1-1'...") Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-28ALSA: seq: fix incorrect order of dest_client/dest_ports argumentsColin Ian King2-2/+2
There are two occurrances of a call to snd_seq_oss_fill_addr where the dest_client and dest_port arguments are in the wrong order. Fix this by swapping them around. Addresses-Coverity: ("Arguments in wrong order") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-06-28ALSA: hda/realtek - Change front mic location for Lenovo M710qDennis Wassenberg1-0/+1
On M710q Lenovo ThinkCentre machine, there are two front mics, we change the location for one of them to avoid conflicts. Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-06-28drm/etnaviv: add missing failure path to destroy suballocLucas Stach1-2/+5
When something goes wrong in the GPU init after the cmdbuf suballocator has been constructed, we fail to destroy it properly. This causes havok later when the GPU is unbound due to a module unload or similar. Fixes: e66774dd6f6a (drm/etnaviv: add cmdbuf suballocator) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-06-28ALSA: usb-audio: fix sign unintended sign extension on left shiftsColin Ian King1-2/+2
There are a couple of left shifts of unsigned 8 bit values that first get promoted to signed ints and hence get sign extended on the shift if the top bit of the 8 bit values are set. Fix this by casting the 8 bit values to unsigned ints to stop the unintentional sign extension. Addresses-Coverity: ("Unintended sign extension") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-06-28cifs: fix crash querying symlinks stored as reparse-pointsRonnie Sahlberg2-5/+73
We never parsed/returned any data from .get_link() when the object is a windows reparse-point containing a symlink. This results in the VFS layer oopsing accessing an uninitialized buffer: ... [ 171.407172] Call Trace: [ 171.408039] readlink_copy+0x29/0x70 [ 171.408872] vfs_readlink+0xc1/0x1f0 [ 171.409709] ? readlink_copy+0x70/0x70 [ 171.410565] ? simple_attr_release+0x30/0x30 [ 171.411446] ? getname_flags+0x105/0x2a0 [ 171.412231] do_readlinkat+0x1b7/0x1e0 [ 171.412938] ? __ia32_compat_sys_newfstat+0x30/0x30 ... Fix this by adding code to handle these buffers and make sure we do return a valid buffer to .get_link() CC: Stable <stable@vger.kernel.org> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-06-28x86/unwind/orc: Fall back to using frame pointers for generated codeJosh Poimboeuf1-4/+22
The ORC unwinder can't unwind through BPF JIT generated code because there are no ORC entries associated with the code. If an ORC entry isn't available, try to fall back to frame pointers. If BPF and other generated code always do frame pointer setup (even with CONFIG_FRAME_POINTERS=n) then this will allow ORC to unwind through most generated code despite there being no corresponding ORC entries. Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") Reported-by: Song Liu <songliubraving@fb.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kairui Song <kasong@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/b6f69208ddff4343d56b7bfac1fc7cfcd62689e8.1561595111.git.jpoimboe@redhat.com
2019-06-28perf/x86: Always store regs->ip in perf_callchain_kernel()Song Liu1-5/+5
The stacktrace_map_raw_tp BPF selftest is failing because the RIP saved by perf_arch_fetch_caller_regs() isn't getting saved by perf_callchain_kernel(). This was broken by the following commit: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") With that change, when starting with non-HW regs, the unwinder starts with the current stack frame and unwinds until it passes up the frame which called perf_arch_fetch_caller_regs(). So regs->ip needs to be saved deliberately. Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kairui Song <kasong@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/3975a298fa52b506fea32666d8ff6a13467eee6d.1561595111.git.jpoimboe@redhat.com
2019-06-27ceph: fix ceph_mdsc_build_path to not stop on first componentJeff Layton1-1/+2
When ceph_mdsc_build_path is handed a positive dentry, it will return a zero-length path string with the base set to that dentry. This is not what we want. Always include at least one path component in the string. ceph_mdsc_build_path has behaved this way for a long time but it didn't matter until recent d_name handling rework. Fixes: 964fff7491e4 ("ceph: use ceph_mdsc_build_path instead of clone_dentry_name") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-06-27ARM: dts: armada-xp-98dx3236: Switch to armada-38x-uart serial nodeJoshua Scott1-0/+8
Switch to the "marvell,armada-38x-uart" driver variant to empty the UART buffer before writing to the UART_LCR register. Signed-off-by: Joshua Scott <joshua.scott@alliedtelesis.co.nz> Tested-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>. Cc: stable@vger.kernel.org Fixes: 43e28ba87708 ("ARM: dts: Use armada-370-xp as a base for armada-xp-98dx3236") Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2019-06-27pinctrl: mediatek: Update cur_mask in mask/mask opsNicolas Boichat1-14/+4
During suspend/resume, mtk_eint_mask may be called while wake_mask is active. For example, this happens if a wake-source with an active interrupt handler wakes the system: irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so that it can be handled later on in the resume flow. However, this may happen before mtk_eint_do_resume is called: in this case, wake_mask is loaded, and cur_mask is restored from an older copy, re-enabling the interrupt, and causing an interrupt storm (especially for level interrupts). Step by step, for a line that has both wake and interrupt enabled: 1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt enabled at hardware level) 2. System suspends, resumes due to that line (at this stage EINT_EN == wake_mask) 3. irq_pm_check_wakeup is called, and disables the interrupt => EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1 4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so it reenables EINT_EN[irq] = 1 => interrupt storm as the driver is not yet ready to handle the interrupt. This patch fixes the issue in step 3, by recording all mask/unmask changes in cur_mask. This also avoids the need to read the current mask in eint_do_suspend, and we can remove mtk_eint_chip_read_mask function. The interrupt will be re-enabled properly later on, sometimes after mtk_eint_do_resume, when the driver is ready to handle it. Fixes: 58a5e1b64bb0 ("pinctrl: mediatek: Implement wake handler and suspend resume") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-06-27proc: remove useless d_is_dir() checkChristian Brauner1-2/+1
Remove the d_is_dir() check from tgid_pidfd_to_pid(). It is pointless since you should never get &proc_tgid_base_operations for f_op on a non-directory. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27copy_process(): don't use ksys_close() on cleanupsAl Viro1-28/+18
anon_inode_getfd() should be used *ONLY* in situations when we are guaranteed to be past the last failure point (including copying the descriptor number to userland, at that). And ksys_close() should not be used for cleanups at all. anon_inode_getfile() is there for all nontrivial cases like that. Just use that... Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27cpu/hotplug: Fix out-of-bounds read when setting fail stateEiichi Tsukata1-0/+3
Setting invalid value to /sys/devices/system/cpu/cpuX/hotplug/fail can control `struct cpuhp_step *sp` address, results in the following global-out-of-bounds read. Reproducer: # echo -2 > /sys/devices/system/cpu/cpu0/hotplug/fail KASAN report: BUG: KASAN: global-out-of-bounds in write_cpuhp_fail+0x2cd/0x2e0 Read of size 8 at addr ffffffff89734438 by task bash/1941 CPU: 0 PID: 1941 Comm: bash Not tainted 5.2.0-rc6+ #31 Call Trace: write_cpuhp_fail+0x2cd/0x2e0 dev_attr_store+0x58/0x80 sysfs_kf_write+0x13d/0x1a0 kernfs_fop_write+0x2bc/0x460 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f05e4f4c970 The buggy address belongs to the variable: cpu_hotplug_lock+0x98/0xa0 Memory state around the buggy address: ffffffff89734300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffff89734400: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa ^ ffffffff89734480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Add a sanity check for the value written from user space. Fixes: 1db49484f21ed ("smp/hotplug: Hotplug state fail injection") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/20190627024732.31672-1-devel@etsukata.com