linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2020-04-03	ipmi: kcs: aspeed: Implement v2 bindings	Andrew Jeffery	1	-23/+121
	The v2 bindings allow us to extract the resources from the devicetree. The table in the driver is retained to derive the channel index, which removes the need for kcs_chan property from the v1 bindings. The v2 bindings allow us to reduce the number of warnings generated by the existing devicetree nodes. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Haiyue Wang <haiyue.wang@linux.intel.com> Message-Id: <01ef3787e9ddaa9d87cfd55a2ac793053b5a69de.1576462051.git-series.andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-04-03	ipmi: kcs: Finish configuring ASPEED KCS device before enable	Andrew Jeffery	1	-3/+4
	The interrupts were configured after the channel was enabled. Configure them beforehand so they will work. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Haiyue Wang <haiyue.wang@linux.intel.com> Message-Id: <c0aba2c9dfe2d0525e9cefd37995983ead0ec242.1576462051.git-series.andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-04-03	dt-bindings: ipmi: aspeed: Introduce a v2 binding for KCS	Andrew Jeffery	1	-6/+14
	The v2 binding utilises reg and renames some of the v1 properties. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Message-Id: <8aec8994bbe1186d257b0a712e13cf914c5ebe35.1576462051.git-series.andrew@aj.id.au> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-04-03	ipmi: fix hung processes in __get_guid()	Wen Yang	1	-2/+2
	The wait_event() function is used to detect command completion. When send_guid_cmd() returns an error, smi_send() has not been called to send data. Therefore, wait_event() should not be used on the error path, otherwise it will cause the following warning: [ 1361.588808] systemd-udevd D 0 1501 1436 0x00000004 [ 1361.588813] ffff883f4b1298c0 0000000000000000 ffff883f4b188000 ffff887f7e3d9f40 [ 1361.677952] ffff887f64bd4280 ffffc90037297a68 ffffffff8173ca3b ffffc90000000010 [ 1361.767077] 00ffc90037297ad0 ffff887f7e3d9f40 0000000000000286 ffff883f4b188000 [ 1361.856199] Call Trace: [ 1361.885578] [<ffffffff8173ca3b>] ? __schedule+0x23b/0x780 [ 1361.951406] [<ffffffff8173cfb6>] schedule+0x36/0x80 [ 1362.010979] [<ffffffffa071f178>] get_guid+0x118/0x150 [ipmi_msghandler] [ 1362.091281] [<ffffffff810d5350>] ? prepare_to_wait_event+0x100/0x100 [ 1362.168533] [<ffffffffa071f755>] ipmi_register_smi+0x405/0x940 [ipmi_msghandler] [ 1362.258337] [<ffffffffa0230ae9>] try_smi_init+0x529/0x950 [ipmi_si] [ 1362.334521] [<ffffffffa022f350>] ? std_irq_setup+0xd0/0xd0 [ipmi_si] [ 1362.411701] [<ffffffffa0232bd2>] init_ipmi_si+0x492/0x9e0 [ipmi_si] [ 1362.487917] [<ffffffffa0232740>] ? ipmi_pci_probe+0x280/0x280 [ipmi_si] [ 1362.568219] [<ffffffff810021a0>] do_one_initcall+0x50/0x180 [ 1362.636109] [<ffffffff812231b2>] ? kmem_cache_alloc_trace+0x142/0x190 [ 1362.714330] [<ffffffff811b2ae1>] do_init_module+0x5f/0x200 [ 1362.781208] [<ffffffff81123ca8>] load_module+0x1898/0x1de0 [ 1362.848069] [<ffffffff811202e0>] ? __symbol_put+0x60/0x60 [ 1362.913886] [<ffffffff8130696b>] ? security_kernel_post_read_file+0x6b/0x80 [ 1362.998514] [<ffffffff81124465>] SYSC_finit_module+0xe5/0x120 [ 1363.068463] [<ffffffff81124465>] ? SYSC_finit_module+0xe5/0x120 [ 1363.140513] [<ffffffff811244be>] SyS_finit_module+0xe/0x10 [ 1363.207364] [<ffffffff81003c04>] do_syscall_64+0x74/0x180 Fixes: 50c812b2b951 ("[PATCH] ipmi: add full sysfs support") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Cc: Corey Minyard <minyard@acm.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: openipmi-developer@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 2.6.17- Message-Id: <20200403090408.58745-1-wenyang@linux.alibaba.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-03-12	drivers: char: ipmi: ipmi_msghandler: Pass lockdep expression to RCU lists	Amol Grover	1	-4/+10
	intf->cmd_rcvrs is traversed with list_for_each_entry_rcu outside an RCU read-side critical section but under the protection of intf->cmd_rcvrs_mutex. ipmi_interfaces is traversed using list_for_each_entry_rcu outside an RCU read-side critical section but under the protection of ipmi_interfaces_mutex. Hence, add the corresponding lockdep expression to the list traversal primitive to silence false-positive lockdep warnings, and harden RCU lists. Add macro for the corresponding lockdep expression to make the code clean and concise. Signed-off-by: Amol Grover <frextrite@gmail.com> Message-Id: <20200117132521.31020-1-frextrite@gmail.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: John Garry <john.garry@huawei.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-03-11	ipmi_si: Avoid spurious errors for optional IRQs	Takashi Iwai	1	-2/+2
	Although the IRQ assignment in ipmi_si driver is optional, platform_get_irq() spews error messages unnecessarily: ipmi_si dmi-ipmi-si.0: IRQ index 0 not found Fix this by switching to platform_get_irq_optional(). Cc: stable@vger.kernel.org # 5.4.x Cc: John Donnelly <john.p.donnelly@oracle.com> Fixes: 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()") Reported-and-tested-by: Patrick Vo <patrick.vo@hpe.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Message-Id: <20200205093146.1352-1-tiwai@suse.de> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-03-11	driver code: clarify and fix platform device DMA mask allocation	Christoph Hellwig	2	-20/+7
	This does three inter-related things to clarify the usage of the platform device dma_mask field. In the process, fix the bug introduced by cdfee5623290 ("driver core: initialize a default DMA mask for platform device") that caused Artem Tashkinov's laptop to not boot with newer Fedora kernels. This does: - First off, rename the field to "platform_dma_mask" to make it greppable. We have way too many different random fields called "dma_mask" in various data structures, where some of them are actual masks, and some of them are just pointers to the mask. And the structures all have pointers to each other, or embed each other inside themselves, and "pdev" sometimes means "platform device" and sometimes it means "PCI device". So to make it clear in the code when you actually use this new field, give it a unique name (it really should be something even more unique like "platform_device_dma_mask", since it's per platform device, not per platform, but that gets old really fast, and this is unique enough in context). To further clarify when the field gets used, initialize it when we actually start using it with the default value. - Then, use this field instead of the random one-off allocation in platform_device_register_full() that is now unnecessary since we now already have a perfectly fine allocation for it in the platform device structure. - The above then allows us to fix the actual bug, where the error path of platform_device_register_full() would unconditionally free the platform device DMA allocation with 'kfree()'. That kfree() was dont regardless of whether the allocation had been done earlier with the (now removed) kmalloc, or whether setup_pdev_dma_masks() had already been used and the dma_mask pointer pointed to the mask that was part of the platform device. It seems most people never triggered the error path, or only triggered it from a call chain that set an explicit pdevinfo->dma_mask value (and thus caused the unnecessary allocation that was "cleaned up" in the error path) before calling platform_device_register_full(). Robin Murphy points out that in Artem's case the wdat_wdt driver failed in platform_device_add(), and that was the one that had called platform_device_register_full() with pdevinfo.dma_mask = 0, and would have caused that kfree() of pdev.dma_mask corrupting the heap. A later unrelated kmalloc() then oopsed due to the heap corruption. Fixes: cdfee5623290 ("driver core: initialize a default DMA mask for platform device") Reported-bisected-and-tested-by: Artem S. Tashkinov <aros@gmx.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-11	ftrace: Return the first found result in lookup_rec()	Artem Savkov	1	-0/+2
	It appears that ip ranges can overlap so. In that case lookup_rec() returns whatever results it got last even if it found nothing in last searched page. This breaks an obscure livepatch late module patching usecase: - load livepatch - load the patched module - unload livepatch - try to load livepatch again To fix this return from lookup_rec() as soon as it found the record containing searched-for ip. This used to be this way prior lookup_rec() introduction. Link: http://lkml.kernel.org/r/20200306174317.21699-1-asavkov@redhat.com Cc: stable@vger.kernel.org Fixes: 7e16f581a817 ("ftrace: Separate out functionality from ftrace_location_range()") Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-10	workqueue: don't use wq_select_unbound_cpu() for bound works	Hillf Danton	1	-6/+8
	wq_select_unbound_cpu() is designed for unbound workqueues only, but it's wrongly called when using a bound workqueue too. Fixing this ensures work queued to a bound workqueue with cpu=WORK_CPU_UNBOUND always runs on the local CPU. Before, that would happen only if wq_unbound_cpumask happened to include it (likely almost always the case), or was empty, or we got lucky with forced round-robin placement. So restricting /sys/devices/virtual/workqueue/cpumask to a small subset of a machine's CPUs would cause some bound work items to run unexpectedly there. Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Hillf Danton <hdanton@sina.com> [dj: massage changelog] Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-09	pid: make ENOMEM return value more obvious	Christian Brauner	1	-0/+8
	The alloc_pid() codepath used to be simpler. With the introducation of the ability to choose specific pids in 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID") it got more complex. It hasn't been super obvious that ENOMEM is returned when the pid namespace init process/child subreaper of the pid namespace has died. As can be seen from multiple attempts to improve this see e.g. [1] and most recently [2]. We regressed returning ENOMEM in [3] and [2] restored it. Let's add a comment on top explaining that this is historic and documented behavior and cannot easily be changed. [1]: 35f71bc0a09a ("fork: report pid reservation failure properly") [2]: b26ebfe12f34 ("pid: Fix error return value in some cases") [3]: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-09	ktest: Fix typos in ktest.pl	Masanari Iida	1	-6/+6
	This patch fixes multipe spelling typo found in ktest.pl. Link: http://lkml.kernel.org/r/20200309115430.57540-1-standby24x7@gmail.com Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-09	ktest: Add timeout for ssh sync testing	Steven Rostedt (VMware)	1	-1/+1
	Before rebooting the box, a "ssh sync" is called to the test machine to see if it is alive or not. But if the test machine is in a partial state, that ssh may never actually finish, and the ktest test hangs. Add a 10 second timeout to the sync test, which will fail after 10 seconds and then cause the test to reboot the test machine. Cc: stable@vger.kernel.org Fixes: 6474ace999edd ("ktest.pl: Powercycle the box on reboot if no connection can be made") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-09	ktest: Make default build option oldconfig not randconfig	Steven Rostedt (VMware)	2	-2/+2
	For the last time, I screwed up my ktest config file, and the build went into the default "randconfig", blowing away the .config that I had set up. The reason for the default randconfig was because when this was first written, I wanted to do a bunch of randconfigs. But as time progressed, ktest isn't about randconfig anymore, and because randconfig destroys the config in the build directory, it's a dangerous default to have. Use oldconfig as the default. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-09	ktest: Fix some typos in sample.conf	Masanari Iida	1	-10/+10
	This patch fixes some spelling typo in sample.conf Link: http://lkml.kernel.org/r/20190930124925.20250-1-standby24x7@gmail.com Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-09	pinctrl: qcom: Assign irq_eoi conditionally	Linus Walleij	1	-2/+1
	The hierarchical parts of MSM pinctrl/GPIO is only used when the device tree has a "wakeup-parent" as a phandle, but the .irq_eoi is anyway assigned leading to semantic problems on elder Qualcomm chipsets. When the drivers/mfd/qcom-pm8xxx.c driver calls chained_irq_exit() that call will in turn call chip->irq_eoi() which is set to irq_chip_eoi_parent() by default on a hierachical IRQ chip, and the parent is pinctrl-msm.c so that will in turn unconditionally call irq_chip_eoi_parent() again, but its parent is invalid so we get the following crash: Unnable to handle kernel NULL pointer dereference at virtual address 00000010 pgd = (ptrval) [00000010] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM (...) PC is at irq_chip_eoi_parent+0x4/0x10 LR is at pm8xxx_irq_handler+0x1b4/0x2d8 If we solve this crash by avoiding to call up to irq_chip_eoi_parent(), the machine will hang and get reset by the watchdog, because of semantic issues, probably inside irq_chip. As a solution, just assign the .irq_eoi conditionally if we are actually using a wakeup parent. Cc: David Heidelberg <david@ixit.cz> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Lina Iyer <ilina@codeaurora.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: stable@vger.kernel.org Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Link: https://lore.kernel.org/r/20200306121221.1231296-1-linus.walleij@linaro.org Link: https://lore.kernel.org/r/20200309125207.571840-1-linus.walleij@linaro.org Link: https://lore.kernel.org/r/20200309152604.585112-1-linus.walleij@linaro.org Tested-by: David Heidelberg <david@ixit.cz> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-03-09	pinctrl: falcon: fix syntax error	Mathias Kresin	1	-1/+1
	Add the missing semicolon after of_node_put to get the file compiled. Fixes: f17d2f54d36d ("pinctrl: falcon: Add of_node_put() before return") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Mathias Kresin <dev@kresin.me> Link: https://lore.kernel.org/r/20200305182245.9636-1-dev@kresin.me Acked-by: Thomas Langer <thomas.langer@intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-03-09	pinctrl: qcom: ssbi-gpio: Fix fwspec parsing bug	Linus Walleij	1	-1/+1
	We are parsing SSBI gpios as fourcell fwspecs but they are twocell. Probably a simple copy-and-paste bug. Tested on the APQ8060 DragonBoard and after this ethernet and MMC card detection works again. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: stable@vger.kernel.org Reviewed-by: Brian Masney <masneyb@onstation.org> Fixes: ae436fe81053 ("pinctrl: ssbi-gpio: convert to hierarchical IRQ helpers in gpio core") Link: https://lore.kernel.org/r/20200306143416.1476250-1-linus.walleij@linaro.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-03-09	MAINTAINERS: Correct MIPS patchwork URL	Thomas Bogendoerfer	1	-1/+1
	MIPS patchwork lives on patchwork.kernel.org for quite some time. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-08	Linux 5.6-rc5	Linus Torvalds	1	-1/+1

2020-03-08	pid: Fix error return value in some cases	Corey Minyard	1	-0/+2
	Recent changes to alloc_pid() allow the pid number to be specified on the command line. If set_tid_size is set, then the code scanning the levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM value. After the code scanning the levels, there are error returns that do not set retval, assuming it is still set to -ENOMEM. So set retval back to -ENOMEM after scanning the levels. Fixes: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID") Signed-off-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Adrian Reber <areber@redhat.com> Cc: <stable@vger.kernel.org> # 5.5 Link: https://lore.kernel.org/r/20200306172314.12232-1-minyard@acm.org [christian.brauner@ubuntu.com: fixup commit message] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-08	virtio_balloon: Adjust label in virtballoon_probe	Nathan Chancellor	1	-1/+1
	Clang warns when CONFIG_BALLOON_COMPACTION is unset: ../drivers/virtio/virtio_balloon.c:963:1: warning: unused label 'out_del_vqs' [-Wunused-label] out_del_vqs: ^~~~~~~~~~~~ 1 warning generated. Move the label within the preprocessor block since it is only used when CONFIG_BALLOON_COMPACTION is set. Fixes: 1ad6f58ea936 ("virtio_balloon: Fix memory leaks on errors in virtballoon_probe()") Link: https://github.com/ClangBuiltLinux/linux/issues/886 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20200216004039.23464-1-natechancellor@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2020-03-08	virtio-blk: improve virtqueue error to BLK_STS	Halil Pasic	1	-2/+7
	Let's change the mapping between virtqueue_add errors to BLK_STS statuses, so that -ENOSPC, which indicates virtqueue full is still mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device specific resource outage is mapped to BLK_STS_RESOURCE. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Link: https://lore.kernel.org/r/20200213123728.61216-3-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-08	virtio-blk: fix hw_queue stopped on arbitrary error	Halil Pasic	1	-3/+5
	Since nobody else is going to restart our hw_queue for us, the blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient necessarily sufficient to ensure that the queue will get started again. In case of global resource outage (-ENOMEM because mapping failure, because of swiotlb full) our virtqueue may be empty and we can get stuck with a stopped hw_queue. Let us not stop the queue on arbitrary errors, but only on -EONSPC which indicates a full virtqueue, where the hw_queue is guaranteed to get started by virtblk_done() before when it makes sense to carry on submitting requests. Let us also remove a stale comment. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails") Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-08	virtio_ring: Fix mem leak with vring_new_virtqueue()	Suman Anna	1	-2/+2
	The functions vring_new_virtqueue() and __vring_new_virtqueue() are used with split rings, and any allocations within these functions are managed outside of the .we_own_ring flag. The commit cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately") allocates the desc state within the __vring_new_virtqueue() but frees it only when the .we_own_ring flag is set. This leads to a memory leak when freeing such allocated virtqueues with the vring_del_virtqueue() function. Fix this by moving the desc_state free code outside the flag and only for split rings. Issue was discovered during testing with remoteproc and virtio_rpmsg. Fixes: cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately") Signed-off-by: Suman Anna <s-anna@ti.com> Link: https://lore.kernel.org/r/20200224212643.30672-1-s-anna@ti.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-03-07	fscrypt: don't evict dirty inodes after removing key	Eric Biggers	1	-0/+9
	After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the filesystem and tries to get and put all inodes that were unlocked by the key so that unused inodes get evicted via fscrypt_drop_inode(). Normally, the inodes are all clean due to the sync. However, after the filesystem is sync'ed, userspace can modify and close one of the files. (Userspace is supposed to close the files before removing the key. But it doesn't always happen, and the kernel can't assume it.) This causes the inode to be dirtied and have i_count == 0. Then, fscrypt_drop_inode() failed to consider this case and indicated that the inode can be dropped, causing the write to be lost. On f2fs, other problems such as a filesystem freeze could occur due to the inode being freed while still on f2fs's dirty inode list. Fix this bug by making fscrypt_drop_inode() only drop clean inodes. I've written an xfstest which detects this bug on ext4, f2fs, and ubifs. Fixes: b1c0ec3599f4 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-03-07	io_uring: fix lockup with timeouts	Pavel Begunkov	1	-0/+1
	There is a recipe to deadlock the kernel: submit a timeout sqe with a linked_timeout (e.g. test_single_link_timeout_ception() from liburing), and SIGKILL the process. Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it during io_put_free() to cancel the linked timeout. Probably, the same can happen with another io_kill_timeout() call site, that is io_commit_cqring(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-07	MIPS: DTS: CI20: fix interrupt for pcf8563 RTC	H. Nikolaus Schaller	1	-1/+4
	Interrupts should not be specified by interrupt line but by gpio parent and reference. Fixes: 73f2b940474d ("MIPS: CI20: DTS: Add I2C nodes") Cc: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-07	MIPS: DTS: CI20: fix PMU definitions for ACT8600	H. Nikolaus Schaller	1	-15/+24
	There is a ACT8600 on the CI20 board and the bindings of the ACT8865 driver have changed without updating the CI20 device tree. Therefore the PMU can not be probed successfully and is running in power-on reset state. Fix DT to match the latest act8865-regulator bindings. Fixes: 73f2b940474d ("MIPS: CI20: DTS: Add I2C nodes") Cc: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-06	parse-maintainers: Mark as executable	Jonathan Neuschäfer	1	-0/+0
	This makes the script more convenient to run. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	auxdisplay: charlcd: replace zero-length array with flexible-array member	Gustavo A. R. Silva	1	-1/+1
	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2020-03-06	auxdisplay: img-ascii-lcd: convert to devm_platform_ioremap_resource	Yangtao Li	1	-3/+1
	Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2020-03-06	auxdisplay: Fix Kconfig indentation	Krzysztof Kozlowski	1	-8/+8
	Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2020-03-06	clang-format: Update with the latest for_each macro list	Miguel Ojeda	1	-4/+21
	Re-run the shell fragment that generated the original list. Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2020-03-06	vgacon: Fix a UAF in vgacon_invert_region	Zhang Xiaoxu	1	-0/+3
	When syzkaller tests, there is a UAF: BUG: KASan: use after free in vgacon_invert_region+0x9d/0x110 at addr ffff880000100000 Read of size 2 by task syz-executor.1/16489 page:ffffea0000004000 count:0 mapcount:-127 mapping: (null) index:0x0 page flags: 0xfffff00000000() page dumped because: kasan: bad access detected CPU: 1 PID: 16489 Comm: syz-executor.1 Not tainted Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: [<ffffffffb119f309>] dump_stack+0x1e/0x20 [<ffffffffb04af957>] kasan_report+0x577/0x950 [<ffffffffb04ae652>] __asan_load2+0x62/0x80 [<ffffffffb090f26d>] vgacon_invert_region+0x9d/0x110 [<ffffffffb0a39d95>] invert_screen+0xe5/0x470 [<ffffffffb0a21dcb>] set_selection+0x44b/0x12f0 [<ffffffffb0a3bfae>] tioclinux+0xee/0x490 [<ffffffffb0a1d114>] vt_ioctl+0xff4/0x2670 [<ffffffffb0a0089a>] tty_ioctl+0x46a/0x1a10 [<ffffffffb052db3d>] do_vfs_ioctl+0x5bd/0xc40 [<ffffffffb052e2f2>] SyS_ioctl+0x132/0x170 [<ffffffffb11c9b1b>] system_call_fastpath+0x22/0x27 Memory state around the buggy address: ffff8800000fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800000fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880000100000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff It can be reproduce in the linux mainline by the program: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <linux/vt.h> struct tiocl_selection { unsigned short xs; /* X start / unsigned short ys; / Y start / unsigned short xe; / X end / unsigned short ye; / Y end / unsigned short sel_mode; / selection mode / }; #define TIOCL_SETSEL 2 struct tiocl { unsigned char type; unsigned char pad; struct tiocl_selection sel; }; int main() { int fd = 0; const char dev = "/dev/char/4:1"; struct vt_consize v = {0}; struct tiocl tioc = {0}; fd = open(dev, O_RDWR, 0); v.v_rows = 3346; ioctl(fd, VT_RESIZEX, &v); tioc.type = TIOCL_SETSEL; ioctl(fd, TIOCLINUX, &tioc); return 0; } When resize the screen, update the 'vc->vc_size_row' to the new_row_size, but when 'set_origin' in 'vgacon_set_origin', vgacon use 'vga_vram_base' for 'vc_origin' and 'vc_visible_origin', not 'vc_screenbuf'. It maybe smaller than 'vc_screenbuf'. When TIOCLINUX, use the new_row_size to calc the offset, it maybe larger than the vga_vram_size in vgacon driver, then bad access. Also, if set an larger screenbuf firstly, then set an more larger screenbuf, when copy old_origin to new_origin, a bad access may happen. So, If the screen size larger than vga_vram, resize screen should be failed. This alse fix CVE-2020-8649 and CVE-2020-8647. Linus pointed out that overflow checking seems absent. We're saved by the existing bounds checks in vc_do_resize() with rather strict limits: if (cols > VC_RESIZE_MAXCOL \|\| lines > VC_RESIZE_MAXROW) return -EINVAL; Fixes: 0aec4867dca14 ("[PATCH] SVGATextMode fix") Reference: CVE-2020-8647 and CVE-2020-8649 Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> [danvet: augment commit message to point out overflow safety] Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200304022429.37738-1-zhangxiaoxu5@huawei.com
2020-03-06	dt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states	Ulf Hansson	1	-15/+13
	The hierarchical topology with power-domain should be described through child nodes, rather than as currently described in the PSCI root node. Fix this by adding a patternProperties with a corresponding reference to the power-domain DT binding. Additionally, update the example to conform to the new pattern, but also to the adjusted domain-idle-state DT binding. Fixes: a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> [robh: Add missing allOf, tweak power-domain node name] Signed-off-by: Rob Herring <robh@kernel.org>
2020-03-06	dt-bindings: power: Extend nodename pattern for power-domain providers	Ulf Hansson	1	-1/+1
	The existing binding requires the nodename to have a '@', which is a bit limiting for the wider use case. Therefore, let's extend the pattern to allow either '@' or '-'. Fixes: a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> [robh: drop example change] Signed-off-by: Rob Herring <robh@kernel.org>
2020-03-06	io_uring: free fixed_file_data after RCU grace period	Jens Axboe	1	-2/+22
	The percpu refcount protects this structure, and we can have an atomic switch in progress when exiting. This makes it unsafe to just free the struct normally, and can trigger the following KASAN warning: BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 Read of size 1 at addr ffff888181a19a30 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: <IRQ> dump_stack+0x76/0xa0 print_address_description.constprop.0+0x3b/0x60 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 __kasan_report.cold+0x1a/0x3d ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 rcu_core+0x370/0x830 ? percpu_ref_exit+0x50/0x50 ? rcu_note_context_switch+0x7b0/0x7b0 ? run_rebalance_domains+0x11d/0x140 __do_softirq+0x10a/0x3e9 irq_exit+0xd5/0xe0 smp_apic_timer_interrupt+0x86/0x200 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:default_idle+0x26/0x1f0 Fix this by punting the final exit and free of the struct to RCU, then we know that it's safe to do so. Jann suggested the approach of using a double rcu callback to achieve this. It's important that we do a nested call_rcu() callback, as otherwise the free could be ordered before the atomic switch, even if the latter was already queued. Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com Suggested-by: Jann Horn <jannh@google.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-06	locks: fix a potential use-after-free problem when wakeup a waiter	yangerkun	1	-14/+0
	'16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the logic to check waiter->fl_blocker without blocked_lock_lock. And it will trigger a UAF when we try to wakeup some waiter： Thread 1 has create a write flock a on file, and now thread 2 try to unlock and delete flock a, thread 3 try to add flock b on the same file. Thread2 Thread3 flock syscall(create flock b) ...flock_lock_inode_wait flock_lock_inode(will insert our fl_blocked_member list to flock a's fl_blocked_requests) sleep flock syscall(unlock) ...flock_lock_inode_wait locks_delete_lock_ctx ...__locks_wake_up_blocks __locks_delete_blocks( b->fl_blocker = NULL) ... break by a signal locks_delete_block b->fl_blocker == NULL && list_empty(&b->fl_blocked_requests) success, return directly locks_free_lock b wake_up(&b->fl_waiter) trigger UAF Fix it by remove this logic, and this patch may also fix CVE-2019-19769. Cc: stable@vger.kernel.org Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2020-03-06	block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()	Carlo Nonato	1	-4/+5
	The bfq_find_set_group() function takes as input a blkcg (which represents a cgroup) and retrieves the corresponding bfq_group, then it updates the bfq internal group hierarchy (see comments inside the function for why this is needed) and finally it returns the bfq_group. In the hierarchy update cycle, the pointer holding the correct bfq_group that has to be returned is mistakenly used to traverse the hierarchy bottom to top, meaning that in each iteration it gets overwritten with the parent of the current group. Since the update cycle stops at root's children (depth = 2), the overwrite becomes a problem only if the blkcg describes a cgroup at a hierarchy level deeper than that (depth > 2). In this case the root's child that happens to be also an ancestor of the correct bfq_group is returned. The main consequence is that processes contained in a cgroup at depth greater than 2 are wrongly placed in the group described above by BFQ. This commits fixes this problem by using a different bfq_group pointer in the update cycle in order to avoid the overwrite of the variable holding the original group reference. Reported-by: Kwon Je Oh <kwonje.oh2@gmail.com> Signed-off-by: Carlo Nonato <carlo.nonato95@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-06	tty: serial: fsl_lpuart: free IDs allocated by IDA	Michael Walle	1	-15/+24
	Since commit 3bc3206e1c0f ("serial: fsl_lpuart: Remove the alias node dependence") the port line number can also be allocated by IDA, but in case of an error the ID will no be removed again. More importantly, any ID will be freed in remove(), even if it wasn't allocated but instead fetched by of_alias_get_id(). If it was not allocated by IDA there will be a warning: WARN(1, "ida_free called for id=%d which is not allocated.\n", id); Move the ID allocation more to the end of the probe() so that we still can use plain return in the first error cases. Fixes: 3bc3206e1c0f ("serial: fsl_lpuart: Remove the alias node dependence") Signed-off-by: Michael Walle <michael@walle.cc> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200303174306.6015-3-michael@walle.cc Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-06	Revert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE"	Michael Walle	1	-0/+2
	This reverts commit a659652f6169240a5818cb244b280c5a362ef5a4. This broke the earlycon on LS1021A processors because the order of the earlycon_setup() functions were changed. Before the commit the normal lpuart32_early_console_setup() was called. After the commit the lpuart32_imx_early_console_setup() is called instead. Fixes: a659652f6169 ("tty: serial: fsl_lpuart: drop EARLYCON_DECLARE") Signed-off-by: Michael Walle <michael@walle.cc> Link: https://lore.kernel.org/r/20200303174306.6015-2-michael@walle.cc Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-06	serdev: Fix detection of UART devices on Apple machines.	Ronald Tschalär	1	-0/+10
	On Apple devices the _CRS method returns an empty resource template, and the resource settings are instead provided by the _DSM method. But commit 33364d63c75d6182fa369cea80315cf1bb0ee38e (serdev: Add ACPI devices by ResourceSource field) changed the search for serdev devices to require valid, non-empty resource template, thereby breaking Apple devices and causing bluetooth devices to not be found. This expands the check so that if we don't find a valid template, and we're on an Apple machine, then just check for the device being an immediate child of the controller and having a "baud" property. Cc: <stable@vger.kernel.org> # 5.5 Fixes: 33364d63c75d ("serdev: Add ACPI devices by ResourceSource field") Signed-off-by: Ronald Tschalär <ronald@innovation.ch> Link: https://lore.kernel.org/r/20200211194723.486217-1-ronald@innovation.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-06	arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description	Miroslav Benes	1	-2/+3
	save_stack_trace_tsk_reliable() is not the only function providing the reliable stack traces anymore. Architecture might define ARCH_STACKWALK which provides a newer stack walking interface and has arch_stack_walk_reliable() function. Update the description accordingly. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: http://lkml.kernel.org/r/20200120154042.9934-1-mbenes@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled	Vlastimil Babka	2	-1/+11
	Commit cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC") fixed memory hotplug with debug_pagealloc enabled, where onlining a page goes through page freeing, which removes the direct mapping. Some arches don't like when the page is not mapped in the first place, so generic_online_page() maps it first. This is somewhat wasteful, but better than special casing page freeing fast paths. The commit however missed that DEBUG_PAGEALLOC configured doesn't mean it's actually enabled. One has to test debug_pagealloc_enabled() since 031bc5743f15 ("mm/debug-pagealloc: make debug-pagealloc boottime configurable"), or alternatively debug_pagealloc_enabled_static() since 8e57f8acbbd1 ("mm, debug_pagealloc: don't rely on static keys too early"), but this is not done. As a result, a s390 kernel with DEBUG_PAGEALLOC configured but not enabled will crash: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:0000001ece13400b R2:000003fff7fd000b R3:000003fff7fcc007 S:000003fff7fd7000 P:000000000000013d Oops: 0004 ilc:2 [#1] SMP CPU: 1 PID: 26015 Comm: chmem Kdump: loaded Tainted: GX 5.3.18-5-default #1 SLE15-SP2 (unreleased) Krnl PSW : 0704e00180000000 0000001ecd281b9e (__kernel_map_pages+0x166/0x188) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000000 0000000000000800 0000400b00000000 0000000000000100 0000000000000001 0000000000000000 0000000000000002 0000000000000100 0000001ece139230 0000001ecdd98d40 0000400b00000100 0000000000000000 000003ffa17e4000 001fffe0114f7d08 0000001ecd4d93ea 001fffe0114f7b20 Krnl Code: 0000001ecd281b8e: ec17ffff00d8 ahik %r1,%r7,-1 0000001ecd281b94: ec111dbc0355 risbg %r1,%r1,29,188,3 >0000001ecd281b9e: 94fb5006 ni 6(%r5),251 0000001ecd281ba2: 41505008 la %r5,8(%r5) 0000001ecd281ba6: ec51fffc6064 cgrj %r5,%r1,6,1ecd281b9e 0000001ecd281bac: 1a07 ar %r0,%r7 0000001ecd281bae: ec03ff584076 crj %r0,%r3,4,1ecd281a5e Call Trace: [<0000001ecd281b9e>] __kernel_map_pages+0x166/0x188 [<0000001ecd4d9516>] online_pages_range+0xf6/0x128 [<0000001ecd2a8186>] walk_system_ram_range+0x7e/0xd8 [<0000001ecda28aae>] online_pages+0x2fe/0x3f0 [<0000001ecd7d02a6>] memory_subsys_online+0x8e/0xc0 [<0000001ecd7add42>] device_online+0x5a/0xc8 [<0000001ecd7d0430>] state_store+0x88/0x118 [<0000001ecd5b9f62>] kernfs_fop_write+0xc2/0x200 [<0000001ecd5064b6>] vfs_write+0x176/0x1e0 [<0000001ecd50676a>] ksys_write+0xa2/0x100 [<0000001ecda315d4>] system_call+0xd8/0x2c8 Fix this by checking debug_pagealloc_enabled_static() before calling kernel_map_pages(). Backports for kernel before 5.5 should use debug_pagealloc_enabled() instead. Also add comments. Fixes: cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC") Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/20200224094651.18257-1-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	mm/z3fold.c: do not include rwlock.h directly	Sebastian Andrzej Siewior	1	-1/+0
	rwlock.h should not be included directly. Instead linux/splinlock.h should be included. One thing it does is to break the RT build. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	fat: fix uninit-memory access for partial initialized inode	OGAWA Hirofumi	1	-12/+7
	When get an error in the middle of reading an inode, some fields in the inode might be still not initialized. And then the evict_inode path may access those fields via iput(). To fix, this makes sure that inode fields are initialized. Reported-by: syzbot+9d82b8de2992579da5d0@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/871rqnreqx.fsf@mail.parknet.co.jp Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	mm: avoid data corruption on CoW fault into PFN-mapped VMA	Kirill A. Shutemov	1	-8/+27
	Jeff Moyer has reported that one of xfstests triggers a warning when run on DAX-enabled filesystem: WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50 ... wp_page_copy+0x98c/0xd50 (unreliable) do_wp_page+0xd8/0xad0 __handle_mm_fault+0x748/0x1b90 handle_mm_fault+0x120/0x1f0 __do_page_fault+0x240/0xd70 do_page_fault+0x38/0xd0 handle_page_fault+0x10/0x30 The warning happens on failed __copy_from_user_inatomic() which tries to copy data into a CoW page. This happens because of race between MADV_DONTNEED and CoW page fault: CPU0 CPU1 handle_mm_fault() do_wp_page() wp_page_copy() do_wp_page() madvise(MADV_DONTNEED) zap_page_range() zap_pte_range() ptep_get_and_clear_full() <TLB flush> __copy_from_user_inatomic() sees empty PTE and fails WARN_ON_ONCE(1) clear_page() The solution is to re-try __copy_from_user_inatomic() under PTL after checking that PTE is matches the orig_pte. The second copy attempt can still fail, like due to non-readable PTE, but there's nothing reasonable we can do about, except clearing the CoW page. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> Cc: Justin He <Justin.He@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Link: http://lkml.kernel.org/r/20200218154151.13349-1-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()	Huang Ying	1	-2/+1
	In set_pmd_migration_entry(), pmdp_invalidate() is used to change PMD atomically. But the PMD is read before that with an ordinary memory reading. If the THP (transparent huge page) is written between the PMD reading and pmdp_invalidate(), the PMD dirty bit may be lost, and cause data corruption. The race window is quite small, but still possible in theory, so need to be fixed. The race is fixed via using the return value of pmdp_invalidate() to get the original content of PMD, which is a read/modify/write atomic operation. So no THP writing can occur in between. The race has been introduced when the THP migration support is added in the commit 616b8371539a ("mm: thp: enable thp migration in generic path"). But this fix depends on the commit d52605d7cb30 ("mm: do not lose dirty and accessed bits in pmdp_invalidate()"). So it's easy to be backported after v4.16. But the race window is really small, so it may be fine not to backport the fix at all. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/r/20200220075220.2327056-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa	Mel Gorman	1	-2/+36
	: A user reported a bug against a distribution kernel while running a : proprietary workload described as "memory intensive that is not swapping" : that is expected to apply to mainline kernels. The workload is : read/write/modifying ranges of memory and checking the contents. They : reported that within a few hours that a bad PMD would be reported followed : by a memory corruption where expected data was all zeros. A partial : report of the bad PMD looked like : : [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000002e0396009e2) : [ 5195.341184] ------------[ cut here ]------------ : [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35! : .... : [ 5195.410033] Call Trace: : [ 5195.410471] [<ffffffff811bc75d>] change_protection_range+0x7dd/0x930 : [ 5195.410716] [<ffffffff811d4be8>] change_prot_numa+0x18/0x30 : [ 5195.410918] [<ffffffff810adefe>] task_numa_work+0x1fe/0x310 : [ 5195.411200] [<ffffffff81098322>] task_work_run+0x72/0x90 : [ 5195.411246] [<ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2 : [ 5195.411494] [<ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40 : [ 5195.411739] [<ffffffff815e56af>] retint_user+0x8/0x10 : : Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD : was a false detection. The bug does not trigger if automatic NUMA : balancing or transparent huge pages is disabled. : : The bug is due a race in change_pmd_range between a pmd_trans_huge and : pmd_nond_or_clear_bad check without any locks held. During the : pmd_trans_huge check, a parallel protection update under lock can have : cleared the PMD and filled it with a prot_numa entry between the transhuge : check and the pmd_none_or_clear_bad check. : : While this could be fixed with heavy locking, it's only necessary to make : a copy of the PMD on the stack during change_pmd_range and avoid races. A : new helper is created for this as the check if quite subtle and the : existing similar helpful is not suitable. This passed 154 hours of : testing (usually triggers between 20 minutes and 24 hours) without : detecting bad PMDs or corruption. A basic test of an autonuma-intensive : workload showed no significant change in behaviour. Although Mel withdrew the patch on the face of LKML comment https://lkml.org/lkml/2017/4/10/922 the race window aforementioned is still open, and we have reports of Linpack test reporting bad residuals after the bad PMD warning is observed. In addition to that, bad rss-counter and non-zero pgtables assertions are triggered on mm teardown for the task hitting the bad PMD. host kernel: mm/pgtable-generic.c:40: bad pmd 00000000b3152f68(8000000d2d2008e7) .... host kernel: BUG: Bad rss-counter state mm:00000000b583043d idx:1 val:512 host kernel: BUG: non-zero pgtables_bytes on freeing mm: 4096 The issue is observed on a v4.18-based distribution kernel, but the race window is expected to be applicable to mainline kernels, as well. [akpm@linux-foundation.org: fix comment typo, per Rafael] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Link: http://lkml.kernel.org/r/20200216191800.22423-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06	ALSA: sgio2audio: Remove usage of dropped hw_params/hw_free functions	Thomas Bogendoerfer	1	-6/+0
	Commit ee88f4ebe575 ("ALSA: mips: Use managed buffer allocation") removed superfluous hw_params/hw_free callbacks, but forgot to remove them where they were used. Fixes: ee88f4ebe575 ("ALSA: mips: Use managed buffer allocation") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://lore.kernel.org/r/20200306105837.31523-1-tsbogend@alpha.franken.de Signed-off-by: Takashi Iwai <tiwai@suse.de>