linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-08-19	selftests: use "$(MAKE)" instead of "make"	Ilya Leoshkevich	1	-11/+11
	When doing "make kselftest TARGETS=bpf -j12", bpf progs end up being compiled sequentially and thus slowly. The reason is that parent make (tools/testing/selftests/Makefile) does not share its jobserver with child make (tools/testing/selftests/bpf/Makefile), therefore the latter runs with -j1. Change all instances of "make" to "$(MAKE)", so that the whole make hierarchy runs using a single jobserver. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-08-18	Linux 5.3-rc5	Linus Torvalds	1	-1/+1

2019-08-17	MAINTAINERS: Fix Hyperv vIOMMU driver file name	Lan Tianyu	1	-1/+1
	The Hyperv vIOMMU file name should be "hyperv-iommu.c" rather than "hyperv_iommu.c". This patch is to fix it. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-17	tools: hv: Use the correct style for SPDX License Identifier	Nishad Kamdar	1	-1/+1
	This patch corrects the SPDX License Identifier style in the trace header file related to Microsoft Hyper-V client drivers. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46 Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-17	tools: hv: fix typos in toolchain	Adrian Vladu	4	-6/+6
	Fix typos in the HyperV toolchain. Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Alessandro Pilotti <apilotti@cloudbasesolutions.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-17	tools: hv: fix KVP and VSS daemons exit code	Adrian Vladu	2	-0/+4
	HyperV KVP and VSS daemons should exit with 0 when the '--help' or '-h' flags are used. Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Alessandro Pilotti <apilotti@cloudbasesolutions.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-17	tools: hv: fixed Python pep8/flake8 warnings for lsvmbus	Adrian Vladu	1	-33/+42
	Fixed pep8/flake8 python style code for lsvmbus tool. The TAB indentation was on purpose ignored (pep8 rule W191) to make sure the code is complying with the Linux code guideline. The following command doe not show any warnings now: pep8 --ignore=W191 lsvmbus flake8 --ignore=W191 lsvmbus Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Alessandro Pilotti <apilotti@cloudbasesolutions.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16	arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side	Will Deacon	1	-9/+13
	The initial support for dynamic ftrace trampolines in modules made use of an indirect branch which loaded its target from the beginning of a special section (e71a4e1bebaf7 ("arm64: ftrace: add support for far branches to dynamic ftrace")). Since no instructions were being patched, no cache maintenance was needed. However, later in be0f272bfc83 ("arm64: ftrace: emit ftrace-mod.o contents through code") this code was reworked to output the trampoline instructions directly into the PLT entry but, unfortunately, the necessary cache maintenance was overlooked. Add a call to __flush_icache_range() after writing the new trampoline instructions but before patching in the branch to the trampoline. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: James Morse <james.morse@arm.com> Cc: <stable@vger.kernel.org> Fixes: be0f272bfc83 ("arm64: ftrace: emit ftrace-mod.o contents through code") Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-08-16	x86/boot: Save fields explicitly, zero out everything else	John Hubbard	1	-15/+48
	Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds memset, if the memset goes accross several fields of a struct. This generated a couple of warnings on x86_64 builds in sanitize_boot_params(). Fix this by explicitly saving the fields in struct boot_params that are intended to be preserved, and zeroing all the rest. [ tglx: Tagged for stable as it breaks the warning free build there as well ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com
2019-08-15	ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term	Hui Peng	1	-8/+27
	`check_input_term` recursively calls itself with input from device side (e.g., uac_input_terminal_descriptor.bCSourceID) as argument (id). In `check_input_term`, if `check_input_term` is called with the same `id` argument as the caller, it triggers endless recursive call, resulting kernel space stack overflow. This patch fixes the bug by adding a bitmap to `struct mixer_build` to keep track of the checked ids and stop the execution if some id has been checked (similar to how parse_audio_unit handles unitid argument). Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Signed-off-by: Hui Peng <benquike@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-15	io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list	Jackie Liu	1	-7/+9
	This patch may fix two issues: First, when IOSQE_IO_DRAIN set, the next IOs need to be inserted into defer list to delay execution, but link io will be actively scheduled to run by calling io_queue_sqe. Second, when multiple LINK_IOs are inserted together with defer_list, the LINK_IO is no longer keep order. \|-------------\| \| LINK_IO \| ----> insert to defer_list ----------- \|-------------\| \| \| LINK_IO \| ----> insert to defer_list ----------\| \|-------------\| \| \| LINK_IO \| ----> insert to defer_list ----------\| \|-------------\| \| \| NORMAL_IO \| ----> insert to defer_list ----------\| \|-------------\| \| \| queue_work at same time <-----\| Fixes: 9e645e1105c ("io_uring: add support for sqe links") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15	block: remove REQ_NOWAIT_INLINE	Jens Axboe	3	-54/+8
	We had a few issues with this code, and there's still a problem around how we deal with error handling for chained/split bios. For now, just revert the code and we'll try again with a thoroug solution. This reverts commits: e15c2ffa1091 ("block: fix O_DIRECT error handling for bio fragments") 0eb6ddfb865c ("block: Fix __blkdev_direct_IO() for bio fragments") 6a43074e2f46 ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO") 893a1c97205a ("blk-mq: allow REQ_NOWAIT to return an error inline") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15	io_uring: fix manual setup of iov_iter for fixed buffers	Aleix Roca Nonell	1	-3/+1
	Commit bd11b3a391e3 ("io_uring: don't use iov_iter_advance() for fixed buffers") introduced an optimization to avoid using the slow iov_iter_advance by manually populating the iov_iter iterator in some cases. However, the computation of the iterator count field was erroneous: The first bvec was always accounted for an extent of page size even if the bvec length was smaller. In consequence, some I/O operations on fixed buffers were unable to operate on the full extent of the buffer, consistently skipping some bytes at the end of it. Fixes: bd11b3a391e3 ("io_uring: don't use iov_iter_advance() for fixed buffers") Cc: stable@vger.kernel.org Signed-off-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15	misc: xilinx-sdfec: fix dependency and build error	Randy Dunlap	1	-0/+1
	lib/devres.c, which implements devm_ioremap_resource(), is only built when CONFIG_HAS_IOMEM is set/enabled, so XILINX_SDFEC should depend on HAS_IOMEM. Fixes this build error (as seen on UML builds): ERROR: "devm_ioremap_resource" [drivers/misc/xilinx_sdfec.ko] undefined! Fixes: 76d83e1c3233 ("misc: xilinx-sdfec: add core driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Derek Kiernan <derek.kiernan@xilinx.com> Cc: Dragan Cvetic <dragan.cvetic@xilinx.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/f9004be5-9925-327b-3ec2-6506e46fe565@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15	usb: add a hcd_uses_dma helper	Christoph Hellwig	5	-11/+10
	The USB buffer allocation code is the only place in the usb core (and in fact the whole kernel) that uses is_device_dma_capable, while the URB mapping code uses the uses_dma flag in struct usb_bus. Switch the buffer allocation to use the uses_dma flag used by the rest of the USB code, and create a helper in hcd.h that checks this flag as well as the CONFIG_HAS_DMA to simplify the caller a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190811080520.21712-3-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15	usb: don't create dma pools for HCDs with a localmem_pool	Christoph Hellwig	1	-3/+3
	If the HCD provides a localmem pool we will never use the DMA pools, so don't create them. Fixes: b0310c2f09bb ("USB: use genalloc for USB HCs with local memory") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190811080520.21712-2-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15	usb: chipidea: imx: fix EPROBE_DEFER support during driver probe	André Draszik	1	-7/+12
	If driver probe needs to be deferred, e.g. because ci_hdrc_add_device() isn't ready yet, this driver currently misbehaves badly: a) success is still reported to the driver core (meaning a 2nd probe attempt will never be done), leaving the driver in a dysfunctional state and the hardware unusable b) driver remove / shutdown OOPSes: [ 206.786916] Unable to handle kernel paging request at virtual address fffffdff [ 206.794148] pgd = 880b9f82 [ 206.796890] [fffffdff] pgd=abf5e861, pte=00000000, ppte=00000000 [ 206.803179] Internal error: Oops: 37 [#1] PREEMPT SMP ARM [ 206.808581] Modules linked in: wl18xx evbug [ 206.813308] CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 4.19.35+gf345c93b4195 #1 [ 206.821053] Hardware name: Freescale i.MX7 Dual (Device Tree) [ 206.826813] PC is at ci_hdrc_remove_device+0x4/0x20 [ 206.831699] LR is at ci_hdrc_imx_remove+0x20/0xe8 [ 206.836407] pc : [<805cd4b0>] lr : [<805d62cc>] psr: 20000013 [ 206.842678] sp : a806be40 ip : 00000001 fp : 80adbd3c [ 206.847906] r10: 80b1b794 r9 : 80d5dfe0 r8 : a8192c44 [ 206.853136] r7 : 80db93a0 r6 : a8192c10 r5 : a8192c00 r4 : a93a4a00 [ 206.859668] r3 : 00000000 r2 : a8192ce4 r1 : ffffffff r0 : fffffdfb [ 206.866201] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 206.873341] Control: 10c5387d Table: a9e0c06a DAC: 00000051 [ 206.879092] Process systemd-shutdow (pid: 1, stack limit = 0xb271353c) [ 206.885624] Stack: (0xa806be40 to 0xa806c000) [ 206.889992] be40: a93a4a00 805d62cc a8192c1c a8170e10 a8192c10 8049a490 80d04d08 00000000 [ 206.898179] be60: 00000000 80d0da2c fee1dead 00000000 a806a000 00000058 00000000 80148b08 [ 206.906366] be80: 01234567 80148d8c a9858600 00000000 00000000 00000000 00000000 80d04d08 [ 206.914553] bea0: 00000000 00000000 a82741e0 a9858600 00000024 00000002 a9858608 00000005 [ 206.922740] bec0: 0000001e 8022c058 00000000 00000000 a806bf14 a9858600 00000000 a806befc [ 206.930927] bee0: a806bf78 00000000 7ee12c30 8022c18c a806bef8 a806befc 00000000 00000001 [ 206.939115] bf00: 00000000 00000024 a806bf14 00000005 7ee13b34 7ee12c68 00000004 7ee13f20 [ 206.947302] bf20: 00000010 7ee12c7c 00000005 7ee12d04 0000000a 76e7dc00 00000001 80d0f140 [ 206.955490] bf40: ab637880 a974de40 60000013 80d0f140 ab6378a0 80d04d08 a8080470 a9858600 [ 206.963677] bf60: a9858600 00000000 00000000 8022c24c 00000000 80144310 00000000 00000000 [ 206.971864] bf80: 80101204 80d04d08 00000000 80d04d08 00000000 00000000 00000003 00000058 [ 206.980051] bfa0: 80101204 80101000 00000000 00000000 fee1dead 28121969 01234567 00000000 [ 206.988237] bfc0: 00000000 00000000 00000003 00000058 00000000 00000000 00000000 00000000 [ 206.996425] bfe0: 0049ffb0 7ee13d58 0048a84b 76f245a6 60000030 fee1dead 00000000 00000000 [ 207.004622] [<805cd4b0>] (ci_hdrc_remove_device) from [<805d62cc>] (ci_hdrc_imx_remove+0x20/0xe8) [ 207.013509] [<805d62cc>] (ci_hdrc_imx_remove) from [<8049a490>] (device_shutdown+0x16c/0x218) [ 207.022050] [<8049a490>] (device_shutdown) from [<80148b08>] (kernel_restart+0xc/0x50) [ 207.029980] [<80148b08>] (kernel_restart) from [<80148d8c>] (sys_reboot+0xf4/0x1f0) [ 207.037648] [<80148d8c>] (sys_reboot) from [<80101000>] (ret_fast_syscall+0x0/0x54) [ 207.045308] Exception stack(0xa806bfa8 to 0xa806bff0) [ 207.050368] bfa0: 00000000 00000000 fee1dead 28121969 01234567 00000000 [ 207.058554] bfc0: 00000000 00000000 00000003 00000058 00000000 00000000 00000000 00000000 [ 207.066737] bfe0: 0049ffb0 7ee13d58 0048a84b 76f245a6 [ 207.071799] Code: ebffffa8 e3a00000 e8bd8010 e92d4010 (e5904004) [ 207.078021] ---[ end trace be47424e3fd46e9f ]--- [ 207.082647] Kernel panic - not syncing: Fatal exception [ 207.087894] ---[ end Kernel panic - not syncing: Fatal exception ]--- c) the error path in combination with driver removal causes imbalanced calls to the clk_() and pm_()* APIs a) happens because the original intended return value is overwritten (with 0) by the return code of regulator_disable() in ci_hdrc_imx_probe()'s error path b) happens because ci_pdev is -EPROBE_DEFER, which causes ci_hdrc_remove_device() to OOPS Fix a) by being more careful in ci_hdrc_imx_probe()'s error path and not overwriting the real error code Fix b) by calling the respective cleanup functions during remove only when needed (when ci_pdev != NULL, i.e. when everything was initialised correctly). This also has the side effect of not causing imbalanced clk_() and pm_() API calls as part of the error code path. Fixes: 7c8e8909417e ("usb: chipidea: imx: add HSIC support") Signed-off-by: André Draszik <git@andred.net> Cc: stable <stable@vger.kernel.org> CC: Peter Chen <Peter.Chen@nxp.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Shawn Guo <shawnguo@kernel.org> CC: Sascha Hauer <s.hauer@pengutronix.de> CC: Pengutronix Kernel Team <kernel@pengutronix.de> CC: Fabio Estevam <festevam@gmail.com> CC: NXP Linux Team <linux-imx@nxp.com> CC: linux-usb@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org CC: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20190810150758.17694-1-git@andred.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15	usb: host: fotg2: restart hcd after port reset	Hans Ulli Kroll	1	-0/+4
	On the Gemini SoC the FOTG2 stalls after port reset so restart the HCD after each port reset. Signed-off-by: Hans Ulli Kroll <ulli.kroll@googlemail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20190810150458.817-1-linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15	USB: CDC: fix sanity checks in CDC union parser	Oliver Neukum	1	-2/+2
	A few checks checked for the size of the pointer to a structure instead of the structure itself. Copy & paste issue presumably. Fixes: e4c6fb7794982 ("usbnet: move the CDC parser into USB core") Cc: stable <stable@vger.kernel.org> Reported-by: syzbot+45a53506b65321c1fe91@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20190813093541.18889-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15	usb: cdc-acm: make sure a refcount is taken early enough	Oliver Neukum	1	-5/+7
	destroy() will decrement the refcount on the interface, so that it needs to be taken so early that it never undercounts. Fixes: 7fb57a019f94e ("USB: cdc-acm: Fix potential deadlock (lockdep warning)") Cc: stable <stable@vger.kernel.org> Reported-and-tested-by: syzbot+1b2449b7b5dc240d107a@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20190808142119.7998-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15	USB: serial: option: add the BroadMobi BM818 card	Bob Ham	1	-0/+2
	Add a VID:PID for the BroadMobi BM818 M.2 card T: Bus=01 Lev=03 Prnt=40 Port=03 Cnt=01 Dev#= 44 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2020 ProdID=2060 Rev=00.00 S: Manufacturer=Qualcomm, Incorporated S: Product=Qualcomm CDMA Technologies MSM C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#=0x0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) I: If#=0x1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) I: If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=(none) I: If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) Signed-off-by: Bob Ham <bob.ham@puri.sm> Signed-off-by: Angus Ainslie (Purism) <angus@akkea.ca> Cc: stable <stable@vger.kernel.org> [ johan: use USB_DEVICE_INTERFACE_CLASS() ] Signed-off-by: Johan Hovold <johan@kernel.org>
2019-08-15	USB: serial: option: Add Motorola modem UARTs	Tony Lindgren	1	-0/+5
	On Motorola Mapphone devices such as Droid 4 there are five USB ports that do not use the same layout as Gobi 1K/2K/etc devices listed in qcserial.c. So we should use qcaux.c or option.c as noted by Dan Williams <dan.j.williams@intel.com>. As the Motorola USB serial ports have an interrupt endpoint as shown with lsusb -v, we should use option.c instead of qcaux.c as pointed out by Johan Hovold <johan@kernel.org>. The ff/ff/ff interfaces seem to always be UARTs on Motorola devices. For the other interfaces, class 0x0a (CDC Data) should not in general be added as they are typically part of a multi-interface function as noted earlier by Bjørn Mork <bjorn@mork.no>. However, looking at the Motorola mapphone kernel code, the mdm6600 0x0a class is only used for flashing the modem firmware, and there are no other interfaces. So I've added that too with more details below as it works just fine. The ttyUSB ports on Droid 4 are: ttyUSB0 DIAG, CQDM-capable ttyUSB1 MUX or NMEA, no response ttyUSB2 MUX or NMEA, no response ttyUSB3 TCMD ttyUSB4 AT-capable The ttyUSB0 is detected as QCDM capable by ModemManager. I think it's only used for debugging with ModemManager --debug for sending custom AT commands though. ModemManager already can manage data connection using the USB QMI ports that are already handled by the qmi_wwan.c driver. To enable the MUX or NMEA ports, it seems that something needs to be done additionally to enable them, maybe via the DIAG or TCMD port. It might be just a NVRAM setting somewhere, but I have no idea what NVRAM settings may need changing for that. The TCMD port seems to be a Motorola custom protocol for testing the modem and to configure it's NVRAM and seems to work just fine based on a quick test with a minimal tcmdrw tool I wrote. The voice modem AT-capable port seems to provide only partial support, and no PM support compared to the TS 27.010 based UART wired directly to the modem. The UARTs added with this change are the same product IDs as the Motorola Mapphone Android Linux kernel mdm6600_id_table. I don't have any mdm9600 based devices, so I have only tested these on mdm6600 based droid 4. Then for the class 0x0a (CDC Data) mode, the Motorola Mapphone Android Linux kernel driver moto_flashqsc.c just seems to change the port->bulk_out_size to 8K from the default. And is only used for flashing the modem firmware it seems. I've verified that flashing the modem with signed firmware works just fine with the option driver after manually toggling the GPIO pins, so I've added droid 4 modem flashing mode to the option driver. I've not added the other devices listed in moto_flashqsc.c in case they really need different port->bulk_out_size. Those can be added as they get tested to work for flashing the modem. After this patch the output of /sys/kernel/debug/usb/devices has the following for normal 22b8:2a70 mode including the related qmi_wwan interfaces: T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=22b8 ProdID=2a70 Rev= 0.00 S: Manufacturer=Motorola, Incorporated S: Product=Flash MZ600 C:* #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=5ms E: Ad=86(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=5ms E: Ad=88(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan E: Ad=89(I) Atr=03(Int.) MxPS= 64 Ivl=5ms E: Ad=8a(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=07(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan E: Ad=8b(I) Atr=03(Int.) MxPS= 64 Ivl=5ms E: Ad=8c(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=08(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fb Prot=ff Driver=qmi_wwan E: Ad=8d(I) Atr=03(Int.) MxPS= 64 Ivl=5ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=09(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms In 22b8:900e "qc_dload" mode the device shows up as: T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=22b8 ProdID=900e Rev= 0.00 S: Manufacturer=Motorola, Incorporated S: Product=Flash MZ600 C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms And in 22b8:4281 "ram_downloader" mode the device shows up as: T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=22b8 ProdID=4281 Rev= 0.00 S: Manufacturer=Motorola, Incorporated S: Product=Flash MZ600 C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=fc Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms Cc: Bjørn Mork <bjorn@mork.no> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Lars Melin <larsm17@gmail.com> Cc: Marcel Partap <mpartap@gmx.net> Cc: Merlijn Wajer <merlijn@wizzup.org> Cc: Michael Scott <hashcode0f@gmail.com> Cc: NeKit <nekit1000@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Sebastian Reichel <sre@kernel.org> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Tony Lindgren <tony@atomide.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org>
2019-08-15	MAINTAINERS, x86/CPU: Tony Luck will maintain asm/intel-family.h	Tony Luck	1	-0/+7
	There are a few different subsystems in the kernel that depend on model specific behaviour (perf, EDAC, power, ...). Easier for just one person to have the task to get new model numbers included instead of having these groups trip over each other to do it. [ bp: s/Cpu/CPU/ and add x86@kernel.org so that it gets CCed too as FYI. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190814234030.30817-1-tony.luck@intel.com
2019-08-15	drm/nouveau: Only recalculate PBN/VCPI on mode/connector changes	Lyude Paul	1	-9/+13
	I -thought- I had fixed this entirely, but it looks like that I didn't test this thoroughly enough as we apparently still make one big mistake with nv50_msto_atomic_check() - we don't handle the following scenario: * CRTC #1 has n VCPI allocated to it, is attached to connector DP-4 which is attached to encoder #1. enabled=y active=n * CRTC #1 is changed from DP-4 to DP-5, causing: * DP-4 crtc=#1→NULL (VCPI n→0) * DP-5 crtc=NULL→#1 * CRTC #1 steals encoder #1 back from DP-4 and gives it to DP-5 * CRTC #1 maintains the same mode as before, just with a different connector * mode_changed=n connectors_changed=y (we _SHOULD_ do VCPI 0→n here, but don't) Once the above scenario is repeated once, we'll attempt freeing VCPI from the connector that we didn't allocate due to the connectors changing, but the mode staying the same. Sigh. Since nv50_msto_atomic_check() has broken a few times now, let's rethink things a bit to be more careful: limit both VCPI/PBN allocations to mode_changed \|\| connectors_changed, since neither VCPI or PBN should ever need to change outside of routing and mode changes. Changes since v1: * Fix accidental reversal of clock and bpp arguments in drm_dp_calc_pbn_mode() - William Lewis Signed-off-by: Lyude Paul <lyude@redhat.com> Reported-by: Bohdan Milar <bmilar@redhat.com> Tested-by: Bohdan Milar <bmilar@redhat.com> Fixes: 232c9eec417a ("drm/nouveau: Use atomic VCPI helpers for MST") References: 412e85b60531 ("drm/nouveau: Only release VCPI slots on mode changes") Cc: Lyude Paul <lyude@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@redhat.com> Cc: Jerry Zuo <Jerry.Zuo@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Juston Li <juston.li@intel.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Karol Herbst <karolherbst@gmail.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <stable@vger.kernel.org> # v5.1+ Acked-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809005307.18391-1-lyude@redhat.com
2019-08-15	drm/ast: Fixed reboot test may cause system hanged	Y.C. Chen	3	-3/+6
	There is another thread still access standard VGA I/O while loading drm driver. Disable standard VGA I/O decode to avoid this issue. Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/1523410059-18415-1-git-send-email-yc_chen@aspeedtech.com
2019-08-14	of: irq: fix a trivial typo in a doc comment	Lubomir Rintel	1	-1/+1
	Diverged from what the code does with commit 530210c7814e ("of/irq: Replace of_irq with of_phandle_args"). Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Rob Herring <robh@kernel.org>
2019-08-14	dt-bindings: pinctrl: stm32: Fix 'st,syscfg' schema	Rob Herring	1	-1/+2
	The proper way to add additional contraints to an existing json-schema is using 'allOf' to reference the base schema. Using just '$ref' doesn't work. Fix this for the 'st,syscfg' property. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: linux-gpio@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Cc: linux-arm-kernel@lists.infradead.org Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org>
2019-08-14	drm/scheduler: use job count instead of peek	Christian König	1	-2/+2
	The spsc_queue_peek function is accessing queue->head which belongs to the consumer thread and shouldn't be accessed by the producer This is fixing a rare race condition when destroying entities. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Monk.liu@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-14	riscv: Make __fstate_clean() work correctly.	Vincent Chen	1	-1/+1
	Make the __fstate_clean() function correctly set the state of sstatus.FS in pt_regs to SR_FS_CLEAN. Fixes: 7db91e57a0acd ("RISC-V: Task implementation") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> [paul.walmsley@sifive.com: expanded "Fixes" commit ID] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-08-14	riscv: Correct the initialized flow of FP register	Vincent Chen	2	-2/+15
	The following two reasons cause FP registers are sometimes not initialized before starting the user program. 1. Currently, the FP context is initialized in flush_thread() function and we expect these initial values to be restored to FP register when doing FP context switch. However, the FP context switch only occurs in switch_to function. Hence, if this process does not be scheduled out and scheduled in before entering the user space, the FP registers have no chance to initialize. 2. In flush_thread(), the state of reg->sstatus.FS inherits from the parent. Hence, the state of reg->sstatus.FS may be dirty. If this process is scheduled out during flush_thread() and initializing the FP register, the fstate_save() in switch_to will corrupt the FP context which has been initialized until flush_thread(). To solve the 1st case, the initialization of the FP register will be completed in start_thread(). It makes sure all FP registers are initialized before starting the user program. For the 2nd case, the state of reg->sstatus.FS in start_thread will be set to SR_FS_OFF to prevent this process from corrupting FP context in doing context save. The FP state is set to SR_FS_INITIAL in start_trhead(). Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 7db91e57a0acd ("RISC-V: Task implementation") Cc: stable@vger.kernel.org [paul.walmsley@sifive.com: fixed brace alignment issue reported by checkpatch] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-08-14	ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit	Hui Peng	1	-0/+2
	The `uac_mixer_unit_descriptor` shown as below is read from the device side. In `parse_audio_mixer_unit`, `baSourceID` field is accessed from index 0 to `bNrInPins` - 1, the current implementation assumes that descriptor is always valid (the length of descriptor is no shorter than 5 + `bNrInPins`). If a descriptor read from the device side is invalid, it may trigger out-of-bound memory access. ``` struct uac_mixer_unit_descriptor { __u8 bLength; __u8 bDescriptorType; __u8 bDescriptorSubtype; __u8 bUnitID; __u8 bNrInPins; __u8 baSourceID[]; } ``` This patch fixes the bug by add a sanity check on the length of the descriptor. Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Cc: <stable@vger.kernel.org> Signed-off-by: Hui Peng <benquike@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-14	i2c: stm32: Use the correct style for SPDX License Identifier	Nishad Kamdar	1	-1/+1
	This patch corrects the SPDX License Identifier style in header file related to STM32 Driver for I2C hardware bus support. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46 Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-14	i2c: emev2: avoid race when unregistering slave client	Wolfram Sang	1	-4/+12
	After we disabled interrupts, there might still be an active one running. Sync before clearing the pointer to the slave device. Fixes: c31d0a00021d ("i2c: emev2: add slave support") Reported-by: Krzysztof Adamski <krzysztof.adamski@nokia.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Krzysztof Adamski <krzysztof.adamski@nokia.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-14	i2c: rcar: avoid race when unregistering slave client	Wolfram Sang	1	-4/+7
	After we disabled interrupts, there might still be an active one running. Sync before clearing the pointer to the slave device. Fixes: de20d1857dd6 ("i2c: rcar: add slave support") Reported-by: Krzysztof Adamski <krzysztof.adamski@nokia.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Krzysztof Adamski <krzysztof.adamski@nokia.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-14	MAINTAINERS: i2c-imx: take over maintainership	Oleksij Rempel	1	-0/+8
	I would like to maintain the i2c-imx driver. Since I work with different i.MX variants and have access to the hardware, I can spend some time on the reviewing of this driver. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-14	Revert "i2c: imx: improve the error handling in i2c_imx_dma_request()"	Fabio Estevam	1	-12/+6
	Since commit e1ab9a468e3b ("i2c: imx: improve the error handling in i2c_imx_dma_request()") when booting with the DMA driver as module (such as CONFIG_FSL_EDMA=m) the following endless clk warnings are seen: [ 153.077831] ------------[ cut here ]------------ [ 153.082528] WARNING: CPU: 0 PID: 15 at drivers/clk/clk.c:924 clk_core_disable_lock+0x18/0x24 [ 153.093077] i2c0 already disabled [ 153.096416] Modules linked in: [ 153.099521] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G W 5.2.0+ #321 [ 153.107290] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) [ 153.113772] Workqueue: events deferred_probe_work_func [ 153.118979] [<c0019560>] (unwind_backtrace) from [<c0014734>] (show_stack+0x10/0x14) [ 153.126778] [<c0014734>] (show_stack) from [<c083f8dc>] (dump_stack+0x9c/0xd4) [ 153.134051] [<c083f8dc>] (dump_stack) from [<c0031154>] (__warn+0xf8/0x124) [ 153.141056] [<c0031154>] (__warn) from [<c0031248>] (warn_slowpath_fmt+0x38/0x48) [ 153.148580] [<c0031248>] (warn_slowpath_fmt) from [<c040fde0>] (clk_core_disable_lock+0x18/0x24) [ 153.157413] [<c040fde0>] (clk_core_disable_lock) from [<c058f520>] (i2c_imx_probe+0x554/0x6ec) [ 153.166076] [<c058f520>] (i2c_imx_probe) from [<c04b9178>] (platform_drv_probe+0x48/0x98) [ 153.174297] [<c04b9178>] (platform_drv_probe) from [<c04b7298>] (really_probe+0x1d8/0x2c0) [ 153.182605] [<c04b7298>] (really_probe) from [<c04b7554>] (driver_probe_device+0x5c/0x174) [ 153.190909] [<c04b7554>] (driver_probe_device) from [<c04b58c8>] (bus_for_each_drv+0x44/0x8c) [ 153.199480] [<c04b58c8>] (bus_for_each_drv) from [<c04b746c>] (__device_attach+0xa0/0x108) [ 153.207782] [<c04b746c>] (__device_attach) from [<c04b65a4>] (bus_probe_device+0x88/0x90) [ 153.215999] [<c04b65a4>] (bus_probe_device) from [<c04b6a04>] (deferred_probe_work_func+0x60/0x90) [ 153.225003] [<c04b6a04>] (deferred_probe_work_func) from [<c004f190>] (process_one_work+0x204/0x634) [ 153.234178] [<c004f190>] (process_one_work) from [<c004f618>] (worker_thread+0x20/0x484) [ 153.242315] [<c004f618>] (worker_thread) from [<c0055c2c>] (kthread+0x118/0x150) [ 153.249758] [<c0055c2c>] (kthread) from [<c00090b4>] (ret_from_fork+0x14/0x20) [ 153.257006] Exception stack(0xdde43fb0 to 0xdde43ff8) [ 153.262095] 3fa0: 00000000 00000000 00000000 00000000 [ 153.270306] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 153.278520] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 153.285159] irq event stamp: 3323022 [ 153.288787] hardirqs last enabled at (3323021): [<c0861c4c>] _raw_spin_unlock_irq+0x24/0x2c [ 153.297261] hardirqs last disabled at (3323022): [<c040d7a0>] clk_enable_lock+0x10/0x124 [ 153.305392] softirqs last enabled at (3322092): [<c000a504>] __do_softirq+0x344/0x540 [ 153.313352] softirqs last disabled at (3322081): [<c00385c0>] irq_exit+0x10c/0x128 [ 153.320946] ---[ end trace a506731ccd9bd703 ]--- This endless clk warnings behaviour is well explained by Andrey Smirnov: "Allocating DMA after registering I2C adapter can lead to infinite probing loop, for example, consider the following scenario: 1. i2c_imx_probe() is called and successfully registers an I2C adapter via i2c_add_numbered_adapter() 2. As a part of i2c_add_numbered_adapter() new I2C slave devices are added from DT which results in a call to driver_deferred_probe_trigger() 3. i2c_imx_probe() continues and calls i2c_imx_dma_request() which due to lack of proper DMA driver returns -EPROBE_DEFER 4. i2c_imx_probe() fails, removes I2C adapter and returns -EPROBE_DEFER, which places it into deferred probe list 5. Deferred probe work triggered in #2 above kicks in and calls i2c_imx_probe() again thus bringing us to step #1" So revert commit e1ab9a468e3b ("i2c: imx: improve the error handling in i2c_imx_dma_request()") and restore the old behaviour, in order to avoid regressions on existing setups. Cc: <stable@vger.kernel.org> Reported-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reported-by: Russell King <linux@armlinux.org.uk> Fixes: e1ab9a468e3b ("i2c: imx: improve the error handling in i2c_imx_dma_request()") Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-14	ALSA: hda - Add a generic reboot_notify	Hui Wang	4	-15/+22
	Make codec enter D3 before rebooting or poweroff can fix the noise issue on some laptops. And in theory it is harmless for all codecs to enter D3 before rebooting or poweroff, let us add a generic reboot_notify, then realtek and conexant drivers can call this function. Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-14	ALSA: hda - Let all conexant codec enter D3 when rebooting	Hui Wang	1	-9/+0
	We have 3 new lenovo laptops which have conexant codec 0x14f11f86, these 3 laptops also have the noise issue when rebooting, after letting the codec enter D3 before rebooting or poweroff, the noise disappers. Instead of adding a new ID again in the reboot_notify(), let us make this function apply to all conexant codec. In theory make codec enter D3 before rebooting or poweroff is harmless, and I tested this change on a couple of other Lenovo laptops which have different conexant codecs, there is no side effect so far. Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-13	riscv: defconfig: Update the defconfig	Alistair Francis	1	-0/+2
	Update the defconfig: - Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable VirtIORNG when running on QEMU Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-08-13	riscv: rv32_defconfig: Update the defconfig	Alistair Francis	1	-0/+3
	Update the rv32_defconfig: - Add 'CONFIG_DEVTMPFS_MOUNT=y' to match the RISC-V defconfig - Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable VirtIORNG when running on QEMU Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-08-13	hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS	Mike Kravetz	1	-0/+19
	Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in the kernel-v5.2.3 testing. This is caused by a race between hugetlb page migration and page fault. If a hugetlb page can not be allocated to satisfy a page fault, the task is sent SIGBUS. This is normal hugetlbfs behavior. A hugetlb fault mutex exists to prevent two tasks from trying to instantiate the same page. This protects against the situation where there is only one hugetlb page, and both tasks would try to allocate. Without the mutex, one would fail and SIGBUS even though the other fault would be successful. There is a similar race between hugetlb page migration and fault. Migration code will allocate a page for the target of the migration. It will then unmap the original page from all page tables. It does this unmap by first clearing the pte and then writing a migration entry. The page table lock is held for the duration of this clear and write operation. However, the beginnings of the hugetlb page fault code optimistically checks the pte without taking the page table lock. If clear (as it can be during the migration unmap operation), a hugetlb page allocation is attempted to satisfy the fault. Note that the page which will eventually satisfy this fault was already allocated by the migration code. However, the allocation within the fault path could fail which would result in the task incorrectly being sent SIGBUS. Ideally, we could take the hugetlb fault mutex in the migration code when modifying the page tables. However, locks must be taken in the order of hugetlb fault mutex, page lock, page table lock. This would require significant rework of the migration code. Instead, the issue is addressed in the hugetlb fault code. After failing to allocate a huge page, take the page table lock and check for huge_pte_none before returning an error. This is the same check that must be made further in the code even if page allocation is successful. Link: http://lkml.kernel.org/r/20190808000533.7701-1-mike.kravetz@oracle.com Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Li Wang <liwang@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Cyril Hrubis <chrubis@suse.cz> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	mm, vmscan: do not special-case slab reclaim when watermarks are boosted	Mel Gorman	1	-11/+2
	Dave Chinner reported a problem pointing a finger at commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs"). The report is extensive: https://lore.kernel.org/linux-mm/20190807091858.2857-1-david@fromorbit.com/ and it's worth recording the most relevant parts (colorful language and typos included). When running a simple, steady state 4kB file creation test to simulate extracting tarballs larger than memory full of small files into the filesystem, I noticed that once memory fills up the cache balance goes to hell. The workload is creating one dirty cached inode for every dirty page, both of which should require a single IO each to clean and reclaim, and creation of inodes is throttled by the rate at which dirty writeback runs at (via balance dirty pages). Hence the ingest rate of new cached inodes and page cache pages is identical and steady. As a result, memory reclaim should quickly find a steady balance between page cache and inode caches. The moment memory fills, the page cache is reclaimed at a much faster rate than the inode cache, and evidence suggests that the inode cache shrinker is not being called when large batches of pages are being reclaimed. In roughly the same time period that it takes to fill memory with 50% pages and 50% slab caches, memory reclaim reduces the page cache down to just dirty pages and slab caches fill the entirety of memory. The LRU is largely full of dirty pages, and we're getting spikes of random writeback from memory reclaim so it's all going to shit. Behaviour never recovers, the page cache remains pinned at just dirty pages, and nothing I could tune would make any difference. vfs_cache_pressure makes no difference - I would set it so high it should trim the entire inode caches in a single pass, yet it didn't do anything. It was clear from tracing and live telemetry that the shrinkers were pretty much not running except when there was absolutely no memory free at all, and then they did the minimum necessary to free memory to make progress. So I went looking at the code, trying to find places where pages got reclaimed and the shrinkers weren't called. There's only one - kswapd doing boosted reclaim as per commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs"). The watermark boosting introduced by the commit is triggered in response to an allocation "fragmentation event". The boosting was not intended to target THP specifically and triggers even if THP is disabled. However, with Dave's perfectly reasonable workload, fragmentation events can be very common given the ratio of slab to page cache allocations so boosting remains active for long periods of time. As high-order allocations might use compaction and compaction cannot move slab pages the decision was made in the commit to special-case kswapd when watermarks are boosted -- kswapd avoids reclaiming slab as reclaiming slab does not directly help compaction. As Dave notes, this decision means that slab can be artificially protected for long periods of time and messes up the balance with slab and page caches. Removing the special casing can still indirectly help avoid fragmentation by avoiding fragmentation-causing events due to slab allocation as pages from a slab pageblock will have some slab objects freed. Furthermore, with the special casing, reclaim behaviour is unpredictable as kswapd sometimes examines slab and sometimes does not in a manner that is tricky to tune or analyse. This patch removes the special casing. The downside is that this is not a universal performance win. Some benchmarks that depend on the residency of data when rereading metadata may see a regression when slab reclaim is restored to its original behaviour. Similarly, some benchmarks that only read-once or write-once may perform better when page reclaim is too aggressive. The primary upside is that slab shrinker is less surprising (arguably more sane but that's a matter of opinion), behaves consistently regardless of the fragmentation state of the system and properly obeys VM sysctls. A fsmark benchmark configuration was constructed similar to what Dave reported and is codified by the mmtest configuration config-io-fsmark-small-file-stream. It was evaluated on a 1-socket machine to avoid dealing with NUMA-related issues and the timing of reclaim. The storage was an SSD Samsung Evo and a fresh trimmed XFS filesystem was used for the test data. This is not an exact replication of Dave's setup. The configuration scales its parameters depending on the memory size of the SUT to behave similarly across machines. The parameters mean the first sample reported by fs_mark is using 50% of RAM which will barely be throttled and look like a big outlier. Dave used fake NUMA to have multiple kswapd instances which I didn't replicate. Finally, the number of iterations differ from Dave's test as the target disk was not large enough. While not identical, it should be representative. fsmark 5.3.0-rc3 5.3.0-rc3 vanilla shrinker-v1r1 Min 1-files/sec 4444.80 ( 0.00%) 4765.60 ( 7.22%) 1st-qrtle 1-files/sec 5005.10 ( 0.00%) 5091.70 ( 1.73%) 2nd-qrtle 1-files/sec 4917.80 ( 0.00%) 4855.60 ( -1.26%) 3rd-qrtle 1-files/sec 4667.40 ( 0.00%) 4831.20 ( 3.51%) Max-1 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Max-5 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Max-10 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Max-90 1-files/sec 4649.60 ( 0.00%) 4780.70 ( 2.82%) Max-95 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%) Max-99 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%) Max 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Hmean 1-files/sec 5004.75 ( 0.00%) 5075.96 ( 1.42%) Stddev 1-files/sec 1778.70 ( 0.00%) 1369.66 ( 23.00%) CoeffVar 1-files/sec 33.70 ( 0.00%) 26.05 ( 22.71%) BHmean-99 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%) BHmean-95 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%) BHmean-90 1-files/sec 5107.05 ( 0.00%) 5131.41 ( 0.48%) BHmean-75 1-files/sec 5208.45 ( 0.00%) 5206.68 ( -0.03%) BHmean-50 1-files/sec 5405.53 ( 0.00%) 5381.62 ( -0.44%) BHmean-25 1-files/sec 6179.75 ( 0.00%) 6095.14 ( -1.37%) 5.3.0-rc3 5.3.0-rc3 vanillashrinker-v1r1 Duration User 501.82 497.29 Duration System 4401.44 4424.08 Duration Elapsed 8124.76 8358.05 This is showing a slight skew for the max result representing a large outlier for the 1st, 2nd and 3rd quartile are similar indicating that the bulk of the results show little difference. Note that an earlier version of the fsmark configuration showed a regression but that included more samples taken while memory was still filling. Note that the elapsed time is higher. Part of this is that the configuration included time to delete all the test files when the test completes -- the test automation handles the possibility of testing fsmark with multiple thread counts. Without the patch, many of these objects would be memory resident which is part of what the patch is addressing. There are other important observations that justify the patch. 1. With the vanilla kernel, the number of dirty pages in the system is very low for much of the test. With this patch, dirty pages is generally kept at 10% which matches vm.dirty_background_ratio which is normal expected historical behaviour. 2. With the vanilla kernel, the ratio of Slab/Pagecache is close to 0.95 for much of the test i.e. Slab is being left alone and dominating memory consumption. With the patch applied, the ratio varies between 0.35 and 0.45 with the bulk of the measured ratios roughly half way between those values. This is a different balance to what Dave reported but it was at least consistent. 3. Slabs are scanned throughout the entire test with the patch applied. The vanille kernel has periods with no scan activity and then relatively massive spikes. 4. Without the patch, kswapd scan rates are very variable. With the patch, the scan rates remain quite steady. 4. Overall vmstats are closer to normal expectations 5.3.0-rc3 5.3.0-rc3 vanilla shrinker-v1r1 Ops Direct pages scanned 99388.00 328410.00 Ops Kswapd pages scanned 45382917.00 33451026.00 Ops Kswapd pages reclaimed 30869570.00 25239655.00 Ops Direct pages reclaimed 74131.00 5830.00 Ops Kswapd efficiency % 68.02 75.45 Ops Kswapd velocity 5585.75 4002.25 Ops Page reclaim immediate 1179721.00 430927.00 Ops Slabs scanned 62367361.00 73581394.00 Ops Direct inode steals 2103.00 1002.00 Ops Kswapd inode steals 570180.00 5183206.00 o Vanilla kernel is hitting direct reclaim more frequently, not very much in absolute terms but the fact the patch reduces it is interesting o "Page reclaim immediate" in the vanilla kernel indicates dirty pages are being encountered at the tail of the LRU. This is generally bad and means in this case that the LRU is not long enough for dirty pages to be cleaned by the background flush in time. This is much reduced by the patch. o With the patch, kswapd is reclaiming 10 times more slab pages than with the vanilla kernel. This is indicative of the watermark boosting over-protecting slab A more complete set of tests were run that were part of the basis for introducing boosting and while there are some differences, they are well within tolerances. Bottom line, the special casing kswapd to avoid slab behaviour is unpredictable and can lead to abnormal results for normal workloads. This patch restores the expected behaviour that slab and page cache is balanced consistently for a workload with a steady allocation ratio of slab/pagecache pages. It also means that if there are workloads that favour the preservation of slab over pagecache that it can be tuned via vm.vfs_cache_pressure where as the vanilla kernel effectively ignores the parameter when boosting is active. Link: http://lkml.kernel.org/r/20190808182946.GM2739@techsingularity.net Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Dave Chinner <dchinner@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	Revert "mm, thp: restore node-local hugepage allocations"	Andrea Arcangeli	3	-17/+29
	This reverts commit 2f0799a0ffc033b ("mm, thp: restore node-local hugepage allocations"). commit 2f0799a0ffc033b was rightfully applied to avoid the risk of a severe regression that was reported by the kernel test robot at the end of the merge window. Now we understood the regression was a false positive and was caused by a significant increase in fairness during a swap trashing benchmark. So it's safe to re-apply the fix and continue improving the code from there. The benchmark that reported the regression is very useful, but it provides a meaningful result only when there is no significant alteration in fairness during the workload. The removal of __GFP_THISNODE increased fairness. __GFP_THISNODE cannot be used in the generic page faults path for new memory allocations under the MPOL_DEFAULT mempolicy, or the allocation behavior significantly deviates from what the MPOL_DEFAULT semantics are supposed to be for THP and 4k allocations alike. Setting THP defrag to "always" or using MADV_HUGEPAGE (with THP defrag set to "madvise") has never meant to provide an implicit MPOL_BIND on the "current" node the task is running on, causing swap storms and providing a much more aggressive behavior than even zone_reclaim_node = 3. Any workload who could have benefited from __GFP_THISNODE has now to enable zone_reclaim_mode=1\|\|2\|\|3. __GFP_THISNODE implicitly provided the zone_reclaim_mode behavior, but it only did so if THP was enabled: if THP was disabled, there would have been no chance to get any 4k page from the current node if the current node was full of pagecache, which further shows how this __GFP_THISNODE was misplaced in MADV_HUGEPAGE. MADV_HUGEPAGE has never been intended to provide any zone_reclaim_mode semantics, in fact the two are orthogonal, zone_reclaim_mode = 1\|2\|3 must work exactly the same with MADV_HUGEPAGE set or not. The performance characteristic of memory depends on the hardware details. The numbers below are obtained on Naples/EPYC architecture and the N/A projection extends them to show what we should aim for in the future as a good THP NUMA locality default. The benchmark used exercises random memory seeks (note: the cost of the page faults is not part of the measurement). D0 THP \| D0 4k \| D1 THP \| D1 4k \| D2 THP \| D2 4k \| D3 THP \| D3 4k \| ... 0% \| +43% \| +45% \| +106% \| +131% \| +224% \| N/A \| N/A D0 means distance zero (i.e. local memory), D1 means distance one (i.e. intra socket memory), D2 means distance two (i.e. inter socket memory), etc... For the guest physical memory allocated by qemu and for guest mode kernel the performance characteristic of RAM is more complex and an ideal default could be: D0 THP \| D1 THP \| D0 4k \| D2 THP \| D1 4k \| D3 THP \| D2 4k \| D3 4k \| ... 0% \| +58% \| +101% \| N/A \| +222% \| N/A \| N/A \| N/A NOTE: the N/A are projections and haven't been measured yet, the measurement in this case is done on a 1950x with only two NUMA nodes. The THP case here means THP was used both in the host and in the guest. After applying this commit the THP NUMA locality order that we'll get out of MADV_HUGEPAGE is this: D0 THP \| D1 THP \| D2 THP \| D3 THP \| ... \| D0 4k \| D1 4k \| D2 4k \| D3 4k \| ... Before this commit it was: D0 THP \| D0 4k \| D1 4k \| D2 4k \| D3 4k \| ... Even if we ignore the breakage of large workloads that can't fit in a single node that the __GFP_THISNODE implicit "current node" mbind caused, the THP NUMA locality order provided by __GFP_THISNODE was still not the one we shall aim for in the long term (i.e. the first one at the top). After this commit is applied, we can introduce a new allocator multi order API and to replace those two alloc_pages_vmas calls in the page fault path, with a single multi order call: unsigned int order = (1 << HPAGE_PMD_ORDER) \| (1 << 0); page = alloc_pages_multi_order(..., &order); if (!page) goto out; if (!(order & (1 << 0))) { VM_WARN_ON(order != 1 << HPAGE_PMD_ORDER); /* THP fault / } else { VM_WARN_ON(order != 1 << 0); / 4k fallback */ } The page allocator logic has to be altered so that when it fails on any zone with order 9, it has to try again with a order 0 before falling back to the next zone in the zonelist. After that we need to do more measurements and evaluate if adding an opt-in feature for guest mode is worth it, to swap "DN 4k \| DN+1 THP" with "DN+1 THP \| DN 4k" at every NUMA distance crossing. Link: http://lkml.kernel.org/r/20190503223146.2312-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""	Andrea Arcangeli	4	-51/+22
	Patch series "reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings". The fixes for what was originally reported as "pathological THP behavior" we rightfully reverted to be sure not to introduced regressions at end of a merge window after a severe regression report from the kernel bot. We can safely re-apply them now that we had time to analyze the problem. The mm process worked fine, because the good fixes were eventually committed upstream without excessive delay. The regression reported by the kernel bot however forced us to revert the good fixes to be sure not to introduce regressions and to give us the time to analyze the issue further. The silver lining is that this extra time allowed to think more at this issue and also plan for a future direction to improve things further in terms of THP NUMA locality. This patch (of 2): This reverts commit 356ff8a9a78fb35d ("Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). So it reapplies 89c83fb539f954 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). Consolidation of the THP allocation flags at the same place was meant to be a clean up to easier handle otherwise scattered code which is imposing a maintenance burden. There were no real problems observed with the gfp mask consolidation but the reversion was rushed through without a larger consensus regardless. This patch brings the consolidation back because this should make the long term maintainability easier as well as it should allow future changes to be less error prone. [mhocko@kernel.org: changelog additions] Link: http://lkml.kernel.org/r/20190503223146.2312-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used	Qian Cai	1	-3/+18
	A compiler throws a warning on an arm64 system since commit 9849a5697d3d ("arch, mm: convert all architectures to use 5level-fixup.h"), mm/kasan/init.c: In function 'kasan_free_p4d': mm/kasan/init.c:344:9: warning: variable 'p4d' set but not used [-Wunused-but-set-variable] p4d_t *p4d; ^~~ because p4d_none() in "5level-fixup.h" is compiled away while it is a static inline function in "pgtable-nopud.h". However, if converted p4d_none() to a static inline there, powerpc would be unhappy as it reads those in assembler language in "arch/powerpc/include/asm/book3s/64/pgtable.h", so it needs to skip assembly include for the static inline C function. While at it, converted a few similar functions to be consistent with the ones in "pgtable-nopud.h". Link: http://lkml.kernel.org/r/20190806232917.881-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	seq_file: fix problem when seeking mid-record	NeilBrown	1	-1/+1
	If you use lseek or similar (e.g. pread) to access a location in a seq_file file that is within a record, rather than at a record boundary, then the first read will return the remainder of the record, and the second read will return the whole of that same record (instead of the next record). When seeking to a record boundary, the next record is correctly returned. This bug was introduced by a recent patch (identified below). Before that patch, seq_read() would increment m->index when the last of the buffer was returned (m->count == 0). After that patch, we rely on ->next to increment m->index after filling the buffer - but there was one place where that didn't happen. Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/ Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NeilBrown <neilb@suse.com> Reported-by: Sergei Turchanov <turchanov@farpost.com> Tested-by: Sergei Turchanov <turchanov@farpost.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	mm: workingset: fix vmstat counters for shadow nodes	Roman Gushchin	3	-6/+43
	Memcg counters for shadow nodes are broken because the memcg pointer is obtained in a wrong way. The following approach is used: virt_to_page(xa_node)->mem_cgroup Since commit 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") page->mem_cgroup pointer isn't set for slab pages, so memcg_from_slab_page() should be used instead. Also I doubt that it ever worked correctly: virt_to_head_page() should be used instead of virt_to_page(). Otherwise objects residing on tail pages are not accounted, because only the head page contains a valid mem_cgroup pointer. That was a case since the introduction of these counters by the commit 68d48e6a2df5 ("mm: workingset: add vmstat counter for shadow nodes"). Link: http://lkml.kernel.org/r/20190801233532.138743-1-guro@fb.com Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	mm/usercopy: use memory range to be accessed for wraparound check	Isaac J. Manjarres	1	-1/+1
	Currently, when checking to see if accessing n bytes starting at address "ptr" will cause a wraparound in the memory addresses, the check in check_bogus_address() adds an extra byte, which is incorrect, as the range of addresses that will be accessed is [ptr, ptr + (n - 1)]. This can lead to incorrectly detecting a wraparound in the memory address, when trying to read 4 KB from memory that is mapped to the the last possible page in the virtual address space, when in fact, accessing that range of memory would not cause a wraparound to occur. Use the memory range that will actually be accessed when considering if accessing a certain amount of bytes will cause the memory address to wrap around. Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org Fixes: f5509cc18daa ("mm: Hardened usercopy") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Co-developed-by: Prasad Sodagudi <psodagud@codeaurora.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Trilok Soni <tsoni@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	mm: kmemleak: disable early logging in case of error	Catalin Marinas	1	-1/+1
	If an error occurs during kmemleak_init() (e.g. kmem cache cannot be created), kmemleak is disabled but kmemleak_early_log remains enabled. Subsequently, when the .init.text section is freed, the log_early() function no longer exists. To avoid a page fault in such scenario, ensure that kmemleak_disable() also disables early logging. Link: http://lkml.kernel.org/r/20190731152302.42073-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13	mm/vmalloc.c: fix percpu free VM area search criteria	Kuppuswamy Sathyanarayanan	1	-1/+11
	Recent changes to the vmalloc code by commit 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") can cause spurious percpu allocation failures. These, in turn, can result in panic()s in the slub code. One such possible panic was reported by Dave Hansen in following link https://lkml.org/lkml/2019/6/19/939. Another related panic observed is, RIP: 0033:0x7f46f7441b9b Call Trace: dump_stack+0x61/0x80 pcpu_alloc.cold.30+0x22/0x4f mem_cgroup_css_alloc+0x110/0x650 cgroup_apply_control_enable+0x133/0x330 cgroup_mkdir+0x41b/0x500 kernfs_iop_mkdir+0x5a/0x90 vfs_mkdir+0x102/0x1b0 do_mkdirat+0x7d/0xf0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 VMALLOC memory manager divides the entire VMALLOC space (VMALLOC_START to VMALLOC_END) into multiple VM areas (struct vm_areas), and it mainly uses two lists (vmap_area_list & free_vmap_area_list) to track the used and free VM areas in VMALLOC space. And pcpu_get_vm_areas(offsets[], sizes[], nr_vms, align) function is used for allocating congruent VM areas for percpu memory allocator. In order to not conflict with VMALLOC users, pcpu_get_vm_areas allocates VM areas near the end of the VMALLOC space. So the search for free vm_area for the given requirement starts near VMALLOC_END and moves upwards towards VMALLOC_START. Prior to commit 68ad4a330433, the search for free vm_area in pcpu_get_vm_areas() involves following two main steps. Step 1: Find a aligned "base" adress near VMALLOC_END. va = free vm area near VMALLOC_END Step 2: Loop through number of requested vm_areas and check, Step 2.1: if (base < VMALLOC_START) 1. fail with error Step 2.2: // end is offsets[area] + sizes[area] if (base + end > va->vm_end) 1. Move the base downwards and repeat Step 2 Step 2.3: if (base + start < va->vm_start) 1. Move to previous free vm_area node, find aligned base address and repeat Step 2 But Commit 68ad4a330433 removed Step 2.2 and modified Step 2.3 as below: Step 2.3: if (base + start < va->vm_start \|\| base + end > va->vm_end) 1. Move to previous free vm_area node, find aligned base address and repeat Step 2 Above change is the root cause of spurious percpu memory allocation failures. For example, consider a case where a relatively large vm_area (~ 30 TB) was ignored in free vm_area search because it did not pass the base + end < vm->vm_end boundary check. Ignoring such large free vm_area's would lead to not finding free vm_area within boundary of VMALLOC_start to VMALLOC_END which in turn leads to allocation failures. So modify the search algorithm to include Step 2.2. Link: http://lkml.kernel.org/r/20190729232139.91131-1-sathyanarayanan.kuppuswamy@linux.intel.com Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reported-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: sathyanarayanan kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>