linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-06-16	ata: pata_sc1200: sc1200_sht'Avoid overwriting initialised field in '	Lee Jones	1	-1/+2
	Fixes the following W=1 kernel build warning(s): drivers/ata/pata_sc1200.c:197:18: warning: initialized field overwritten [-Woverride-init] drivers/ata/pata_sc1200.c:197:18: note: (near initialization for ‘sc1200_sht.sg_tablesize’) Cc: Jens Axboe <axboe@kernel.dk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Mark Lord <mlord@pobox.com> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-10-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: pata_cs5530: Avoid overwriting initialised field in 'cs5530_sht'	Lee Jones	1	-1/+2
	Fixes the following W=1 kernel build warning(s): drivers/ata/pata_cs5530.c:151:18: warning: initialized field overwritten [-Woverride-init] drivers/ata/pata_cs5530.c:151:18: note: (near initialization for ‘cs5530_sht.sg_tablesize’) Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-9-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: pata_cs5520: Avoid overwriting initialised field in 'cs5520_sht'	Lee Jones	1	-1/+2
	Fixes the following W=1 kernel build warning(s): drivers/ata/pata_cs5520.c:99:19: warning: initialized field overwritten [-Woverride-init] drivers/ata/pata_cs5520.c:99:19: note: (near initialization for ‘cs5520_sht.sg_tablesize’) Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-8-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: pata_atiixp: Avoid overwriting initialised field in 'atiixp_sht'	Lee Jones	1	-1/+2
	Fixes the following W=1 kernel build warning(s): drivers/ata/pata_atiixp.c:256:19: warning: initialized field overwritten [-Woverride-init] drivers/ata/pata_atiixp.c:256:19: note: (near initialization for ‘atiixp_sht.sg_tablesize’) Cc: Jens Axboe <axboe@kernel.dk> Cc: ATI Inc <hyu@ati.com> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-7-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: sata_nv: Do not over-write initialise fields in 'nv_adma_sht' and 'nv_swncq_sht'	Lee Jones	1	-2/+8
	Fixes the following W=1 kernel build warning(s): drivers/ata/sata_nv.c:379:16: warning: initialized field overwritten [-Woverride-init] drivers/ata/sata_nv.c:379:16: note: (near initialization for ‘nv_adma_sht.can_queue’) drivers/ata/sata_nv.c:382:21: warning: initialized field overwritten [-Woverride-init] drivers/ata/sata_nv.c:382:21: note: (near initialization for ‘nv_adma_sht.slave_configure’) drivers/ata/sata_nv.c:387:16: warning: initialized field overwritten [-Woverride-init] drivers/ata/sata_nv.c:387:16: note: (near initialization for ‘nv_swncq_sht.can_queue’) drivers/ata/sata_nv.c:390:21: warning: initialized field overwritten [-Woverride-init] drivers/ata/sata_nv.c:390:21: note: (near initialization for ‘nv_swncq_sht.slave_configure’) Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-6-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: sata_mv: Do not over-write initialise fields in 'mv6_sht'	Lee Jones	1	-1/+5
	Fixes the following W=1 kernel build warning(s): drivers/ata/sata_mv.c:670:16: warning: initialized field overwritten [-Woverride-init] drivers/ata/sata_mv.c:670:16: note: (near initialization for ‘mv6_sht.can_queue’) Cc: Jens Axboe <axboe@kernel.dk> Cc: Mark Lord <mlord@pobox.com> Cc: ALWAYS copy <linux-ide@vger.kernel.org> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-5-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: sata_sil24: Do not over-write initialise fields in 'sil24_sht'	Lee Jones	1	-1/+4
	Fixes the following W=1 kernel build warning(s): In file included from drivers/ata/sata_sil24.c:14: drivers/ata/sata_sil24.c:378:16: warning: initialized field overwritten [-Woverride-init] drivers/ata/sata_sil24.c:378:16: note: (near initialization for ‘sil24_sht.can_queue’) Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-4-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: ahci: Ensure initialised fields are not overwritten in AHCI_SHT()	Lee Jones	1	-2/+5
	Fixes the following W=1 kernel build warning(s): In file included from drivers/ata/ahci_platform.c:21: drivers/ata/ahci.h:388:16: warning: initialized field overwritten [-Woverride-init] drivers/ata/ahci_platform.c:40:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:388:16: note: (near initialization for ‘ahci_platform_sht.can_queue’) drivers/ata/ahci_platform.c:40:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:392:17: warning: initialized field overwritten [-Woverride-init] drivers/ata/ahci_platform.c:40:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:392:17: note: (near initialization for ‘ahci_platform_sht.sdev_attrs’) drivers/ata/ahci_platform.c:40:2: note: in expansion of macro ‘AHCI_SHT’ In file included from drivers/ata/ahci_mtk.c:18: drivers/ata/ahci.h:388:16: warning: initialized field overwritten [-Woverride-init] drivers/ata/ahci_mtk.c:41:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:388:16: note: (near initialization for ‘ahci_platform_sht.can_queue’) drivers/ata/ahci_mtk.c:41:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:392:17: warning: initialized field overwritten [-Woverride-init] drivers/ata/ahci_mtk.c:41:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:392:17: note: (near initialization for ‘ahci_platform_sht.sdev_attrs’) drivers/ata/ahci_mtk.c:41:2: note: in expansion of macro ‘AHCI_SHT’ In file included from drivers/ata/ahci_dm816.c:16: drivers/ata/ahci.h:388:16: warning: initialized field overwritten [-Woverride-init] drivers/ata/ahci_dm816.c:138:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:388:16: note: (near initialization for ‘ahci_dm816_platform_sht.can_queue’) drivers/ata/ahci_dm816.c:138:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:392:17: warning: initialized field overwritten [-Woverride-init] drivers/ata/ahci_dm816.c:138:2: note: in expansion of macro ‘AHCI_SHT’ drivers/ata/ahci.h:392:17: note: (near initialization for ‘ahci_dm816_platform_sht.sdev_attrs’) drivers/ata/ahci_dm816.c:138:2: note: in expansion of macro ‘AHCI_SHT’ NB: Snipped 150 lines of this for brevity! Cc: Jens Axboe <axboe@kernel.dk> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: ALWAYS copy <linux-ide@vger.kernel.org> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-3-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: include: libata: Move fields commonly over-written to separate MACRO	Lee Jones	1	-5/+8
	This is a pre-cursor to some upcoming W=1 fix-ups. Fixes the following W=1 kernel build warning(s): Cc: Jens Axboe <axboe@kernel.dk> Cc: Mark Lord <mlord@pobox.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-2-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15	ahci: Add support for Dell S140 and later controllers	Charles Rose	2	-0/+6
	This patch enables support for Dell S140 and later controllers that use Intel's PCHs configured as PCI_CLASS_STORAGE_RAID. Reviewed-by: Mika Westerberg <mika.westerberg@intel.com> Signed-off-by: Charles Rose <charles.rose@dell.com> Link: https://lore.kernel.org/r/20210615190801.1744466-1-charles.rose@dell.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-14	ata: ahci_sunxi: Disable DIPM	Timo Sigurdsson	1	-1/+1
	DIPM is unsupported or broken on sunxi. Trying to enable the power management policy med_power_with_dipm on an Allwinner A20 SoC based board leads to immediate I/O errors and the attached SATA disk disappears from the /dev filesystem. A reset (power cycle) is required to make the SATA controller or disk work again. The A10 and A20 SoC data sheets and manuals don't mention DIPM at all [1], so it's fair to assume that it's simply not supported. But even if it was, it should be considered broken and best be disabled in the ahci_sunxi driver. [1] https://github.com/allwinner-zh/documents/tree/master/ Fixes: c5754b5220f0 ("ARM: sunxi: Add support for Allwinner SUNXi SoCs sata to ahci_platform") Cc: stable@vger.kernel.org Signed-off-by: Timo Sigurdsson <public_timo.s@silentcreek.de> Tested-by: Timo Sigurdsson <public_timo.s@silentcreek.de> Link: https://lore.kernel.org/r/20210614072539.3307-1-public_timo.s@silentcreek.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-10	m68k/q40: Replace q40ide driver with pata_falcon and falconide	Finn Thain	10	-245/+141
	This allows m68k q40 systems to switch from the deprecated IDE subsystem to libata. Enhance the byte-swapping falconide and pata_falcon platform drivers to accept an irq resource, for use on q40. Atari ST-DMA IRQ arrangements seem to co-exist with q40 IRQ arrangements without too much mess. The new IO resources were added solely for the purpose of making request_region() reservations identical to those made by q40ide: these regions aren't used for actual IO. Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: Richard Zidlicky <rz@linux-m68k.org> Reviewed-and-tested-by: Michael Schmitz <schmitzmic@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/eefb7e9c2291e09fb4e065ce06bc105f05bb9e06.1623287706.git.fthain@linux-m68k.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-10	m68k/mac: Replace macide driver with generic platform drivers	Finn Thain	6	-188/+14
	This allows m68k mac systems to switch from the deprecated IDE subsystem to libata. This was tested on my Quadra 630. I haven't tested it on my PowerBook 150 because I don't have a RAM adapter board for it. It appears that the hardware I tested doesn't need macide_clear_irq() or macide_test_irq(). If it did, the generic driver would not have worked. It's possible that those routines are needed for the PowerBook 150 but we can cross that bridge if and when we come to it. BTW, macide_clear_irq() appears to suffer from a race condition. The write to the interrupt flags register could have unintended side effects as it may alter other flag bits. Fortunately, all of the other bits are unused by Linux. When tested on my Quadra 630, the assignment *ide_ifr &= ~0x20 was observed to have no effect on bit 5, so it may be redundant anyway. Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: Joshua Thompson <funaho@jurai.org> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/11a56b3317df3bb2ddc15fd29b40b6820e9c7444.1623287706.git.fthain@linux-m68k.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-27	pata_ep93xx: fix deferred probing	Sergey Shtylyov	1	-1/+1
	The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver would fail the probe permanently instead of the deferred probing. Propagate the error code upstream, as it should have been done from the start... Fixes: 2fff27512600 ("PATA host controller driver for ep93xx") Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru> Link: https://lore.kernel.org/r/509fda88-2e0d-2cc7-f411-695d7e94b136@omprussia.ru Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-19	pata_octeon_cf: avoid WARN_ON() in ata_host_activate()	Sergey Shtylyov	1	-2/+3
	Iff platform_get_irq() fails (or returns IRQ0) and thus the polling mode has to be used, ata_host_activate() hits the WARN_ON() due to 'irq_handler' parameter being non-NULL if the polling mode is selected. Let's only set the pointer to the driver's IRQ handler if platform_get_irq() returns a valid IRQ # -- this should avoid the unnecessary WARN_ON()... Fixes: 43f01da0f279 ("MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/3a241167-f84d-1d25-5b9b-be910afbe666@omp.ru Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-18	pata_rb532_cf: fix deferred probing	Sergey Shtylyov	1	-2/+4
	The driver overrides the error codes returned by platform_get_irq() to -ENOENT, so if it returns -EPROBE_DEFER, the driver would fail the probe permanently instead of the deferred probing. Switch to propagating the error code upstream, still checking/overriding IRQ0 as libata regards it as "no IRQ" (thus polling) anyway... Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru> Link: https://lore.kernel.org/r/771ced55-3efb-21f5-f21c-b99920aae611@omprussia.ru Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-18	sata_highbank: fix deferred probing	Sergey Shtylyov	1	-2/+4
	The driver overrides the error codes returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver would fail the probe permanently instead of the deferred probing. Switch to propagating the error code upstream, still checking/overriding IRQ0 as libata regards it as "no IRQ" (thus polling) anyway... Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru> Link: https://lore.kernel.org/r/105b456d-1199-f6e9-ceb7-ffc5ba551d1a@omprussia.ru Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-17	sata: nv: fix debug format string mismatch	Arnd Bergmann	1	-1/+1
	Turning on debugging in this this driver reveals a type mismatch: In file included from include/linux/kernel.h:17, from drivers/ata/sata_nv.c:23: drivers/ata/sata_nv.c: In function 'nv_swncq_sdbfis': drivers/ata/sata_nv.c:2121:10: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] 2121 \| DPRINTK("id 0x%x QC: qc_active 0x%x," \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ...... 2124 \| ap->print_id, ap->qc_active, pp->qc_active, \| ~~~~~~~~~~~~~ \| \| \| u64 {aka long long unsigned int} include/linux/printk.h:142:10: note: in definition of macro 'no_printk' 142 \| printk(fmt, ##__VA_ARGS__); \ \| ^~~ drivers/ata/sata_nv.c:2121:2: note: in expansion of macro 'DPRINTK' 2121 \| DPRINTK("id 0x%x QC: qc_active 0x%x," \| ^~~~~~~ drivers/ata/sata_nv.c:2121:36: note: format string is defined here 2121 \| DPRINTK("id 0x%x QC: qc_active 0x%x," \| ~^ \| \| \| unsigned int \| %llx Use the correct format string for the u64 type. Fixes: e3ed89396441 ("libata: bump ->qc_active to a 64-bit type") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210514140105.3080580-1-arnd@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-17	sata: fsl: fix DPRINTK format string	Arnd Bergmann	1	-1/+1
	Printing an __iomem pointer as %x produces a warning: drivers/ata/sata_fsl.c: In function 'fsl_sata_set_irq_coalescing': drivers/ata/sata_fsl.c:316:17: error: format '%x' expects argument of type 'unsigned int', but argument 2 has type 'void ' [-Werror=format=] 316 \| DPRINTK("ICC register status: (hcr base: 0x%x) = 0x%x\n", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 317 \| hcr_base, ioread32(hcr_base + ICC)); \| ~~~~~~~~ \| \| \| void It's not clear why that pointer should be printed here, but if we do, then using %p is the way to avoid the warnings. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210514213402.691436-1-arnd@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-17	ata: Replace inclusion of kernel.h by bits.h in the header	Andy Shevchenko	1	-1/+1
	ata.h uses BIT() macro, hence bits.h must be included. Otherwise there is no need to have kernel.h included, I do not see any direct users of it in ata.h. Hence replace inclusion of kernel.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210409153456.87798-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-16	Linux 5.13-rc2	Linus Torvalds	1	-1/+1

2021-05-15	tty: vt: always invoke vc->vc_sw->con_resize callback	Tetsuo Handa	2	-2/+2
	syzbot is reporting OOB write at vga16fb_imageblit() [1], for resize_screen() from ioctl(VT_RESIZE) returns 0 without checking whether requested rows/columns fit the amount of memory reserved for the graphical screen if current mode is KD_GRAPHICS. ---------- #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kd.h> #include <linux/vt.h> int main(int argc, char *argv[]) { const int fd = open("/dev/char/4:1", O_RDWR); struct vt_sizes vt = { 0x4100, 2 }; ioctl(fd, KDSETMODE, KD_GRAPHICS); ioctl(fd, VT_RESIZE, &vt); ioctl(fd, KDSETMODE, KD_TEXT); return 0; } ---------- Allow framebuffer drivers to return -EINVAL, by moving vc->vc_mode != KD_GRAPHICS check from resize_screen() to fbcon_resize(). Link: https://syzkaller.appspot.com/bug?extid=1f29e126cf461c4de3b3 [1] Reported-by: syzbot <syzbot+1f29e126cf461c4de3b3@syzkaller.appspotmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+1f29e126cf461c4de3b3@syzkaller.appspotmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	mm/ioremap: fix iomap_max_page_shift	Christophe Leroy	1	-3/+3
	iomap_max_page_shift is expected to contain a page shift, so it can't be a 'bool', has to be an 'unsigned int' And fix the default values: P4D_SHIFT is when huge iomap is allowed. However, on some architectures (eg: powerpc book3s/64), P4D_SHIFT is not a constant so it can't be used to initialise a static variable. So, initialise iomap_max_page_shift with a maximum shift supported by the architecture, it is gated by P4D_SHIFT in vmap_try_huge_p4d() anyway. Link: https://lkml.kernel.org/r/ad2d366015794a9f21320dcbdd0a8eb98979e9df.1620898113.git.christophe.leroy@csgroup.eu Fixes: bbc180a5adb0 ("mm: HUGE_VMAP arch support cleanup") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	docs: admin-guide: update description for kernel.modprobe sysctl	Rasmus Villemoes	1	-4/+5
	When I added CONFIG_MODPROBE_PATH, I neglected to update Documentation/. It's still true that this defaults to /sbin/modprobe, but now via a level of indirection. So document that the kernel might have been built with something other than /sbin/modprobe as the initial value. Link: https://lkml.kernel.org/r/20210420125324.1246826-1-linux@rasmusvillemoes.dk Fixes: 17652f4240f7a ("modules: add CONFIG_MODPROBE_PATH") Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	hfsplus: prevent corruption in shrinking truncate	Jouni Roivas	1	-3/+4
	I believe there are some issues introduced by commit 31651c607151 ("hfsplus: avoid deadlock on file truncation") HFS+ has extent records which always contains 8 extents. In case the first extent record in catalog file gets full, new ones are allocated from extents overflow file. In case shrinking truncate happens to middle of an extent record which locates in extents overflow file, the logic in hfsplus_file_truncate() was changed so that call to hfs_brec_remove() is not guarded any more. Right action would be just freeing the extents that exceed the new size inside extent record by calling hfsplus_free_extents(), and then check if the whole extent record should be removed. However since the guard (blk_cnt > start) is now after the call to hfs_brec_remove(), this has unfortunate effect that the last matching extent record is removed unconditionally. To reproduce this issue, create a file which has at least 10 extents, and then perform shrinking truncate into middle of the last extent record, so that the number of remaining extents is not under or divisible by 8. This causes the last extent record (8 extents) to be removed totally instead of truncating into middle of it. Thus this causes corruption, and lost data. Fix for this is simply checking if the new truncated end is below the start of this extent record, making it safe to remove the full extent record. However call to hfs_brec_remove() can't be moved to it's previous place since we're dropping ->tree_lock and it can cause a race condition and the cached info being invalidated possibly corrupting the node data. Another issue is related to this one. When entering into the block (blk_cnt > start) we are not holding the ->tree_lock. We break out from the loop not holding the lock, but hfs_find_exit() does unlock it. Not sure if it's possible for someone else to take the lock under our feet, but it can cause hard to debug errors and premature unlocking. Even if there's no real risk of it, the locking should still always be kept in balance. Thus taking the lock now just before the check. Link: https://lkml.kernel.org/r/20210429165139.3082828-1-jouni.roivas@tuxera.com Fixes: 31651c607151f ("hfsplus: avoid deadlock on file truncation") Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	mm/filemap: fix readahead return types	Matthew Wilcox (Oracle)	2	-5/+5
	A readahead request will not allocate more memory than can be represented by a size_t, even on systems that have HIGHMEM available. Change the length functions from returning an loff_t to a size_t. Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org Fixes: 32c0a6bcaa1f57 ("btrfs: add and use readahead_batch_length") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled	Peter Collingbourne	1	-6/+23
	These tests deliberately access these arrays out of bounds, which will cause the dynamic local bounds checks inserted by CONFIG_UBSAN_LOCAL_BOUNDS to fail and panic the kernel. To avoid this problem, access the arrays via volatile pointers, which will prevent the compiler from being able to determine the array bounds. These accesses use volatile pointers to char (char volatile) rather than the more conventional pointers to volatile char (volatile char ) because we want to prevent the compiler from making inferences about the pointer itself (i.e. its array bounds), not the data that it refers to. Link: https://lkml.kernel.org/r/20210507025915.1464056-1-pcc@google.com Link: https://linux-review.googlesource.com/id/I90b1713fbfa1bf68ff895aef099ea77b98a7c3b9 Signed-off-by: Peter Collingbourne <pcc@google.com> Tested-by: Alexander Potapenko <glider@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Peter Collingbourne <pcc@google.com> Cc: George Popescu <georgepope@android.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	mm: fix struct page layout on 32-bit systems	Matthew Wilcox (Oracle)	3	-8/+20
	32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long. We always store the low bits in the first word to prevent the PageTail bit from being inadvertently set on a big endian platform. If that happened, get_user_pages_fast() racing against a page which was freed and reallocated to the page_pool could dereference a bogus compound_head(), which would be hard to trace back to this cause. Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Matteo Croce <mcroce@linux.microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"	Hugh Dickins	1	-1/+2
	This reverts commit 3e96b6a2e9ad929a3230a22f4d64a74671a0720b. General Protection Fault in rmap_walk_ksm() under memory pressure: remove_rmap_item_from_tree() needs to take page lock, of course. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2105092253500.1127@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	userfaultfd: release page in error path to avoid BUG_ON	Axel Rasmussen	1	-1/+11
	Consider the following sequence of events: 1. Userspace issues a UFFD ioctl, which ends up calling into shmem_mfill_atomic_pte(). We successfully account the blocks, we shmem_alloc_page(), but then the copy_from_user() fails. We return -ENOENT. We don't release the page we allocated. 2. Our caller detects this error code, tries the copy_from_user() after dropping the mmap_lock, and retries, calling back into shmem_mfill_atomic_pte(). 3. Meanwhile, let's say another process filled up the tmpfs being used. 4. So shmem_mfill_atomic_pte() fails to account blocks this time, and immediately returns - without releasing the page. This triggers a BUG_ON in our caller, which asserts that the page should always be consumed, unless -ENOENT is returned. To fix this, detect if we have such a "dangling" page when accounting fails, and if so, release it before returning. Link: https://lkml.kernel.org/r/20210428230858.348400-1-axelrasmussen@google.com Fixes: cb658a453b93 ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY") Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reported-by: Hugh Dickins <hughd@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	squashfs: fix divide error in calculate_skip()	Phillip Lougher	1	-3/+3
	Sysbot has reported a "divide error" which has been identified as being caused by a corrupted file_size value within the file inode. This value has been corrupted to a much larger value than expected. Calculate_skip() is passed i_size_read(inode) >> msblk->block_log. Due to the file_size value corruption this overflows the int argument/variable in that function, leading to the divide error. This patch changes the function to use u64. This will accommodate any unexpectedly large values due to corruption. The value returned from calculate_skip() is clamped to be never more than SQUASHFS_CACHED_BLKS - 1, or 7. So file_size corruption does not lead to an unexpectedly large return result here. Link: https://lkml.kernel.org/r/20210507152618.9447-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: <syzbot+e8f781243ce16ac2f962@syzkaller.appspotmail.com> Reported-by: <syzbot+7b98870d4fec9447b951@syzkaller.appspotmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	kernel/resource: fix return code check in __request_free_mem_region	Alistair Popple	1	-1/+1
	Splitting an earlier version of a patch that allowed calling __request_region() while holding the resource lock into a series of patches required changing the return code for the newly introduced __request_region_locked(). Unfortunately this change was not carried through to a subsequent commit 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region") in the series. This resulted in a use-after-free due to freeing the struct resource without properly releasing it. Fix this by correcting the return code check so that the struct is not freed if the request to add it was successful. Link: https://lkml.kernel.org/r/20210512073528.22334-1-apopple@nvidia.com Fixes: 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <smuchun@gmail.com> Cc: Oliver Sang <oliver.sang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	mm, slub: move slub_debug static key enabling outside slab_mutex	Vlastimil Babka	2	-9/+10
	Paul E. McKenney reported [1] that commit 1f0723a4c0df ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags") results in the lockdep complaint: ====================================================== WARNING: possible circular locking dependency detected 5.12.0+ #15 Not tainted ------------------------------------------------------ rcu_torture_sta/109 is trying to acquire lock: ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20 but task is already holding lock: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (slab_mutex){+.+.}-{3:3}: lock_acquire+0xb9/0x3a0 __mutex_lock+0x8d/0x920 slub_cpu_dead+0x15/0xf0 cpuhp_invoke_callback+0x17a/0x7c0 cpuhp_invoke_callback_range+0x3b/0x80 _cpu_down+0xdf/0x2a0 cpu_down+0x2c/0x50 device_offline+0x82/0xb0 remove_cpu+0x1a/0x30 torture_offline+0x80/0x140 torture_onoff+0x147/0x260 kthread+0x10a/0x140 ret_from_fork+0x22/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: check_prev_add+0x8f/0xbf0 __lock_acquire+0x13f0/0x1d80 lock_acquire+0xb9/0x3a0 cpus_read_lock+0x21/0xa0 static_key_enable+0x9/0x20 __kmem_cache_create+0x38d/0x430 kmem_cache_create_usercopy+0x146/0x250 kmem_cache_create+0xd/0x10 rcu_torture_stats+0x79/0x280 kthread+0x10a/0x140 ret_from_fork+0x22/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slab_mutex); lock(cpu_hotplug_lock); lock(slab_mutex); lock(cpu_hotplug_lock); * DEADLOCK * 1 lock held by rcu_torture_sta/109: #0: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250 stack backtrace: CPU: 3 PID: 109 Comm: rcu_torture_sta Not tainted 5.12.0+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: dump_stack+0x6d/0x89 check_noncircular+0xfe/0x110 ? lock_is_held_type+0x98/0x110 check_prev_add+0x8f/0xbf0 __lock_acquire+0x13f0/0x1d80 lock_acquire+0xb9/0x3a0 ? static_key_enable+0x9/0x20 ? mark_held_locks+0x49/0x70 cpus_read_lock+0x21/0xa0 ? static_key_enable+0x9/0x20 static_key_enable+0x9/0x20 __kmem_cache_create+0x38d/0x430 kmem_cache_create_usercopy+0x146/0x250 ? rcu_torture_stats_print+0xd0/0xd0 kmem_cache_create+0xd/0x10 rcu_torture_stats+0x79/0x280 ? rcu_torture_stats_print+0xd0/0xd0 kthread+0x10a/0x140 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 This is because there's one order of locking from the hotplug callbacks: lock(cpu_hotplug_lock); // from hotplug machinery itself lock(slab_mutex); // in e.g. slab_mem_going_offline_callback() And commit 1f0723a4c0df made the reverse sequence possible: lock(slab_mutex); // in kmem_cache_create_usercopy() lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable() The simplest fix is to move static_key_enable() to a place before slab_mutex is taken. That means kmem_cache_create_usercopy() in mm/slab_common.c which is not ideal for SLUB-specific code, but the #ifdef CONFIG_SLUB_DEBUG makes it at least self-contained and obvious. [1] https://lore.kernel.org/lkml/20210502171827.GA3670492@paulmck-ThinkPad-P17-Gen-1/ Link: https://lkml.kernel.org/r/20210504120019.26791-1-vbabka@suse.cz Fixes: 1f0723a4c0df ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	mm/hugetlb: fix cow where page writtable in child	Peter Xu	1	-0/+1
	When rework early cow of pinned hugetlb pages, we moved huge_ptep_get() upper but overlooked a side effect that the huge_ptep_get() will fetch the pte after wr-protection. After moving it upwards, we need explicit wr-protect of child pte or we will keep the write bit set in the child process, which could cause data corrution where the child can write to the original page directly. This issue can also be exposed by "memfd_test hugetlbfs" kselftest. Link: https://lkml.kernel.org/r/20210503234356.9097-3-peterx@redhat.com Fixes: 4eae4efa2c299 ("hugetlb: do early cow when page pinned on src mm") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	mm/hugetlb: fix F_SEAL_FUTURE_WRITE	Peter Xu	3	-18/+41
	Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2. Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to hugetlbfs, which I can easily verify using the memfd_test program, which seems that the program is hardly run with hugetlbfs pages (as by default shmem). Meanwhile I found another probably even more severe issue on that hugetlb fork won't wr-protect child cow pages, so child can potentially write to parent private pages. Patch 2 addresses that. After this series applied, "memfd_test hugetlbfs" should start to pass. This patch (of 2): F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. There is a test program for that and it fails constantly. $ ./memfd_test hugetlbfs memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE mmap() didn't fail as expected Aborted (core dumped) I think it's probably because no one is really running the hugetlbfs test. Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in shmem_mmap(). Generalize a helper for that. Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14	arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()	Catalin Marinas	1	-1/+3
	To ensure that instructions are observable in a new mapping, the arm64 set_pte_at() implementation cleans the D-cache and invalidates the I-cache to the PoU. As an optimisation, this is only done on executable mappings and the PG_dcache_clean page flag is set to avoid future cache maintenance on the same page. When two different processes map the same page (e.g. private executable file or shared mapping) there's a potential race on checking and setting PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the fault paths the page is locked (PG_locked), mprotect() does not take the page lock. The result is that one process may see the PG_dcache_clean flag set but the I/D cache maintenance not yet performed. Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit() and set_bit(). In the rare event of a race, the cache maintenance is done twice. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210514095001.13236-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-05-14	block/partitions/efi.c: Fix the efi_partition() kernel-doc header	Bart Van Assche	1	-1/+1
	Fix the following kernel-doc warning: block/partitions/efi.c:685: warning: wrong kernel-doc identifier on line: * efi_partition(struct parsed_partitions *state) Cc: Alexander Viro <viro@math.psu.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210513171708.8391-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	blk-mq: Swap two calls in blk_mq_exit_queue()	Bart Van Assche	1	-2/+4
	If a tag set is shared across request queues (e.g. SCSI LUNs) then the block layer core keeps track of the number of active request queues in tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is cleared by blk_mq_del_queue_tag_set(). Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Fixes: 0d2602ca30e4 ("blk-mq: improve support for shared tags maps") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210513171529.7977-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	blk-mq: plug request for shared sbitmap	Ming Lei	1	-2/+3
	In case of shared sbitmap, request won't be held in plug list any more sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset"), this way makes request merge from flush plug list & batching submission not possible, so cause performance regression. Yanhui reports performance regression when running sequential IO test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk is emulated with image stored on xfs/megaraid_sas. Fix the issue by recovering original behavior to allow to hold request in plug list. Cc: Yanhui Ma <yama@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: kashyap.desai@broadcom.com Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514022052.1047665-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	xen/swiotlb: check if the swiotlb has already been initialized	Stefano Stabellini	2	-1/+12
	xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with -ENOMEM if the swiotlb has already been initialized. Add an explicit check io_tlb_default_mem != NULL at the beginning of xen_swiotlb_init. If the swiotlb is already initialized print a warning and return -EEXIST. On x86, the error propagates. On ARM, we don't actually need a special swiotlb buffer (yet), any buffer would do. So ignore the error and continue. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210512201823.1963-3-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14	arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required	Christoph Hellwig	1	-1/+2
	Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init, today dma_direct_map_page returns error if SWIOTLB_NO_FORCE. For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it is going to be required later (e.g. Xen requires it). CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: catalin.marinas@arm.com CC: will@kernel.org CC: linux-arm-kernel@lists.infradead.org Fixes: 2726bf3ff252 ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14	xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h	Stefano Stabellini	2	-13/+14
	Move xen_swiotlb_detect to a static inline function to make it available to !CONFIG_XEN builds. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210512201823.1963-1-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14	clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86	Vitaly Kuznetsov	2	-2/+4
	Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029) the commit e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") broke vDSO on x86. The problem appears to be that VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and '#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not a define). Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead. Fixes: e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") Reported-by: Mohammed Gamal <mgamal@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
2021-05-14	io_uring: increase max number of reg buffers	Pavel Begunkov	1	-1/+3
	Since recent changes instead of storing a large array of struct io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and we should not to so worry about restricting max number of registererd buffer slots, increase the limit 4 times. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	io_uring: further remove sqpoll limits on opcodes	Pavel Begunkov	1	-4/+2
	There are three types of requests that left disabled for sqpoll, namely epoll ctx, statx, and resources update. Since SQPOLL task is now closely mimics a userspace thread, remove the restrictions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	io_uring: fix ltout double free on completion race	Pavel Begunkov	1	-3/+4
	Always remove linked timeout on io_link_timeout_fn() from the master request link list, otherwise we may get use-after-free when first io_link_timeout_fn() puts linked timeout in the fail path, and then will be found and put on master's free. Cc: stable@vger.kernel.org # 5.10+ Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly") Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	powerpc/64e/interrupt: Fix nvgprs being clobbered	Nicholas Piggin	1	-14/+24
	Some interrupt handlers have an "extra" that saves 1 or 2 registers (r14, r15) in the paca save area and makes them available to use by the handler. The change to always save nvgprs in exception handlers lead to some interrupt handlers saving those scratch r14 / r15 registers into the interrupt frame's GPR saves, which get restored on interrupt exit. Fix this by always reloading those scratch registers from paca before the EXCEPTION_COMMON that saves nvgprs. Fixes: 4228b2c3d20e ("powerpc/64e/interrupt: always save nvgprs on interrupt") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
2021-05-14	powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled	Nicholas Piggin	1	-0/+7
	scv support introduced the notion of code that implicitly soft-masks irqs due to the instruction addresses. This is required because scv enters the kernel with MSR[EE]=1. If a NMI (including soft-NMI) interrupt hits when we are implicitly soft-masked then its regs->softe does not reflect this because it is derived from the explicit soft mask state (paca->irq_soft_mask). This makes arch_irq_disabled_regs(regs) return false. This can trigger a warning in the soft-NMI watchdog code (shown below). Fix it by having NMI interrupts set regs->softe to disabled in case of interrupting an implicit soft-masked region. ------------[ cut here ]------------ WARNING: CPU: 41 PID: 1103 at arch/powerpc/kernel/watchdog.c:259 soft_nmi_interrupt+0x3e4/0x5f0 CPU: 41 PID: 1103 Comm: (spawn) Not tainted NIP: c000000000039534 LR: c000000000039234 CTR: c000000000009a00 REGS: c000007fffbcf940 TRAP: 0700 Not tainted MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 22042482 XER: 200400ad CFAR: c000000000039260 IRQMASK: 3 GPR00: c000000000039204 c000007fffbcfbe0 c000000001d6c300 0000000000000003 GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020 GPR08: 0000007ffd4e0000 0000000000000000 c000007ffffceb00 7265677368657265 GPR12: 9000000000009033 c000007ffffceb00 00000f7075bf4480 000000000000002a GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000 GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040 GPR24: 0000000000000000 0000000000000029 c000000001dae058 0000000000000029 GPR28: 0000000000000000 0000000000000800 0000000000000009 c000007fffbcfd60 NIP [c000000000039534] soft_nmi_interrupt+0x3e4/0x5f0 LR [c000000000039234] soft_nmi_interrupt+0xe4/0x5f0 Call Trace: [c000007fffbcfbe0] [c000000000039204] soft_nmi_interrupt+0xb4/0x5f0 (unreliable) [c000007fffbcfcf0] [c00000000000c0e8] soft_nmi_common+0x138/0x1c4 --- interrupt: 900 at end_real_trampolines+0x0/0x1000 NIP: c000000000003000 LR: 00007ca426adb03c CTR: 900000000280f033 REGS: c000007fffbcfd60 TRAP: 0900 MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44042482 XER: 200400ad CFAR: 00007ca426946020 IRQMASK: 0 GPR00: 00000000000000ad 00007ffffa45d050 00007ca426b07f00 0000000000000035 GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020 GPR08: 0000000000000000 0000000000100000 0000000010000000 00007ffffa45d110 GPR12: 0000000000000001 00007ca426d4e680 00000f7075bf4480 000000000000002a GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000 GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040 GPR24: 0000000000000000 00000f7057473f68 0000000000000003 000000000000041b GPR28: 00007ffffa45d4c4 0000000000000035 0000000000000000 00000f7057473f68 NIP [c000000000003000] end_real_trampolines+0x0/0x1000 LR [00007ca426adb03c] 0x7ca426adb03c --- interrupt: 900 Instruction dump: 60000000 60000000 60420000 38600001 482b3ae5 60000000 e93f0138 a36d0008 7daa6b78 71290001 7f7907b4 4082fd34 <0fe00000> 4bfffd2c 60420000 ea6100a8 ---[ end trace dc75f67d819779da ]--- Fixes: 118178e62e2e ("powerpc: move NMI entry/exit code into wrapper") Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503111708.758261-1-npiggin@gmail.com
2021-05-14	powerpc/64s: Fix stf mitigation patching w/strict RWX & hash	Michael Ellerman	1	-10/+10
	The stf entry barrier fallback is unsafe to execute in a semi-patched state, which can happen when enabling/disabling the mitigation with strict kernel RWX enabled and using the hash MMU. See the previous commit for more details. Fix it by changing the order in which we patch the instructions. Note the stf barrier fallback is only used on Power6 or earlier. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
2021-05-14	powerpc/64s: Fix entry flush patching w/strict RWX & hash	Michael Ellerman	1	-16/+43
	The entry flush mitigation can be enabled/disabled at runtime. When this happens it results in the kernel patching its own instructions to enable/disable the mitigation sequence. With strict kernel RWX enabled instruction patching happens via a secondary mapping of the kernel text, so that we don't have to make the primary mapping writable. With the hash MMU this leads to a hash fault, which causes us to execute the exception entry which contains the entry flush mitigation. This means we end up executing the entry flush in a semi-patched state, ie. after we have patched the first instruction but before we patch the second or third instruction of the sequence. On machines with updated firmware the entry flush is a series of special nops, and it's safe to to execute in a semi-patched state. However when using the fallback flush the sequence is mflr/branch/mtlr, and so it's not safe to execute if we have patched out the mflr but not the other two instructions. Doing so leads to us corrputing LR, leading to an oops, for example: # echo 0 > /sys/kernel/debug/powerpc/entry_flush kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc000000002971000 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1 NIP: c000000002971000 LR: c000000002971000 CTR: c000000000120c40 REGS: c000000013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce) MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48428482 XER: 00000000 ... NIP 0xc000000002971000 LR 0xc000000002971000 Call Trace: do_patch_instruction+0xc4/0x340 (unreliable) do_entry_flush_fixups+0x100/0x3b0 entry_flush_set+0x50/0xe0 simple_attr_write+0x160/0x1a0 full_proxy_write+0x8c/0x110 vfs_write+0xf0/0x340 ksys_write+0x84/0x140 system_call_exception+0x164/0x2d0 system_call_common+0xec/0x278 The simplest fix is to change the order in which we patch the instructions, so that the sequence is always safe to execute. For the non-fallback flushes it doesn't matter what order we patch in. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au