linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-07-15	vfio-ccw: Fix memory leak and don't call cp_free in cp_init	Farhan Ali	1	-4/+7
	We don't set cp->initialized to true so calling cp_free will just return and not do anything. Also fix a memory leak where we fail to free a ccwchain on an error. Fixes: 812271b910 ("s390/cio: Squash cp_free() and cp_unpin_free()") Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Message-Id: <3173c4216f4555d9765eb6e4922534982bc820e4.1562854091.git.alifm@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-07-15	vfio-ccw: Fix misleading comment when setting orb.cmd.c64	Farhan Ali	1	-6/+7
	The comment is misleading because it tells us that we should set orb.cmd.c64 before calling ccwchain_calc_length, otherwise the function ccwchain_calc_length would return an error. This is not completely accurate. We want to allow an orb without cmd.c64, and this is fine as long as the channel program does not use IDALs. But we do want to reject any channel program that uses IDALs and does not set the flag, which is what we do in ccwchain_calc_length. After we have done the ccw processing, we need to set cmd.c64, as we use IDALs for all translated channel programs. Also for better code readability let's move the setting of cmd.c64 within the non error path. Fixes: fb9e7880af35 ("vfio: ccw: push down unsupported IDA check") Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <f68636106aef0faeb6ce9712584d102d1b315ff8.1562854091.git.alifm@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-07-11	s390/unwind: avoid int overflow in outside_of_stack	Vasily Gorbik	1	-1/+1
	When current task is interrupted in-between stack frame allocation and backchain write instructions new stack frame backchain pointer is left uninitialized. That invalid backchain value is passed into outside_of_stack for sanity check. Make sure int overflow does not happen by subtracting stack_frame size from the stack "end" rather than adding it to "random" backchain value. Fixes: 41b0474c1b1c ("s390/unwind: introduce stack unwind API") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/zcrypt: remove the exporting of ap_query_configuration	Denis Efremov	1	-1/+0
	The function ap_query_configuration is declared static and marked EXPORT_SYMBOL, which is at best an odd combination. Because the function is not used outside of the drivers/s390/crypto/ap_bus.c file it is defined in, this commit removes the EXPORT_SYMBOL() marking. Link: http://lkml.kernel.org/r/20190709122507.11158-1-efremov@linux.com Fixes: f1b0a4343c41 ("s390/zcrypt: Integrate ap_asm.h into include/asm/ap.h.") Fixes: 050349b5b71d ("s390/zcrypt: externalize AP config info query") Signed-off-by: Denis Efremov <efremov@linux.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/pci: add mio_enabled attribute	Sebastian Ott	1	-0/+10
	Provide an attribute to query the usage of mio instructions. Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390: fix setting of mio addressing control	Sebastian Ott	3	-13/+3
	Move enablement of mio addressing control from detect_machine_facilities to pci_base_init. detect_machine_facilities runs so early that the static branches have not been toggled yet, thus mio addressing control was always off. In pci_base_init we have to use the SMP aware ctl_set_bit though. Fixes: 833b441ec0f6 ("s390: enable processes for mio instructions") Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/ipl: Fix detection of has_secure attribute	Philipp Rudo	3	-8/+1
	Use the correct bit for detection of the machine capability associated with the has_secure attribute. It is expected that the underlying platform (including hypervisors) unsets the bit when they don't provide secure ipl for their guests. Fixes: c9896acc7851 ("s390/ipl: Provide has_secure sysfs attribute") Cc: stable@vger.kernel.org # 5.2 Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390: vfio-ap: fix irq registration	Christian Borntraeger	1	-2/+1
	vfio_ap_free_aqic_resources is called in two places: - during registration to have a "known state" - during interrupt disable We must not clear q->matrix_mdev in the registration phase as this will mess up the reference counting and can lead to some warning and other bugs. Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel") Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/cpumf: Add extended counter set definitions for model 8561 and 8562	Thomas Richter	1	-0/+2
	Add the extended counter set definitions for s390 machine types 8561 and 8262. They are identical with machine types 3906 and 3907. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Handle out-of-space constraint	Jan Höppner	5	-3/+243
	The storage server issues three different types of out-of-space messages whenever the Extent Pool or Extent Repository space runs short. When a configured warning watermark is reached, the physical space is completeley exhausted, or the capacity constraints have been relieved, a message is received. A log entry for the sysadmin to react to is generated in any case. In case the physical space is completely exhausted, sense data that reads "no space left on device" is received. In this case, currently running I/O will be blocked until space has either been released or added to the extent pool, and a relieve message was received via an attention interrupt. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Add discard support for ESE volumes	Jan Höppner	1	-3/+54
	ESE (Extent Space Efficient) volumes are thin-provisioned and therefore space is only occupied with real data. In order to make previously used space available for re-allocation again, discard support is enabled for ESE volumes allowing the DASD driver to release said space. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Use ALIGN_DOWN macro	Jan Höppner	1	-1/+1
	There is now an ALIGN_DOWN macro available. Let's rather use kernel provided macros that do the things we want. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Make dasd_setup_queue() a discipline function	Jan Höppner	7	-80/+103
	ECKD, FBA, and the DIAG discipline use slightly different block layer settings. In preparation of even more diverse queue settings, make dasd_setup_queue() a discipline function. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Add new ioctl to release space	Jan Höppner	5	-0/+364
	Userspace tools might have the need to release space for Extent Space Efficient (ESE) volumes when working with such a device. Provide the necessarry interface for such a task by implementing a new ioctl BIODASDRAS. The ioctl uses the format_data_t data structure for data input: typedef struct format_data_t { unsigned int start_unit; /* from track / unsigned int stop_unit; / to track / unsigned int blksize; / sectorsize */ unsigned int intensity; } format_data_t; If the intensity is set to 0x40, start_unit and stop_unit are ignored and space for the entire volume is released. Otherwise, if intensity is set to 0, the respective range is released (if possible). Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Add dasd_sleep_on_queue_interruptible()	Jan Höppner	2	-0/+10
	There is dasd_sleep_on() and dasd_sleep_on_interruptible() to start CCW requests uninterruptible and interruptible. However, there is only dasd_sleep_on_queue() to start requests from CCW queues uninterruptible. Add dasd_sleep_on_queue_interruptible() to provide a way to start requests from CCW queues interruptible. _dasd_sleep_on_queue() already provides this functionality. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Add missing intensity definition	Jan Höppner	1	-0/+1
	The definition for the bit that removes the write permission for record zero when formatting was missing. Add it to complete the list. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Fix whitespace	Jan Höppner	1	-75/+75
	Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Add dynamic formatting support for ESE volumes	Jan Höppner	3	-14/+239
	A dynamic formatting is issued whenever a write request returns with either a No Record Found error (Command Mode), Incorrect Length error (Transport Mode), or File Protected error (Transport Mode). All three cases mean that the tracks in question haven't been initialized in a desired format yet. The part of the volume that was tried to be written on is then formatted and the original request is re-queued. As the formatting will happen during normal I/O operations, it is quite likely that there won't be any memory available to build the respective request. Another two pages of memory are allocated per volume specifically for the dynamic formatting. The dasd_eckd_build_format() function is extended to make sure that the original startdev is reused. Also, all formatting and format check functions use the new memory pool exclusively now to reduce complexity. Read operations will always return zero data when unformatted areas are read. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Recognise data for ESE volumes	Jan Höppner	4	-4/+430
	In order to work with Extent Space Efficient (ESE) volumes, certain viable information about those volumes and the corresponding extent pool (such as extent size, configured space, allocated space, etc.) can be provided. Use the CCW commands Volume Storage Query and Logical Configuration Query to receive detailed information about ESE volumes and the extent pool respectively. These information are made accessible via internal functions for subsequent users, and via sysfs attributes for userpsace usage. The new sysfs attributes reside in separate directories called capacity and extent_pool. attributes: ese: 0/1 depending on whether the volume is an ESE volume Capacity related attributes: space_allocated: Space currently allocated by the volume (in cyl) space_configured: Remaining space in the extent pool (in cyl) logical_capacity: The entire addressable space for this volume (in cyl) Extent Pool related attributes: pool_id: ID of the extent pool the volume in question resides in pool_oos: Extent pool is out-of-space extent_size: Size of a single extent in this pool cap_at_warnlevel Extent pool capacity at warn level warn_threshold: Threshold at which percentage of remaining extent pool space a warning message is issued Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Put sub-order definitions in a separate section	Jan Höppner	1	-2/+6
	There are orders and sub-orders. Put them in different sections for a better overview. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Make layout analysis ESE compatible	Jan Höppner	1	-6/+6
	The disk layout and volume information of a DASD reside in the first two tracks of cylinder 0. When a DASD is set online, currently the first three tracks are read and analysed to confirm an expected layout. For CDL (Compatible Disk Layout) only count area data of the first track is evaluated and checked against expected key and data lengths. For LDL (Linux Disk Layout) the first and third track is evaluated. However, an LDL formatted volume is expected to be in the same format across all tracks. Checking the third track therefore doesn't have any more value than checking any other track at random. Now, an Extent Space Efficient (ESE) DASD is initialised by only formatting the first two tracks, as those tracks always contain all information necessarry. Checking the third track on an ESE volume will therefore most likely fail with a record not found error, as the third track will be empty. This in turn leads to the device being recognised with a volume size of 0. Attempts to write volume information on the first two tracks then fail with "no space left on device" errors. Initialising the first three tracks for an ESE volume is not a viable solution, because the third track is already a regular track and could contain user data. With that there is potential for data corruption. Instead, always only analyse the first two tracks, as it is sufficiant for both CDL and LDL, and allow ESE volumes to be recognised as well. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Remove old defines and function	Jan Höppner	1	-21/+0
	Commit 4d284cac76d0 ("[S390] Avoid excessive inlining.") removed bytes_per_record() which was the only user of the defines ECKD_C0 and ECKD_F*, and round_up_multiple(). Let's get rid of those. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11	s390/dasd: Remove unused structs and function prototypes	Jan Höppner	1	-25/+0
	There are structs that have never been used. There are also two function prototypes which were forgotton in commit f9f8d02fae0d ("[S390] dasd: revert LCU optimization"). Clean up and keep the header file tidy. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-07	Linux 5.2	Linus Torvalds	1	-1/+1

2019-07-06	blk-mq: fix up placement of debugfs directory of queue files	Greg Kroah-Hartman	1	-0/+7
	When the blk-mq debugfs file creation logic was "cleaned up" it was cleaned up too much, causing the queue file to not be created in the correct location. Turns out the check for the directory being present is needed as if that has not happened yet, the files should not be created, and the function will be called later on in the initialization code so that the files can be created in the correct location. Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-05	Revert "mm: page cache: store only head pages in i_pages"	Linus Torvalds	8	-82/+94
	This reverts commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f. Mikhail Gavrilov reports that it causes the VM_BUG_ON_PAGE() in __delete_from_swap_cache() to trigger: page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689 index:0xfecec363 anon flags: 0x17fffe00080034(uptodate\|lru\|active\|swapbacked) raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689 raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000 page dumped because: VM_BUG_ON_PAGE(entry != page) page->mem_cgroup:ffff978433ace000 ------------[ cut here ]------------ kernel BUG at mm/swap_state.c:170! invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1 Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2202 04/11/2019 RIP: 0010:__delete_from_swap_cache+0x20d/0x240 Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff <0f> 0b 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f RSP: 0018:ffffa982036e7980 EFLAGS: 00010046 RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900 RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535 R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120 R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000 FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0 Call Trace: delete_from_swap_cache+0x46/0xa0 try_to_free_swap+0xbc/0x110 swap_writepage+0x13/0x70 pageout.isra.0+0x13c/0x350 shrink_page_list+0xc14/0xdf0 shrink_inactive_list+0x1e5/0x3c0 shrink_node_memcg+0x202/0x760 shrink_node+0xe0/0x470 balance_pgdat+0x2d1/0x510 kswapd+0x220/0x420 kthread+0xfb/0x130 ret_from_fork+0x22/0x40 and it's not immediately obvious why it happens. It's too late in the rc cycle to do anything but revert for now. Link: https://lore.kernel.org/lkml/CABXGCsN9mYmBD-4GaaeW_NrDu+FDXLzr_6x+XNxfmFV6QkYCDg@mail.gmail.com/ Reported-and-bisected-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05	mtd: rawnand: sunxi: Add A23/A33 DMA support with extra MBUS configuration	Miquel Raynal	1	-0/+24
	Allwinner NAND controllers can make use of DMA to enhance the I/O throughput thanks to ECC pipelining. DMA handling with A23/A33 NAND IP is a bit different than with the older SoCs, hence the introduction of a new compatible to handle: * the differences between register offsets, * the burst length change from 4 to minimum 8, * manage SRAM accesses through MBUS with extra configuration. Fixes: c49836f05aa1 ("mtd: rawnand: sunxi: Add A23/A33 DMA support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-07-05	Revert "mtd: rawnand: sunxi: Add A23/A33 DMA support"	Miquel Raynal	1	-36/+2
	This reverts commit c49836f05aa15282f7280e06ede3f6f8a6324833. The commit is wrong and its approach actually does not work. Let's revert it in order to add the feature with a clean patch. Fixes: c49836f05aa1 ("mtd: rawnand: sunxi: Add A23/A33 DMA support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-07-05	i2c: tegra: Add Dmitry as a reviewer	Dmitry Osipenko	1	-0/+1
	I'm contributing to Tegra's upstream development in general and happened to review the Tegra's I2C patches for awhile because I'm actively using upstream kernel on all of my Tegra-powered devices and initially some of the submitted patches were getting my attention since they were causing problems. Recently Wolfram Sang asked whether I'm interested in becoming a reviewer for the driver and I don't mind at all. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> [wsa: ack was expressed by Thierry Reding in a mail thread] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-07-05	fs: VALIDATE_FS_PARSER should default to n	Geert Uytterhoeven	1	-1/+0
	CONFIG_VALIDATE_FS_PARSER is a debugging tool to check that the parser tables are vaguely sane. It was set to default to 'Y' for the moment to catch errors in upcoming fs conversion development. Make sure it is not enabled by default in the final release of v5.1. Fixes: 31d921c7fb969172 ("vfs: Add configuration parser helpers") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05	docs: s390: s390dbf: typos and formatting, update crash command	Steffen Maier	1	-54/+68
	Signed-off-by: Steffen Maier <maier@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <1562149189-1417-4-git-send-email-maier@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-05	docs: s390: unify and update s390dbf kdocs at debug.c	Steffen Maier	3	-116/+102
	For non-static-inlines, debug.c already had non-compliant function header docs. So move the pure prototype kdocs of ("s390: include/asm/debug.h add kerneldoc markups") from debug.h to debug.c and merge them with the old function docs. Also, I had the impression that kdoc typically is at the implementation in the compile unit rather than at the prototype in the header file. While at it, update the short kdoc description to distinguish the different functions. And a few more consistency cleanups. Added a new kdoc for debug_set_critical() since debug.h comments it as part of the API. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <1562149189-1417-3-git-send-email-maier@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-05	docs: s390: restore important non-kdoc parts of s390dbf.rst	Steffen Maier	1	-0/+339
	Complements previous ("s390: include/asm/debug.h add kerneldoc markups") which seemed to have dropped important non-kdoc parts such as user space interface (level, size, flush) as well as views and caution regarding strings in the sprintf view. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <1562149189-1417-2-git-send-email-maier@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-05	KVM: arm64/sve: Fix vq_present() macro to yield a bool	Zhang Lei	1	-1/+1
	The original implementation of vq_present() relied on aggressive inlining in order for the compiler to know that the code is correct, due to some const-casting issues. This was causing sparse and clang to complain, while GCC compiled cleanly. Commit 0c529ff789bc addressed this problem, but since vq_present() is no longer a function, there is now no implicit casting of the returned value to the return type (bool). In set_sve_vls(), this uncast bit value is compared against a bool, and so may spuriously compare as unequal when both are nonzero. As a result, KVM may reject valid SVE vector length configurations as invalid, and vice versa. Fix it by forcing the returned value to a bool. Signed-off-by: Zhang Lei <zhang.lei@jp.fujitsu.com> Fixes: 0c529ff789bc ("KVM: arm64: Implement vq_present() as a macro") Signed-off-by: Dave Martin <Dave.Martin@arm.com> [commit message rewrite] Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-05	dmaengine: qcom: bam_dma: Fix completed descriptors count	Sricharan R	1	-0/+3
	One space is left unused in circular FIFO to differentiate 'full' and 'empty' cases. So take that in to account while counting for the descriptors completed. Fixes the issue reported here, https://lkml.org/lkml/2019/6/18/669 Cc: stable@vger.kernel.org Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Sricharan R <sricharan@codeaurora.org> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05	dmaengine: imx-sdma: remove BD_INTR for channel0	Robin Gong	1	-2/+2
	It is possible for an irq triggered by channel0 to be received later after clks are disabled once firmware loaded during sdma probe. If that happens then clearing them by writing to SDMA_H_INTR won't work and the kernel will hang processing infinite interrupts. Actually, don't need interrupt triggered on channel0 since it's pollling SDMA_H_STATSTOP to know channel0 done rather than interrupt in current code, just clear BD_INTR to disable channel0 interrupt to avoid the above case. This issue was brought by commit 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") which didn't take care the above case. Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") Cc: stable@vger.kernel.org #5.0+ Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Sven Van Asbroeck <thesven73@gmail.com> Tested-by: Sven Van Asbroeck <thesven73@gmail.com> Reviewed-by: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05	dmaengine: imx-sdma: fix use-after-free on probe error path	Sven Van Asbroeck	1	-21/+27
	If probe() fails anywhere beyond the point where sdma_get_firmware() is called, then a kernel oops may occur. Problematic sequence of events: 1. probe() calls sdma_get_firmware(), which schedules the firmware callback to run when firmware becomes available, using the sdma instance structure as the context 2. probe() encounters an error, which deallocates the sdma instance structure 3. firmware becomes available, firmware callback is called with deallocated sdma instance structure 4. use after free - kernel oops ! Solution: only attempt to load firmware when we're certain that probe() will succeed. This guarantees that the firmware callback's context will remain valid. Note that the remove() path is unaffected by this issue: the firmware loader will increment the driver module's use count, ensuring that the module cannot be unloaded while the firmware callback is pending or running. Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Robin Gong <yibin.gong@nxp.com> [vkoul: fixed braces for if condition] Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05	dmaengine: jz4780: Fix an endian bug in IRQ handler	Dan Carpenter	1	-2/+3
	The "pending" variable was a u32 but we cast it to an unsigned long pointer when we do the for_each_set_bit() loop. The problem is that on big endian 64bit systems that results in an out of bounds read. Fixes: 4e4106f5e942 ("dmaengine: jz4780: Fix transfers being ACKed too soon") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05	vfio-ccw: Fix the conversion of Format-0 CCWs to Format-1	Eric Farman	1	-1/+1
	When processing Format-0 CCWs, we use the "len" variable as the number of CCWs to convert to Format-1. But that variable contains zero here, and is not a meaningful CCW count until ccwchain_calc_length() returns. Since that routine requires and expects Format-1 CCWs to identify the chaining behavior, the format conversion must be done first. Convert the 2KB we copied even if it's more than we need. Fixes: 7f8e89a8f2fd ("vfio-ccw: Factor out the ccw0-to-ccw1 transition") Reported-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20190702180928.18113-1-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-07-05	swap_readpage(): avoid blk_wake_io_task() if !synchronous	Oleg Nesterov	1	-5/+8
	swap_readpage() sets waiter = bio->bi_private even if synchronous = F, this means that the caller can get the spurious wakeup after return. This can be fatal if blk_wake_io_task() does set_current_state(TASK_RUNNING) after the caller does set_special_state(), in the worst case the kernel can crash in do_task_dead(). Link: http://lkml.kernel.org/r/20190704160301.GA5956@redhat.com Fixes: 0619317ff8baa2d ("block: add polled wakeup task helper") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Qian Cai <cai@lca.pw> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05	devres: allow const resource arguments	Arnd Bergmann	2	-2/+4
	devm_ioremap_resource() does not currently take 'const' arguments, which results in a warning from the first driver trying to do it anyway: drivers/gpio/gpio-amd-fch.c: In function 'amd_fch_gpio_probe': drivers/gpio/gpio-amd-fch.c:171:49: error: passing argument 2 of 'devm_ioremap_resource' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] priv->base = devm_ioremap_resource(&pdev->dev, &amd_fch_gpio_iores); ^~~~~~~~~~~~~~~~~~~ Change the prototype to allow it, as there is no real reason not to. Link: http://lkml.kernel.org/r/20190628150049.1108048-1-arnd@arndb.de Fixes: 9bb2e0452508 ("gpio: amd: Make resource struct const") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05	mm/vmscan.c: prevent useless kswapd loops	Shakeel Butt	1	-12/+15
	In production we have noticed hard lockups on large machines running large jobs due to kswaps hoarding lru lock within isolate_lru_pages when sc->reclaim_idx is 0 which is a small zone. The lru was couple hundred GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in isolate_lru_pages() was basically skipping GiBs of pages while holding the LRU spinlock with interrupt disabled. On further inspection, it seems like there are two issues: (1) If kswapd on the return from balance_pgdat() could not sleep (i.e. node is still unbalanced), the classzone_idx is unintentionally set to 0 and the whole reclaim cycle of kswapd will try to reclaim only the lowest and smallest zone while traversing the whole memory. (2) Fundamentally isolate_lru_pages() is really bad when the allocation has woken kswapd for a smaller zone on a very large machine running very large jobs. It can hoard the LRU spinlock while skipping over 100s of GiBs of pages. This patch only fixes (1). (2) needs a more fundamental solution. To fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is invalid use the classzone_idx of the previous kswapd loop otherwise use the one the waker has requested. Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com Fixes: e716f2eb24de ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05	fs/userfaultfd.c: disable irqs for fault_pending and event locks	Eric Biggers	1	-16/+26
	When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by userfaultfd_ctx_read(), which in turn can be waiting for userfaultfd_ctx::fault_pending_wqh.lock or userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and event_wqh locks are taken with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by always disabling IRQs when taking the fault_pending_wqh and event_wqh locks. Commit ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock") didn't fix this because it only accounted for the fd_wqh lock, not the other locks nested inside it. Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05	mm/page_alloc.c: fix regression with deferred struct page init	Juergen Gross	1	-1/+2
	Commit 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") is causing a regression on some systems when the kernel is booted as Xen dom0. The system will just hang in early boot. Reason is an endless loop in get_page_from_freelist() in case the first zone looked at has no free memory. deferred_grow_zone() is always returning true due to the following code snipplet: /* If the zone is empty somebody else may have cleared out the zone */ if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, first_deferred_pfn)) { pgdat->first_deferred_pfn = ULONG_MAX; pgdat_resize_unlock(pgdat, &flags); return true; } This in turn results in the loop as get_page_from_freelist() is assuming forward progress can be made by doing some more struct page initialization. Link: http://lkml.kernel.org/r/20190620160821.4210-1-jgross@suse.com Fixes: 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") Signed-off-by: Juergen Gross <jgross@suse.com> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05	ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME	Jann Horn	1	-3/+1
	Fix two issues: When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU reference to the parent's objective credentials, then give that pointer to get_cred(). However, the object lifetime rules for things like struct cred do not permit unconditionally turning an RCU reference into a stable reference. PTRACE_TRACEME records the parent's credentials as if the parent was acting as the subject, but that's not the case. If a malicious unprivileged child uses PTRACE_TRACEME and the parent is privileged, and at a later point, the parent process becomes attacker-controlled (because it drops privileges and calls execve()), the attacker ends up with control over two processes with a privileged ptrace relationship, which can be abused to ptrace a suid binary and obtain root privileges. Fix both of these by always recording the credentials of the process that is requesting the creation of the ptrace relationship: current_cred() can't change under us, and current is the proper subject for access control. This change is theoretically userspace-visible, but I am not aware of any code that it will actually break. Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-04	s390/pci: correctly handle MIO opt-out	Sebastian Ott	1	-1/+1
	Do not issue CLP_SET_ENABLE_MIO after opting out of MIO instruction usage. This should not fix a bug but reduce overhead within firmware. Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-04	s390/pci: deal with devices that have no support for MIO instructions	Sebastian Ott	2	-7/+13
	Unfortunately we have to handle a class of devices that don't support the new MIO instructions. Adjust resource assignment and mapping accordingly. Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-04	drm/imx: only send event on crtc disable if kept disabled	Robert Beckett	1	-1/+1
	The event will be sent as part of the vblank enable during the modeset if the crtc is not being kept disabled. Fixes: 5f2f911578fb ("drm/imx: atomic phase 3 step 1: Use atomic configuration") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-07-04	drm/imx: notify drm core before sending event during crtc disable	Robert Beckett	1	-2/+2
	Notify drm core before sending pending events during crtc disable. This fixes the first event after disable having an old stale timestamp by having drm_crtc_vblank_off update the timestamp to now. This was seen while debugging weston log message: Warning: computed repaint delay is insane: -8212 msec This occurred due to: 1. driver starts up 2. fbcon comes along and restores fbdev, enabling vblank 3. vblank_disable_fn fires via timer disabling vblank, keeping vblank seq number and time set at current value (some time later) 4. weston starts and does a modeset 5. atomic commit disables crtc while it does the modeset 6. ipu_crtc_atomic_disable sends vblank with old seq number and time Fixes: a474478642d5 ("drm/imx: fix crtc vblank state regression") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-07-03	nfsd: Fix overflow causing non-working mounts on 1 TB machines	Paul Menzel	1	-1/+1
	Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>