aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-05-20s390/mm: fix set_huge_pte_at() for empty ptesGerald Schaefer1-3/+6
On s390, the layout of normal and large ptes (i.e. pmds/puds) differs. Therefore, set_huge_pte_at() does a conversion from a normal pte to the corresponding large pmd/pud. So, when converting an empty pte, this should result in an empty pmd/pud, which would return true for pmd/pud_none(). However, after conversion we also mark the pmd/pud as large, and therefore present. For empty ptes, this will result in an empty pmd/pud that is also marked as large, and pmd/pud_none() would not return true. There is currently no issue with this behaviour, as set_huge_pte_at() does not seem to be called for empty ptes. It would be valid though, so let's fix this by not marking empty ptes as large in set_huge_pte_at(). This was found by testing a patch from from Anshuman Khandual, which is currently discussed on LKML ("mm/debug: Add more arch page table helper tests"). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-14s390/kexec_file: fix initrd location for kdump kernelPhilipp Rudo1-1/+1
initrd_start must not point at the location the initrd is loaded into the crashkernel memory but at the location it will be after the crashkernel memory is swapped with the memory at 0. Fixes: ee337f5469fd ("s390/kexec_file: Add crash support to image loader") Reported-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Tested-by: Lianbo Jiang <lijiang@redhat.com> Link: https://lore.kernel.org/r/20200512193956.15ae3f23@laptop2-ibm.local Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-14s390/pci: Fix s390_mmio_read/write with MIONiklas Schnelle2-4/+219
The s390_mmio_read/write syscalls are currently broken when running with MIO. The new pcistb_mio/pcstg_mio/pcilg_mio instructions are executed similiarly to normal load/store instructions and do address translation in the current address space. That means inside the kernel they are aware of mappings into kernel address space while outside the kernel they use user space mappings (usually created through mmap'ing a PCI device file). Now when existing user space applications use the s390_pci_mmio_write and s390_pci_mmio_read syscalls, they pass I/O addresses that are mapped into user space so as to be usable with the new instructions without needing a syscall. Accessing these addresses with the old instructions as done currently leads to a kernel panic. Also, for such a user space mapping there may not exist an equivalent kernel space mapping which means we can't just use the new instructions in kernel space. Instead of replicating user mappings in the kernel which then might collide with other mappings, we can conceptually execute the new instructions as if executed by the user space application using the secondary address space. This even allows us to directly store to the user pointer without the need for copy_to/from_user(). Cc: stable@vger.kernel.org Fixes: 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-10Linux 5.7-rc5Linus Torvalds1-1/+1
2020-05-10Merge tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds11-90/+138
Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Ensure that direct mapping alias is always flushed when changing page attributes. The optimization for small ranges failed to do so when the virtual address was in the vmalloc or module space. - Unbreak the trace event registration for syscalls without arguments caused by the refactoring of the SYSCALL_DEFINE0() macro. - Move the printk in the TSC deadline timer code to a place where it is guaranteed to only be called once during boot and cannot be rearmed by clearing warn_once after boot. If it's invoked post boot then lockdep rightfully complains about a potential deadlock as the calling context is different. - A series of fixes for objtool and the ORC unwinder addressing variety of small issues: - Stack offset tracking for indirect CFAs in objtool ignored subsequent pushs and pops - Repair the unwind hints in the register clearing entry ASM code - Make the unwinding in the low level exit to usermode code stop after switching to the trampoline stack. The unwind hint is no longer valid and the ORC unwinder emits a warning as it can't find the registers anymore. - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit() which caused objtool to generate bogus ORC data. - Prevent unwinder warnings when dumping the stack of a non-current task as there is no way to be sure about the validity because the dumped stack can be a moving target. - Make the ORC unwinder behave the same way as the frame pointer unwinder when dumping an inactive tasks stack and do not skip the first frame. - Prevent ORC unwinding before ORC data has been initialized - Immediately terminate unwinding when a unknown ORC entry type is found. - Prevent premature stop of the unwinder caused by IRET frames. - Fix another infinite loop in objtool caused by a negative offset which was not catched. - Address a few build warnings in the ORC unwinder and add missing static/ro_after_init annotations" * tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES x86/apic: Move TSC deadline timer debug printk ftrace/x86: Fix trace event registration for syscalls without arguments x86/mm/cpa: Flush direct map alias during cpa objtool: Fix infinite loop in for_offset_range() x86/unwind/orc: Fix premature unwind stoppage due to IRET frames x86/unwind/orc: Fix error path for bad ORC entry type x86/unwind/orc: Prevent unwinding before ORC initialization x86/unwind/orc: Don't skip the first frame for inactive tasks x86/unwind: Prevent false warnings for non-current tasks x86/unwind/orc: Convert global variables to static x86/entry/64: Fix unwind hints in rewind_stack_do_exit() x86/entry/64: Fix unwind hints in __switch_to_asm() x86/entry/64: Fix unwind hints in kernel exit path x86/entry/64: Fix unwind hints in register clearing code objtool: Fix stack offset tracking for indirect CFAs
2020-05-10Merge tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+13
Pull objtool fix from Thomas Gleixner: "A single fix for objtool to prevent an infinite loop in the jump table search which can be triggered when building the kernel with '-ffunction-sections'" * tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix infinite loop in find_jump_table()
2020-05-10Merge tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+7
Pull locking fix from Thomas Gleixner: "A single fix for the fallout of the recent futex uacess rework. With those changes GCC9 fails to analyze arch_futex_atomic_op_inuser() correctly and emits a 'maybe unitialized' warning. While we usually ignore compiler stupidity the conditional store is pointless anyway because the correct case has to store. For the fault case the extra store does no harm" * tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM: futex: Address build warning
2020-05-10Merge tag 'iommu-fixes-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds3-47/+162
Pull iommu fixes from Joerg Roedel: - Race condition fixes for the AMD IOMMU driver. These are five patches fixing two race conditions around increase_address_space(). The first race condition was around the non-atomic update of the domain page-table root pointer and the variable containing the page-table depth (called mode). This is fixed now be merging page-table root and mode into one 64-bit field which is read/written atomically. The second race condition was around updating the page-table root pointer and making it public before the hardware caches were flushed. This could cause addresses to be mapped and returned to drivers which are not reachable by IOMMU hardware yet, causing IO page-faults. This is fixed too by adding the necessary flushes before a new page-table root is published. Related to the race condition fixes these patches also add a missing domain_flush_complete() barrier to update_domain() and a fix to bail out of the loop which tries to increase the address space when the call to increase_address_space() fails. Qian was able to trigger the race conditions under high load and memory pressure within a few days of testing. He confirmed that he has seen no issues anymore with the fixes included here. - Fix for a list-handling bug in the VirtIO IOMMU driver. * tag 'iommu-fixes-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/virtio: Reverse arguments to list_add iommu/amd: Do not flush Device Table in iommu_map_page() iommu/amd: Update Device Table in increase_address_space() iommu/amd: Call domain_flush_complete() in update_domain() iommu/amd: Do not loop forever when trying to increase address space iommu/amd: Fix race in increase_address_space()/fetch_pte()
2020-05-10Merge tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-blockLinus Torvalds12-68/+107
Pull block fixes from Jens Axboe: - a small series fixing a use-after-free of bdi name (Christoph,Yufen) - NVMe fix for a regression with the smaller CQ update (Alexey) - NVMe fix for a hang at namespace scanning error recovery (Sagi) - fix race with blk-iocost iocg->abs_vdebt updates (Tejun) * tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block: nvme: fix possible hang when ns scanning fails during error recovery nvme-pci: fix "slimmer CQ head update" bdi: add a ->dev_name field to struct backing_dev_info bdi: use bdi_dev_name() to get device name bdi: move bdi_dev_name out of line vboxsf: don't use the source name in the bdi name iocost: protect iocg->abs_vdebt with iocg->waitq.lock
2020-05-09gcc-10: mark more functions __init to avoid section mismatch warningsLinus Torvalds2-2/+2
It seems that for whatever reason, gcc-10 ends up not inlining a couple of functions that used to be inlined before. Even if they only have one single callsite - it looks like gcc may have decided that the code was unlikely, and not worth inlining. The code generation difference is harmless, but caused a few new section mismatch errors, since the (now no longer inlined) function wasn't in the __init section, but called other init functions: Section mismatch in reference from the function kexec_free_initrd() to the function .init.text:free_initrd_mem() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memremap() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memunmap() So add the appropriate __init annotation to make modpost not complain. In both cases there were trivially just a single callsite from another __init function. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09Merge tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds9-34/+121
Pull RISC-V fixes from Palmer Dabbelt: "A smattering of fixes and cleanups: - Dead code removal. - Exporting riscv_cpuid_to_hartid_mask for modules. - Per-CPU tracking of ISA features. - Setting max_pfn correctly when probing memory. - Adding a note to the VDSO so glibc can check the kernel's version without a uname(). - A fix to force the bootloader to initialize the boot spin tables, which still get used as a fallback when SBI-0.1 is enabled" * tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Remove unused code from STRICT_KERNEL_RWX riscv: force __cpu_up_ variables to put in data section riscv: add Linux note to vdso riscv: set max_pfn to the PFN of the last page RISC-V: Remove N-extension related defines RISC-V: Add bitmap reprensenting ISA features common across CPUs RISC-V: Export riscv_cpuid_to_hartid_mask() API
2020-05-09gcc-10: avoid shadowing standard library 'free()' in cryptoLinus Torvalds2-6/+6
gcc-10 has started warning about conflicting types for a few new built-in functions, particularly 'free()'. This results in warnings like: crypto/xts.c:325:13: warning: conflicting types for built-in function ‘free’; expected ‘void(void *)’ [-Wbuiltin-declaration-mismatch] because the crypto layer had its local freeing functions called 'free()'. Gcc-10 is in the wrong here, since that function is marked 'static', and thus there is no chance of confusion with any standard library function namespace. But the simplest thing to do is to just use a different name here, and avoid this gcc mis-feature. [ Side note: gcc knowing about 'free()' is in itself not the mis-feature: the semantics of 'free()' are special enough that a compiler can validly do special things when seeing it. So the mis-feature here is that gcc thinks that 'free()' is some restricted name, and you can't shadow it as a local static function. Making the special 'free()' semantics be a function attribute rather than tied to the name would be the much better model ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: disable 'restrict' warning for nowLinus Torvalds1-0/+3
gcc-10 now warns about passing aliasing pointers to functions that take restricted pointers. That's actually a great warning, and if we ever start using 'restrict' in the kernel, it might be quite useful. But right now we don't, and it turns out that the only thing this warns about is an idiom where we have declared a few functions to be "printf-like" (which seems to make gcc pick up the restricted pointer thing), and then we print to the same buffer that we also use as an input. And people do that as an odd concatenation pattern, with code like this: #define sysfs_show_gen_prop(buffer, fmt, ...) \ snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__) where we have 'buffer' as both the destination of the final result, and as the initial argument. Yes, it's a bit questionable. And outside of the kernel, people do have standard declarations like int snprintf( char *restrict buffer, size_t bufsz, const char *restrict format, ... ); where that output buffer is marked as a restrict pointer that cannot alias with any other arguments. But in the context of the kernel, that 'use snprintf() to concatenate to the end result' does work, and the pattern shows up in multiple places. And we have not marked our own version of snprintf() as taking restrict pointers, so the warning is incorrect for now, and gcc picks it up on its own. If we do start using 'restrict' in the kernel (and it might be a good idea if people find places where it matters), we'll need to figure out how to avoid this issue for snprintf and friends. But in the meantime, this warning is not useful. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: disable 'stringop-overflow' warning for nowLinus Torvalds1-0/+1
This is the final array bounds warning removal for gcc-10 for now. Again, the warning is good, and we should re-enable all these warnings when we have converted all the legacy array declaration cases to flexible arrays. But in the meantime, it's just noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09nvme: fix possible hang when ns scanning fails during error recoverySagi Grimberg1-1/+1
When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see 205da2434301). One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns identify descriptor list optional, we continue to compare ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace. Exactly what we don't want to happen. Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09nvme-pci: fix "slimmer CQ head update"Alexey Dobriyan1-1/+5
Pre-incrementing ->cq_head can't be done in memory because OOB value can be observed by another context. This devalues space savings compared to original code :-\ $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32) Function old new delta nvme_poll_irqdisable 464 456 -8 nvme_poll 455 447 -8 nvme_irq 388 380 -8 nvme_dev_disable 955 947 -8 But the code is minimal now: one read for head, one read for q_depth, one increment, one comparison, single instruction phase bit update and one write for new head. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: John Garry <john.garry@huawei.com> Tested-by: John Garry <john.garry@huawei.com> Fixes: e2a366a4b0feaeb ("nvme-pci: slimmer CQ head update") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09bdi: add a ->dev_name field to struct backing_dev_infoChristoph Hellwig2-2/+4
Cache a copy of the name for the life time of the backing_dev_info structure so that we can reference it even after unregistering. Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears") Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09bdi: use bdi_dev_name() to get device nameYufen Yu4-8/+10
Use the common interface bdi_dev_name() to get device name. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Add missing <linux/backing-dev.h> include BFQ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09gcc-10: disable 'array-bounds' warning for nowLinus Torvalds1-0/+1
This is another fine warning, related to the 'zero-length-bounds' one, but hitting the same historical code in the kernel. Because C didn't historically support flexible array members, we have code that instead uses a one-sized array, the same way we have cases of zero-sized arrays. The one-sized arrays come from either not wanting to use the gcc zero-sized array extension, or from a slight convenience-feature, where particularly for strings, the size of the structure now includes the allocation for the final NUL character. So with a "char name[1];" at the end of a structure, you can do things like v = my_malloc(sizeof(struct vendor) + strlen(name)); and avoid the "+1" for the terminator. Yes, the modern way to do that is with a flexible array, and using 'offsetof()' instead of 'sizeof()', and adding the "+1" by hand. That also technically gets the size "more correct" in that it avoids any alignment (and thus padding) issues, but this is another long-term cleanup thing that will not happen for 5.7. So disable the warning for now, even though it's potentially quite useful. Having a slew of warnings that then hide more urgent new issues is not an improvement. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: disable 'zero-length-bounds' warning for nowLinus Torvalds1-0/+3
This is a fine warning, but we still have a number of zero-length arrays in the kernel that come from the traditional gcc extension. Yes, they are getting converted to flexible arrays, but in the meantime the gcc-10 warning about zero-length bounds is very verbose, and is hiding other issues. I missed one actual build failure because it was hidden among hundreds of lines of warning. Thankfully I caught it on the second go before pushing things out, but it convinced me that I really need to disable the new warnings for now. We'll hopefully be all done with our conversion to flexible arrays in the not too distant future, and we can then re-enable this warning. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09Stop the ad-hoc games with -Wno-maybe-initializedLinus Torvalds3-23/+3
We have some rather random rules about when we accept the "maybe-initialized" warnings, and when we don't. For example, we consider it unreliable for gcc versions < 4.9, but also if -O3 is enabled, or if optimizing for size. And then various kernel config options disabled it, because they know that they trigger that warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES). And now gcc-10 seems to be introducing a lot of those warnings too, so it falls under the same heading as 4.9 did. At the same time, we have a very straightforward way to _enable_ that warning when wanted: use "W=2" to enable more warnings. So stop playing these ad-hoc games, and just disable that warning by default, with the known and straight-forward "if you want to work on the extra compiler warnings, use W=123". Would it be great to have code that is always so obvious that it never confuses the compiler whether a variable is used initialized or not? Yes, it would. In a perfect world, the compilers would be smarter, and our source code would be simpler. That's currently not the world we live in, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09Merge tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-blockLinus Torvalds2-70/+40
Pull io_uring fixes from Jens Axboe: - Fix finish_wait() balancing in file cancelation (Xiaoguang) - Ensure early cleanup of resources in ring map failure (Xiaoguang) - Ensure IORING_OP_SLICE does the right file mode checks (Pavel) - Remove file opening from openat/openat2/statx, it's not needed and messes with O_PATH * tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-block: io_uring: don't use 'fd' for openat/openat2/statx splice: move f_mode checks to do_{splice,tee}() io_uring: handle -EFAULT properly in io_uring_setup() io_uring: fix mismatched finish_wait() calls in io_uring_cancel_files()
2020-05-08Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds4-6/+7
Pull SCSI fixes from James Bottomley: "Four minor fixes, all in drivers (qla2xxx, ibmvfc, ibmvscsi)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ibmvscsi: Fix WARN_ON during event pool release scsi: ibmvfc: Don't send implicit logouts prior to NPIV login scsi: qla2xxx: Delete all sessions before unregister local nvme port scsi: qla2xxx: Fix hang when issuing nvme disconnect-all in NPIV
2020-05-08Merge tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-clientLinus Torvalds4-14/+7
Pull ceph fixes from Ilya Dryomov: "Fixes for an endianness handling bug that prevented mounts on big-endian arches, a spammy log message and a couple error paths. Also included a MAINTAINERS update" * tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-client: ceph: demote quotarealm lookup warning to a debug message MAINTAINERS: remove myself as ceph co-maintainer ceph: fix double unlock in handle_cap_export() ceph: fix special error code in ceph_try_get_caps() ceph: fix endianness bug when handling MDS session feature bits
2020-05-08ceph: demote quotarealm lookup warning to a debug messageLuis Henriques1-2/+2
A misconfigured cephx can easily result in having the kernel client flooding the logs with: ceph: Can't lookup inode 1 (err: -13) Change this message to debug level. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/44546 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-08Merge tag 'char-misc-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds14-51/+77
Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 5.7-rc5 that resolve a number of minor reported issues: - mhi bus driver fixes found as people actually use the code - phy driver fixes and compat string additions - most driver fix due to link order changing when the core moved out of staging - mei driver fix - interconnect build warning fix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: bus: mhi: core: Fix channel device name conflict bus: mhi: core: Fix typo in comment bus: mhi: core: Offload register accesses to the controller bus: mhi: core: Remove link_status() callback bus: mhi: core: Make sure to powerdown if mhi_sync_power_up fails bus: mhi: Fix parsing of mhi_flags mei: me: disable mei interface on LBG servers. phy: qualcomm: usb-hs-28nm: Prepare clocks in init MAINTAINERS: Add Vinod Koul as Generic PHY co-maintainer interconnect: qcom: Move the static keyword to the front of declaration most: core: use function subsys_initcall() bus: mhi: core: Fix a NULL vs IS_ERR check in mhi_create_devices() phy: qcom-qusb2: Re add "qcom,sdm845-qusb2-phy" compat string phy: tegra: Select USB_COMMON for usb_get_maximum_speed()
2020-05-08Merge tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds10-30/+48
Pull driver core fixes from Greg KH: "Here are a number of small driver core fixes for 5.7-rc5 to resolve a bunch of reported issues with the current tree. Biggest here are the reverts and patches from John Stultz to resolve a bunch of deferred probe regressions we have been seeing in 5.7-rc right now. Along with those are some other smaller fixes: - coredump crash fix - devlink fix for when permissive mode was enabled - amba and platform device dma_parms fixes - component error silenced for when deferred probe happens All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: regulator: Revert "Use driver_deferred_probe_timeout for regulator_init_complete_work" driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings driver core: Revert default driver_deferred_probe_timeout value to 0 component: Silence bind error on -EPROBE_DEFER driver core: Fix handling of fw_devlink=permissive coredump: fix crash when umh is disabled amba: Initialize dma_parms for amba devices driver core: platform: Initialize dma_parms for platform devices
2020-05-08Merge tag 'staging-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds3-6/+4
Pull staging driver fixes from Greg KH: "Here are three small driver fixes for 5.7-rc5. Two of these are documentation fixes: - MAINTAINERS update due to removed driver - removing Wolfram from the ks7010 driver TODO file The other patch is a real fix: - fix gasket driver to proper check the return value of a call All of these have been in linux-next for a while with no reported issues" * tag 'staging-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: gasket: Check the return value of gasket_get_bar_index() staging: ks7010: remove me from CC list MAINTAINERS: remove entry after hp100 driver removal
2020-05-08Merge tag 'tty-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds3-5/+9
Pull tty/serial fixes from Greg KH: "Here are three small TTY/Serial/VT fixes for 5.7-rc5: - revert for the bcm63xx driver "fix" that was incorrect - vt unicode console bugfix - xilinx_uartps console driver fix All of these have been in linux next with no reported issues" * tag 'tty-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: xilinx_uartps: Fix missing id assignment to the console vt: fix unicode console freeing with a common interface Revert "tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart"
2020-05-08Merge tag 'usb-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds8-10/+24
Pull USB fixes from Greg KH: "Here are some small USB fixes for 5.7-rc5 to resolve some reported issues: - syzbot found problems fixed - usbfs dma mapping fix - typec bugfixs - chipidea bugfix - usb4/thunderbolt fix - new device ids/quirks All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: chipidea: msm: Ensure proper controller reset using role switch API usb: typec: mux: intel: Handle alt mode HPD_HIGH usb: usbfs: correct kernel->user page attribute mismatch usb: typec: intel_pmc_mux: Fix the property names USB: core: Fix misleading driver bug report USB: serial: qcserial: Add DW5816e support USB: uas: add quirk for LaCie 2Big Quadra thunderbolt: Check return value of tb_sw_read() in usb4_switch_op() USB: serial: garmin_gps: add sanity checking for data length
2020-05-08Merge tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drmLinus Torvalds12-31/+57
Pull drm fixes from Dave Airlie: "Another pretty normal week. I didn't get any i915 fixes yet, so next week I'd expect double the usual i915, but otherwise a bunch of amdgpu and some scattered other fixes. hdcp: - fix HDCP regression amdgpu: - Runtime PM fixes - DC fix for PPC - Misc DC fixes virtio: - fix context ordering issue sun4i: - old gcc warning fix ingenic-drm: - missing module support" * tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Prevent dpcd reads with passive dongles drm/amd/display: fix counter in wait_for_no_pipes_pending drm/amd/display: Update DCN2.1 DV Code Revision drm: Fix HDCP failures when SRM fw is missing sun6i: dsi: fix gcc-4.8 drm: ingenic-drm: add MODULE_DEVICE_TABLE drm/virtio: create context before RESOURCE_CREATE_2D in 3D mode drm/amd/display: work around fp code being emitted outside of DC_FP_START/END drm/amdgpu/dc: Use WARN_ON_ONCE for ASSERT drm/amdgpu: drop redundant cg/pg ungate on runpm enter drm/amdgpu: move kfd suspend after ip_suspend_phase1
2020-05-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds14-78/+275
Merge misc fixes from Andrew Morton: "14 fixes and one selftest to verify the ipc fixes herein" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: limit boost_watermark on small zones ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST mm/vmscan: remove unnecessary argument description of isolate_lru_pages() epoll: atomically remove wait entry on wake up kselftests: introduce new epoll60 testcase for catching lost wakeups percpu: make pcpu_alloc() aware of current gfp context mm/slub: fix incorrect interpretation of s->offset scripts/gdb: repair rb_first() and rb_last() eventpoll: fix missing wakeup for ovflist in ep_poll_callback arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() scripts/decodecode: fix trapping instruction formatting kernel/kcov.c: fix typos in kcov_remote_start documentation mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() mm, memcg: fix error return value of mem_cgroup_css_alloc() ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08iommu/virtio: Reverse arguments to list_addJulia Lawall1-1/+1
Elsewhere in the file, there is a list_for_each_entry with &vdev->resv_regions as the second argument, suggesting that &vdev->resv_regions is the list head. So exchange the arguments on the list_add call to put the list head in the second argument. Fixes: 2a5a31487445 ("iommu/virtio: Add probe request") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/1588704467-13431-1-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-08Merge tag 'drm-misc-fixes-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixesDave Airlie6-4/+14
A few minor fixes for an ordering issue in virtio, an (old) gcc warning in sun4i, a probe issue in ingenic-drm and a regression in the HDCP support. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200507160130.id64niqgf5wsha4u@gilmour.lan
2020-05-08Merge tag 'amd-drm-fixes-5.7-2020-05-06' of git://people.freedesktop.org/~agd5f/linux into drm-fixesDave Airlie6-27/+43
amd-drm-fixes-5.7-2020-05-06: amdgpu: - Runtime PM fixes - DC fix for PPC - Misc DC fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200506212257.3893-1-alexander.deucher@amd.com
2020-05-07Merge branch 'for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds1-1/+1
Pull security subsystem fix from James Morris: "Fix the default value of fs_context_parse_param hook" * 'for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: Fix the default value of fs_context_parse_param hook
2020-05-07mm: limit boost_watermark on small zonesHenry Willard1-0/+8
Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") adds a boost_watermark() function which increases the min watermark in a zone by at least pageblock_nr_pages or the number of pages in a page block. On Arm64, with 64K pages and 512M huge pages, this is 8192 pages or 512M. It does this regardless of the number of managed pages managed in the zone or the likelihood of success. This can put the zone immediately under water in terms of allocating pages from the zone, and can cause a small machine to fail immediately due to OoM. Unlike set_recommended_min_free_kbytes(), which substantially increases min_free_kbytes and is tied to THP, boost_watermark() can be called even if THP is not active. The problem is most likely to appear on architectures such as Arm64 where pageblock_nr_pages is very large. It is desirable to run the kdump capture kernel in as small a space as possible to avoid wasting memory. In some architectures, such as Arm64, there are restrictions on where the capture kernel can run, and therefore, the space available. A capture kernel running in 768M can fail due to OoM immediately after boost_watermark() sets the min in zone DMA32, where most of the memory is, to 512M. It fails even though there is over 500M of free memory. With boost_watermark() suppressed, the capture kernel can run successfully in 448M. This patch limits boost_watermark() to boosting a zone's min watermark only when there are enough pages that the boost will produce positive results. In this case that is estimated to be four times as many pages as pageblock_nr_pages. Mel said: : There is no harm in marking it stable. Clearly it does not happen very : often but it's not impossible. 32-bit x86 is a lot less common now : which would previously have been vulnerable to triggering this easily. : ppc64 has a larger base page size but typically only has one zone. : arm64 is likely the most vulnerable, particularly when CMA is : configured with a small movable zone. Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Henry Willard <henry.willard@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07ubsan: disable UBSAN_ALIGNMENT under COMPILE_TESTKees Cook2-10/+6
The documentation for UBSAN_ALIGNMENT already mentions that it should not be used on all*config builds (and for efficient-unaligned-access architectures), so just refactor the Kconfig to correctly implement this so randconfigs will stop creating insane images that freak out objtool under CONFIG_UBSAN_TRAP (due to the false positives producing functions that never return, etc). Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook Fixes: 0887a7ebc977 ("ubsan: add trap instrumentation option") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07mm/vmscan: remove unnecessary argument description of isolate_lru_pages()Qiwu Chen1-1/+0
Since commit a9e7c39fa9fd9 ("mm/vmscan.c: remove 7th argument of isolate_lru_pages()"), the explanation of 'mode' argument has been unnecessary. Let's remove it. Signed-off-by: Qiwu Chen <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200501090346.2894-1-chenqiwu@xiaomi.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07epoll: atomically remove wait entry on wake upRoman Penyaev1-19/+24
This patch does two things: - fixes a lost wakeup introduced by commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") - improves performance for events delivery. The description of the problem is the following: if N (>1) threads are waiting on ep->wq for new events and M (>1) events come, it is quite likely that >1 wakeups hit the same wait queue entry, because there is quite a big window between __add_wait_queue_exclusive() and the following __remove_wait_queue() calls in ep_poll() function. This can lead to lost wakeups, because thread, which was woken up, can handle not all the events in ->rdllist. (in better words the problem is described here: https://lkml.org/lkml/2019/10/7/905) The idea of the current patch is to use init_wait() instead of init_waitqueue_entry(). Internally init_wait() sets autoremove_wake_function as a callback, which removes the wait entry atomically (under the wq locks) from the list, thus the next coming wakeup hits the next wait entry in the wait queue, thus preventing lost wakeups. Problem is very well reproduced by the epoll60 test case [1]. Wait entry removal on wakeup has also performance benefits, because there is no need to take a ep->lock and remove wait entry from the queue after the successful wakeup. Here is the timing output of the epoll60 test case: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): real 0m6.970s user 0m49.786s sys 0m0.113s After this patch: real 0m5.220s user 0m36.879s sys 0m0.019s The other testcase is the stress-epoll [2], where one thread consumes all the events and other threads produce many events: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): threads events/ms run-time ms 8 5427 1474 16 6163 2596 32 6824 4689 64 7060 9064 128 6991 18309 After this patch: threads events/ms run-time ms 8 5598 1429 16 7073 2262 32 7502 4265 64 7640 8376 128 7634 16767 (number of "events/ms" represents event bandwidth, thus higher is better; number of "run-time ms" represents overall time spent doing the benchmark, thus lower is better) [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jason Baron <jbaron@akamai.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Heiher <r@hev.cc> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07kselftests: introduce new epoll60 testcase for catching lost wakeupsRoman Penyaev1-0/+146
This test case catches lost wake up introduced by commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") The test is simple: we have 10 threads and 10 event fds. Each thread can harvest only 1 event. 1 producer fires all 10 events at once and waits that all 10 events will be observed by 10 threads. In case of lost wakeup epoll_wait() will timeout and 0 will be returned. Test case catches two sort of problems: forgotten wakeup on event, which hits the ->ovflist list, this problem was fixed by: 5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback") the other problem is when several sequential events hit the same waiting thread, thus other waiters get no wakeups. Problem is fixed in the following patch. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Heiher <r@hev.cc> Cc: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07percpu: make pcpu_alloc() aware of current gfp contextFilipe Manana1-4/+10
Since 5.7-rc1, on btrfs we have a percpu counter initialization for which we always pass a GFP_KERNEL gfp_t argument (this happens since commit 2992df73268f78 ("btrfs: Implement DREW lock")). That is safe in some contextes but not on others where allowing fs reclaim could lead to a deadlock because we are either holding some btrfs lock needed for a transaction commit or holding a btrfs transaction handle open. Because of that we surround the call to the function that initializes the percpu counter with a NOFS context using memalloc_nofs_save() (this is done at btrfs_init_fs_root()). However it turns out that this is not enough to prevent a possible deadlock because percpu_alloc() determines if it is in an atomic context by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this case) and it is not aware that a NOFS context is set. Because percpu_alloc() thinks it is in a non atomic context it locks the pcpu_alloc_mutex. This can result in a btrfs deadlock when pcpu_balance_workfn() is running, has acquired that mutex and is waiting for reclaim, while the btrfs task that called percpu_counter_init() (and therefore percpu_alloc()) is holding either the btrfs commit_root semaphore or a transaction handle (done fs/btrfs/backref.c: iterate_extent_inodes()), which prevents reclaim from finishing as an attempt to commit the current btrfs transaction will deadlock. Lockdep reports this issue with the following trace: ====================================================== WARNING: possible circular locking dependency detected 5.6.0-rc7-btrfs-next-77 #1 Not tainted ------------------------------------------------------ kswapd0/91 is trying to acquire lock: ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] but task is already holding lock: ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (fs_reclaim){+.+.}: fs_reclaim_acquire.part.0+0x25/0x30 __kmalloc+0x5f/0x3a0 pcpu_create_chunk+0x19/0x230 pcpu_balance_workfn+0x56a/0x680 process_one_work+0x235/0x5f0 worker_thread+0x50/0x3b0 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 -> #3 (pcpu_alloc_mutex){+.+.}: __mutex_lock+0xa9/0xaf0 pcpu_alloc+0x480/0x7c0 __percpu_counter_init+0x50/0xd0 btrfs_drew_lock_init+0x22/0x70 [btrfs] btrfs_get_fs_root+0x29c/0x5c0 [btrfs] resolve_indirect_refs+0x120/0xa30 [btrfs] find_parent_nodes+0x50b/0xf30 [btrfs] btrfs_find_all_leafs+0x60/0xb0 [btrfs] iterate_extent_inodes+0x139/0x2f0 [btrfs] iterate_inodes_from_logical+0xa1/0xe0 [btrfs] btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs] btrfs_ioctl+0x165a/0x3130 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #2 (&fs_info->commit_root_sem){++++}: down_write+0x38/0x70 btrfs_cache_block_group+0x2ec/0x500 [btrfs] find_free_extent+0xc6a/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] commit_cowonly_roots+0x55/0x310 [btrfs] btrfs_commit_transaction+0x509/0xb20 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x93/0xc0 exit_to_usermode_loop+0xf9/0x100 do_syscall_64+0x20d/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #1 (&space_info->groups_sem){++++}: down_read+0x3c/0x140 find_free_extent+0xef6/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] btrfs_search_slot+0x50c/0xd60 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] __btrfs_update_delayed_inode+0x90/0x280 [btrfs] __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs] __btrfs_run_delayed_items+0x8e/0x180 [btrfs] btrfs_commit_transaction+0x31b/0xb20 [btrfs] iterate_supers+0x87/0xf0 ksys_sync+0x60/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&delayed_node->mutex){+.+.}: __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(pcpu_alloc_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/91: #0: (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 #1: (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0 #2: (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50 stack backtrace: CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8f/0xd0 check_noncircular+0x170/0x190 __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL to percpu_counter_init() in contextes where it is not reclaim safe, however that type of approach is discouraged since memalloc_[nofs|noio]_save() were introduced. Therefore this change makes pcpu_alloc() look up into an existing nofs/noio context before deciding whether it is in an atomic context or not. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07mm/slub: fix incorrect interpretation of s->offsetWaiman Long1-15/+30
In a couple of places in the slub memory allocator, the code uses "s->offset" as a check to see if the free pointer is put right after the object. That check is no longer true with commit 3202fa62fb43 ("slub: relocate freelist pointer to middle of object"). As a result, echoing "1" into the validate sysfs file, e.g. of dentry, may cause a bunch of "Freepointer corrupt" error reports like the following to appear with the system in panic afterwards. ============================================================================= BUG dentry(666:pmcd.service) (Tainted: G B): Freepointer corrupt ----------------------------------------------------------------------------- To fix it, use the check "s->offset == s->inuse" in the new helper function freeptr_outside_object() instead. Also add another helper function get_info_end() to return the end of info block (inuse + free pointer if not overlapping with object). Fixes: 3202fa62fb43 ("slub: relocate freelist pointer to middle of object") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Vitaly Nikolenko <vnik@duasynt.com> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Changbin Du <changbin.du@gmail.com> Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07scripts/gdb: repair rb_first() and rb_last()Aymeric Agon-Rambosson1-2/+2
The current implementations of the rb_first() and rb_last() gdb functions have a variable that references itself in its instanciation, which causes the function to throw an error if a specific condition on the argument is met. The original author rather intended to reference the argument and made a typo. Referring the argument instead makes the function work as intended. Signed-off-by: Aymeric Agon-Rambosson <aymeric.agon@yandex.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: Jackie Liu <liuyun01@kylinos.cn> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/r/20200427051029.354840-1-aymeric.agon@yandex.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07eventpoll: fix missing wakeup for ovflist in ep_poll_callbackKhazhismel Kumykov1-9/+9
In the event that we add to ovflist, before commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") we would be woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback. With that wakeup removed, if we add to ovflist here, we may never wake up. Rather than adding back the ep_scan_ready_list wakeup - which was resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback. We noticed that one of our workloads was missing wakeups starting with 339ddb53d373 and upon manual inspection, this wakeup seemed missing to me. With this patch added, we no longer see missing wakeups. I haven't yet tried to make a small reproducer, but the existing kselftests in filesystem/epoll passed for me with this patch. [khazhy@google.com: use if/elif instead of goto + cleanup suggested by Roman] Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Penyaev <rpenyaev@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Heiher <r@hev.cc> Cc: Jason Baron <jbaron@akamai.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()Janakarajan Natarajan1-1/+1
When trying to lock read-only pages, sev_pin_memory() fails because FOLL_WRITE is used as the flag for get_user_pages_fast(). Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") updated the get_user_pages_fast() call sites to use flags, but incorrectly updated the call in sev_pin_memory(). As the original coding of this call was correct, revert the change made by that commit. Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07scripts/decodecode: fix trapping instruction formattingIvan Delalande1-1/+1
If the trapping instruction contains a ':', for a memory access through segment registers for example, the sed substitution will insert the '*' marker in the middle of the instruction instead of the line address: 2b: 65 48 0f c7 0f cmpxchg16b %gs:*(%rdi) <-- trapping instruction I started to think I had forgotten some quirk of the assembly syntax before noticing that it was actually coming from the script. Fix it to add the address marker at the right place for these instructions: 28: 49 8b 06 mov (%r14),%rax 2b:* 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) <-- trapping instruction 30: 0f 94 c0 sete %al Fixes: 18ff44b189e2 ("scripts/decodecode: make faulting insn ptr more robust") Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20200419223653.GA31248@visor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07kernel/kcov.c: fix typos in kcov_remote_start documentationMaciej Grochowski1-2/+2
Signed-off-by: Maciej Grochowski <maciej.grochowski@pm.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Link: http://lkml.kernel.org/r/20200420030259.31674-1-maciek.grochowski@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()David Hildenbrand1-0/+1
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected, e.g., while booting up. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: __pageblock_pfn_to_page+0x134/0x1c0 Call Trace: set_zone_contiguous+0x56/0x70 page_alloc_init_late+0x166/0x176 kernel_init_freeable+0xfa/0x255 kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The issue becomes visible when having a lot of memory (e.g., 4TB) assigned to a single NUMA node - a system that can easily be created using QEMU. Inside VMs on a hypervisor with quite some memory overcommit, this is fairly easy to trigger. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07mm, memcg: fix error return value of mem_cgroup_css_alloc()Yafang Shao1-6/+9
When I run my memcg testcase which creates lots of memcgs, I found there're unexpected out of memory logs while there're still enough available free memory. The error log is mkdir: cannot create directory 'foo.65533': Cannot allocate memory The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno should be -ENOSPC "No space left on device", which is an appropriate errno for userspace's failed mkdir. As the errno really misled me, we should make it right. After this patch, the error log will be mkdir: cannot create directory 'foo.65533': No space left on device [akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal] [akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal] Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1586192163-20099-1-git-send-email-laoar.shao@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>