aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/page_idle.h (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-08-19lockdown: Lock down tracing and perf kprobes when in confidentiality modeDavid Howells3-0/+7
Disallow the creation of perf and ftrace kprobes when the kernel is locked down in confidentiality mode by preventing their registration. This prevents kprobes from being used to access kernel memory to steal crypto data, but continues to allow the use of kprobes from signed modules. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: davem@davemloft.net Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Lock down /proc/kcoreDavid Howells3-0/+7
Disallow access to /proc/kcore when the kernel is locked down to prevent access to cryptographic data. This is limited to lockdown confidentiality mode and is still permitted in integrity mode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19x86/mmiotrace: Lock down the testmmiotrace moduleDavid Howells3-0/+7
The testmmiotrace module shouldn't be permitted when the kernel is locked down as it can be used to arbitrarily read and write MMIO space. This is a runtime check rather than buildtime in order to allow configurations where the same kernel may be run in both locked down or permissive modes depending on local policy. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Howells <dhowells@redhat.com Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Kees Cook <keescook@chromium.org> cc: Thomas Gleixner <tglx@linutronix.de> cc: Steven Rostedt <rostedt@goodmis.org> cc: Ingo Molnar <mingo@kernel.org> cc: "H. Peter Anvin" <hpa@zytor.com> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Lock down module params that specify hardware parameters (eg. ioport)David Howells3-5/+18
Provided an annotation for module parameters that specify hardware parameters (such as io ports, iomem addresses, irqs, dma channels, fixed dma buffers and other types). Suggested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Lock down TIOCSSERIALDavid Howells3-0/+7
Lock down TIOCSSERIAL as that can be used to change the ioport and irq settings on a serial port. This only appears to be an issue for the serial drivers that use the core serial code. All other drivers seem to either ignore attempts to change port/irq or give an error. Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: Jiri Slaby <jslaby@suse.com> Cc: linux-serial@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Prohibit PCMCIA CIS storage when the kernel is locked downDavid Howells3-0/+7
Prohibit replacement of the PCMCIA Card Information Structure when the kernel is locked down. Suggested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19acpi: Disable ACPI table override if the kernel is locked downLinn Crosetto1-0/+6
>From the kernel documentation (initrd_table_override.txt): If the ACPI_INITRD_TABLE_OVERRIDE compile option is true, it is possible to override nearly any ACPI table provided by the BIOS with an instrumented, modified one. When lockdown is enabled, the kernel should disallow any unauthenticated changes to kernel space. ACPI tables contain code invoked by the kernel, so do not allow ACPI tables to be overridden if the kernel is locked down. Signed-off-by: Linn Crosetto <lcrosetto@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: linux-acpi@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19acpi: Ignore acpi_rsdp kernel param when the kernel has been locked downJosh Boyer7-7/+49
This option allows userspace to pass the RSDP address to the kernel, which makes it possible for a user to modify the workings of hardware. Reject the option when the kernel is locked down. This requires some reworking of the existing RSDP command line logic, since the early boot code also makes use of a command-line passed RSDP when locating the SRAT table before the lockdown code has been initialised. This is achieved by separating the command line RSDP path in the early boot code from the generic RSDP path, and then copying the command line RSDP into boot params in the kernel proper if lockdown is not enabled. If lockdown is enabled and an RSDP is provided on the command line, this will only be used when parsing SRAT (which shouldn't permit kernel code execution) and will be ignored in the rest of the kernel. (Modified by Matthew Garrett in order to handle the early boot RSDP environment) Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: Dave Young <dyoung@redhat.com> cc: linux-acpi@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19ACPI: Limit access to custom_method when the kernel is locked downMatthew Garrett3-0/+8
custom_method effectively allows arbitrary access to system memory, making it possible for an attacker to circumvent restrictions on module loading. Disable it if the kernel is locked down. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: linux-acpi@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19x86/msr: Restrict MSR access when the kernel is locked downMatthew Garrett3-0/+10
Writing to MSRs should not be allowed if the kernel is locked down, since it could lead to execution of arbitrary code in kernel mode. Based on a patch by Kees Cook. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19x86: Lock down IO port access when the kernel is locked downMatthew Garrett3-2/+7
IO port access would permit users to gain access to PCI configuration registers, which in turn (on a lot of hardware) give access to MMIO register space. This would potentially permit root to trigger arbitrary DMA, so lock it down by default. This also implicitly locks down the KDADDIO, KDDELIO, KDENABIO and KDDISABIO console ioctls. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19PCI: Lock down BAR access when the kernel is locked downMatthew Garrett5-3/+33
Any hardware that can potentially generate DMA has to be locked down in order to avoid it being possible for an attacker to modify kernel code, allowing them to circumvent disabled module loading or module signing. Default to paranoid - in future we can potentially relax this for sufficiently IOMMU-isolated devices. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: linux-pci@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19hibernate: Disable when the kernel is locked downJosh Boyer3-1/+4
There is currently no way to verify the resume image when returning from hibernate. This might compromise the signed modules trust model, so until we can work with signed hibernate images we disable it when the kernel is locked down. Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: rjw@rjwysocki.net Cc: pavel@ucw.cz cc: linux-pm@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19kexec_file: Restrict at runtime if the kernel is locked downJiri Bohac1-1/+1
When KEXEC_SIG is not enabled, kernel should not load images through kexec_file systemcall if the kernel is locked down. [Modified by David Howells to fit with modifications to the previous patch and to return -EPERM if the kernel is locked down for consistency with other lockdowns. Modified by Matthew Garrett to remove the IMA integration, which will be replaced by integrating with the IMA architecture policy patches.] Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCEJiri Bohac15-34/+88
This is a preparatory patch for kexec_file_load() lockdown. A locked down kernel needs to prevent unsigned kernel images from being loaded with kexec_file_load(). Currently, the only way to force the signature verification is compiling with KEXEC_VERIFY_SIG. This prevents loading usigned images even when the kernel is not locked down at runtime. This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE. Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG turns on the signature verification but allows unsigned images to be loaded. KEXEC_SIG_FORCE disallows images without a valid signature. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Copy secure_boot flag in boot params across kexec rebootDave Young1-0/+1
Kexec reboot in case secure boot being enabled does not keep the secure boot mode in new kernel, so later one can load unsigned kernel via legacy kexec_load. In this state, the system is missing the protections provided by secure boot. Adding a patch to fix this by retain the secure_boot flag in original kernel. secure_boot flag in boot_params is set in EFI stub, but kexec bypasses the stub. Fixing this issue by copying secure_boot flag across kexec reboot. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19kexec_load: Disable at runtime if the kernel is locked downMatthew Garrett3-0/+10
The kexec_load() syscall permits the loading and execution of arbitrary code in ring 0, which is something that lock-down is meant to prevent. It makes sense to disable kexec_load() in this situation. This does not affect kexec_file_load() syscall which can check for a signature on the image to be booted. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Dave Young <dyoung@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Restrict /dev/{mem,kmem,port} when the kernel is locked downMatthew Garrett3-2/+7
Allowing users to read and write to core kernel memory makes it possible for the kernel to be subverted, avoiding module loading restrictions, and also to steal cryptographic information. Disallow /dev/mem and /dev/kmem from being opened this when the kernel has been locked down to prevent this. Also disallow /dev/port from being opened to prevent raw ioport access and thus DMA from being used to accomplish the same thing. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19lockdown: Enforce module signatures if the kernel is locked downDavid Howells5-7/+38
If the kernel is locked down, require that all modules have valid signatures that we can verify. I have adjusted the errors generated: (1) If there's no signature (ENODATA) or we can't check it (ENOPKG, ENOKEY), then: (a) If signatures are enforced then EKEYREJECTED is returned. (b) If there's no signature or we can't check it, but the kernel is locked down then EPERM is returned (this is then consistent with other lockdown cases). (2) If the signature is unparseable (EBADMSG, EINVAL), the signature fails the check (EKEYREJECTED) or a system error occurs (eg. ENOMEM), we return the error we got. Note that the X.509 code doesn't check for key expiry as the RTC might not be valid or might not have been transferred to the kernel's clock yet. [Modified by Matthew Garrett to remove the IMA integration. This will be replaced with integration with the IMA architecture policy patchset.] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <matthewgarrett@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19security: Add a static lockdown policy LSMMatthew Garrett7-5/+236
While existing LSMs can be extended to handle lockdown policy, distributions generally want to be able to apply a straightforward static policy. This patch adds a simple LSM that can be configured to reject either integrity or all lockdown queries, and can be configured at runtime (through securityfs), boot time (via a kernel parameter) or build time (via a kconfig option). Based on initial code by David Howells. Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19security: Add a "locked down" LSM hookMatthew Garrett3-0/+45
Add a mechanism to allow LSMs to make a policy decision around whether kernel functionality that would allow tampering with or examining the runtime state of the kernel should be permitted. Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19security: Support early LSMsMatthew Garrett5-9/+62
The lockdown module is intended to allow for kernels to be locked down early in boot - sufficiently early that we don't have the ability to kmalloc() yet. Add support for early initialisation of some LSMs, and then add them to the list of names when we do full initialisation later. Early LSMs are initialised in link order and cannot be overridden via boot parameters, and cannot make use of kmalloc() (since the allocator isn't initialised yet). (Fixed by Stephen Rothwell to include a stub to fix builds when !CONFIG_SECURITY) Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: James Morris <jmorris@namei.org>
2019-07-07Linux 5.2Linus Torvalds1-1/+1
2019-07-06blk-mq: fix up placement of debugfs directory of queue filesGreg Kroah-Hartman1-0/+7
When the blk-mq debugfs file creation logic was "cleaned up" it was cleaned up too much, causing the queue file to not be created in the correct location. Turns out the check for the directory being present is needed as if that has not happened yet, the files should not be created, and the function will be called later on in the initialization code so that the files can be created in the correct location. Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-05Revert "mm: page cache: store only head pages in i_pages"Linus Torvalds8-82/+94
This reverts commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f. Mikhail Gavrilov reports that it causes the VM_BUG_ON_PAGE() in __delete_from_swap_cache() to trigger: page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689 index:0xfecec363 anon flags: 0x17fffe00080034(uptodate|lru|active|swapbacked) raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689 raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000 page dumped because: VM_BUG_ON_PAGE(entry != page) page->mem_cgroup:ffff978433ace000 ------------[ cut here ]------------ kernel BUG at mm/swap_state.c:170! invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1 Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2202 04/11/2019 RIP: 0010:__delete_from_swap_cache+0x20d/0x240 Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff <0f> 0b 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f RSP: 0018:ffffa982036e7980 EFLAGS: 00010046 RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900 RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535 R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120 R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000 FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0 Call Trace: delete_from_swap_cache+0x46/0xa0 try_to_free_swap+0xbc/0x110 swap_writepage+0x13/0x70 pageout.isra.0+0x13c/0x350 shrink_page_list+0xc14/0xdf0 shrink_inactive_list+0x1e5/0x3c0 shrink_node_memcg+0x202/0x760 shrink_node+0xe0/0x470 balance_pgdat+0x2d1/0x510 kswapd+0x220/0x420 kthread+0xfb/0x130 ret_from_fork+0x22/0x40 and it's not immediately obvious why it happens. It's too late in the rc cycle to do anything but revert for now. Link: https://lore.kernel.org/lkml/CABXGCsN9mYmBD-4GaaeW_NrDu+FDXLzr_6x+XNxfmFV6QkYCDg@mail.gmail.com/ Reported-and-bisected-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05mtd: rawnand: sunxi: Add A23/A33 DMA support with extra MBUS configurationMiquel Raynal1-0/+24
Allwinner NAND controllers can make use of DMA to enhance the I/O throughput thanks to ECC pipelining. DMA handling with A23/A33 NAND IP is a bit different than with the older SoCs, hence the introduction of a new compatible to handle: * the differences between register offsets, * the burst length change from 4 to minimum 8, * manage SRAM accesses through MBUS with extra configuration. Fixes: c49836f05aa1 ("mtd: rawnand: sunxi: Add A23/A33 DMA support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-07-05Revert "mtd: rawnand: sunxi: Add A23/A33 DMA support"Miquel Raynal1-36/+2
This reverts commit c49836f05aa15282f7280e06ede3f6f8a6324833. The commit is wrong and its approach actually does not work. Let's revert it in order to add the feature with a clean patch. Fixes: c49836f05aa1 ("mtd: rawnand: sunxi: Add A23/A33 DMA support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-07-05i2c: tegra: Add Dmitry as a reviewerDmitry Osipenko1-0/+1
I'm contributing to Tegra's upstream development in general and happened to review the Tegra's I2C patches for awhile because I'm actively using upstream kernel on all of my Tegra-powered devices and initially some of the submitted patches were getting my attention since they were causing problems. Recently Wolfram Sang asked whether I'm interested in becoming a reviewer for the driver and I don't mind at all. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> [wsa: ack was expressed by Thierry Reding in a mail thread] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-07-05fs: VALIDATE_FS_PARSER should default to nGeert Uytterhoeven1-1/+0
CONFIG_VALIDATE_FS_PARSER is a debugging tool to check that the parser tables are vaguely sane. It was set to default to 'Y' for the moment to catch errors in upcoming fs conversion development. Make sure it is not enabled by default in the final release of v5.1. Fixes: 31d921c7fb969172 ("vfs: Add configuration parser helpers") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05KVM: arm64/sve: Fix vq_present() macro to yield a boolZhang Lei1-1/+1
The original implementation of vq_present() relied on aggressive inlining in order for the compiler to know that the code is correct, due to some const-casting issues. This was causing sparse and clang to complain, while GCC compiled cleanly. Commit 0c529ff789bc addressed this problem, but since vq_present() is no longer a function, there is now no implicit casting of the returned value to the return type (bool). In set_sve_vls(), this uncast bit value is compared against a bool, and so may spuriously compare as unequal when both are nonzero. As a result, KVM may reject valid SVE vector length configurations as invalid, and vice versa. Fix it by forcing the returned value to a bool. Signed-off-by: Zhang Lei <zhang.lei@jp.fujitsu.com> Fixes: 0c529ff789bc ("KVM: arm64: Implement vq_present() as a macro") Signed-off-by: Dave Martin <Dave.Martin@arm.com> [commit message rewrite] Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-05dmaengine: qcom: bam_dma: Fix completed descriptors countSricharan R1-0/+3
One space is left unused in circular FIFO to differentiate 'full' and 'empty' cases. So take that in to account while counting for the descriptors completed. Fixes the issue reported here, https://lkml.org/lkml/2019/6/18/669 Cc: stable@vger.kernel.org Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Sricharan R <sricharan@codeaurora.org> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05dmaengine: imx-sdma: remove BD_INTR for channel0Robin Gong1-2/+2
It is possible for an irq triggered by channel0 to be received later after clks are disabled once firmware loaded during sdma probe. If that happens then clearing them by writing to SDMA_H_INTR won't work and the kernel will hang processing infinite interrupts. Actually, don't need interrupt triggered on channel0 since it's pollling SDMA_H_STATSTOP to know channel0 done rather than interrupt in current code, just clear BD_INTR to disable channel0 interrupt to avoid the above case. This issue was brought by commit 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") which didn't take care the above case. Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") Cc: stable@vger.kernel.org #5.0+ Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Sven Van Asbroeck <thesven73@gmail.com> Tested-by: Sven Van Asbroeck <thesven73@gmail.com> Reviewed-by: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05dmaengine: imx-sdma: fix use-after-free on probe error pathSven Van Asbroeck1-21/+27
If probe() fails anywhere beyond the point where sdma_get_firmware() is called, then a kernel oops may occur. Problematic sequence of events: 1. probe() calls sdma_get_firmware(), which schedules the firmware callback to run when firmware becomes available, using the sdma instance structure as the context 2. probe() encounters an error, which deallocates the sdma instance structure 3. firmware becomes available, firmware callback is called with deallocated sdma instance structure 4. use after free - kernel oops ! Solution: only attempt to load firmware when we're certain that probe() will succeed. This guarantees that the firmware callback's context will remain valid. Note that the remove() path is unaffected by this issue: the firmware loader will increment the driver module's use count, ensuring that the module cannot be unloaded while the firmware callback is pending or running. Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Robin Gong <yibin.gong@nxp.com> [vkoul: fixed braces for if condition] Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05dmaengine: jz4780: Fix an endian bug in IRQ handlerDan Carpenter1-2/+3
The "pending" variable was a u32 but we cast it to an unsigned long pointer when we do the for_each_set_bit() loop. The problem is that on big endian 64bit systems that results in an out of bounds read. Fixes: 4e4106f5e942 ("dmaengine: jz4780: Fix transfers being ACKed too soon") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05swap_readpage(): avoid blk_wake_io_task() if !synchronousOleg Nesterov1-5/+8
swap_readpage() sets waiter = bio->bi_private even if synchronous = F, this means that the caller can get the spurious wakeup after return. This can be fatal if blk_wake_io_task() does set_current_state(TASK_RUNNING) after the caller does set_special_state(), in the worst case the kernel can crash in do_task_dead(). Link: http://lkml.kernel.org/r/20190704160301.GA5956@redhat.com Fixes: 0619317ff8baa2d ("block: add polled wakeup task helper") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Qian Cai <cai@lca.pw> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05devres: allow const resource argumentsArnd Bergmann2-2/+4
devm_ioremap_resource() does not currently take 'const' arguments, which results in a warning from the first driver trying to do it anyway: drivers/gpio/gpio-amd-fch.c: In function 'amd_fch_gpio_probe': drivers/gpio/gpio-amd-fch.c:171:49: error: passing argument 2 of 'devm_ioremap_resource' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] priv->base = devm_ioremap_resource(&pdev->dev, &amd_fch_gpio_iores); ^~~~~~~~~~~~~~~~~~~ Change the prototype to allow it, as there is no real reason not to. Link: http://lkml.kernel.org/r/20190628150049.1108048-1-arnd@arndb.de Fixes: 9bb2e0452508 ("gpio: amd: Make resource struct const") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05mm/vmscan.c: prevent useless kswapd loopsShakeel Butt1-12/+15
In production we have noticed hard lockups on large machines running large jobs due to kswaps hoarding lru lock within isolate_lru_pages when sc->reclaim_idx is 0 which is a small zone. The lru was couple hundred GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in isolate_lru_pages() was basically skipping GiBs of pages while holding the LRU spinlock with interrupt disabled. On further inspection, it seems like there are two issues: (1) If kswapd on the return from balance_pgdat() could not sleep (i.e. node is still unbalanced), the classzone_idx is unintentionally set to 0 and the whole reclaim cycle of kswapd will try to reclaim only the lowest and smallest zone while traversing the whole memory. (2) Fundamentally isolate_lru_pages() is really bad when the allocation has woken kswapd for a smaller zone on a very large machine running very large jobs. It can hoard the LRU spinlock while skipping over 100s of GiBs of pages. This patch only fixes (1). (2) needs a more fundamental solution. To fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is invalid use the classzone_idx of the previous kswapd loop otherwise use the one the waker has requested. Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com Fixes: e716f2eb24de ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05fs/userfaultfd.c: disable irqs for fault_pending and event locksEric Biggers1-16/+26
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by userfaultfd_ctx_read(), which in turn can be waiting for userfaultfd_ctx::fault_pending_wqh.lock or userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and event_wqh locks are taken with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by always disabling IRQs when taking the fault_pending_wqh and event_wqh locks. Commit ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock") didn't fix this because it only accounted for the fd_wqh lock, not the other locks nested inside it. Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05mm/page_alloc.c: fix regression with deferred struct page initJuergen Gross1-1/+2
Commit 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") is causing a regression on some systems when the kernel is booted as Xen dom0. The system will just hang in early boot. Reason is an endless loop in get_page_from_freelist() in case the first zone looked at has no free memory. deferred_grow_zone() is always returning true due to the following code snipplet: /* If the zone is empty somebody else may have cleared out the zone */ if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, first_deferred_pfn)) { pgdat->first_deferred_pfn = ULONG_MAX; pgdat_resize_unlock(pgdat, &flags); return true; } This in turn results in the loop as get_page_from_freelist() is assuming forward progress can be made by doing some more struct page initialization. Link: http://lkml.kernel.org/r/20190620160821.4210-1-jgross@suse.com Fixes: 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") Signed-off-by: Juergen Gross <jgross@suse.com> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEMEJann Horn1-3/+1
Fix two issues: When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU reference to the parent's objective credentials, then give that pointer to get_cred(). However, the object lifetime rules for things like struct cred do not permit unconditionally turning an RCU reference into a stable reference. PTRACE_TRACEME records the parent's credentials as if the parent was acting as the subject, but that's not the case. If a malicious unprivileged child uses PTRACE_TRACEME and the parent is privileged, and at a later point, the parent process becomes attacker-controlled (because it drops privileges and calls execve()), the attacker ends up with control over two processes with a privileged ptrace relationship, which can be abused to ptrace a suid binary and obtain root privileges. Fix both of these by always recording the credentials of the process that is requesting the creation of the ptrace relationship: current_cred() can't change under us, and current is the proper subject for access control. This change is theoretically userspace-visible, but I am not aware of any code that it will actually break. Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-04drm/imx: only send event on crtc disable if kept disabledRobert Beckett1-1/+1
The event will be sent as part of the vblank enable during the modeset if the crtc is not being kept disabled. Fixes: 5f2f911578fb ("drm/imx: atomic phase 3 step 1: Use atomic configuration") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-07-04drm/imx: notify drm core before sending event during crtc disableRobert Beckett1-2/+2
Notify drm core before sending pending events during crtc disable. This fixes the first event after disable having an old stale timestamp by having drm_crtc_vblank_off update the timestamp to now. This was seen while debugging weston log message: Warning: computed repaint delay is insane: -8212 msec This occurred due to: 1. driver starts up 2. fbcon comes along and restores fbdev, enabling vblank 3. vblank_disable_fn fires via timer disabling vblank, keeping vblank seq number and time set at current value (some time later) 4. weston starts and does a modeset 5. atomic commit disables crtc while it does the modeset 6. ipu_crtc_atomic_disable sends vblank with old seq number and time Fixes: a474478642d5 ("drm/imx: fix crtc vblank state regression") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-07-03nfsd: Fix overflow causing non-working mounts on 1 TB machinesPaul Menzel1-1/+1
Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03crypto: user - prevent operating on larval algorithmsEric Biggers1-0/+3
Michal Suchanek reported [1] that running the pcrypt_aead01 test from LTP [2] in a loop and holding Ctrl-C causes a NULL dereference of alg->cra_users.next in crypto_remove_spawns(), via crypto_del_alg(). The test repeatedly uses CRYPTO_MSG_NEWALG and CRYPTO_MSG_DELALG. The crash occurs when the instance that CRYPTO_MSG_DELALG is trying to unregister isn't a real registered algorithm, but rather is a "test larval", which is a special "algorithm" added to the algorithms list while the real algorithm is still being tested. Larvals don't have initialized cra_users, so that causes the crash. Normally pcrypt_aead01 doesn't trigger this because CRYPTO_MSG_NEWALG waits for the algorithm to be tested; however, CRYPTO_MSG_NEWALG returns early when interrupted. Everything else in the "crypto user configuration" API has this same bug too, i.e. it inappropriately allows operating on larval algorithms (though it doesn't look like the other cases can cause a crash). Fix this by making crypto_alg_match() exclude larval algorithms. [1] https://lkml.kernel.org/r/20190625071624.27039-1-msuchanek@suse.de [2] https://github.com/linux-test-project/ltp/blob/20190517/testcases/kernel/crypto/pcrypt_aead01.c Reported-by: Michal Suchanek <msuchanek@suse.de> Fixes: a38f7907b926 ("crypto: Add userspace configuration API") Cc: <stable@vger.kernel.org> # v3.2+ Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-03crypto: cryptd - Fix skcipher instance memory leakVincent Whitchurch1-0/+1
cryptd_skcipher_free() fails to free the struct skcipher_instance allocated in cryptd_create_skcipher(), leading to a memory leak. This is detected by kmemleak on bootup on ARM64 platforms: unreferenced object 0xffff80003377b180 (size 1024): comm "cryptomgr_probe", pid 822, jiffies 4294894830 (age 52.760s) backtrace: kmem_cache_alloc_trace+0x270/0x2d0 cryptd_create+0x990/0x124c cryptomgr_probe+0x5c/0x1e8 kthread+0x258/0x318 ret_from_fork+0x10/0x1c Fixes: 4e0958d19bd8 ("crypto: cryptd - Add support for skcipher") Cc: <stable@vger.kernel.org> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-03lib/mpi: Fix karactx leak in mpi_powmHerbert Xu1-4/+2
Sometimes mpi_powm will leak karactx because a memory allocation failure causes a bail-out that skips the freeing of karactx. This patch moves the freeing of karactx to the end of the function like everything else so that it can't be skipped. Reported-by: syzbot+f7baccc38dcc1e094e77@syzkaller.appspotmail.com Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files...") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-03Bluetooth: Fix faulty expression for minimum encryption key size checkMatias Karhumaa1-1/+1
Fix minimum encryption key size check so that HCI_MIN_ENC_KEY_SIZE is also allowed as stated in the comment. This bug caused connection problems with devices having maximum encryption key size of 7 octets (56-bit). Fixes: 693cd8ce3f88 ("Bluetooth: Fix regression with minimum encryption key size alignment") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203997 Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-02scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not supportedMaurizio Lombardi1-8/+8
If the CHAP_A value is not supported, the chap_server_open() function should free the auth_protocol pointer and set it to NULL, or we will leave a dangling pointer around. [ 66.010905] Unsupported CHAP_A value [ 66.011660] Security negotiation failed. [ 66.012443] iSCSI Login negotiation failed. [ 68.413924] general protection fault: 0000 [#1] SMP PTI [ 68.414962] CPU: 0 PID: 1562 Comm: targetcli Kdump: loaded Not tainted 4.18.0-80.el8.x86_64 #1 [ 68.416589] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 68.417677] RIP: 0010:__kmalloc_track_caller+0xc2/0x210 Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02scsi: target/iblock: Fix overrun in WRITE SAME emulationRoman Bolshakov1-1/+1
WRITE SAME corrupts data on the block device behind iblock if the command is emulated. The emulation code issues (M - 1) * N times more bios than requested, where M is the number of 512 blocks per real block size and N is the NUMBER OF LOGICAL BLOCKS specified in WRITE SAME command. So, for a device with 4k blocks, 7 * N more LBAs gets written after the requested range. The issue happens because the number of 512 byte sectors to be written is decreased one by one while the real bios are typically from 1 to 8 512 byte sectors per bio. Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") Cc: <stable@vger.kernel.org> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02gpio/spi: Fix spi-gpio regression on active high CSLinus Walleij1-1/+8
I ran into an intriguing bug caused by commit ""spi: gpio: Don't request CS GPIO in DT use-case" affecting all SPI GPIO devices with an active high chip select line. The commit switches the CS gpio handling over to the GPIO core, which will parse and handle "cs-gpios" from the OF node without even calling down to the driver to get the job done. However the GPIO core handles the standard bindings in Documentation/devicetree/bindings/spi/spi-controller.yaml that specifies that active high CS needs to be specified using "spi-cs-high" in the DT node. The code in drivers/spi/spi-gpio.c never respected this and never tried to inspect subnodes to see if they contained "spi-cs-high" like the gpiolib OF quirks does. Instead the only way to get an active high CS was to tag it in the device tree using the flags cell such as cs-gpios = <&gpio 4 GPIO_ACTIVE_HIGH>; This alters the quirks to not inspect the subnodes of SPI masters on "spi-gpio" for the standard attribute "spi-cs-high", making old device trees work as expected. This semantic is a bit ambigous, but just allowing the flags on the GPIO descriptor to modify polarity is what the kernel at large mostly uses so let's encourage that. Fixes: 249e2632dcd0 ("spi: gpio: Don't request CS GPIO in DT use-case") Cc: Andrey Smirnov <andrew.smirnov@gmail.com> Cc: linux-gpio@vger.kernel.org Cc: linux-spi@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>