aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-10cifs: fix parsing of symbolic link error responseRonnie Sahlberg2-4/+12
RHBZ: 1672539 In smb2_query_symlink(), if we are parsing the error buffer but it is not something we recognize as a symlink we should return -EINVAL and not -ENOENT. I.e. the entry does exist, it is just not something we recognize. Additionally, add check to verify that that the errortag and the reparsetag all make sense. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Acked-by: Paulo Alcantara <palcantara@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07cifs: refactor and clean up arguments in the reparse point parsingRonnie Sahlberg1-35/+31
Will be helpful as we improve handling of special file types. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07SMB3: query inode number on open via create contextSteve French2-5/+60
We can cut the number of roundtrips on open (may also help some rename cases as well) by returning the inode number in the SMB2 open request itself instead of querying it afterwards via a query FILE_INTERNAL_INFO. This should significantly improve the performance of posix open. Add SMB2_CREATE_QUERY_ON_DISK_ID create context request on open calls so that when server supports this we can save a roundtrip for QUERY_INFO on every open. Follow on patch will add the response processing for SMB2_CREATE_QUERY_ON_DISK_ID context and optimize smb2_open_file to avoid the extra network roundtrip on every posix open. This patch adds the context on SMB2/SMB3 open requests. Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07smb3: Send netname context during negotiate protocolSteve French2-2/+29
See MS-SMB2 2.2.3.1.4 Allows hostname to be used by load balancers Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07smb3: do not send compression info by defaultSteve French3-10/+21
Since in theory a server could respond with compressed read responses even if not requested on read request (assuming that a compression negcontext is sent in negotiate protocol) - do not send compression information during negotiate protocol unless the user asks for compression explicitly (compression is experimental), and add a mount warning that compression is experimental. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-07-07smb3: add new mount option to retrieve mode from special ACESteve French4-2/+12
There is a special ACE used by some servers to allow the mode bits to be stored. This can be especially helpful in scenarios in which the client is trusted, and access checking on the client vs the POSIX mode bits is sufficient. Add mount option to allow enabling this behavior. Follow on patch will add support for chmod and queryinfo (stat) by retrieving the POSIX mode bits from the special ACE, SID: S-1-5-88-3 See e.g. https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/hh509017(v=ws.10) Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-07-07smb3: Allow query of symlinks stored as reparse pointsSteve French1-6/+54
The 'NFS' style symlinks (see MS-FSCC 2.1.2.4) were not being queried properly in query_symlink. Fix this. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-07-07cifs: Fix a race condition with cifs_echo_requestRonnie Sahlberg1-4/+4
There is a race condition with how we send (or supress and don't send) smb echos that will cause the client to incorrectly think the server is unresponsive and thus needs to be reconnected. Summary of the race condition: 1) Daisy chaining scheduling creates a gap. 2) If traffic comes unfortunate shortly after the last echo, the planned echo is suppressed. 3) Due to the gap, the next echo transmission is delayed until after the timeout, which is set hard to twice the echo interval. This is fixed by changing the timeouts from 2 to three times the echo interval. Detailed description of the bug: https://lutz.donnerhacke.de/eng/Blog/Groundhog-Day-with-SMB-remount Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07cifs: always add credits back for unsolicited PDUsRonnie Sahlberg1-1/+1
not just if CONFIG_CIFS_DEBUG2 is enabled. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07fs: cifs: cifsssmb: Change return type of convert_ace_to_cifs_aceHariprasad Kelam1-11/+3
Change return from int to void of convert_ace_to_cifs_ace as it never fails. fixes below issue reported by coccicheck fs/cifs/cifssmb.c:3606:7-9: Unneeded variable: "rc". Return "0" on line 3620 Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07add some missing definitionsSteve French1-0/+8
query on disk id structure definition was missing Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07cifs: fix typo in debug message with struct field ia_validColin Ian King1-1/+1
Field ia_valid is being debugged with the field name iavalid, fix this. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07smb3: minor cleanup of compound_send_recvAurelien Aptel1-22/+24
Trivial cleanup. Will make future multichannel code smaller as well. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07CIFS: Fix module dependencySteve French1-2/+3
KEYS is required not that CONFIG_CIFS_ACL is always on and the ifdef for it removed. Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07cifs: simplify code by removing CONFIG_CIFS_ACL ifdefSteve French10-46/+1
SMB3 ACL support is needed for many use cases now and should not be ifdeffed out, even for SMB1 (CIFS). Remove the CONFIG_CIFS_ACL ifdef so ACL support is always built into cifs.ko Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07cifs: Fix check for matching with existing mountSteve French1-0/+1
If we mount the same share twice, we check the flags to see if the second mount matches the earlier mount, but we left some flags out. Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07cifs: Properly handle auto disabling of serverino optionPaulo Alcantara (SUSE)3-2/+12
Fix mount options comparison when serverino option is turned off later in cifs_autodisable_serverino() and thus avoiding mismatch of new cifs mounts. Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <paulo@paulo.ac> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilove@microsoft.com>
2019-07-07smb3: if max_credits is specified then display it in /proc/mountsSteve French1-0/+5
If "max_credits" is overridden from its default by specifying it on the smb3 mount then display it in /proc/mounts Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07Fix match_server check to allow for auto dialect negotiateSteve French3-2/+11
When using multidialect negotiate (default or specifying vers=3.0 which allows any smb3 dialect), fix how we check for an existing server session. Before this fix if you mounted a second time to the same server (e.g. a different share on the same server) we would only reuse the existing smb session if a single dialect were requested (e.g. specifying vers=2.1 or vers=3.0 or vers=3.1.1 on the mount command). If a default mount (e.g. not specifying vers=) is done then would always create a new socket connection and SMB3 (or SMB3.1.1) session each time we connect to a different share on the same server rather than reusing the existing one. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-07-07cifs: add missing GCM module dependencyAurelien Aptel2-0/+2
Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07SMB3.1.1: Add GCM crypto to the encrypt and decrypt functionsSteve French2-7/+21
SMB3.1.1 GCM performs much better than the older CCM default: more than twice as fast in the write patch (copy to the Samba server on localhost for example) and 80% faster on the read patch (copy from the server). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-07-07SMB3: Add SMB3.1.1 GCM to negotiated crypto algorigthmsSteve French3-8/+8
GCM is faster. Request it during negotiate protocol. Followon patch will add callouts to GCM crypto Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-07-07fs: cifs: Drop unlikely before IS_ERR(_OR_NULL)Kefeng Wang1-1/+1
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag, so no need to do that again from its callers. Drop it. Cc: linux-cifs@vger.kernel.org Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2019-07-07cifs: Use kmemdup in SMB2_ioctl_init()YueHaibing1-2/+1
Use kmemdup rather than duplicating its implementation This was reported by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-07Linux 5.2Linus Torvalds1-1/+1
2019-07-06blk-mq: fix up placement of debugfs directory of queue filesGreg Kroah-Hartman1-0/+7
When the blk-mq debugfs file creation logic was "cleaned up" it was cleaned up too much, causing the queue file to not be created in the correct location. Turns out the check for the directory being present is needed as if that has not happened yet, the files should not be created, and the function will be called later on in the initialization code so that the files can be created in the correct location. Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-05Revert "mm: page cache: store only head pages in i_pages"Linus Torvalds8-82/+94
This reverts commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f. Mikhail Gavrilov reports that it causes the VM_BUG_ON_PAGE() in __delete_from_swap_cache() to trigger: page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689 index:0xfecec363 anon flags: 0x17fffe00080034(uptodate|lru|active|swapbacked) raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689 raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000 page dumped because: VM_BUG_ON_PAGE(entry != page) page->mem_cgroup:ffff978433ace000 ------------[ cut here ]------------ kernel BUG at mm/swap_state.c:170! invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1 Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2202 04/11/2019 RIP: 0010:__delete_from_swap_cache+0x20d/0x240 Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff <0f> 0b 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f RSP: 0018:ffffa982036e7980 EFLAGS: 00010046 RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900 RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535 R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120 R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000 FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0 Call Trace: delete_from_swap_cache+0x46/0xa0 try_to_free_swap+0xbc/0x110 swap_writepage+0x13/0x70 pageout.isra.0+0x13c/0x350 shrink_page_list+0xc14/0xdf0 shrink_inactive_list+0x1e5/0x3c0 shrink_node_memcg+0x202/0x760 shrink_node+0xe0/0x470 balance_pgdat+0x2d1/0x510 kswapd+0x220/0x420 kthread+0xfb/0x130 ret_from_fork+0x22/0x40 and it's not immediately obvious why it happens. It's too late in the rc cycle to do anything but revert for now. Link: https://lore.kernel.org/lkml/CABXGCsN9mYmBD-4GaaeW_NrDu+FDXLzr_6x+XNxfmFV6QkYCDg@mail.gmail.com/ Reported-and-bisected-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05mtd: rawnand: sunxi: Add A23/A33 DMA support with extra MBUS configurationMiquel Raynal1-0/+24
Allwinner NAND controllers can make use of DMA to enhance the I/O throughput thanks to ECC pipelining. DMA handling with A23/A33 NAND IP is a bit different than with the older SoCs, hence the introduction of a new compatible to handle: * the differences between register offsets, * the burst length change from 4 to minimum 8, * manage SRAM accesses through MBUS with extra configuration. Fixes: c49836f05aa1 ("mtd: rawnand: sunxi: Add A23/A33 DMA support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-07-05Revert "mtd: rawnand: sunxi: Add A23/A33 DMA support"Miquel Raynal1-36/+2
This reverts commit c49836f05aa15282f7280e06ede3f6f8a6324833. The commit is wrong and its approach actually does not work. Let's revert it in order to add the feature with a clean patch. Fixes: c49836f05aa1 ("mtd: rawnand: sunxi: Add A23/A33 DMA support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2019-07-05i2c: tegra: Add Dmitry as a reviewerDmitry Osipenko1-0/+1
I'm contributing to Tegra's upstream development in general and happened to review the Tegra's I2C patches for awhile because I'm actively using upstream kernel on all of my Tegra-powered devices and initially some of the submitted patches were getting my attention since they were causing problems. Recently Wolfram Sang asked whether I'm interested in becoming a reviewer for the driver and I don't mind at all. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> [wsa: ack was expressed by Thierry Reding in a mail thread] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-07-05fs: VALIDATE_FS_PARSER should default to nGeert Uytterhoeven1-1/+0
CONFIG_VALIDATE_FS_PARSER is a debugging tool to check that the parser tables are vaguely sane. It was set to default to 'Y' for the moment to catch errors in upcoming fs conversion development. Make sure it is not enabled by default in the final release of v5.1. Fixes: 31d921c7fb969172 ("vfs: Add configuration parser helpers") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05KVM: arm64/sve: Fix vq_present() macro to yield a boolZhang Lei1-1/+1
The original implementation of vq_present() relied on aggressive inlining in order for the compiler to know that the code is correct, due to some const-casting issues. This was causing sparse and clang to complain, while GCC compiled cleanly. Commit 0c529ff789bc addressed this problem, but since vq_present() is no longer a function, there is now no implicit casting of the returned value to the return type (bool). In set_sve_vls(), this uncast bit value is compared against a bool, and so may spuriously compare as unequal when both are nonzero. As a result, KVM may reject valid SVE vector length configurations as invalid, and vice versa. Fix it by forcing the returned value to a bool. Signed-off-by: Zhang Lei <zhang.lei@jp.fujitsu.com> Fixes: 0c529ff789bc ("KVM: arm64: Implement vq_present() as a macro") Signed-off-by: Dave Martin <Dave.Martin@arm.com> [commit message rewrite] Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-05dmaengine: qcom: bam_dma: Fix completed descriptors countSricharan R1-0/+3
One space is left unused in circular FIFO to differentiate 'full' and 'empty' cases. So take that in to account while counting for the descriptors completed. Fixes the issue reported here, https://lkml.org/lkml/2019/6/18/669 Cc: stable@vger.kernel.org Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Sricharan R <sricharan@codeaurora.org> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05dmaengine: imx-sdma: remove BD_INTR for channel0Robin Gong1-2/+2
It is possible for an irq triggered by channel0 to be received later after clks are disabled once firmware loaded during sdma probe. If that happens then clearing them by writing to SDMA_H_INTR won't work and the kernel will hang processing infinite interrupts. Actually, don't need interrupt triggered on channel0 since it's pollling SDMA_H_STATSTOP to know channel0 done rather than interrupt in current code, just clear BD_INTR to disable channel0 interrupt to avoid the above case. This issue was brought by commit 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") which didn't take care the above case. Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") Cc: stable@vger.kernel.org #5.0+ Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Sven Van Asbroeck <thesven73@gmail.com> Tested-by: Sven Van Asbroeck <thesven73@gmail.com> Reviewed-by: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05dmaengine: imx-sdma: fix use-after-free on probe error pathSven Van Asbroeck1-21/+27
If probe() fails anywhere beyond the point where sdma_get_firmware() is called, then a kernel oops may occur. Problematic sequence of events: 1. probe() calls sdma_get_firmware(), which schedules the firmware callback to run when firmware becomes available, using the sdma instance structure as the context 2. probe() encounters an error, which deallocates the sdma instance structure 3. firmware becomes available, firmware callback is called with deallocated sdma instance structure 4. use after free - kernel oops ! Solution: only attempt to load firmware when we're certain that probe() will succeed. This guarantees that the firmware callback's context will remain valid. Note that the remove() path is unaffected by this issue: the firmware loader will increment the driver module's use count, ensuring that the module cannot be unloaded while the firmware callback is pending or running. Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Robin Gong <yibin.gong@nxp.com> [vkoul: fixed braces for if condition] Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05dmaengine: jz4780: Fix an endian bug in IRQ handlerDan Carpenter1-2/+3
The "pending" variable was a u32 but we cast it to an unsigned long pointer when we do the for_each_set_bit() loop. The problem is that on big endian 64bit systems that results in an out of bounds read. Fixes: 4e4106f5e942 ("dmaengine: jz4780: Fix transfers being ACKed too soon") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-07-05swap_readpage(): avoid blk_wake_io_task() if !synchronousOleg Nesterov1-5/+8
swap_readpage() sets waiter = bio->bi_private even if synchronous = F, this means that the caller can get the spurious wakeup after return. This can be fatal if blk_wake_io_task() does set_current_state(TASK_RUNNING) after the caller does set_special_state(), in the worst case the kernel can crash in do_task_dead(). Link: http://lkml.kernel.org/r/20190704160301.GA5956@redhat.com Fixes: 0619317ff8baa2d ("block: add polled wakeup task helper") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Qian Cai <cai@lca.pw> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05devres: allow const resource argumentsArnd Bergmann2-2/+4
devm_ioremap_resource() does not currently take 'const' arguments, which results in a warning from the first driver trying to do it anyway: drivers/gpio/gpio-amd-fch.c: In function 'amd_fch_gpio_probe': drivers/gpio/gpio-amd-fch.c:171:49: error: passing argument 2 of 'devm_ioremap_resource' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] priv->base = devm_ioremap_resource(&pdev->dev, &amd_fch_gpio_iores); ^~~~~~~~~~~~~~~~~~~ Change the prototype to allow it, as there is no real reason not to. Link: http://lkml.kernel.org/r/20190628150049.1108048-1-arnd@arndb.de Fixes: 9bb2e0452508 ("gpio: amd: Make resource struct const") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05mm/vmscan.c: prevent useless kswapd loopsShakeel Butt1-12/+15
In production we have noticed hard lockups on large machines running large jobs due to kswaps hoarding lru lock within isolate_lru_pages when sc->reclaim_idx is 0 which is a small zone. The lru was couple hundred GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in isolate_lru_pages() was basically skipping GiBs of pages while holding the LRU spinlock with interrupt disabled. On further inspection, it seems like there are two issues: (1) If kswapd on the return from balance_pgdat() could not sleep (i.e. node is still unbalanced), the classzone_idx is unintentionally set to 0 and the whole reclaim cycle of kswapd will try to reclaim only the lowest and smallest zone while traversing the whole memory. (2) Fundamentally isolate_lru_pages() is really bad when the allocation has woken kswapd for a smaller zone on a very large machine running very large jobs. It can hoard the LRU spinlock while skipping over 100s of GiBs of pages. This patch only fixes (1). (2) needs a more fundamental solution. To fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is invalid use the classzone_idx of the previous kswapd loop otherwise use the one the waker has requested. Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com Fixes: e716f2eb24de ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05fs/userfaultfd.c: disable irqs for fault_pending and event locksEric Biggers1-16/+26
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by userfaultfd_ctx_read(), which in turn can be waiting for userfaultfd_ctx::fault_pending_wqh.lock or userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and event_wqh locks are taken with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by always disabling IRQs when taking the fault_pending_wqh and event_wqh locks. Commit ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock") didn't fix this because it only accounted for the fd_wqh lock, not the other locks nested inside it. Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05mm/page_alloc.c: fix regression with deferred struct page initJuergen Gross1-1/+2
Commit 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") is causing a regression on some systems when the kernel is booted as Xen dom0. The system will just hang in early boot. Reason is an endless loop in get_page_from_freelist() in case the first zone looked at has no free memory. deferred_grow_zone() is always returning true due to the following code snipplet: /* If the zone is empty somebody else may have cleared out the zone */ if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, first_deferred_pfn)) { pgdat->first_deferred_pfn = ULONG_MAX; pgdat_resize_unlock(pgdat, &flags); return true; } This in turn results in the loop as get_page_from_freelist() is assuming forward progress can be made by doing some more struct page initialization. Link: http://lkml.kernel.org/r/20190620160821.4210-1-jgross@suse.com Fixes: 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") Signed-off-by: Juergen Gross <jgross@suse.com> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-05ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEMEJann Horn1-3/+1
Fix two issues: When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU reference to the parent's objective credentials, then give that pointer to get_cred(). However, the object lifetime rules for things like struct cred do not permit unconditionally turning an RCU reference into a stable reference. PTRACE_TRACEME records the parent's credentials as if the parent was acting as the subject, but that's not the case. If a malicious unprivileged child uses PTRACE_TRACEME and the parent is privileged, and at a later point, the parent process becomes attacker-controlled (because it drops privileges and calls execve()), the attacker ends up with control over two processes with a privileged ptrace relationship, which can be abused to ptrace a suid binary and obtain root privileges. Fix both of these by always recording the credentials of the process that is requesting the creation of the ptrace relationship: current_cred() can't change under us, and current is the proper subject for access control. This change is theoretically userspace-visible, but I am not aware of any code that it will actually break. Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-04drm/imx: only send event on crtc disable if kept disabledRobert Beckett1-1/+1
The event will be sent as part of the vblank enable during the modeset if the crtc is not being kept disabled. Fixes: 5f2f911578fb ("drm/imx: atomic phase 3 step 1: Use atomic configuration") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-07-04drm/imx: notify drm core before sending event during crtc disableRobert Beckett1-2/+2
Notify drm core before sending pending events during crtc disable. This fixes the first event after disable having an old stale timestamp by having drm_crtc_vblank_off update the timestamp to now. This was seen while debugging weston log message: Warning: computed repaint delay is insane: -8212 msec This occurred due to: 1. driver starts up 2. fbcon comes along and restores fbdev, enabling vblank 3. vblank_disable_fn fires via timer disabling vblank, keeping vblank seq number and time set at current value (some time later) 4. weston starts and does a modeset 5. atomic commit disables crtc while it does the modeset 6. ipu_crtc_atomic_disable sends vblank with old seq number and time Fixes: a474478642d5 ("drm/imx: fix crtc vblank state regression") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-07-03nfsd: Fix overflow causing non-working mounts on 1 TB machinesPaul Menzel1-1/+1
Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03crypto: user - prevent operating on larval algorithmsEric Biggers1-0/+3
Michal Suchanek reported [1] that running the pcrypt_aead01 test from LTP [2] in a loop and holding Ctrl-C causes a NULL dereference of alg->cra_users.next in crypto_remove_spawns(), via crypto_del_alg(). The test repeatedly uses CRYPTO_MSG_NEWALG and CRYPTO_MSG_DELALG. The crash occurs when the instance that CRYPTO_MSG_DELALG is trying to unregister isn't a real registered algorithm, but rather is a "test larval", which is a special "algorithm" added to the algorithms list while the real algorithm is still being tested. Larvals don't have initialized cra_users, so that causes the crash. Normally pcrypt_aead01 doesn't trigger this because CRYPTO_MSG_NEWALG waits for the algorithm to be tested; however, CRYPTO_MSG_NEWALG returns early when interrupted. Everything else in the "crypto user configuration" API has this same bug too, i.e. it inappropriately allows operating on larval algorithms (though it doesn't look like the other cases can cause a crash). Fix this by making crypto_alg_match() exclude larval algorithms. [1] https://lkml.kernel.org/r/20190625071624.27039-1-msuchanek@suse.de [2] https://github.com/linux-test-project/ltp/blob/20190517/testcases/kernel/crypto/pcrypt_aead01.c Reported-by: Michal Suchanek <msuchanek@suse.de> Fixes: a38f7907b926 ("crypto: Add userspace configuration API") Cc: <stable@vger.kernel.org> # v3.2+ Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-03crypto: cryptd - Fix skcipher instance memory leakVincent Whitchurch1-0/+1
cryptd_skcipher_free() fails to free the struct skcipher_instance allocated in cryptd_create_skcipher(), leading to a memory leak. This is detected by kmemleak on bootup on ARM64 platforms: unreferenced object 0xffff80003377b180 (size 1024): comm "cryptomgr_probe", pid 822, jiffies 4294894830 (age 52.760s) backtrace: kmem_cache_alloc_trace+0x270/0x2d0 cryptd_create+0x990/0x124c cryptomgr_probe+0x5c/0x1e8 kthread+0x258/0x318 ret_from_fork+0x10/0x1c Fixes: 4e0958d19bd8 ("crypto: cryptd - Add support for skcipher") Cc: <stable@vger.kernel.org> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-03lib/mpi: Fix karactx leak in mpi_powmHerbert Xu1-4/+2
Sometimes mpi_powm will leak karactx because a memory allocation failure causes a bail-out that skips the freeing of karactx. This patch moves the freeing of karactx to the end of the function like everything else so that it can't be skipped. Reported-by: syzbot+f7baccc38dcc1e094e77@syzkaller.appspotmail.com Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files...") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-03Bluetooth: Fix faulty expression for minimum encryption key size checkMatias Karhumaa1-1/+1
Fix minimum encryption key size check so that HCI_MIN_ENC_KEY_SIZE is also allowed as stated in the comment. This bug caused connection problems with devices having maximum encryption key size of 7 octets (56-bit). Fixes: 693cd8ce3f88 ("Bluetooth: Fix regression with minimum encryption key size alignment") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203997 Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-02scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not supportedMaurizio Lombardi1-8/+8
If the CHAP_A value is not supported, the chap_server_open() function should free the auth_protocol pointer and set it to NULL, or we will leave a dangling pointer around. [ 66.010905] Unsupported CHAP_A value [ 66.011660] Security negotiation failed. [ 66.012443] iSCSI Login negotiation failed. [ 68.413924] general protection fault: 0000 [#1] SMP PTI [ 68.414962] CPU: 0 PID: 1562 Comm: targetcli Kdump: loaded Not tainted 4.18.0-80.el8.x86_64 #1 [ 68.416589] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 68.417677] RIP: 0010:__kmalloc_track_caller+0xc2/0x210 Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>