linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-09	io_uring: rsrc ref lock needs to be IRQ safe	Jens Axboe	1	-14/+5
	Nadav reports running into the below splat on re-enabling softirqs: WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 Modules linked in: CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 Call Trace: _raw_spin_unlock_bh+0x31/0x40 io_rsrc_node_ref_zero+0x13e/0x190 io_dismantle_req+0x215/0x220 io_req_complete_post+0x1b8/0x720 __io_complete_rw.isra.0+0x16b/0x1f0 io_complete_rw+0x10/0x20 where it's clear we end up calling the percpu count release directly from the completion path, as it's in atomic mode and we drop the last ref. For file/block IO, this can be from IRQ context already, and the softirq locking for rsrc isn't enough. Just make the lock fully IRQ safe, and ensure we correctly safe state from the release path as we don't know the full context there. Reported-by: Nadav Amit <nadav.amit@gmail.com> Tested-by: Nadav Amit <nadav.amit@gmail.com> Link: https://lore.kernel.org/io-uring/C187C836-E78B-4A31-B24C-D16919ACA093@gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-08	io_uring: Use WRITE_ONCE() when writing to sq_flags	Nadav Amit	1	-4/+9
	The compiler should be forbidden from any strange optimization for async writes to user visible data-structures. Without proper protection, the compiler can cause write-tearing or invent writes that would confuse the userspace. However, there are writes to sq_flags which are not protected by WRITE_ONCE(). Use WRITE_ONCE() for these writes. This is purely a theoretical issue. Presumably, any compiler is very unlikely to do such optimizations. Fixes: 75b28affdd6a ("io_uring: allocate the two rings together") Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Nadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20210808001342.964634-3-namit@vmware.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-08	io_uring: clear TIF_NOTIFY_SIGNAL when running task work	Nadav Amit	1	-2/+3
	When using SQPOLL, the submission queue polling thread calls task_work_run() to run queued work. However, when work is added with TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains set afterwards and is never cleared. Consequently, when the submission queue polling thread checks whether signal_pending(), it may always find a pending signal, if task_work_add() was ever called before. The impact of this bug might be different on different kernel versions. It appears that on 5.14 it would only cause unnecessary calculation and prevent the polling thread from sleeping. On 5.13, where the bug was found, it stops the polling thread from finding newly submitted work. Instead of task_work_run(), use tracehook_notify_signal() that clears TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to current->task_works to avoid a race in which task_works is cleared but the TIF_NOTIFY_SIGNAL is set. Fixes: 685fe7feedb96 ("io-wq: eliminate the need for a manager thread") Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Nadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20210808001342.964634-2-namit@vmware.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-06	io-wq: fix lack of acct->nr_workers < acct->max_workers judgement	Hao Xu	1	-1/+9
	There should be this judgement before we create an io-worker Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-06	io-wq: fix no lock protection of acct->nr_worker	Hao Xu	1	-6/+17
	There is an acct->nr_worker visit without lock protection. Think about the case: two callers call io_wqe_wake_worker(), one is the original context and the other one is an io-worker(by calling io_wqe_enqueue(wqe, linked)), on two cpus paralelly, this may cause nr_worker to be larger than max_worker. Let's fix it by adding lock for it, and let's do nr_workers++ before create_io_worker. There may be a edge cause that the first caller fails to create an io-worker, but the second caller doesn't know it and then quit creating io-worker as well: say nr_worker = max_worker - 1 cpu 0 cpu 1 io_wqe_wake_worker() io_wqe_wake_worker() nr_worker < max_worker nr_worker++ create_io_worker() nr_worker == max_worker failed return return But the chance of this case is very slim. Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> [axboe: fix unconditional create_io_worker() call] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-04	io-wq: fix race between worker exiting and activating free worker	Jens Axboe	1	-19/+19
	Nadav correctly reports that we have a race between a worker exiting, and new work being queued. This can lead to work being queued behind an existing worker that could be sleeping on an event before it can run to completion, and hence introducing potential big latency gaps if we hit this race condition: cpu0 cpu1 ---- ---- io_wqe_worker() schedule_timeout() // timed out io_wqe_enqueue() io_wqe_wake_worker() // work_flags & IO_WQ_WORK_CONCURRENT io_wqe_activate_free_worker() io_worker_exit() Fix this by having the exiting worker go through the normal decrement of a running worker, which will spawn a new one if needed. The free worker activation is modified to only return success if we were able to find a sleeping worker - if not, we keep looking through the list. If we fail, we create a new worker as per usual. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/io-uring/BFF746C0-FEDE-4646-A253-3021C57C26C9@gmail.com/ Reported-by: Nadav Amit <nadav.amit@gmail.com> Tested-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-28	io_uring: fix poll requests leaking second poll entries	Hao Xu	1	-3/+2
	For pure poll requests, it doesn't remove the second poll wait entry when it's done, neither after vfs_poll() or in the poll completion handler. We should remove the second poll wait entry. And we use io_poll_remove_double() rather than io_poll_remove_waitqs() since the latter has some redundant logic. Fixes: 88e41cf928a6 ("io_uring: add multishot mode for IORING_OP_POLL_ADD") Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210728030322.12307-1-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-28	io_uring: don't block level reissue off completion path	Jens Axboe	1	-0/+6
	Some setups, like SCSI, can throw spurious -EAGAIN off the softirq completion path. Normally we expect this to happen inline as part of submission, but apparently SCSI has a weird corner case where it can happen as part of normal completions. This should be solved by having the -EAGAIN bubble back up the stack as part of submission, but previous attempts at this failed and we're not just quite there yet. Instead we currently use REQ_F_REISSUE to handle this case. For now, catch it in io_rw_should_reissue() and prevent a reissue from a bogus path. Cc: stable@vger.kernel.org Reported-by: Fabian Ebner <f.ebner@proxmox.com> Tested-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-27	io_uring: always reissue from task_work context	Jens Axboe	1	-2/+8
	As a safeguard, if we're going to queue async work, do it from task_work from the original task. This ensures that we can always sanely create threads, regards of what the reissue context may be. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-26	io_uring: fix race in unified task_work running	Jens Axboe	1	-1/+5
	We use a bit to manage if we need to add the shared task_work, but a list + lock for the pending work. Before aborting a current run of the task_work we check if the list is empty, but we do so without grabbing the lock that protects it. This can lead to races where we think we have nothing left to run, where in practice we could be racing with a task adding new work to the list. If we do hit that race condition, we could be left with work items that need processing, but the shared task_work is not active. Ensure that we grab the lock before checking if the list is empty, so we know if it's safe to exit the run or not. Link: https://lore.kernel.org/io-uring/c6bd5987-e9ae-cd02-49d0-1b3ac1ef65b1@tnonline.net/ Cc: stable@vger.kernel.org # 5.11+ Reported-by: Forza <forza@tnonline.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-26	io_uring: fix io_prep_async_link locking	Pavel Begunkov	1	-2/+11
	io_prep_async_link() may be called after arming a linked timeout, automatically making it unsafe to traverse the linked list. Guard with completion_lock if there was a linked timeout. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/93f7c617e2b4f012a2a175b3dab6bc2f27cebc48.1627304436.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-23	io_uring: explicitly catch any illegal async queue attempt	Jens Axboe	2	-1/+17
	Catch an illegal case to queue async from an unrelated task that got the ring fd passed to it. This should not be possible to hit, but better be proactive and catch it explicitly. io-wq is extended to check for early IO_WQ_WORK_CANCEL being set on a work item as well, so it can run the request through the normal cancelation path. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-23	io_uring: never attempt iopoll reissue from release path	Jens Axboe	1	-7/+7
	There are two reasons why this shouldn't be done: 1) Ring is exiting, and we're canceling requests anyway. Any request should be canceled anyway. In theory, this could iterate for a number of times if someone else is also driving the target block queue into request starvation, however the likelihood of this happening is miniscule. 2) If the original task decided to pass the ring to another task, then we don't want to be reissuing from this context as it may be an unrelated task or context. No assumptions should be made about the context in which ->release() is run. This can only happen for pure read/write, and we'll get -EFAULT on them anyway. Link: https://lore.kernel.org/io-uring/YPr4OaHv0iv0KTOc@zeniv-ca.linux.org.uk/ Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-22	io_uring: fix early fdput() of file	Jens Axboe	1	-2/+4
	A previous commit shuffled some code around, and inadvertently used struct file after fdput() had been called on it. As we can't touch the file post fdput() dropping our reference, move the fdput() to after that has been done. Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/io-uring/YPnqM0fY3nM5RdRI@zeniv-ca.linux.org.uk/ Fixes: f2a48dd09b8e ("io_uring: refactor io_sq_offload_create()") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-20	io_uring: fix memleak in io_init_wq_offload()	Yang Yingliang	1	-1/+5
	I got memory leak report when doing fuzz test: BUG: memory leak unreferenced object 0xffff888107310a80 (size 96): comm "syz-executor.6", pid 4610, jiffies 4295140240 (age 20.135s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... backtrace: [<000000001974933b>] kmalloc include/linux/slab.h:591 [inline] [<000000001974933b>] kzalloc include/linux/slab.h:721 [inline] [<000000001974933b>] io_init_wq_offload fs/io_uring.c:7920 [inline] [<000000001974933b>] io_uring_alloc_task_context+0x466/0x640 fs/io_uring.c:7955 [<0000000039d0800d>] __io_uring_add_tctx_node+0x256/0x360 fs/io_uring.c:9016 [<000000008482e78c>] io_uring_add_tctx_node fs/io_uring.c:9052 [inline] [<000000008482e78c>] __do_sys_io_uring_enter fs/io_uring.c:9354 [inline] [<000000008482e78c>] __se_sys_io_uring_enter fs/io_uring.c:9301 [inline] [<000000008482e78c>] __x64_sys_io_uring_enter+0xabc/0xc20 fs/io_uring.c:9301 [<00000000b875f18f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<00000000b875f18f>] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 [<000000006b0a8484>] entry_SYSCALL_64_after_hwframe+0x44/0xae CPU0 CPU1 io_uring_enter io_uring_enter io_uring_add_tctx_node io_uring_add_tctx_node __io_uring_add_tctx_node __io_uring_add_tctx_node io_uring_alloc_task_context io_uring_alloc_task_context io_init_wq_offload io_init_wq_offload hash = kzalloc hash = kzalloc ctx->hash_map = hash ctx->hash_map = hash <- one of the hash is leaked When calling io_uring_enter() in parallel, the 'hash_map' will be leaked, add uring_lock to protect 'hash_map'. Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210720083805.3030730-1-yangyingliang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-20	io_uring: remove double poll entry on arm failure	Pavel Begunkov	1	-0/+2
	__io_queue_proc() can enqueue both poll entries and still fail afterwards, so the callers trying to cancel it should also try to remove the second poll entry (if any). For example, it may leave the request alive referencing a io_uring context but not accessible for cancellation: [ 282.599913][ T1620] task:iou-sqp-23145 state:D stack:28720 pid:23155 ppid: 8844 flags:0x00004004 [ 282.609927][ T1620] Call Trace: [ 282.613711][ T1620] __schedule+0x93a/0x26f0 [ 282.634647][ T1620] schedule+0xd3/0x270 [ 282.638874][ T1620] io_uring_cancel_generic+0x54d/0x890 [ 282.660346][ T1620] io_sq_thread+0xaac/0x1250 [ 282.696394][ T1620] ret_from_fork+0x1f/0x30 Cc: stable@vger.kernel.org Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users") Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ec1228fc5eda4cb524eeda857da8efdc43c331c.1626774457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-20	io_uring: explicitly count entries for poll reqs	Pavel Begunkov	1	-6/+10
	If __io_queue_proc() fails to add a second poll entry, e.g. kmalloc() failed, but it goes on with a third waitqueue, it may succeed and overwrite the error status. Count the number of poll entries we added, so we can set pt->error to zero at the beginning and find out when the mentioned scenario happens. Cc: stable@vger.kernel.org Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9d6b9e561f88bcc0163623b74a76c39f712151c3.1626774457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-11	io_uring: fix io_drain_req()	Pavel Begunkov	1	-2/+4
	io_drain_req() return whether the request has been consumed or not, not an error code. Fix a stupid mistake slipped from optimisation patches. Reported-by: syzbot+ba6fcd859210f4e9e109@syzkaller.appspotmail.com Fixes: 76cc33d79175a ("io_uring: refactor io_req_defer()") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4d3c53c4274ffff307c8ae062fc7fda63b978df2.1626039606.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-11	io_uring: use right task for exiting checks	Pavel Begunkov	1	-1/+1
	When we use delayed_work for fallback execution of requests, current will be not of the submitter task, and so checks in io_req_task_submit() may not behave as expected. Currently, it leaves inline completions not flushed, so making io_ring_exit_work() to hang. Use the submitter task for all those checks. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cb413c715bed0bc9c98b169059ea9c8a2c770715.1625881431.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-11	Linux 5.14-rc1	Linus Torvalds	1	-2/+2

2021-07-11	mm/rmap: try_to_migrate() skip zone_device !device_private	Hugh Dickins	1	-3/+3
	I know nothing about zone_device pages and !device_private pages; but if try_to_migrate_one() will do nothing for them, then it's better that try_to_migrate() filter them first, than trawl through all their vmas. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-11	mm/rmap: fix new bug: premature return from page_mlock_one()	Hugh Dickins	1	-6/+5
	In the unlikely race case that page_mlock_one() finds VM_LOCKED has been cleared by the time it got page table lock, page_vma_mapped_walk_done() must be called before returning, either explicitly, or by a final call to page_vma_mapped_walk() - otherwise the page table remains locked. Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-11	mm/rmap: fix old bug: munlocking THP missed other mlocks	Hugh Dickins	1	-5/+8
	The kernel recovers in due course from missing Mlocked pages: but there was no point in calling page_mlock() (formerly known as try_to_munlock()) on a THP, because nothing got done even when it was found to be mapped in another VM_LOCKED vma. It's true that we need to be careful: Mlocked accounting of pte-mapped THPs is too difficult (so consistently avoided); but Mlocked accounting of only-pmd-mapped THPs is supposed to work, even when multiple mappings are mlocked and munlocked or munmapped. Refine the tests. There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so page_mlock_one() does not even have to worry about that complication. (I said the kernel recovers: but would page reclaim be likely to split THP before rediscovering that it's VM_LOCKED? I've not followed that up) Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-11	mm/rmap: fix comments left over from recent changes	Hugh Dickins	2	-7/+2
	Parallel developments in mm/rmap.c have left behind some out-of-date comments: try_to_migrate_one() also accepts TTU_SYNC (already commented in try_to_migrate() itself), and try_to_migrate() returns nothing at all. TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete the "recently referenced" comment from try_to_unmap_one() (once upon a time the comment was near the removed codeblock, but they drifted apart). Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://lore.kernel.org/lkml/563ce5b2-7a44-5b4d-1dfd-59a0e65932a9@google.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-10	mm/page_alloc: Revert pahole zero-sized workaround	Mel Gorman	2	-14/+0
	Commit dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock") folded in a workaround patch for pahole that was unable to deal with zero-sized percpu structures. A superior workaround is achieved with commit a0b8200d06ad ("kbuild: skip per-CPU BTF generation for pahole v1.18-v1.21"). This patch reverts the dummy field and the pahole version check. Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-10	rtc: pcf8523: rename register and bit defines	Alexandre Belloni	1	-73/+73
	arch/arm/mach-ixp4xx/include/mach/platform.h now gets included indirectly and defines REG_OFFSET. Rename the register and bit definition to something specific to the driver. Fixes: 7fd70c65faac ("ARM: irqstat: Get rid of duplicated declaration") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210710211431.1393589-1-alexandre.belloni@bootlin.com
2021-07-10	rtc: pcf2127: handle timestamp interrupts	Mian Yousaf Kaukab	1	-59/+133
	commit 03623b4b041c ("rtc: pcf2127: add tamper detection support") added support for timestamp interrupts. However they are not being handled in the irq handler. If a timestamp interrupt occurs it results in kernel disabling the interrupt and displaying the call trace: [ 121.145580] irq 78: nobody cared (try booting with the "irqpoll" option) ... [ 121.238087] [<00000000c4d69393>] irq_default_primary_handler threaded [<000000000a90d25b>] pcf2127_rtc_irq [rtc_pcf2127] [ 121.248971] Disabling IRQ #78 Handle timestamp interrupts in pcf2127_rtc_irq(). Save time stamp before clearing TSF1 and TSF2 flags so that it can't be overwritten. Set a flag to mark if the timestamp is valid and only report to sysfs if the flag is set. To mimic the hardware behavior, don’t save another timestamp until the first one has been read by the userspace. However, if the alarm irq is not configured, keep the old way of handling timestamp interrupt in the timestamp0 sysfs calls. Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de> Reviewed-by: Bruno Thomsen <bruno.thomsen@gmail.com> Tested-by: Bruno Thomsen <bruno.thomsen@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210629150643.31551-1-ykaukab@suse.de
2021-07-10	rtc: at91sam9: Remove unnecessary offset variable checks	Nobuhiro Iwamatsu	1	-1/+1
	The offset variable is checked by at91_rtc_readalarm(), but this check is unnecessary because the previous check knew that the value of this variable was not 0. This removes that unnecessary offset variable checks. Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210708051340.341345-1-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: s5m: Check return value of s5m_check_peding_alarm_interrupt()	Nobuhiro Iwamatsu	1	-3/+1
	s5m_check_peding_alarm_interrupt() in s5m_rtc_read_alarm() gets the return value, but doesn't use it. This modifies using the s5m_check_peding_alarm_interrupt()"s return value as the s5m_rtc_read_alarm()'s return value. Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: linux-samsung-soc@vger.kernel.org Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210708051304.341278-1-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: spear: convert to SPDX identifier	Nobuhiro Iwamatsu	1	-4/+1
	Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-11-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: tps6586x: convert to SPDX identifier	Nobuhiro Iwamatsu	1	-14/+1
	Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-9-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: tps80031: convert to SPDX identifier	Nobuhiro Iwamatsu	1	-14/+1
	Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-8-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: rtd119x: Fix format of SPDX identifier	Nobuhiro Iwamatsu	1	-2/+1
	For C files, use the C99 format (//). Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-7-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: sc27xx: Fix format of SPDX identifier	Nobuhiro Iwamatsu	1	-1/+1
	For C files, use the C99 format (//). Cc: Orson Zhai <orsonzhai@gmail.com> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-6-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: palmas: convert to SPDX identifier	Nobuhiro Iwamatsu	1	-14/+1
	Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-5-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: max6900: convert to SPDX identifier	Nobuhiro Iwamatsu	1	-5/+3
	Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-4-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: ds1374: convert to SPDX identifier	Nobuhiro Iwamatsu	1	-5/+2
	Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-3-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-10	rtc: au1xxx: convert to SPDX identifier	Nobuhiro Iwamatsu	1	-4/+1
	Use SPDX-License-Identifier instead of a verbose license text. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210707075804.337458-2-nobuhiro1.iwamatsu@toshiba.co.jp
2021-07-09	Revert "PCI: Coalesce host bridge contiguous apertures"	Bjorn Helgaas	1	-46/+4
	This reverts commit 65db04053efea3f3e412a7e0cc599962999c96b4. Guenter reported that after 65db04053efe, the ppc:sam460ex qemu emulation no longer boots from nvme: nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0 nvme nvme0: Removing after probe failure status: -19 Link: https://lore.kernel.org/r/20210709231529.GA3270116@roeck-us.net Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-07-10	rtc: pcf85063: Update the PCF85063A datasheet revision	Fabio Estevam	1	-1/+1
	After updating the datasheet URL, the PCF85063A datasheet revision has changed. Adjust it accordingly. Reported-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210624120953.2313378-1-festevam@gmail.com
2021-07-10	dt-bindings: rtc: ti,bq32k: take maintainership	Alexandre Belloni	1	-1/+1
	Take maintainership of the binding as PAvel said he doesn't have the hardware anymore. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Pavel Machek <pavel@ucw.cz> Link: https://lore.kernel.org/r/20210620224030.1115356-1-alexandre.belloni@bootlin.com
2021-07-09	perf test: Add free() calls for scandir() returned dirent entries	Riccardo Mancini	1	-4/+11
	ASan reported a memory leak for items of the entlist returned from scandir(). In fact, scandir() returns a malloc'd array of malloc'd dirents. This patch adds the missing (z)frees. Fixes: da963834fe6975a1 ("perf test: Iterate over shell tests in alphabetical order") Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Link: http://lore.kernel.org/lkml/20210709163454.672082-1-rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09	libperf: Add tests for perf_evlist__set_leader()	Jiri Olsa	1	-6/+21
	Add a test for the newly added perf_evlist__set_leader() function. Committer testing: $ cd tools/lib/perf/ $ sudo make tests [sudo] password for acme: running static: - running tests/test-cpumap.c...OK - running tests/test-threadmap.c...OK - running tests/test-evlist.c...OK - running tests/test-evsel.c...OK running dynamic: - running tests/test-cpumap.c...OK - running tests/test-threadmap.c...OK - running tests/test-evlist.c...OK - running tests/test-evsel.c...OK $ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210706151704.73662-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09	libperf: Remove BUG_ON() from library code in get_group_fd()	Arnaldo Carvalho de Melo	1	-7/+16
	We shouldn't just panic, return a value that doesn't clash with what perf_evsel__open() was already returning in case of error, i.e. errno when sys_perf_event_open() fails. Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Link: http://lore.kernel.org/lkml/YOiOA5zOtVH9IBbE@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09	cifs: update internal version number	Steve French	1	-1/+1
	To 2.33 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-09	cifs: prevent NULL deref in cifs_compose_mount_options()	Paulo Alcantara	1	-0/+3
	The optional @ref parameter might contain an NULL node_name, so prevent dereferencing it in cifs_compose_mount_options(). Addresses-Coverity: 1476408 ("Explicit null dereferenced") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-09	libperf: Add group support to perf_evsel__open()	Jiri Olsa	1	-2/+24
	Add support to set group_fd in perf_evsel__open() and make it follow the group setup. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210706151704.73662-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09	SMB3.1.1: Add support for negotiating signing algorithm	Steve French	4	-11/+86
	Support for faster packet signing (using GMAC instead of CMAC) can now be negotiated to some newer servers, including Windows. See MS-SMB2 section 2.2.3.17. This patch adds support for sending the new negotiate context with the first of three supported signing algorithms (AES-CMAC) and decoding the response. A followon patch will add support for sending the other two (including AES-GMAC, which is fastest) and changing the signing algorithm used based on what was negotiated. To allow the client to request GMAC signing set module parameter "enable_negotiate_signing" to 1. Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-09	perf tools: Fix pattern matching for same substring in different PMU type	Jin Yao	3	-2/+37
	Some different PMU types may have the same substring. For example, on Icelake server we have PMU types "uncore_imc" and "uncore_imc_free_running". Both PMU types have the substring "uncore_imc". But the parser wrongly thinks they are the same PMU type. We enable an imc event, perf stat -e uncore_imc/event=0xe3/ -a -- sleep 1 Perf actually expands the event to: uncore_imc_0/event=0xe3/ uncore_imc_1/event=0xe3/ uncore_imc_2/event=0xe3/ uncore_imc_3/event=0xe3/ uncore_imc_4/event=0xe3/ uncore_imc_5/event=0xe3/ uncore_imc_6/event=0xe3/ uncore_imc_7/event=0xe3/ uncore_imc_free_running_0/event=0xe3/ uncore_imc_free_running_1/event=0xe3/ uncore_imc_free_running_3/event=0xe3/ uncore_imc_free_running_4/event=0xe3/ That's because the "uncore_imc_free_running" matches the pattern "uncore_imc". Now we check that the last characters of PMU name is '_<digit>'. For example, for pattern "uncore_imc", "uncore_imc_0" is parsed ok, but "uncore_imc_free_running_0" fails. Fixes: b2b9d3a3f0211c5d ("perf pmu: Support wildcards on pmu name in dynamic pmu events") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Agustin Vega-Frias <agustinv@codeaurora.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210701064253.1175-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09	perf record: Add a dummy event on hybrid systems to collect metadata records	Kan Liang	1	-4/+5
	Some symbols may not be resolved if a user only monitors one type of PMU. $ sudo perf record -e cpu_atom/branch-instructions/ ./big_small_workload $ sudo perf report –stdio # Overhead Command Shared Object Symbol # ........ ......... ................. ..................... # 28.02% perf-exec [unknown] [.] 0x0000000000401cf6 11.32% perf-exec [unknown] [.] 0x0000000000401d04 10.90% perf-exec [unknown] [.] 0x0000000000401d11 10.61% perf-exec [unknown] [.] 0x0000000000401cfc To parse symbols the metadata records, e.g., PERF_RECORD_COMM, which are generated by the kernel, are required. To decide whether to generate the metadata records, the kernel relies on the event_filter_match() to filter the unrelated events. On a hybrid system, event_filter_match() further checks the CPU mask of the current enabled PMU. If an event is collected on the CPU which doesn't have an enabled PMU, it's treated as an unrelated event. The "big_small_workload" is created in a big core, but runs on a small core. The metadata records are filtered, because the user only monitors the PMU of the small core. The big core PMU is not enabled. For a hybrid system, a dummy event is required to generate the complete side-band events. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/1625760212-18441-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>