wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2020-04-04	nvmet-rdma: fix bonding failover possible NULL deref	Sagi Grimberg	1	-56/+119
	RDMA_CM_EVENT_ADDR_CHANGE event occur in the case of bonding failover on normal as well as on listening cm_ids. Hence this event will immediately trigger a NULL dereference trying to disconnect a queue for a cm_id that actually belongs to the port. To fix this we provide a different handler for the listener cm_ids that will defer a work to disable+(re)enable the port which essentially destroys and setups another listener cm_id Reported-by: Alex Lyakas <alex@zadara.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Tested-by: Alex Lyakas <alex@zadara.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-02	nvmet: fix NULL dereference when removing a referral	Sagi Grimberg	1	-1/+9
	When item release is called, the parent is already null. We need the parent to pass to nvmet_referral_disable so hook it up to ->disconnect_notify. Reported-by: Tony Asleson <tasleson@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-02	nvme: inherit stable pages constraint in the mpath stack device	Sagi Grimberg	1	-0/+7
	If the backing device require stable pages, we need to set it on the stack mpath device as well. This applies to rdma/fc transports when doing data integrity and tcp transport calculating digests. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-01	nvme-tcp: fix possible crash in recv error flow	Sagi Grimberg	1	-1/+1
	If the target misbehaves and sends us unexpected payload we need to make sure to fail the controller and stop processing the input stream. We clear the rd_enabled flag and stop the io_work, but we may still requeue it if we still have pending sends and then in the next invocation we will process the input stream as the check is only in the .data_ready upcall. To fix this we need to make sure not to self-requeue io_work upon a recv flow error. This fixes the crash: nvme nvme2: receive failed: -22 BUG: unable to handle page fault for address: ffffbeb5816c3b48 nvme_ns_head_make_request: 29 callbacks suppressed block nvme0n5: no usable path - requeuing I/O block nvme0n5: no usable path - requeuing I/O block nvme0n7: no usable path - requeuing I/O block nvme0n7: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n7: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O #PF: supervisor read access inkernel mode #PF: error_code(0x0000) - not-present page PGD 1039157067 P4D 1039157067 PUD 103915a067 PMD 102719f067 PTE 0 Oops: 0000 [#1] SMP PTI CPU: 8 PID: 411 Comm: kworker/8:1H Not tainted 5.3.0-40-generic #32~18.04.1-Ubuntu Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015 Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] RIP: 0010:nvme_tcp_recv_skb+0x2ae/0xb50 [nvme_tcp] RSP: 0018:ffffbeb5806cfd10 EFLAGS: 00010246 RAX: ffffbeb5816c3b48 RBX: 00000000000003d0 RCX: 0000000000000008 RDX: 00000000000003d0 RSI: 0000000000000001 RDI: ffff9a3040684b40 RBP: ffffbeb5806cfd90 R08: 0000000000000000 R09: ffffffff946e6900 R10: ffffbeb5806cfce0 R11: 0000000000000001 R12: 0000000000000000 R13: ffff9a2ff86501c0 R14: 00000000000003d0 R15: ffff9a30b85f2798 FS: 0000000000000000(0000) GS:ffff9a30bf800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffbeb5816c3b48 CR3: 000000088400a006 CR4: 00000000003626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcp_read_sock+0x8c/0x290 ? __release_sock+0x9d/0xe0 ? nvme_tcp_write_space+0xb0/0xb0 [nvme_tcp] nvme_tcp_io_work+0x4b4/0x830 [nvme_tcp] ? finish_task_switch+0x163/0x270 process_one_work+0x1fd/0x3f0 worker_thread+0x34/0x410 kthread+0x121/0x140 ? process_one_work+0x3f0/0x3f0 ? kthread_park+0xb0/0xb0 ret_from_fork+0x35/0x40 Reported-by: Roy Shterman <roys@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-31	nvme-tcp: don't poll a non-live queue	Sagi Grimberg	1	-0/+3
	In error recovery we might be removing the queue so check we can actually poll before we do. Reported-by: Mark Wunderlich <mark.wunderlich@intel.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-31	nvme-tcp: fix possible crash in write_zeroes processing	Sagi Grimberg	1	-6/+7
	We cannot look at blk_rq_payload_bytes without first checking that the request has a mappable physical segments first (e.g. blk_rq_nr_phys_segments(rq) != 0) and only then to take the request payload bytes. This caused us to send a wrong sgl to the target or even dereference a non-existing buffer in case we actually got to the data send sequence (if it was in-capsule). Reported-by: Tony Asleson <tasleson@redhat.com> Suggested-by: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-31	nvmet-fc: fix typo in comment	James Smart	1	-1/+1
	Fix typo in comment: about should be abort Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chiatanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
2020-03-31	nvme-rdma: Replace comma with a semicolon	Israel Rukshin	1	-1/+1
	Use a semicolon at the end of an assignment expression. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-31	nvme-fcloop: fix deallocation of working context	James Smart	1	-24/+52
	There's been a longstanding bug of LS completions which freed ls ops, particularly the disconnect LS, while executing on a work context that is in the memory being free. Not a good thing to do. Rework LS handling to make callbacks in the rport context rather than the ls_request context. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-31	nvme: fix compat address handling in several ioctls	Nick Bowler	1	-7/+20
	On a 32-bit kernel, the upper bits of userspace addresses passed via various ioctls are silently ignored by the nvme driver. However on a 64-bit kernel running a compat task, these upper bits are not ignored and are in fact required to be zero for the ioctls to work. Unfortunately, this difference matters. 32-bit smartctl submits the NVME_IOCTL_ADMIN_CMD ioctl with garbage in these upper bits because it seems the pointer value it puts into the nvme_passthru_cmd structure is sign extended. This works fine on 32-bit kernels but fails on a 64-bit one because (at least on my setup) the addresses smartctl uses are consistently above 2G. For example: # smartctl -x /dev/nvme0n1 smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.11] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Bad address Since changing 32-bit kernels to actually check all of the submitted address bits now would break existing userspace, this patch fixes the compat problem by explicitly zeroing the upper bits in the compat case. This enables 32-bit smartctl to work on a 64-bit kernel. Signed-off-by: Nick Bowler <nbowler@draconx.ca> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-30	staging/octeon: fix up merge error	Randy Dunlap	1	-1/+1
	There's a semantic conflict in the Octeon staging network driver, which used the skb_reset_tc() function to reset skb state when re-using an skb. But that inline helper function was removed in mainline by commit 2c64605b590e ("net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build"). Fix it by using skb_reset_redirect() instead. Also move it out of the This code path only ends up triggering if REUSE_SKBUFFS_WITHOUT_FREE is enabled, which in turn only happens if you don't have CONFIG_NETFILTER configured. Which was how this wasn't caught by the usual allmodconfig builds. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-30	ACPICA: Update version to 20200214	Bob Moore	1	-1/+1
	ACPICA commit ac0c1b8a43a317702bb11e11fd5067a1c59e3002 Version 20200214 Link: https://github.com/acpica/acpica/commit/ac0c1b8a Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-30	media: venus: firmware: Ignore secure call error on first resume	Stanimir Varbanov	1	-2/+8
	With the latest cleanup in qcom scm driver the secure monitor call for setting the remote processor state returns EINVAL when it is called for the first time and after another scm call auth_and_reset. The error returned from scm call could be ignored because the state transition is already done in auth_and_reset. Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-29	seccomp: Add missing compat_ioctl for notify	Sven Schnelle	1	-0/+1
	Executing the seccomp_bpf testsuite under a 64-bit kernel with 32-bit userland (both s390 and x86) doesn't work because there's no compat_ioctl handler defined. Add the handler. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200310123332.42255-1-svens@linux.ibm.com Signed-off-by: Kees Cook <keescook@chromium.org>
2020-03-29	Linux 5.6	Linus Torvalds	1	-1/+1

2020-03-29	unicore32: Replace setup_irq() by request_irq()	afzal mohammed	1	-8/+3
	request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. setup_irq() was required in older kernels as the memory allocator was not available during early boot. Hence replace setup_irq() by request_irq(). Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/82667ae23520611b2a9d8db77e1d8aeb982f08e5.1585320721.git.afzal.mohd.ma@gmail.com
2020-03-29	sh: Replace setup_irq() by request_irq()	afzal mohammed	2	-18/+9
	request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. setup_irq() was required in older kernels as the memory allocator was not available during early boot. Hence replace setup_irq() by request_irq(). Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/b060312689820559121ee0a6456bbc1202fb7ee5.1585320721.git.afzal.mohd.ma@gmail.com
2020-03-29	hexagon: Replace setup_irq() by request_irq()	afzal mohammed	2	-19/+14
	request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. setup_irq() was required in older kernels as the memory allocator was not available during early boot. Hence replace setup_irq() by request_irq(). Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/e84ac60de8f747d49ce082659e51595f708c29d4.1585320721.git.afzal.mohd.ma@gmail.com
2020-03-29	c6x: Replace setup_irq() by request_irq()	afzal mohammed	1	-8/+3
	request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. setup_irq() was required in older kernels as the memory allocator was not available during early boot. Hence replace setup_irq() by request_irq(). Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/56e991e920ce5806771fab892574cba89a3d413f.1585320721.git.afzal.mohd.ma@gmail.com
2020-03-29	alpha: Replace setup_irq() by request_irq()	afzal mohammed	14	-55/+31
	request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. setup_irq() was required in older kernels as the memory allocator was not available during early boot. Hence replace setup_irq() by request_irq(). Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Matt Turner <mattst88@gmail.com> Link: https://lkml.kernel.org/r/51f8ae7da9f47a23596388141933efa2bdef317b.1585320721.git.afzal.mohd.ma@gmail.com
2020-03-29	mm/sparse: fix kernel crash with pfn_section_valid check	Aneesh Kumar K.V	1	-0/+6
	Fix the crash like this: BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000000c3447c Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1 ... NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0 LR [c000000000088354] vmemmap_free+0x144/0x320 Call Trace: section_deactivate+0x220/0x240 __remove_pages+0x118/0x170 arch_remove_memory+0x3c/0x150 memunmap_pages+0x1cc/0x2f0 devm_action_release+0x30/0x50 release_nodes+0x2f8/0x3e0 device_release_driver_internal+0x168/0x270 unbind_store+0x130/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x68/0x80 kernfs_fop_write+0x100/0x290 __vfs_write+0x3c/0x70 vfs_write+0xcc/0x240 ksys_write+0x7c/0x140 system_call+0x5c/0x68 The crash is due to NULL dereference at test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL in pfn_section_valid() With commit d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM\|!VMEMMAP case") section_mem_map is set to NULL after depopulate_section_mem(). This was done so that pfn_page() can work correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that config pfn_to_page does __section_mem_map_addr(__sec) + __pfn; where static inline struct page __section_mem_map_addr(struct mem_section section) { unsigned long map = section->section_mem_map; map &= SECTION_MAP_MASK; return (struct page )map; } Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to check the pfn validity (pfn_valid()). Since section_deactivate release mem_section->usage if a section is fully deactivated, pfn_valid() check after a subsection_deactivate cause a kernel crash. static inline int pfn_valid(unsigned long pfn) { ... return early_section(ms) \|\| pfn_section_valid(ms, pfn); } where static inline int pfn_section_valid(struct mem_section ms, unsigned long pfn) { int idx = subsection_map_index(pfn); return test_bit(idx, ms->usage->subsection_map); } Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed. For architectures like ppc64 where large pages are used for vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple sections. Hence before a vmemmap mapping page can be freed, the kernel needs to make sure there are no valid sections within that mapping. Clearing the section valid bit before depopulate_section_memap enables this. [aneesh.kumar@linux.ibm.com: add comment] Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.comLink: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM\|!VMEMMAP case") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29	mm: fork: fix kernel_stack memcg stats for various stack implementations	Roman Gushchin	3	-2/+52
	Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the space for task stacks can be allocated using __vmalloc_node_range(), alloc_pages_node() and kmem_cache_alloc_node(). In the first and the second cases page->mem_cgroup pointer is set, but in the third it's not: memcg membership of a slab page should be determined using the memcg_from_slab_page() function, which looks at page->slab_cache->memcg_params.memcg . In this case, using mod_memcg_page_state() (as in account_kernel_stack()) is incorrect: page->mem_cgroup pointer is NULL even for pages charged to a non-root memory cgroup. It can lead to kernel_stack per-memcg counters permanently showing 0 on some architectures (depending on the configuration). In order to fix it, let's introduce a mod_memcg_obj_state() helper, which takes a pointer to a kernel object as a first argument, uses mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls mod_memcg_state(). It allows to handle all possible configurations (CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without spilling any memcg/kmem specifics into fork.c . Note: This is a special version of the patch created for stable backports. It contains code from the following two patches: - mm: memcg/slab: introduce mem_cgroup_from_obj() - mm: fork: fix kernel_stack memcg stats for various stack implementations [guro@fb.com: introduce mem_cgroup_from_obj()] Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29	hugetlb_cgroup: fix illegal access to memory	Mina Almasry	1	-2/+1
	This appears to be a mistake in commit faced7e0806cf ("mm: hugetlb controller for cgroups v2"). Essentially that commit does a hugetlb_cgroup_from_counter assuming that page_counter_try_charge has initialized counter. But if that has failed then it seems will not initialize counter, so hugetlb_cgroup_from_counter(counter) ends up pointing to random memory, causing kasan to complain. The solution is to simply use 'h_cg', instead of hugetlb_cgroup_from_counter(counter), since that is a reference to the hugetlb_cgroup anyway. After this change kasan ceases to complain. Fixes: faced7e0806cf ("mm: hugetlb controller for cgroups v2") Reported-by: syzbot+cac0c4e204952cf449b1@syzkaller.appspotmail.com Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Giuseppe Scrivano <gscrivan@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/20200313223920.124230-1-almasrymina@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29	drivers/base/memory.c: indicate all memory blocks as removable	David Hildenbrand	1	-20/+3
	We see multiple issues with the implementation/interface to compute whether a memory block can be offlined (exposed via /sys/devices/system/memory/memoryX/removable) and would like to simplify it (remove the implementation). 1. It runs basically lockless. While this might be good for performance, we see possible races with memory offlining that will require at least some sort of locking to fix. 2. Nowadays, more false positives are possible. No arch-specific checks are performed that validate if memory offlining will not be denied right away (and such check will require locking). For example, arm64 won't allow to offline any memory block that was added during boot - which will imply a very high error rate. Other archs have other constraints. 3. The interface is inherently racy. E.g., if a memory block is detected to be removable (and was not a false positive at that time), there is still no guarantee that offlining will actually succeed. So any caller already has to deal with false positives. 4. It is unclear which performance benefit this interface actually provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add sysfs removable attribute for hotplug memory remove") mentioned "A user-level agent must be able to identify which sections of memory are likely to be removable before attempting the potentially expensive operation." However, no actual performance comparison was included. Known users: - lsmem: Will group memory blocks based on the "removable" property. [1] - chmem: Indirect user. It has a RANGE mode where one can specify removable ranges identified via lsmem to be offlined. However, it also has a "SIZE" mode, which allows a sysadmin to skip the manual "identify removable blocks" step. [2] - powerpc-utils: Uses the "removable" attribute to skip some memory blocks right away when trying to find some to offline+remove. However, with ballooning enabled, it already skips this information completely (because it once resulted in many false negatives). Therefore, the implementation can deal with false positives properly already. [3] According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer driven from userspace via the drmgr command (powerpc-utils). Nowadays it's managed in the kernel - including onlining/offlining of memory blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the affected legacy userspace handling is only active on old kernels. Only very old versions of drmgr on a new kernel (unlikely) might execute slower - totally acceptable. With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not break any user space tool. We implement a very bad heuristic now. Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report "not removable" as before. Original discussion can be found in [4] ("[PATCH RFC v1] mm: is_mem_section_removable() overhaul"). Other users of is_mem_section_removable() will be removed next, so that we can remove is_mem_section_removable() completely. [1] http://man7.org/linux/man-pages/man1/lsmem.1.html [2] http://man7.org/linux/man-pages/man8/chmem.8.html [3] https://github.com/ibm-power-utilities/powerpc-utils [4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com Also, this patch probably fixes a crash reported by Steve. http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com Reported-by: "Scargall, Steve" <steve.scargall@intel.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Nathan Fontenot <ndfont@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Karel Zak <kzak@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29	mm/swapfile.c: move inode_lock out of claim_swapfile	Naohiro Aota	1	-21/+20
	claim_swapfile() currently keeps the inode locked when it is successful, or the file is already swapfile (with -EBUSY). And, on the other error cases, it does not lock the inode. This inconsistency of the lock state and return value is quite confusing and actually causing a bad unlock balance as below in the "bad_swap" section of __do_sys_swapon(). This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE check out of claim_swapfile(). The inode is unlocked in "bad_swap_unlock_inode" section, so that the inode is ensured to be unlocked at "bad_swap". Thus, error handling codes after the locking now jumps to "bad_swap_unlock_inode" instead of "bad_swap". ===================================== WARNING: bad unlock balance detected! 5.5.0-rc7+ #176 Not tainted ------------------------------------- swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550 but there are no more locks to release! other info that might help us debug this: no locks held by swapon/4294. stack backtrace: CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 Call Trace: dump_stack+0xa1/0xea print_unlock_imbalance_bug.cold+0x114/0x123 lock_release+0x562/0xed0 up_write+0x2d/0x490 __do_sys_swapon+0x94b/0x3550 __x64_sys_swapon+0x54/0x80 do_syscall_64+0xa4/0x4b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f15da0a0dc7 Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Qais Youef <qais.yousef@arm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29	block: return NULL in blk_alloc_queue() on error	Chaitanya Kulkarni	1	-1/+1
	This patch fixes follwoing warning: block/blk-core.c: In function ‘blk_alloc_queue’: block/blk-core.c:558:10: warning: returning ‘int’ from a function with return type ‘struct request_queue *’ makes pointer from integer without a cast [-Wint-conversion] return -EINVAL; Fixes: 3d745ea5b095a ("block: simplify queue allocation") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-29	efi/libstub/arm: Fix spurious message that an initrd was loaded	Ard Biesheuvel	1	-1/+1
	Commit: ec93fc371f014a6f ("efi/libstub: Add support for loading the initrd from a device path") added a diagnostic print to the ARM version of the EFI stub that reports whether an initrd has been loaded that was passed via the command line using initrd=. However, it failed to take into account that, for historical reasons, the file loading routines return EFI_SUCCESS when no file was found, and the only way to decide whether a file was loaded is to inspect the 'size' argument that is passed by reference. So let's inspect this returned size, to prevent the print from being emitted even if no initrd was loaded at all. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org
2020-03-29	efi/libstub/arm64: Avoid image_base value from efi_loaded_image	Ard Biesheuvel	1	-1/+6
	Commit: 9f9223778ef3 ("efi/libstub/arm: Make efi_entry() an ordinary PE/COFF entrypoint") did some code refactoring to get rid of the EFI entry point assembler code, and in the process, it got rid of the assignment of image_addr to the value of _text. Instead, it switched to using the image_base field of the efi_loaded_image struct provided by UEFI, which should contain the same value. However, Michael reports that this is not the case: older GRUB builds corrupt this value in some way, and since we can easily switch back to referring to _text to discover this value, let's simply do that. While at it, fix another issue in commit 9f9223778ef3, which may result in the unassigned image_addr to be misidentified as the preferred load offset of the kernel, which is unlikely but will cause a boot crash if it does occur. Finally, let's add a warning if the _text vs. image_base discrepancy is detected, so we can tell more easily how widespread this issue actually is. Reported-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org
2020-03-29	i3c: convert to use i2c_new_client_device()	Wolfram Sang	1	-1/+1
	Move away from the deprecated API. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/linux-i3c/20200326211002.13241-2-wsa+renesas@sang-engineering.com
2020-03-28	fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t	Thomas Gleixner	4	-26/+16
	Bit spinlocks are problematic if PREEMPT_RT is enabled, because they disable preemption, which is undesired for latency reasons and breaks when regular spinlocks are taken within the bit_spinlock locked region because regular spinlocks are converted to 'sleeping spinlocks' on RT. PREEMPT_RT replaced the bit spinlocks with regular spinlocks to avoid this problem. The replacement was done conditionaly at compile time, but Christoph requested to do an unconditional conversion. Jan suggested to move the spinlock into a existing padding hole which avoids a size increase of struct buffer_head on production kernels. As a benefit the lock gains lockdep coverage. [ bigeasy: Remove the wrapper and use always spinlock_t and move it into the padding hole ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Link: https://lkml.kernel.org/r/20191118132824.rclhrbujqh4b4g4d@linutronix.de
2020-03-28	thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t	Clark Williams	1	-12/+12
	The pkg_temp_lock spinlock is acquired in the thermal vector handler which is truly atomic context even on PREEMPT_RT kernels. The critical sections are tiny, so change it to a raw spinlock. Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191008110021.2j44ayunal7fkb7i@linutronix.de
2020-03-28	Documentation/locking/locktypes: Minor copy editor fixes	Randy Dunlap	1	-11/+11
	Minor editorial fixes: - remove 'enabled' from PREEMPT_RT enabled kernels for consistency - add some periods for consistency - add "'" for possessive CPU's - spell out interrupts [ tglx: Picked up Paul's suggestions ] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/ac615f36-0b44-408d-aeab-d76e4241add4@infradead.org
2020-03-28	Documentation/locking/locktypes: Further clarifications and wordsmithing	Thomas Gleixner	1	-50/+98
	The documentation of rw_semaphores is wrong as it claims that the non-owner reader release is not supported by RT. That's just history biased memory distortion. Split the 'Owner semantics' section up and add separate sections for semaphore and rw_semaphore to reflect reality. Aside of that the following updates are done: - Add pseudo code to document the spinlock state preserving mechanism on PREEMPT_RT - Wordsmith the bitspinlock and lock nesting sections Co-developed-by: Paul McKenney <paulmck@kernel.org> Signed-off-by: Paul McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/87wo78y5yy.fsf@nanos.tec.linutronix.de
2020-03-28	m68knommu: Remove mm.h include from uaccess_no.h	Thomas Gleixner	1	-1/+0
	In file included from include/linux/huge_mm.h:8, from include/linux/mm.h:567, from arch/m68k/include/asm/uaccess_no.h:8, from arch/m68k/include/asm/uaccess.h:3, from include/linux/uaccess.h:11, from include/linux/sched/task.h:11, from include/linux/sched/signal.h:9, from include/linux/rcuwait.h:6, from include/linux/percpu-rwsem.h:7, from kernel/locking/percpu-rwsem.c:6: include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore' 1422 \| struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; Removing the include of linux/mm.h from the uaccess header solves the problem and various build tests of nommu configurations still work. Fixes: 80fbaf1c3f29 ("rcuwait: Add @state argument to rcuwait_wait_event()") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lkml.kernel.org/r/87fte1qzh0.fsf@nanos.tec.linutronix.de
2020-03-28	cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()	Thomas Gleixner	2	-5/+11
	A recent change to freeze_secondary_cpus() which added an early abort if a wakeup is pending missed the fact that the function is also invoked for shutdown, reboot and kexec via disable_nonboot_cpus(). In case of disable_nonboot_cpus() the wakeup event needs to be ignored as the purpose is to terminate the currently running kernel. Add a 'suspend' argument which is only set when the freeze is in context of a suspend operation. If not set then an eventually pending wakeup event is ignored. Fixes: a66d955e910a ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending") Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pavankumar Kondeti <pkondeti@codeaurora.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
2020-03-28	Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"	Thomas Gleixner	1	-2/+0
	This reverts commit 4f41fe386a94639cd9a1831298d4f85db5662f1e. The change breaks systems on which the DT node of a device is used by multiple drivers. The proposed workaround to clear OF_POPULATED is just a band aid and this needs to be cleaned up at the root of the problem. Revert this for now. Reported-by: Ionela Voinescu <ionela.voinescu@arm.com> Reported-by: Jon Hunter <jonathanh@nvidia.com> Requested-by: Rob Herring <robh+dt@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Saravana Kannan <saravanak@google.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200324175955.GA16972@arm.com
2020-03-28	MAINTAINERS: erofs: update my email address	Gao Xiang	1	-1/+1
	This email address will not be available in a few days. Update my own email address to xiang@kernel.org, which should be available all the time. Link: https://lore.kernel.org/r/20200328040036.117974-1-gaoxiang25@huawei.com Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2020-03-28	i2c: pca-platform: Use platform_irq_get_optional	Chris Packham	1	-1/+1
	The interrupt is not required so use platform_irq_get_optional() to avoid error messages like i2c-pca-platform 22080000.i2c: IRQ index 0 not found Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-28	i2c: st: fix missing struct parameter description	Alain Volmat	1	-0/+1
	Fix a missing struct parameter description to allow warning free W=1 compilation. Signed-off-by: Alain Volmat <avolmat@me.com> Reviewed-by: Patrice Chotard <patrice.chotard@st.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-27	x86: get rid of user_atomic_cmpxchg_inatomic()	Al Viro	2	-94/+19
	Only one user left; the thing had been made polymorphic back in 2013 for the sake of MPX. No point keeping it now that MPX is gone. Convert futex_atomic_cmpxchg_inatomic() to user_access_{begin,end}() while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	generic arch_futex_atomic_op_inuser() doesn't need access_ok()	Al Viro	1	-2/+0
	uses get_user() and put_user() for memory accesses Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	x86: don't reload after cmpxchg in unsafe_atomic_op2() loop	Al Viro	1	-8/+8
	lock cmpxchg leaves the current value in eax; no need to reload it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()	Al Viro	1	-26/+36
	Lift stac/clac pairs from __futex_atomic_op{1,2} into arch_futex_atomic_op_inuser(), fold them with access_ok() in there. The switch in arch_futex_atomic_op_inuser() is what has required the previous (objtool) commit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	objtool: whitelist __sanitizer_cov_trace_switch()	Al Viro	1	-0/+1
	it's not really different from e.g. __sanitizer_cov_trace_cmp4(); as it is, the switches that generate an array of labels get rejected by objtool, while slightly different set of cases that gets compiled into a series of comparisons is accepted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	[parisc, s390, sparc64] no need for access_ok() in futex handling	Al Viro	3	-7/+0
	access_ok() is always true on those Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	sh: no need of access_ok() in arch_futex_atomic_op_inuser()	Al Viro	1	-3/+0
	everything it uses is doing access_ok() already Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	futex: arch_futex_atomic_op_inuser() calling conventions change	Al Viro	20	-57/+43
	Move access_ok() in and pagefault_enable()/pagefault_disable() out. Mechanical conversion only - some instances don't really need a separate access_ok() at all (e.g. the ones only using get_user()/put_user(), or architectures where access_ok() is always true); we'll deal with that in followups. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27	r8169: fix PHY driver check on platforms w/o module softdeps	Heiner Kallweit	1	-9/+7
	On Android/x86 the module loading infrastructure can't deal with softdeps. Therefore the check for presence of the Realtek PHY driver module fails. mdiobus_register() will try to load the PHY driver module, therefore move the check to after this call and explicitly check that a dedicated PHY driver is bound to the PHY device. Fixes: f32593773549 ("r8169: check that Realtek PHY driver module is loaded") Reported-by: Chih-Wei Huang <cwhuang@android-x86.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27	null_blk: add trace in null_blk_zoned.c	Chaitanya Kulkarni	1	-1/+11
	With the help of previously added tracepoints we can now trace report-zones, zone-write and zone-mgmt ops in null_blk_zoned.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-27	null_blk: add tracepoint helpers for zoned mode	Chaitanya Kulkarni	3	-0/+106
	This patch adds two new tracpoints for null_blk_zoned.c that allows us to trace report-zones, zone-mgmt-op and zone-write operations which has direct effect on the zone condition state machine. Also, we update drivers/block/Makefile so that new null_blk related tracefiles can be compiled. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>