wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2020-07-27	s390/vmem: cleanup empty page tables	David Hildenbrand	1	-1/+101
	Let's cleanup empty page tables. Consider only page tables that fully fall into the idendity mapping and the vmemmap range. As there are no valid accesses to vmem/vmemmap within non-populated ranges, the single tlb flush at the end should be sufficient. Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200722094558.9828-7-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27	s390/vmemmap: take the vmem_mutex when populating/freeing	David Hildenbrand	1	-0/+4
	Let's synchronize all accesses to the 1:1 and vmemmap mappings. This will be especially relevant when wanting to cleanup empty page tables that could be shared by both. Avoid races when removing tables that might be just about to get reused. Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200722094558.9828-6-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27	s390/vmemmap: cleanup when vmemmap_populate() fails	David Hildenbrand	1	-1/+6
	Cleanup what we partially added in case vmemmap_populate() fails. For vmem, this is already handled by vmem_add_mapping(). Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200722094558.9828-5-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27	s390/vmemmap: extend modify_pagetable() to handle vmemmap	David Hildenbrand	1	-105/+76
	Extend our shiny new modify_pagetable() to handle !direct (vmemmap) mappings. Convert vmemmap_populate() and implement vmemmap_free(). Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200722094558.9828-4-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27	s390/vmem: consolidate vmem_add_range() and vmem_remove_range()	David Hildenbrand	1	-119/+198
	We want to have only a single pagetable walker and reuse the same functionality for vmemmap handling. Let's start by consolidating vmem_add_range() and vmem_remove_range(), converting it into a recursive implementation. A recursive implementation makes it easier to expand individual cases without harming readability. In addition, we minimize traversing the whole hierarchy over and over again. One change is that we don't unmap large PMDs/PUDs when not completely covered by the request, something that should never happen with direct mappings, unless one would be removing in other granularity than added, which would be broken already. Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200722094558.9828-3-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27	s390/vmem: rename vmem_add_mem() to vmem_add_range()	David Hildenbrand	1	-3/+3
	Let's match the name to vmem_remove_range(). Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200722094558.9828-2-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27	s390: enable HAVE_FUNCTION_ERROR_INJECTION	Ilya Leoshkevich	5	-3/+25
	This kernel feature is required for enabling BPF_KPROBE_OVERRIDE. Define override_function_with_return() and regs_set_return_value() functions, and fix compile errors in syscall_wrapper.h. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-27	s390/pci: clarify comment in s390_mmio_read/write	Niklas Schnelle	1	-8/+12
	The existing comment was talking about reading in the write part and vice versa. While we are here make it more clear why restricting the syscalls to MIO capable devices is okay. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-22	s390/time: improve comparison for tod steering	Heiko Carstens	1	-1/+1
	It doesn't make sense to add zero shifted by 15. It's still zero. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-22	s390/time: select CLOCKSOURCE_VALIDATE_LAST_CYCLE	Heiko Carstens	1	-0/+1
	The value returned by read_tod_clock() will overflow on September 17th 2042. To avoid that system time jumps back select CLOCKSOURCE_VALIDATE_LAST_CYCLE which enables a sanity check in order to prevent negative "delta" values. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-22	s390/time: use CLOCKSOURCE_MASK	Heiko Carstens	1	-1/+1
	Make use of CLOCKSOURCE_MASK instead of open-coding it. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/bpf: implement BPF_PROBE_MEM	Ilya Leoshkevich	1	-1/+138
	This is a s390 port of x86 commit 3dec541b2e63 ("bpf: Add support for BTF pointers to x86 JIT"). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/kernel: expand exception table logic to allow new handling options	Ilya Leoshkevich	6	-17/+94
	This is a s390 port of commit 548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options"), which is needed for implementing BPF_PROBE_MEM on s390. The new handler field is made 64-bit in order to allow pointing from dynamically allocated entries to handlers in kernel text. Unlike on x86, NULL is used instead of ex_handler_default. This is because exception tables are used by boot/text_dma.S, and it would be a pain to preserve ex_handler_default. The new infrastructure is ignored in early_pgm_check_handler, since there is no pt_regs. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/kernel: unify EX_TABLE* implementations	Ilya Leoshkevich	2	-23/+23
	Replace three implementations with one using using __stringify_in_c macro conveniently "borrowed" from powerpc and microblaze. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/mm: allow order 10 allocations	Heiko Carstens	1	-4/+0
	Get rid of FORCE_MAX_ZONEORDER which limited allocations to order 8 (= 1MB) and use the default, which allows for order 10 (= 4MB) allocations. Given that s390 allows less than the default this caused some memory allocation problems more or less unique to s390 from time to time. Note: this was originally introduced with commit 684de39bd795 ("[S390] Fix IPL from NSS.") in order to support Named Saved Segments, which could start/end at an arbitrary 1 megabyte boundary and also before support for sparsemem vmemmmap was enabled. Since NSS support is gone, but sparsemem vmemmap support is available this limitation can go away. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/mm: avoid trimming to MAX_ORDER	Heiko Carstens	1	-8/+0
	Trimming to MAX_ORDER was originally done in order to avoid to set HOLES_IN_ZONE, which in turn would enable a quite expensive pfn_valid() check. pfn_valid() however only checks if a struct page exists for a given pfn. With sparsemen vmemmap there are always struct pages, since memmaps are allocated for whole sections. Therefore remove the HOLES_IN_ZONE comment and the trimming. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/qdio: remove internal polling in non-thinint path	Julian Wiedmann	2	-27/+2
	For non-thinint devices in LPAR, qdio polls an idle Input Queue for a little while to catch more work. But platform support for thinints has been around practically _forever_ by now, so this micro-optimization is seeing 0 actual use. Remove it to reduce the overall complexity of the hot path. In the meantime we also grew support for driver-level polling (eg. NAPI in qeth), so it's quite questionable how useful this would actually be on current kernels. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/qdio: allow to scan all 128 Input SBALs	Julian Wiedmann	1	-5/+1
	The comment is inaccurate, qdio_inbound_q_moved() and/or its callers no longer get confused by a count of 128 completed SBALs. Scanning all 128 SBALs at once can improve IRQ reduction (as we now place the ACK at the right spot), and reduce the amount of processing needed to handle all completed SBALs. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/qdio: fix statistics for 128 SBALs	Julian Wiedmann	3	-14/+3
	Old code would only scan up to 127 SBALs at once. So the last statistics bucket was set aside to count "discovered 127 SBALs with new work" events. But nowadays we allow to scan all 128 SBALs for Output Queues, and a subsequent patch will introduce the same for Input Queues. So fix up the accounting to use the last bucket only when all 128 SBALs have been discovered with new work. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-20	s390/mm: fix typo in comment	Heiko Carstens	1	-1/+1
	Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-10	s390: add trace events for idle enter/exit	Sven Schnelle	1	-1/+3
	Helpful for debugging. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-10	s390: fix comment regarding interrupts in svc	Christian Borntraeger	1	-1/+1
	With the removal of the critical section cleanup, we now enter the svc interrupt handler with interrupts disabled. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-10	s390/ap: rework crypto config info and default domain code	Harald Freudenberger	1	-148/+153
	Rework of the QCI crypto info and how it is used. This is only a internal rework but does not affect the way how the ap bus acts with ap card and queue devices and domain handling. Tested on z15, z14, z12 (QCI support) and z196 (no QCI support). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-10	s390/mm: don't set ARCH_KEEP_MEMBLOCK	David Hildenbrand	1	-1/+0
	Commit 50be63450728 ("s390/mm: Convert bootmem to memblock") mentions "The original bootmem allocator is getting replaced by memblock. To cover the needs of the s390 kdump implementation the physical memory list is used." As we can now reference "physmem" managed in the memblock allocator after init even without ARCH_KEEP_MEMBLOCK, and s390x does no longer need other memblock metadata after boot (esp., the zcore memmap device that used it got removed), we can stop setting ARCH_KEEP_MEMBLOCK. With this change, we no longer create memblocks for standby/hotplugged memory (added via add_memory()) and free up memblock metadata (except physmem) after boot. Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Philipp Rudo <prudo@linux.ibm.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200701141830.18749-3-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-10	mm/memblock: expose only miminal interface to add/walk physmem	David Hildenbrand	3	-36/+55
	"physmem" in the memblock allocator is somewhat weird: it's not actually used for allocation, it's simply information collected during boot, which describes the unmodified physical memory map at boot time, without any standby/hotplugged memory. It's only used on s390 and is currently the only reason s390 keeps using CONFIG_ARCH_KEEP_MEMBLOCK. Physmem isn't numa aware and current users don't specify any flags. Let's hide it from the user, exposing only for_each_physmem(), and simplify. The interface for physmem is now really minimalistic: - memblock_physmem_add() to add ranges - for_each_physmem() / __next_physmem_range() to walk physmem ranges Don't place it into an __init section and don't discard it without CONFIG_ARCH_KEEP_MEMBLOCK. As we're reusing __next_mem_range(), remove the __meminit notifier to avoid section mismatch warnings once CONFIG_ARCH_KEEP_MEMBLOCK is no longer used with CONFIG_HAVE_MEMBLOCK_PHYS_MAP. While fixing up the documentation, sneak in some related cleanups. We can stop setting CONFIG_ARCH_KEEP_MEMBLOCK for s390 next. Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Message-Id: <20200701141830.18749-2-david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-03	s390/zcrypt: provide cex4 cca sysfs attributes for cex3	Harald Freudenberger	2	-8/+132
	This patch introduces the sysfs attributes serialnr and mkvps for cex2c and cex3c cards. These sysfs attributes are available for cex4c and higher since commit 7c4e91c0959b ("s390/zcrypt: new sysfs attributes serialnr and mkvps")' and this patch now provides the same for the older cex2 and cex3 cards. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/ap: rename and clarify ap state machine related stuff	Harald Freudenberger	3	-138/+138
	There is a state machine held for each ap queue device. The states and functions related to this where somethimes noted with _sm_ somethimes without. This patch clarifies and renames all the ap queue state machine related functions, enums and defines to have a _sm_ in the name. There is no functional change coming with this patch - it's only beautifying code. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/zcrypt: split ioctl function into smaller code units	Harald Freudenberger	1	-72/+92
	The zcrpyt_unlocked_ioctl() function has become large. So split away into new static functions the 4 ioctl ICARSAMODEXPO, ICARSACRT, ZSECSENDCPRB and ZSENDEP11CPRB. This makes the code more readable and is a preparation step for further improvements needed on these ioctls. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/zcrypt: code beautification and struct field renames	Harald Freudenberger	10	-219/+217
	Some beautifications related to the internal only used struct ap_message and related code. Instead of one int carrying only the special flag now a u32 flags field is used. At struct CPRBX the pointers to additional data are now marked with __user. This caused some changes needed on code, where these structs are also used within the zcrypt misc functions. The ica_rsa_* structs now use the generic types __u8, __u32, ... instead of char, unsigned int. zcrypt_msg6 and zcrypt_msg50 use min_t() instead of min(). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/zcrypt: fix smatch warnings	Harald Freudenberger	2	-8/+8
	Fix these smatch warnings: zcrypt_api.c:986 _zcrypt_send_ep11_cprb() error: uninitialized symbol 'pref_weight'. zcrypt_api.c:1008 _zcrypt_send_ep11_cprb() error: uninitialized symbol 'weight'. zcrypt_api.c:676 zcrypt_rsa_modexpo() error: uninitialized symbol 'pref_weight'. zcrypt_api.c:694 zcrypt_rsa_modexpo() error: uninitialized symbol 'weight'. zcrypt_api.c:760 zcrypt_rsa_crt() error: uninitialized symbol 'pref_weight'. zcrypt_api.c:778 zcrypt_rsa_crt() error: uninitialized symbol 'weight'. zcrypt_api.c:824 _zcrypt_send_cprb() warn: always true condition '(tdom >= 0) => (0-u16max >= 0)' zcrypt_api.c:846 _zcrypt_send_cprb() error: uninitialized symbol 'pref_weight'. zcrypt_api.c:867 _zcrypt_send_cprb() error: uninitialized symbol 'weight'. zcrypt_api.c:1065 zcrypt_rng() error: uninitialized symbol 'pref_weight'. zcrypt_api.c:1079 zcrypt_rng() error: uninitialized symbol 'weight'. zcrypt_cex4.c:251 ep11_card_op_modes_show() warn: should '(1 << ep11_op_modes[i]->mode_bit)' be a 64 bit type? zcrypt_cex4.c:346 ep11_queue_op_modes_show() warn: should '(1 << ep11_op_modes[i]->mode_bit)' be a 64 bit type? Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/pkey: fix smatch warning inconsistent indenting	Harald Freudenberger	1	-2/+2
	Fix smatch warnings: pkey_api.c:1606 pkey_ccacipher_aes_attr_read() warn: inconsistent indenting Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/extmem: remove stale -ENOSPC comment and handling	David Hildenbrand	1	-6/+1
	segment_load() will no longer return -ENOSPC. If a segment overlaps with storage, we now also return -EBUSY. Remove the stale comment from __segment_load() and the stale handling from segment_warning(). Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200630084240.8283-1-david@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/smp: add missing linebreak	Heiko Carstens	1	-0/+1
	Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-03	s390/smp: move smp_cpus_done() to header file	Heiko Carstens	2	-4/+4
	Saves us a couple of bytes. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-01	s390/tty3270: remove function callback casts	Oscar Carter	1	-6/+6
	In an effort to enable -Wcast-function-type in the top-level Makefile to support Control Flow Integrity builds, remove all the function callback casts. To do this modify the function prototypes accordingly. Signed-off-by: Oscar Carter <oscar.carter@gmx.com> Message-Id: <20200627125417.18887-1-oscar.carter@gmx.com> Reviewed-by: Kees Cook <keescook@chromium.org> [heiko.carstens@de.ibm.com: coding style changes] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-01	s390/vmem: get rid of memory segment list	David Hildenbrand	3	-121/+21
	I can't come up with a satisfying reason why we still need the memory segment list. We used to represent in the list: - boot memory - standby memory added via add_memory() - loaded dcss segments When loading/unloading dcss segments, we already track them in a separate list and check for overlaps (arch/s390/mm/extmem.c:segment_overlaps_others()) when loading segments. The overlap check was introduced for some segments in commit b2300b9efe1b ("[S390] dcssblk: add >2G DCSSs support and stacked contiguous DCSSs support.") and was extended to cover all dcss segments in commit ca57114609d1 ("s390/extmem: remove code for 31 bit addressing mode"). Although I doubt that overlaps with boot memory and standby memory are relevant, let's reshuffle the checks in load_segment() to request the resource first. This will bail out in case we have overlaps with other resources (esp. boot memory and standby memory). The order is now different compared to segment_unload() and segment_unload(), but that should not matter. This smells like a leftover from ancient times, let's get rid of it. We can now convert vmem_remove_mapping() into a void function - everybody ignored the return value already. Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200625150029.45019-1-david@redhat.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [DCSS] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-07-01	s390/stp: allow group and users to read stp sysfs files	Sven Schnelle	1	-27/+22
	There are no secrets in these files, so allow all users to read it. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-29	s390/appldata: use struct_size() helper	Gustavo A. R. Silva	1	-4/+2
	Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Message-Id: <20200617212930.GA11728@embeddedor> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-29	s390/debug: remove struct __debug_entry from uapi	Heiko Carstens	2	-36/+16
	There is no interface to userspace which exposes anything that would require the struct __debug_entry definition. Therefore remove it from uapi. This allows to change the definition, since it is only kernel internally used. The only exception is the crash utility, however that tool must handle changes all the time anyway. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-29	s390/debug: remove raw view	Heiko Carstens	3	-56/+4
	There is not a single user of the debug raw view. Therefore remove it before anybody uses it. If anybody would make use of the view it would expose the struct __debug_entry definition to userspace and really would make it uapi. This wouldn't be good, since the definition is suboptimal and needs to be changed. Right now the structure definition is only defined to be uapi, however there is no user. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-29	s390/time: remove unused function	Sven Schnelle	1	-5/+0
	Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-29	s390/pci: remove unused functions	Sven Schnelle	1	-11/+0
	Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-29	s390/zcore: remove memmap device	Alexander Egorenkov	1	-55/+2
	Remove unused /sys/kernel/debug/zcore/memmap device. Since at least version 1.24.0 of s390-tools zfcpdump no longer needs it and reads /proc/vmcore instead. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-29	s390: convert to msecs_to_jiffies()	Sven Schnelle	4	-5/+5
	Instead of using the old 'jiffies + HZ {/,*} something' calculation use msecs_to_jiffies() as that makes the code more readable. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-28	Linux 5.8-rc3	Linus Torvalds	1	-1/+1

2020-06-28	sched/cfs: change initial value of runnable_avg	Vincent Guittot	1	-1/+1
	Some performance regression on reaim benchmark have been raised with commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") The problem comes from the init value of runnable_avg which is initialized with max value. This can be a problem if the newly forked task is finally a short task because the group of CPUs is wrongly set to overloaded and tasks are pulled less agressively. Set initial value of runnable_avg equals to util_avg to reflect that there is no waiting time so far. Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200624154422.29166-1-vincent.guittot@linaro.org
2020-06-28	smp, irq_work: Continue smp_call_function() and irq_work() integration	Peter Zijlstra	6	-58/+86
	Instead of relying on BUG_ON() to ensure the various data structures line up, use a bunch of horrible unions to make it all automatic. Much of the union magic is to ensure irq_work and smp_call_function do not (yet) see the members of their respective data structures change name. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
2020-06-28	sched/core: s/WF_ON_RQ/WQ_ON_CPU/	Peter Zijlstra	2	-3/+3
	Use a better name for this poorly named flag, to avoid confusion... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org
2020-06-28	sched/core: Fix ttwu() race	Peter Zijlstra	1	-5/+28
	Paul reported rcutorture occasionally hitting a NULL deref: sched_ttwu_pending() ttwu_do_wakeup() check_preempt_curr() := check_preempt_wakeup() find_matching_se() is_same_group() if (se->cfs_rq == pse->cfs_rq) <-- BOOM Debugging showed that this only appears to happen when we take the new code-path from commit: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling") and only when @cpu == smp_processor_id(). Something which should not be possible, because p->on_cpu can only be true for remote tasks. Similarly, without the new code-path from commit: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") this would've unconditionally hit: smp_cond_load_acquire(&p->on_cpu, !VAL); and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this would result in an instant live-lock (with IRQs disabled), something that hasn't been reported. The NULL deref can be explained however if the task_cpu(p) load at the beginning of try_to_wake_up() returns an old value, and this old value happens to be smp_processor_id(). Further assume that the p->on_cpu load accurately returns 1, it really is still running, just not here. Then, when we enqueue the task locally, we can crash in exactly the observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from the wrong CPU, therefore we'll iterate into the non-existant parents and NULL deref. The closest semi-plausible scenario I've managed to contrive is somewhat elaborate (then again, actual reproduction takes many CPU hours of rcutorture, so it can't be anything obvious): X->cpu = 1 rq(1)->curr = X CPU0 CPU1 CPU2 // switch away from X LOCK rq(1)->lock smp_mb__after_spinlock dequeue_task(X) X->on_rq = 9 switch_to(Z) X->on_cpu = 0 UNLOCK rq(1)->lock // migrate X to cpu 0 LOCK rq(1)->lock dequeue_task(X) set_task_cpu(X, 0) X->cpu = 0 UNLOCK rq(1)->lock LOCK rq(0)->lock enqueue_task(X) X->on_rq = 1 UNLOCK rq(0)->lock // switch to X LOCK rq(0)->lock smp_mb__after_spinlock switch_to(X) X->on_cpu = 1 UNLOCK rq(0)->lock // X goes sleep X->state = TASK_UNINTERRUPTIBLE smp_mb(); // wake X ttwu() LOCK X->pi_lock smp_mb__after_spinlock if (p->state) cpu = X->cpu; // =? 1 smp_rmb() // X calls schedule() LOCK rq(0)->lock smp_mb__after_spinlock dequeue_task(X) X->on_rq = 0 if (p->on_rq) smp_rmb(); if (p->on_cpu && ttwu_queue_wakelist(..)) [*] smp_cond_load_acquire(&p->on_cpu, !VAL) cpu = select_task_rq(X, X->wake_cpu, ...) if (X->cpu != cpu) switch_to(Y) X->on_cpu = 0 UNLOCK rq(0)->lock However I'm having trouble convincing myself that's actually possible on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu observes ->state != RUNNING, it must also observe ->cpu != 1. (Most of the previous ttwu() races were found on very large PowerPC) Nevertheless, this fully explains the observed failure case. Fix it by ordering the task_cpu(p) load after the p->on_cpu load, which is easy since nothing actually uses @cpu before this. Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net
2020-06-28	sched/core: Fix PI boosting between RT and DEADLINE tasks	Juri Lelli	1	-1/+2
	syzbot reported the following warning: WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628 enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504 At deadline.c:628 we have: 623 static inline void setup_new_dl_entity(struct sched_dl_entity dl_se) 624 { 625 struct dl_rq dl_rq = dl_rq_of_se(dl_se); 626 struct rq *rq = rq_of_dl_rq(dl_rq); 627 628 WARN_ON(dl_se->dl_boosted); 629 WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline)); [...] } Which means that setup_new_dl_entity() has been called on a task currently boosted. This shouldn't happen though, as setup_new_dl_entity() is only called when the 'dynamic' deadline of the new entity is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this condition. Digging through the PI code I noticed that what above might in fact happen if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the first branch of boosting conditions we check only if a pi_task 'dynamic' deadline is earlier than mutex holder's and in this case we set mutex holder to be dl_boosted. However, since RT 'dynamic' deadlines are only initialized if such tasks get boosted at some point (or if they become DEADLINE of course), in general RT 'dynamic' deadlines are usually equal to 0 and this verifies the aforementioned condition. Fix it by checking that the potential donor task is actually (even if temporary because in turn boosted) running at DEADLINE priority before using its 'dynamic' deadline value. Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic") Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Tested-by: Daniel Wagner <dwagner@suse.de> Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain