| Age | Commit message (Collapse) | Author | Files | Lines |
|
This reverts commit 55a5542a5462 ("s390/hibernate: fix error handling when
suspend cpu != resume cpu"). It added sclp_early_printk_force() which
is no longer used since commit 394216275c7d ("s390: remove broken
hibernate / power management support"). No hibernate - no problem.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Since commit 980d5f9ab36b ("s390/boot: enable .bss section for compressed
kernel") .bss section usage is no longer restricted. .bss section is a
part of the decompressor's image and is zeroed by the linker. For that
reason clean up now unneeded .data section usage.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
.bss section is a part of the decompressor's image now, linker fills it
with zeros already. No need do it with memset additionally.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The spinlock ap_poll_timer_lock is initialized statically. It is
unnecessary to initialize by spin_lock_init().
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This patch reworks the zcrypt device driver so that the set_fs()
invocation is not needed any more. Instead there is a new flag bool
userspace passed through all the functions which tells if the pointer
arguments are userspace or kernelspace. Together with the two new
inline functions z_copy_from_user() and z_copy_to_user() which either
invoke copy_from_user (userspace is true) or memcpy (userspace is
false) the zcrypt dd and the AP bus now has no requirement for
the set_fs() functionality any more.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Currently the kernel crashes in Kasan instrumentation code if
CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization
capable machine where the ultravisor imposes addressing limitations on
the host and those limitations are lower then KASAN_SHADOW_OFFSET.
The problem is that Kasan has to know in advance where vmalloc/modules
areas would be. With protected virtualization enabled vmalloc/modules
areas are moved down to the ultravisor secure storage limit while kasan
still expects them at the very end of 4-level paging address space.
To fix that make Kasan recognize when protected virtualization is enabled
and predefine vmalloc/modules areas position which are compliant with
ultravisor secure storage limit.
Kasan shadow itself stays in place and might reside above that ultravisor
secure storage limit.
One slight difference compaired to a kernel without Kasan enabled is that
vmalloc/modules areas position is not reverted to default if ultravisor
initialization fails. It would still be below the ultravisor secure
storage limit.
Kernel layout with kasan, 4-level paging and protected virtualization
enabled (ultravisor secure storage limit is at 0x0000800000000000):
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400080000000
---[ vmemmap Area End ]---
---[ vmalloc Area Start ]---
0x00007fe000000000-0x00007fff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x00007fff80000000-0x0000800000000000
---[ Modules Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
0x001c000000000000-0x0020000000000000 1P PGD I
Kernel layout with kasan, 4-level paging and protected virtualization
disabled/unsupported:
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400060000000
---[ vmemmap Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
---[ vmalloc Area Start ]---
0x001fffe000000000-0x001fffff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x001fffff80000000-0x0020000000000000
---[ Modules Area End ]---
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Avoid potential crash due to lack of secure storage limit. Check that
max_sec_stor_addr is not 0 before adjusting vmalloc position.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
To make early kernel address space layout definition possible parse
prot_virt option in the decompressor and pass it to the uncompressed
kernel. This enables kasan to take ultravisor secure storage limit into
consideration and pre-define vmalloc position correctly.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Currently vmemmap area is unconditionally moved beyond Kasan shadow
memory. When Kasan is not enabled vmemmap area position is calculated
in setup_memory_end() and depends on limiting factors like ultravisor
secure storage limit. Try to follow the same logic with Kasan enabled
as well and avoid unnecessary vmemmap area position changes unless it
really intersects with Kasan shadow.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Kasan configuration options and size of physical memory present could
affect kernel memory layout. In particular vmemmap, vmalloc and modules
might come before kasan shadow or after it. To make ptdump correctly
output markers in the right order markers have to be sorted.
To preserve the original order of markers with the same start address
avoid using sort() from lib/sort.c (which is not stable sorting algorithm)
and sort markers in place.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
this fixes a missing prototype compiler warning spotted by the kernel
test robot.
Fixes: abb95b7550f8 ("s390/pci: consolidate SR-IOV specific code")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Use ifdefs instead of IS_ENABLED() to avoid compile error
for !PTDUMP_DEBUGFS:
arch/s390/mm/dump_pagetables.c: In function ‘pt_dump_init’:
arch/s390/mm/dump_pagetables.c:248:64: error: ‘ptdump_fops’ undeclared (first use in this function); did you mean ‘pidfd_fops’?
debugfs_create_file("kernel_page_tables", 0400, NULL, NULL, &ptdump_fops);
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Fixes: 08c8e685c7c9 ("s390: add ARCH_HAS_DEBUG_WX support")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
- Support static uninitialized variables in compressed kernel.
- Remove chkbss script
- Get rid of workarounds for not having .bss section
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
We don't need to export pages if we destroy the VM configuration
afterwards anyway. Instead we can destroy the page which will zero it
and then make it accessible to the host.
Destroying is about twice as fast as the export.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/kvm/20200907124700.10374-2-frankja@linux.ibm.com/
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
[hca@linux.ibm.com: add more markers, rename some markers]
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
ARCH_HAS_DEBUG_WX feature support brought attention to the fact that
currently initial kasan shadow memory mapped without noexec flag. So fix that.
Temporary initial identity mapping is still created without noexec, but
it is replaced by properly set up paging later.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Checks the whole kernel address space for W+X mappings. Note that
currently the first lowcore page unfortunately has to be mapped
W+X. Therefore this not reported as an insecure mapping.
For the very same reason the wording is also different to other
architectures if the test passes:
On s390 it is "no unexpected W+X pages found" instead of
"no W+X pages found".
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
s390 version of ae5d1cf358a5 ("arm64: dump: Make the page table
dumping seq_file optional").
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make sure that kprobe insn pages are not writable anymore.
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
clp_rescan_pci_devices_simple() is neither simpler than
clp_scan_pci_devices() nor does it really scan PCI devices, in particular
it will neither add newly discovered devices nor remove those which
disappeared.
Instead it only refreshes PCI function handles and also
has just a single callsite in the same translation unit left which
in fact only refreshes one specific function handle identified by
a FID.
Clarify this by renaming the function and its helper to
clp_refresh_fh() respectvely __clp_refresh_fh() and make it take
a fid directly which saves us dealing with the NULL case which
updated all function handles but is not used anymore.
Furthermore since the only callsite is in the same translation unit
make it static.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
there is only one call site of clp_rescan_pci_devices() and
all the function does is call zpci_remove_reserved_devices()
followed by a duplicating clp_scan_pci_devices().
So inline the single call as a call to zpci_remove_reserved_devices()
and clp_scan_pci_devices() and remove the function.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
the only caller of this was removed as part of the suspend/resume
removal so no need to keep this function around.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
currently we have multiple #ifdef CONFIG_PCI_IOV blocks spread over
different compliation units and headers, all dealing with SR-IOV
specific behavior.
This violates the style guide which discourages conditionally compiled
code blocks and hinders maintainability by speading SR-IOV functionality
over many files.
Let's move all of this into a conditionally compiled pci_iov.c file and
local header and prefix SR-IOV specific functions with zpci_iov_*.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This is currently only preventing that outdated information is
provided to user space. A concurrent split of huge/large pages does
modify the kernel page tables, however either the huge/large mapping
is reported or the split area is being walked.
This "fixes" also only a potential future bug, since split pages could
also be merged again if page permissions are the same for larger
memory areas.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This is the s390 variant of commit bf2b59f60ee1 ("arm64/mm: Hold
memory hotplug lock while walking for kernel page table dump").
Right now this doesn't fix any real bug, however as soon as kvm
patches get merged which make use of memory remove we might end up
dereferencing/accessing freed page tables.
Therefore fix this potential bug already now.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make use of generic ptdump infrastructure.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Instead of two times go through the list of available AP devices
(which may be up to 256 * 256 entries) this patch reworks the code do
only run through once. The price is instead of reporting all possible
devices to the caller only the first 256 devices are collected.
However, having to choose from 256 AP devices is plenty of resources
and should fulfill the caller's requirements. On the other side
the loop code is much simplier and more easy to maintain.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Passing a custom name from the device driver is nice - but in practice
it's only zfcp who has been using this. So we might as well hard-code
a naming scheme in the qdio layer, so that qeth also benefits from it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
With our current support for the new MIO PCI instructions, write
combining/write back MMIO memory can be obtained via the pci_iomap_wc()
and pci_iomap_wc_range() functions.
This is achieved by using the write back address for a specific bar
as provided in clp_store_query_pci_fn()
These functions are however not widely used and instead drivers often
rely on ioremap_wc() and ioremap_prot(), which on other platforms enable
write combining using a PTE flag set through the pgrprot value.
While we do not have a write combining flag in the low order flag bits
of the PTE like x86_64 does, with MIO support, there is a write back bit
in the physical address (bit 1 on z15) and thus also the PTE.
Which bit is used to toggle write back and whether it is available at
all, is however not fixed in the architecture. Instead we get this
information from the CLP Store Logical Processor Characteristics for PCI
command. When the write back bit is not provided we fall back to the
existing behavior.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
__qdio_allocate_fill_qdr() is meant to set up one specific queue
descriptor in the QDR. But for this simple task, it gets passed a bunch
of global structs and offsets - and then navigates through the structs
to find its actual operands.
Clean up all the complicated pointer chasing & index calculation, and
just pass a descriptor and its associated queue struct.
While at it also add some virt_to_phys() translations, to clarify that
addresses in the QDR are meant to be absolute.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
When processing a PENDING buffer with no attached aob, the current code
would get stuck on this buffer (as the 'continue' causes us to not
advance the buffer index) and process it repeatedly until the loop
terminates eventually.
Luckily this should never happen - the HW must not use the PENDING state
when no aob was provided. But we can still make this code path less
fragile and protect against buggy devices.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
When branch profiling is enabled, if () gets annotated with code to
instrument the hit/miss ratio. This doesn't work for VDSO as we can't
access kernel code. Add -DDISABLE_BRANCH_PROFILING to fix this.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Convert s390 to generic vDSO. There are a few special things on s390:
- vDSO can be called without a stack frame - glibc did this in the past.
So we need to allocate a stackframe on our own.
- The former assembly code used stcke to get the TOD clock and applied
time steering to it. We need to do the same in the new code. This is done
in the architecture specific __arch_get_hw_counter function. The steering
information is stored in an architecure specific area in the vDSO data.
- CPUCLOCK_VIRT is now handled with a syscall fallback, which might
be slower/less accurate than the old implementation.
The getcpu() function stays as an assembly function because there is no
generic implementation and the code is just a few lines.
Performance number from my system do 100 mio gettimeofday() calls:
Plain syscall: 8.6s
Generic VDSO: 1.3s
old ASM VDSO: 1s
So it's a bit slower but still much faster than syscalls.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add some coding style changes which hopefully make the code
look a bit less odd.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Use "|" instead of "+" within csum_fold() for consistency reasons,
like in the rest of the file.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Convert ip_fast_csum() so it doesn't call csum_partial(), but instead
open code the checksum calculation. The problem with csum_partial() is
that it makes use of the cksm instruction, which has high startup
costs and therefore is only very fast if used on larger memory
regions.
IPv4 headers however are small in size (5-16 32-bit words). The open
coded variant calculates the checksum in ~30% of the time compared to
the old variant (z14, march=z196).
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Rewrite csum_tcpudp_nofold() so that the generated code will not
contain branches. The old implementation was also optimized for
machines which came with "add logical with carry" instructions,
however the compiler doesn't generate them anymore. This is most
likely because those instructions are slower.
However with the old code the compiler generates a lot of branches,
which isn't too helpful usually. Therefore rewrite the code.
In a tight loop this doesn't make any difference since the branch
prediction unit does its job.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This implementation needs only ~30% of the time to calculate the
checksum compared to the generic variant. In addition the compiler
also generates only ~30% of the instructions compared to the generic
variant (on z14, compiled with march=z196).
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.
However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Currently the nexthop code will use an empty NHA_GROUP attribute, but it
requires at least 1 entry in order to function properly. Otherwise we
end up derefencing null or random pointers all over the place due to not
having any nh_grp_entry members allocated, nexthop code relies on having at
least the first member present. Empty NHA_GROUP doesn't make any sense so
just disallow it.
Also add a WARN_ON for any future users of nexthop_create_group().
BUG: kernel NULL pointer dereference, address: 0000000000000080
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:fib_check_nexthop+0x4a/0xaa
Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
FS: 00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
Call Trace:
fib_create_info+0x64d/0xaf7
fib_table_insert+0xf6/0x581
? __vma_adjust+0x3b6/0x4d4
inet_rtm_newroute+0x56/0x70
rtnetlink_rcv_msg+0x1e3/0x20d
? rtnl_calcit.isra.0+0xb8/0xb8
netlink_rcv_skb+0x5b/0xac
netlink_unicast+0xfa/0x17b
netlink_sendmsg+0x334/0x353
sock_sendmsg_nosec+0xf/0x3f
____sys_sendmsg+0x1a0/0x1fc
? copy_msghdr_from_user+0x4c/0x61
___sys_sendmsg+0x63/0x84
? handle_mm_fault+0xa39/0x11b5
? sockfd_lookup_light+0x72/0x9a
__sys_sendmsg+0x50/0x6e
do_syscall_64+0x54/0xbe
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f10dacc0bb7
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
Modules linked in:
CR2: 0000000000000080
CC: David Ahern <dsahern@gmail.com>
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: f516fb704d02fff2 ("dt-bindings: Whitespace clean-ups in schema files")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200819092058.1526-1-geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():
| BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
| in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
| INFO: lockdep is turned off.
| CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
| Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
| Call trace:
| dump_backtrace+0x0/0x284
| show_stack+0x1c/0x28
| dump_stack+0xf0/0x1a4
| ___might_sleep+0x2bc/0x2cc
| unmap_stage2_range+0x160/0x1ac
| kvm_unmap_hva_range+0x1a0/0x1c8
| kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
| __mmu_notifier_invalidate_range_start+0x218/0x31c
| mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
| __oom_reap_task_mm+0x128/0x268
| oom_reap_task+0xac/0x298
| oom_reaper+0x178/0x17c
| kthread+0x1e4/0x1fc
| ret_from_fork+0x10/0x30
Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.
Cc: <stable@vger.kernel.org>
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-3-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.
Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The phy-connection-type parameter is described in ePAPR 1.1:
Specifies interface type between the Ethernet device and a physical
layer (PHY) device. The value of this property is specific to the
implementation.
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://lore.kernel.org/r/1597917724-11127-1-git-send-email-madalin.bucur@oss.nxp.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
The intel,lgm-pcie binding is matching on all snps,dw-pcie instances
which is wrong. Add a custom 'select' entry to fix this.
Fixes: e54ea45a4955 ("dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller")
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Reviewed-by: Dilip Kota <eswara.kota@linux.intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|