linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-05-17	xtensa: improve call0 ABI probing	Max Filippov	4	-0/+24
	When call0 userspace ABI support by probing is enabled instructions that cause illegal instruction exception when PS.WOE is clear are retried with PS.WOE set before calling c-level exception handler. Record user pc at which PS.WOE was set in the fast exception handler and clear PS.WOE in the c-level exception handler if we get there from the same address. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-17	xtensa: support artificial division by 0 exception	Max Filippov	1	-0/+23
	On xtensa cores wihout hardware division option division support functions from libgcc react to division by 0 attempt by executing illegal instruction followed by the characters 'DIV0'. Recognize this pattern in illegal instruction exception handler and convert it to division by 0. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-13	xtensa: add trap handler for division by zero	Max Filippov	1	-1/+7
	Add c-level handler for the division by zero exception and kill the task if it was thrown from the kernel space or send SIGFPE otherwise. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-11	xtensa/simdisk: fix proc_read_simdisk()	Yi Yang	1	-6/+12
	The commit a69755b18774 ("xtensa simdisk: switch to proc_create_data()") split read operation into two parts, first retrieving the path when it's non-null and second retrieving the trailing '\n'. However when the path is non-null the first simple_read_from_buffer updates ppos, and the second simple_read_from_buffer returns 0 if ppos is greater than 1 (i.e. almost always). As a result reading from that proc file is almost always empty. Fix it by making a temporary copy of the path with the trailing '\n' and using simple_read_from_buffer on that copy. Cc: stable@vger.kernel.org Fixes: a69755b18774 ("xtensa simdisk: switch to proc_create_data()") Signed-off-by: Yi Yang <yiyang13@huawei.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-11	xtensa: no need to initialise statics to 0	Jason Wang	1	-1/+1
	Static variables do not need to be initialised to 0, because compiler will initialise all uninitialised statics to 0. Thus, remove the unneeded initializations. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Message-Id: <20220508022910.98481-1-wangborong@cdjrlc.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: clean up labels in the kernel entry assembly	Max Filippov	1	-47/+53
	Don't use numeric labels for long jumps, use named local labels instead. Avoid conditional label definition. No functional changes. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: don't leave invalid TLB entry in fast_store_prohibited	Max Filippov	1	-1/+5
	When fast_store_prohibited needs to go to the C-level exception handler it leaves TLB entry that caused page fault in the TLB. If the faulting task gets switched to a different CPU and completes page table update there the TLB entry will get out of sync with the page table which may cause a livelock on access to that page. Invalidate faulting TLB entry on a slow path exit from the fast_store_prohibited. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: fix declaration of _SecondaryResetVector_text_*	Max Filippov	1	-1/+1
	Secondary reset vector is defined, compiled and used when CONFIG_SECONDARY_RESET_VECTOR is enabled, not only on SMP. Make declarations of _SecondaryResetVector_text_* symbols available accordingly. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	irqchip: irq-xtensa-mx: fix initial IRQ affinity	Max Filippov	1	-4/+14
	When irq-xtensa-mx chip is used in non-SMP configuration its irq_set_affinity callback is not called leaving IRQ affinity set empty. As a result IRQ delivery does not work in that configuration. Initialize IRQ affinity of the xtensa MX interrupt distributor to CPU 0 for all external IRQ lines. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: enable ARCH_HAS_DEBUG_VM_PGTABLE	Max Filippov	2	-1/+2
	xtensa kernels successfully build and run with CONFIG_DEBUG_VM_PGTABLE=y, enable arch support for it. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: add hibernation support	Max Filippov	5	-0/+129
	Define ARCH_HIBERNATION_POSSIBLE in Kconfig and implement hibernation callbacks. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: support coprocessors on SMP	Max Filippov	11	-67/+230
	Current coprocessor support on xtensa only works correctly on uniprocessor configurations. Make it work on SMP too and keep it lazy. Make coprocessor_owner array per-CPU and move it to struct exc_table for easy access from the fast_coprocessor exception handler. Allow task to have live coprocessors only on single CPU, record this CPU number in the struct thread_info::cp_owner_cpu. Change struct thread_info::cpenable meaning to be 'coprocessors live on cp_owner_cpu'. Introduce C-level coprocessor exception handler that flushes and releases live coprocessors of the task taking 'coprocessor disabled' exception and call it from the fast_coprocessor handler when the task has live coprocessors on other CPU. Make coprocessor_flush_all and coprocessor_release_all work correctly when called from any CPU by sending IPI to the cp_owner_cpu. Add function coprocessor_flush_release_all to do flush followed by release atomically. Add function local_coprocessors_flush_release_all to flush and release all coprocessors on the local CPU and use it to flush coprocessor contexts from the CPU that goes offline. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: get rid of stack frame in coprocessor_flush	Max Filippov	1	-6/+5
	coprocessor_flush is an ordinary function, it can use all registers. Don't reserve stack frame for it and use a7 to preserve a0 around the context saving call. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: merge SAVE_CP_REGS_TAB and LOAD_CP_REGS_TAB	Max Filippov	1	-48/+37
	Both tables share the same offset field but the different function pointers. Merge them into single table with 3-element entries to reduce code and data duplication. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: add xtensa_xsr macro	Max Filippov	1	-0/+7
	xtensa_xsr does the XSR instruction for the specified special register. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: handle coprocessor exceptions in kernel mode	Max Filippov	1	-1/+1
	In order to let drivers use xtensa coprocessors on behalf of the calling process the kernel must handle coprocessor exceptions from the kernel mode the same way as from the user mode. This is not sufficient to allow using coprocessors transparently in IRQ or softirq context. Should such users exist they must be aware of the context and do the right thing, e.g. preserve the coprocessor state and resore it after use. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: use callx0 opcode in fast_coprocessor	Max Filippov	1	-10/+8
	Instead of emulating call0 in fast_coprocessor use that opcode directly. Use 'ret' instead of 'jx a0'. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: clean up excsave1 initialization	Max Filippov	2	-4/+3
	Use xtensa_set_sr instead of inline assembly. Rename local variable exc_table in early_trap_init to avoid conflict with per-CPU variable of the same name. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: clean up declarations in coprocessor.h	Max Filippov	1	-4/+3
	Drop 'extern' from all function declarations. Add parameter names in declarations. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: clean up exception handler prototypes	Max Filippov	3	-15/+13
	Exception handlers are currently passed as void pointers because they may have one or two parameters. Only two handlers uses the second parameter and it is available in the struct pt_regs anyway. Make all handlers have only one parameter, introduce xtensa_exception_handler type for handlers and use it in trap_set_handler. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: clean up function declarations in traps.c	Max Filippov	2	-32/+32
	Drop 'extern' from all function declarations and move those that need to be visible from traps.c to traps.h. Add 'asmlinkage' to declarations of fucntions defined in assembly. Add 'static' to declarations and definitions only used locally. Add argument names in declarations. Drop unused second argument from do_multihit and do_page_fault. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: enable KCSAN	Max Filippov	6	-7/+73
	Prefix arch-specific barrier macros with '__' to make use of instrumented generic macros. Prefix arch-specific bitops with 'arch_' to make use of instrumented generic functions. Provide stubs for 64-bit atomics when building with KCSAN. Disable KCSAN instrumentation in arch/xtensa/boot. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Marco Elver <elver@google.com>
2022-05-01	xtensa: enable HAVE_VIRT_CPU_ACCOUNTING_GEN	Max Filippov	2	-1/+2
	There's no direct cputime_t manipulation in the xtensa arch code, so generic virt CPU accounting may be enabled. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: enable context tracking	Max Filippov	3	-1/+11
	Put user exit context tracking call on the common kernel entry/exit path (function calls are impossible at earlier kernel entry stages because PS.EXCM is not cleared yet). Put user entry context tracking call on the user exit path. Syscalls go through this common code too, so nothing specific needs to be done for them. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: use abi_* register names in the kernel exit code	Max Filippov	1	-40/+42
	Using plain register names is prone to errors when code is changed and new calls are added between the register load and use. Change plain register names to abi_* names in the call-heavy part of the kernel exit code to clearly indicate what's supposed to be preserved and what's not. Re-align code while at it. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: move trace_hardirqs_off call back to entry.S	Max Filippov	2	-15/+15
	Context tracking call must be done after hardirq tracking call, otherwise lockdep_assert_irqs_disabled called from rcu_eqs_exit gives a warning. To avoid context tracking logic duplication for IRQ/exception entry paths move trace_hardirqs_off call back to common entry code. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: drop dead code from entry.S	Max Filippov	1	-18/+0
	KERNEL_STACK_OVERFLOW_CHECK is incomplete and have never been enabled. Remove it. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: noMMU: allow handling protection faults	Max Filippov	4	-11/+27
	Many xtensa CPU cores without full MMU still have memory protection features capable of raising exceptions for invalid instruction fetches/data access. Allow handling such exceptions. This improves behavior of processes that pass invalid memory pointers to syscalls in noMMU configs: in case of exception the kernel instead of killing the process is now able to return -EINVAL from a syscall. Introduce CONFIG_PFAULT that controls whether protection fault code is enabled and register handlers for common memory protection exceptions when it is enabled. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: extract vmalloc_fault code into a function	Max Filippov	1	-53/+54
	Move full MMU-specific code into a separate function to isolate it from more generic do_page_fault code. No functional changes. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: move asid_cache from fault.c to mmu.c	Max Filippov	2	-1/+2
	asid_cache is only useful with full MMU, but fault.c is also useful with MPU. Move asid_cache definition to MMU-specific source file. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: iss: extract and constify network callbacks	Max Filippov	1	-19/+28
	Instead of storing pointers to callback functions in the struct iss_net_private::tp move them to struct struct iss_net_ops and store a const pointer to it. Make static const tuntap_ops structure with tuntap callbacks and initialize tp.net_ops with it in the tuntap_probe. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: iss: clean up per-device locking in network driver	Max Filippov	1	-21/+18
	Per-device locking in the ISS network driver is used to protect poll timer and stats updates. Stat collection is not protected. Remove per-device locking everywhere except the stats updates. Replace ndo_get_stats callback with ndo_get_stats64 and use proper locking there as well. As a side effect this fixes possible deadlock between iss_net_close and iss_net_timer. Reported by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: iss: replace iss_net_set_mac with eth_mac_addr	Max Filippov	1	-14/+1
	iss_net_set_mac is just a copy of eth_mac_addr with pointless locking. Drop this function and replace it with eth_mac_addr. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: iss: drop opened_list logic from the network driver	Max Filippov	1	-39/+14
	opened_list is used to poll all opened devices in the timer callback, but there's individual timer that is associated with each device. Drop opened_list and only poll the device that is associated with the timer in the timer callback. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	xtensa: localize labels used in memmove	Max Filippov	1	-10/+10
	Internal labels in the memmove implementation don't need to be visible, localize them by prefixing their names with '.L'. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-01	Linux 5.18-rc5	Linus Torvalds	1	-1/+1

2022-04-29	Revert "arm: dts: at91: Fix boolean properties with values"	Arnd Bergmann	2	-2/+2
	This reverts commit 0dc23d1a8e17, which caused another regression as the pinctrl code actually expects an integer value of 0 or 1 rather than a simple boolean property. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29	KVM: x86: work around QEMU issue with synthetic CPUID leaves	Paolo Bonzini	1	-5/+14
	Synthesizing AMD leaves up to 0x80000021 caused problems with QEMU, which assumes the host CPUID[0x80000000].EAX is higher or equal to what KVM_GET_SUPPORTED_CPUID reports. This causes QEMU to issue bogus host CPUIDs when preparing the input to KVM_SET_CPUID2. It can even get into an infinite loop, which is only terminated by an abort(): cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e) To work around this, only synthesize those leaves if 0x8000001d exists on the host. The synthetic 0x80000021 leaf is mostly useful on Zen2, which satisfies the condition. Fixes: f144c49e8c39 ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	Revert "x86/mm: Introduce lookup_address_in_mm()"	Sean Christopherson	2	-15/+0
	Drop lookup_address_in_mm() now that KVM is providing it's own variant of lookup_address_in_pgd() that is safe for use with user addresses, e.g. guards against page tables being torn down. A variant that provides a non-init mm is inherently dangerous and flawed, as the only reason to use an mm other than init_mm is to walk a userspace mapping, and lookup_address_in_pgd() does not play nice with userspace mappings, e.g. doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE() to ensure an upper level entry isn't converted to a huge page between checking the PAGE_SIZE bit and grabbing the address of the next level down. This reverts commit 13c72c060f1ba6f4eddd7b1c4f52a8aded43d6d9. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <YmwIi3bXr/1yhYV/@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: fix potential races when walking host page table	Mingwei Zhang	1	-5/+42
	KVM uses lookup_address_in_mm() to detect the hugepage size that the host uses to map a pfn. The function suffers from several issues: - no usage of READ_ONCE(*). This allows multiple dereference of the same page table entry. The TOCTOU problem because of that may cause KVM to incorrectly treat a newly generated leaf entry as a nonleaf one, and dereference the content by using its pfn value. - the information returned does not match what KVM needs; for non-present entries it returns the level at which the walk was terminated, as long as the entry is not 'none'. KVM needs level information of only 'present' entries, otherwise it may regard a non-present PXE entry as a present large page mapping. - the function is not safe for mappings that can be torn down, because it does not disable IRQs and because it returns a PTE pointer which is never safe to dereference after the function returns. So implement the logic for walking host page tables directly in KVM, and stop using lookup_address_in_mm(). Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220429031757.2042406-1-mizhang@google.com> [Inline in host_pfn_mapping_level, ensure no semantic change for its callers. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT	Paolo Bonzini	6	-11/+34
	When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags member that at the time was unused. Unfortunately this extensibility mechanism has several issues: - x86 is not writing the member, so it would not be possible to use it on x86 except for new events - the member is not aligned to 64 bits, so the definition of the uAPI struct is incorrect for 32- on 64-bit userspace. This is a problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately usage of flags was only introduced in 5.18. Since padding has to be introduced, place a new field in there that tells if the flags field is valid. To allow further extensibility, in fact, change flags to an array of 16 values, and store how many of the values are valid. The availability of the new ndata field is tied to a system capability; all architectures are changed to fill in the field. To avoid breaking compilation of userspace that was using the flags field, provide a userspace-only union to overlap flags with data[0]. The new field is placed at the same offset for both 32- and 64-bit userspace. Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220422103013.34832-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR	Sean Christopherson	5	-16/+45
	Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR. The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the guest, possibly with help from userspace, manages to coerce KVM into creating a SPTE for an "impossible" gfn, KVM will leak the associated shadow pages (page tables): WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57 kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm] Modules linked in: kvm_intel kvm irqbypass CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G W 5.18.0-rc1+ #293 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2d0 [kvm] kvm_vm_release+0x1d/0x30 [kvm] __fput+0x82/0x240 task_work_run+0x5b/0x90 exit_to_user_mode_prepare+0xd2/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> On bare metal, encountering an impossible gpa in the page fault path is well and truly impossible, barring CPU bugs, as the CPU will signal #PF during the gva=>gpa translation (or a similar failure when stuffing a physical address into e.g. the VMCS/VMCB). But if KVM is running as a VM itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR of the underlying hardware, in which case the hardware will not fault on the illegal-from-KVM's-perspective gpa. Alternatively, KVM could continue allowing the dodgy behavior and simply zap the max possible range. But, for hosts with MAXPHYADDR < 52, that's a (minor) waste of cycles, and more importantly, KVM can't reasonably support impossible memslots when running on bare metal (or with an accurate MAXPHYADDR as a VM). Note, limiting the overhead by checking if KVM is running as a guest is not a safe option as the host isn't required to announce itself to the guest in any way, e.g. doesn't need to set the HYPERVISOR CPUID bit. A second alternative to disallowing the memslot behavior would be to disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR. That restriction is undesirable as there are legitimate use cases for doing so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous systems so that VMs can be migrated between hosts with different MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess. Note that any guest.MAXPHYADDR is valid with shadow paging, and it is even useful in order to test KVM with MAXPHYADDR=52 (i.e. without any reserved physical address bits). The now common kvm_mmu_max_gfn() is inclusive instead of exclusive. The memslot and TDP MMU code want an exclusive value, but the name implies the returned value is inclusive, and the MMIO path needs an inclusive check. Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 524a1e4e381f ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Ben Gardon <bgardon@google.com> Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220428233416.2446833-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	io_uring: check that data field is 0 in ringfd unregister	Eugene Syromiatnikov	1	-1/+1
	Only allow data field to be 0 in struct io_uring_rsrc_update user arguments to allow for future possible usage. Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29	bfq: Fix warning in bfqq_request_over_limit()	Jan Kara	1	-3/+9
	People are occasionally reporting a warning bfqq_request_over_limit() triggering reporting that BFQ's idea of cgroup hierarchy (and its depth) does not match what generic blkcg code thinks. This can actually happen when bfqq gets moved between BFQ groups while bfqq_request_over_limit() is running. Make sure the code is safe against BFQ queue being moved to a different BFQ group. Fixes: 76f1df88bbc2 ("bfq: Limit number of requests consumed by each cgroup") CC: stable@vger.kernel.org Link: https://lore.kernel.org/all/CAJCQCtTw_2C7ZSz7as5Gvq=OmnDiio=HRkQekqWpKot84sQhFA@mail.gmail.com/ Reported-by: Chris Murphy <lists@colorremedies.com> Reported-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220407140738.9723-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29	x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests	Thomas Gleixner	1	-1/+5
	When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI layer. This can lead to a situation where the PCI/MSI layer masks an MSI[-X] interrupt and the hypervisor grants the write despite the fact that it already requested the interrupt. As a consequence interrupt delivery on the affected device is not happening ever. Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests already. Fixes: 809f9267bbab ("xen: map MSIs into pirqs") Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: Dusty Mabe <dustymabe@redhat.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Noah Meyerhans <noahm@debian.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
2022-04-28	io_uring: fix uninitialized field in rw io_kiocb	Joseph Ravichandran	1	-0/+1
	io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll reads kiocb->private it can contain uninitialized data. Fixes: 3e08773c3841 ("block: switch polling to be bio based") Signed-off-by: Joseph Ravichandran <jravi@mit.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-28	tcp: fix F-RTO may not work correctly when receiving DSACK	Pengcheng Yang	1	-1/+2
	Currently DSACK is regarded as a dupack, which may cause F-RTO to incorrectly enter "loss was real" when receiving DSACK. Packetdrill to demonstrate: // Enable F-RTO and TLP 0 `sysctl -q net.ipv4.tcp_frto=2` 0 `sysctl -q net.ipv4.tcp_early_retrans=3` 0 `sysctl -q net.ipv4.tcp_congestion_control=cubic` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // RTT 10ms, RTO 210ms +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 2 data segments +0 write(4, ..., 2000) = 2000 +0 > P. 1:2001(2000) ack 1 // TLP +.022 > P. 1001:2001(1000) ack 1 // Continue to send 8 data segments +0 write(4, ..., 10000) = 10000 +0 > P. 2001:10001(8000) ack 1 // RTO +.188 > . 1:1001(1000) ack 1 // The original data is acked and new data is sent(F-RTO step 2.b) +0 < . 1:1(0) ack 2001 win 257 +0 > P. 10001:12001(2000) ack 1 // D-SACK caused by TLP is regarded as a dupack, this results in // the incorrect judgment of "loss was real"(F-RTO step 3.a) +.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop> // Never-retransmitted data(3001:4001) are acked and // expect to switch to open state(F-RTO step 3.b) +0 < . 1:1(0) ack 4001 win 257 +0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }% Fixes: e33099f96d99 ("tcp: implement RFC5682 F-RTO") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28	Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"	Dany Madden	2	-100/+35
	This reverts commit 723ad916134784b317b72f3f6cf0f7ba774e5dae When client requests channel or ring size larger than what the server can support the server will cap the request to the supported max. So, the client would not be able to successfully request resources that exceed the server limit. Fixes: 723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits") Signed-off-by: Dany Madden <drt@linux.ibm.com> Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28	net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK	Vladimir Oltean	1	-4/+0
	The Time-Specified Departure feature is indeed mutually exclusive with TX IP checksumming in ENETC, but TX checksumming in itself is broken and was removed from this driver in commit 82728b91f124 ("enetc: Remove Tx checksumming offload code"). The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply with software TSO's expectations, and still did the checksumming in software by calling skb_checksum_help(). So there isn't any restriction for the Time-Specified Departure feature. However, enetc_setup_tc_txtime() doesn't understand that, and blindly looks for NETIF_F_CSUM_MASK. Instead of checking for things which can literally never happen in the current code base, just remove the check and let the driver offload tc-etf qdiscs. Fixes: acede3c5dad5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220427203017.1291634-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28	ixgbe: ensure IPsec VF<->PF compatibility	Leon Romanovsky	1	-1/+2
	The VF driver can forward any IPsec flags and such makes the function is not extendable and prone to backward/forward incompatibility. If new software runs on VF, it won't know that PF configured something completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag. Fixes: eda0333ac293 ("ixgbe: add VF IPsec management") Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shannon Nelson <snelson@pensando.io> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>