Age | Commit message (Collapse) | Author | Files | Lines |
|
The bug is not easily reproducable, as it may occur very infrequently
(we had machines with 20minutes heavy downloading before it occurred)
However, on a virual machine (VMWare on Windows 10 host) it occurred
pretty frequently (1-2 seconds after a speedtest was started)
dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback
before it is set.
This causes the following problems:
- double free of the skb or potential memory leak
- in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually
general protection fault
Example dmesg output:
[ 134.841986] ------------[ cut here ]------------
[ 134.841987] recvmsg bug: copied 9C24A555 seq 9C24B557 rcvnxt 9C25A6B3 fl 0
[ 134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0
[ 134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[ 134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu
[ 134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0
[ 134.842048] RSP: 0018:ffffa6630422bcc8 EFLAGS: 00010286
[ 134.842049] RAX: 0000000000000000 RBX: ffff997616f4f200 RCX: 0000000000000006
[ 134.842049] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9976257d6490
[ 134.842050] RBP: ffffa6630422bd98 R08: 0000000000000001 R09: 000000000004bba4
[ 134.842050] R10: 0000000001e00c6f R11: 000000000004bba4 R12: ffff99760dee3000
[ 134.842051] R13: 0000000000000000 R14: ffff99760dee3514 R15: 0000000000000000
[ 134.842051] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
[ 134.842052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 134.842053] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
[ 134.842055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 134.842055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 134.842057] Call Trace:
[ 134.842060] ? aa_sk_perm+0x53/0x1a0
[ 134.842064] inet_recvmsg+0x51/0xc0
[ 134.842066] sock_recvmsg+0x43/0x50
[ 134.842070] SYSC_recvfrom+0xe4/0x160
[ 134.842072] ? __schedule+0x3de/0x8b0
[ 134.842075] ? ktime_get_ts64+0x4c/0xf0
[ 134.842079] SyS_recvfrom+0xe/0x10
[ 134.842082] do_syscall_64+0x73/0x130
[ 134.842086] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 134.842086] RIP: 0033:0x7fe331f5a81d
[ 134.842088] RSP: 002b:00007ffe8da98398 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
[ 134.842090] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007fe331f5a81d
[ 134.842094] RDX: 00000000000003fb RSI: 0000000001e00874 RDI: 0000000000000003
[ 134.842095] RBP: 00007fe32f642c70 R08: 0000000000000000 R09: 0000000000000000
[ 134.842097] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe332347698
[ 134.842099] R13: 0000000001b7e0a0 R14: 0000000001e00874 R15: 0000000000000000
[ 134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00
[ 134.842126] ---[ end trace b7138fc08c83147f ]---
[ 134.842144] general protection fault: 0000 [#1] SMP PTI
[ 134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[ 134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu
[ 134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 134.842164] RIP: 0010:tcp_close+0x2c6/0x440
[ 134.842165] RSP: 0018:ffffa6630422bde8 EFLAGS: 00010202
[ 134.842167] RAX: 0000000000000000 RBX: ffff99760dee3000 RCX: 0000000180400034
[ 134.842168] RDX: 5c4afd407207a6c4 RSI: ffffe868495bd300 RDI: ffff997616f4f200
[ 134.842169] RBP: ffffa6630422be08 R08: 0000000016f4d401 R09: 0000000180400034
[ 134.842169] R10: ffffa6630422bd98 R11: 0000000000000000 R12: 000000000000600c
[ 134.842170] R13: 0000000000000000 R14: ffff99760dee30c8 R15: ffff9975bd44fe00
[ 134.842171] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
[ 134.842173] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 134.842174] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
[ 134.842177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 134.842178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 134.842179] Call Trace:
[ 134.842181] inet_release+0x42/0x70
[ 134.842183] __sock_release+0x42/0xb0
[ 134.842184] sock_close+0x15/0x20
[ 134.842187] __fput+0xea/0x220
[ 134.842189] ____fput+0xe/0x10
[ 134.842191] task_work_run+0x8a/0xb0
[ 134.842193] exit_to_usermode_loop+0xc4/0xd0
[ 134.842195] do_syscall_64+0xf4/0x130
[ 134.842197] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 134.842197] RIP: 0033:0x7fe331f5a560
[ 134.842198] RSP: 002b:00007ffe8da982e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 134.842200] RAX: 0000000000000000 RBX: 00007fe32f642c70 RCX: 00007fe331f5a560
[ 134.842201] RDX: 00000000008f5320 RSI: 0000000001cd4b50 RDI: 0000000000000003
[ 134.842202] RBP: 00007fe32f6500f8 R08: 000000000000003c R09: 00000000009343c0
[ 134.842203] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe32f6500d0
[ 134.842204] R13: 00000000008f5320 R14: 00000000008f5320 R15: 0000000001cd4770
[ 134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80
[ 134.842226] RIP: tcp_close+0x2c6/0x440 RSP: ffffa6630422bde8
[ 134.842227] ---[ end trace b7138fc08c831480 ]---
The proposed patch eliminates a potential racing condition.
Before, usb_submit_urb was called and _after_ that, the skb was attached
(dev->tx_skb). So, on a callback it was possible, however unlikely that the
skb was freed before it was set. That way (because dev->tx_skb was not set
to NULL after it was freed), it could happen that a skb from a earlier
transmission was freed a second time (and the skb we should have freed did
not get freed at all)
Now we free the skb directly in ipheth_tx(). It is not passed to the
callback anymore, eliminating the posibility of a double free of the same
skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any()
respectively dev_consume_skb_any() to free the skb.
Signed-off-by: Oliver Zweigle <Oliver.Zweigle@faro.com>
Signed-off-by: Bernd Eckstein <3ernd.Eckstein@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-11-27
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix several bugs in BPF sparc JIT, that is, convergence for fused
branches, initialization of frame pointer register, and moving all
arguments into output registers from input registers in prologue
to fix BPF to BPF calls, from David.
2) Fix a bug in arm64 JIT for fetching BPF to BPF call addresses where
they are not guaranteed to fit into imm field and therefore must be
retrieved through prog aux data, from Daniel.
3) Explicitly add all JITs to MAINTAINERS file with developers able to
help out in feature development, fixes, review, etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, we only called indirect_branch_prediction_barrier on the
logical CPU that freed a vmcb. This function should be called on all
logical CPUs that last loaded the vmcb in question.
Fixes: 15d45071523d ("KVM/x86: Add IBPB support")
Reported-by: Neel Natu <neelnatu@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When a guest page table is updated via an emulated write,
kvm_mmu_pte_write() is called to update the shadow PTE using the just
written guest PTE value. But if two emulated guest PTE writes happened
concurrently, it is possible that the guest PTE and the shadow PTE end
up being out of sync. Emulated writes do not mark the shadow page as
unsync-ed, so this inconsistency will not be resolved even by a guest TLB
flush (unless the page was marked as unsync-ed at some other point).
This is fixed by re-reading the current value of the guest PTE after the
MMU lock has been acquired instead of just using the value that was
written prior to calling kvm_mmu_pte_write().
Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
vmcs12 represents the per-CPU cache of L1 active vmcs12.
This cache can be loaded by one of the following:
1) Guest making a vmcs12 active by exeucting VMPTRLD
2) Guest specifying eVMCS in VP assist page and executing
VMLAUNCH/VMRESUME.
Either way, vmcs12 should have revision_id of VMCS12_REVISION.
Which is not equal to eVMCS revision_id which specifies used
VersionNumber of eVMCS struct (e.g. KVM_EVMCS_VERSION).
Specifically, this causes an issue in restoring a nested VM state
because vmx_set_nested_state() verifies that vmcs12->revision_id
is equal to VMCS12_REVISION which was not true in case vmcs12
was populated from an eVMCS by vmx_get_nested_state() which calls
copy_enlightened_to_vmcs12().
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
According to TLFS section 16.11.2 Enlightened VMCS, the first u32
field of eVMCS should specify eVMCS VersionNumber.
This version should be in the range of supported eVMCS versions exposed
to guest via CPUID.0x4000000A.EAX[0:15].
The range which KVM expose to guest in this CPUID field should be the
same as the value returned in vmcs_version by nested_enable_evmcs().
According to the above, eVMCS VMPTRLD should verify that version specified
in given eVMCS is in the supported range. However, current code
mistakenly verfies this field against VMCS12_REVISION.
One can also see that when KVM use eVMCS, it makes sure that
alloc_vmcs_cpu() sets allocated eVMCS revision_id to KVM_EVMCS_VERSION.
Obvious fix should just change eVMCS VMPTRLD to verify first u32 field
of eVMCS is equal to KVM_EVMCS_VERSION.
However, it turns out that Microsoft Hyper-V fails to comply to their
own invented interface: When Hyper-V use eVMCS, it just sets first u32
field of eVMCS to revision_id specified in MSR_IA32_VMX_BASIC (In our
case: VMCS12_REVISION). Instead of used eVMCS version number which is
one of the supported versions specified in CPUID.0x4000000A.EAX[0:15].
To overcome Hyper-V bug, we accept either a supported eVMCS version
or VMCS12_REVISION as valid values for first u32 field of eVMCS.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Since commit e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to
represent the running guest"), vcpu->arch.tsc_offset meaning was
changed to always reflect the tsc_offset value set on active VMCS.
Regardless if vCPU is currently running L1 or L2.
However, above mentioned commit failed to also change
kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly.
This is because vmx_write_tsc_offset() could set the tsc_offset value
in active VMCS to given offset parameter *plus vmcs12->tsc_offset*.
However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset
to given offset parameter. Without taking into account the possible
addition of vmcs12->tsc_offset. (Same is true for SVM case).
Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return
actually set tsc_offset in active VMCS and modify
kvm_vcpu_write_tsc_offset() to set returned value in
vcpu->arch.tsc_offset.
In addition, rename write_tsc_offset() callback to write_l1_tsc_offset()
to make it clear that it is meant to set L1 TSC offset.
Fixes: e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Leonid Shatz <leonid.shatz@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The inline keyword which is not at the beginning of the function
declaration may trigger the following build warnings, so let's fix it:
arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We get the following warnings about empty statements when building
with 'W=1':
arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
Rework the debug helper macro to get rid of these warnings.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When guest transitions from/to long-mode by modifying MSR_EFER.LMA,
the list of shared MSRs to be saved/restored on guest<->host
transitions is updated (See vmx_set_efer() call to setup_msrs()).
On every entry to guest, vcpu_enter_guest() calls
vmx_prepare_switch_to_guest(). This function should also take care
of setting the shared MSRs to be saved/restored. However, the
function does nothing in case we are already running with loaded
guest state (vmx->loaded_cpu_state != NULL).
This means that even when guest modifies MSR_EFER.LMA which results
in updating the list of shared MSRs, it isn't being taken into account
by vmx_prepare_switch_to_guest() because it happens while we are
running with loaded guest state.
To fix above mentioned issue, add a flag to mark that the list of
shared MSRs has been updated and modify vmx_prepare_switch_to_guest()
to set shared MSRs when running with host state *OR* list of shared
MSRs has been updated.
Note that this issue was mistakenly introduced by commit
678e315e78a7 ("KVM: vmx: add dedicated utility to access guest's
kernel_gs_base") because previously vmx_set_efer() always called
vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to
set shared MSRs.
Fixes: 678e315e78a7 ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base")
Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_pv_clock_pairing() allocates local var
"struct kvm_clock_pairing clock_pairing" on stack and initializes
all it's fields besides padding (clock_pairing.pad[]).
Because clock_pairing var is written completely (including padding)
to guest memory, failure to init struct padding results in kernel
info-leak.
Fix the issue by making sure to also init the padding with zeroes.
Fixes: 55dd00a73a51 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Consider the case that userspace enables KVM_CAP_HYPERV_ENLIGHTENED_VMCS twice:
1) kvm_vcpu_ioctl_enable_cap() is called to enable
KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
2) nested_enable_evmcs() sets enlightened_vmcs_enabled to true and fills
vmcs_version which is then copied to userspace.
3) kvm_vcpu_ioctl_enable_cap() is called again to enable
KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
4) This time nested_enable_evmcs() just returns 0 as
enlightened_vmcs_enabled is already true. *Without filling
vmcs_version*.
5) kvm_vcpu_ioctl_enable_cap() continues as usual and copies
*uninitialized* vmcs_version to userspace which leads to kernel info-leak.
Fix this issue by simply changing nested_enable_evmcs() to always fill
vmcs_version output argument. Even when enlightened_vmcs_enabled is
already set to true.
Note that SVM's nested_enable_evmcs() should not be modified because it
always returns a non-zero value (-ENODEV) which results in
kvm_vcpu_ioctl_enable_cap() skipping the copy of vmcs_version to
userspace (as it should).
Fixes: 57b119da3594 ("KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability")
Reported-by: syzbot+cfbc368e283d381f8cef@syzkaller.appspotmail.com
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There is a race condition when accessing kvm->arch.apic_access_page_done.
Due to it, x86_set_memory_region will fail when creating the second vcpu
for a svm guest.
Add a mutex_lock to serialize the accesses to apic_access_page_done.
This lock is also used by vmx for the same purpose.
Signed-off-by: Wei Wang <wawei@amazon.de>
Signed-off-by: Amadeusz Juskowiak <ajusk@amazon.de>
Signed-off-by: Julian Stecklina <jsteckli@amazon.de>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reported by syzkaller:
BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16
RIP: 0010:__lock_acquire+0x1a6/0x1990
Call Trace:
lock_acquire+0xdb/0x210
_raw_spin_lock+0x38/0x70
kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
vcpu_enter_guest+0x167e/0x1910 [kvm]
kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
do_vfs_ioctl+0xa5/0x690
ksys_ioctl+0x6d/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x83/0x6e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
However, irqchip is not initialized by this simple testcase, ioapic/apic
objects should not be accessed.
This can be triggered by the following program:
#define _GNU_SOURCE
#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
long res = 0;
memcpy((void*)0x20000040, "/dev/kvm", 9);
res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
if (res != -1)
r[0] = res;
res = syscall(__NR_ioctl, r[0], 0xae01, 0);
if (res != -1)
r[1] = res;
res = syscall(__NR_ioctl, r[1], 0xae41, 0);
if (res != -1)
r[2] = res;
memcpy(
(void*)0x20000080,
"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
106);
syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
syscall(__NR_ioctl, r[2], 0xae80, 0);
return 0;
}
This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
kernel.
Reported-by: Wei Wu <ww9210@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wei Wu <ww9210@gmail.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reported by syzkaller:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 2567 Comm: poc Tainted: G OE 4.19.0-rc5 #16
RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
Call Trace:
kvm_emulate_hypercall+0x3cc/0x700 [kvm]
handle_vmcall+0xe/0x10 [kvm_intel]
vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
vcpu_enter_guest+0x9fb/0x1910 [kvm]
kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
do_vfs_ioctl+0xa5/0x690
ksys_ioctl+0x6d/0x80
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x83/0x6e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that the apic map has not yet been initialized, the testcase
triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
is dereferenced. This patch fixes it by checking whether or not apic map is
NULL and bailing out immediately if that is the case.
Fixes: 4180bf1b65 (KVM: X86: Implement "send IPI" hypercall)
Reported-by: Wei Wu <ww9210@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wei Wu <ww9210@gmail.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Apparently, the ple_gap parameter was accidentally removed
by commit c8e88717cfc6b36bedea22368d97667446318291. Add it
back.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: stable@vger.kernel.org
Fixes: c8e88717cfc6b36bedea22368d97667446318291
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move all arguments into output registers from input registers.
This path is exercised by test_verifier.c's "calls: two calls with
args" test. Adjust BPF_TAILCALL_PROLOGUE_SKIP as needed.
Let's also make the prologue length a constant size regardless of
the combination of ->saw_frame_pointer and ->saw_tail_call
settings.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Layout of coprocessor registers in the elf_xtregs_t and
xtregs_coprocessor_t may be different due to alignment. Thus it is not
always possible to copy data between the xtregs_coprocessor_t structure
and the elf_xtregs_t and get correct values for all registers.
Use a table of offsets and sizes of individual coprocessor register
groups to do coprocessor context copying in the ptrace_getxregs and
ptrace_setxregs.
This fixes incorrect coprocessor register values reading from the user
process by the native gdb on an xtensa core with multiple coprocessors
and registers with high alignment requirements.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Coprocessor context offsets are used by the assembly code that moves
coprocessor context between the individual fields of the
thread_info::xtregs_cp structure and coprocessor registers.
This fixes coprocessor context clobbering on flushing and reloading
during normal user code execution and user process debugging in the
presence of more than one coprocessor in the core configuration.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
coprocessor_flush_all may be called from a context of a thread that is
different from the thread being flushed. In that case contents of the
cpenable special register may not match ti->cpenable of the target
thread, resulting in unhandled coprocessor exception in the kernel
context.
Set cpenable special register to the ti->cpenable of the target register
for the duration of the flush and restore it afterwards.
This fixes the following crash caused by coprocessor register inspection
in native gdb:
(gdb) p/x $w0
Illegal instruction in kernel: sig: 9 [#1] PREEMPT
Call Trace:
___might_sleep+0x184/0x1a4
__might_sleep+0x41/0xac
exit_signals+0x14/0x218
do_exit+0xc9/0x8b8
die+0x99/0xa0
do_illegal_instruction+0x18/0x6c
common_exception+0x77/0x77
coprocessor_flush+0x16/0x3c
arch_ptrace+0x46c/0x674
sys_ptrace+0x2ce/0x3b4
system_call+0x54/0x80
common_exception+0x77/0x77
note: gdb[100] exited with preempt_count 1
Killed
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
The numa_slit variable used by node_distance is available to a
module as long as it is linked at compile-time. However, it is
not available to loadable modules. Leading to errors such as:
ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!
The error above is caused by the nvme multipath code that makes
use of node_distance for its path calculation. When the patch was
added, the lightnvm subsystem would select nvme and always compile
it in, leading to the node_distance call to always succeed.
However, when this requirement was removed, nvme could be compiled
in as a module, which exposed this bug.
This patch extracts node_distance to a function and exports it.
Since ACPI is depending on node_distance being a simple lookup to
numa_slit, the previous behavior is exposed as slit_distance and its
users updated.
Fixes: f333444708f82 "nvme: take node locality into account when selecting a path"
Fixes: 73569e11032f "lightnvm: remove dependencies on BLK_DEV_NVME and PCI"
Signed-off-by: Matias Bjøring <mb@lightnvm.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Make the high-level BPF JIT entry a general 'catch-all' and add
architecture specific entries to make it more clear who actively
maintains which BPF JIT compiler. The list (L) address implies
that this eventually lands in the bpf patchwork bucket. Goal is
that this set of responsible developers listed here is always up
to date and a point of contact for helping out in e.g. feature
development, fixes, review or testing patches in order to help
long-term in ensuring quality of the BPF JITs and therefore BPF
core under a given architecture. Every new JIT in future /must/
have an entry here as well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Paul Burton <paul.burton@mips.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We need to initialize the frame pointer register not just if it is
seen as a source operand, but also if it is seen as the destination
operand of a store or an atomic instruction (which effectively is a
source operand).
This is exercised by test_verifier's "non-invalid fp arithmetic"
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
On T4 and later sparc64 cpus we can use the fused compare and branch
instruction.
However, it can only be used if the branch destination is in the range
of a signed 10-bit immediate offset. This amounts to 1024
instructions forwards or backwards.
After the commit referenced in the Fixes: tag, the largest possible
size program seen by the JIT explodes by a significant factor.
As a result of this convergance takes many more passes since the
expanded "BPF_LDX | BPF_MSH | BPF_B" code sequence, for example,
contains several embedded branch on condition instructions.
On each pass, as suddenly new fused compare and branch instances
become valid, this makes thousands more in range for the next pass.
And so on and so forth.
This is most greatly exemplified by "BPF_MAXINSNS: exec all MSH" which
takes 35 passes to converge, and shrinks the image by about 64K.
To decrease the cost of this number of convergance passes, do the
convergance pass before we have the program image allocated, just like
other JITs (such as x86) do.
Fixes: e0cea7ce988c ("bpf: implement ld_abs/ld_ind in native bpf")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Daniel Borkmann says:
====================
This set contains a fix for arm64 BPF JIT. First patch generalizes
ppc64 way of retrieving subprog into bpf_jit_get_func_addr() as core
code and uses the same on arm64 in second patch. Tested on both arm64
and ppc64.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF
to BPF call offset can be too far away from core kernel in that relative
encoding into imm is not sufficient and could potentially be truncated,
see also fd045f6cd98e ("arm64: add support for module PLTs") which adds
spill-over space for module_alloc() and therefore bpf_jit_binary_alloc().
Therefore, use the recently added bpf_jit_get_func_addr() helper for
properly fetching the address through prog->aux->func[off]->bpf_func
instead. This also has the benefit to optimize normal helper calls since
their address can use the optimized emission. Tested on Cavium ThunderX
CN8890.
Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make fetching of the BPF call address from ppc64 JIT generic. ppc64
was using a slightly different variant rather than through the insns'
imm field encoding as the target address would not fit into that space.
Therefore, the target subprog number was encoded into the insns' offset
and fetched through fp->aux->func[off]->bpf_func instead. Given there
are other JITs with this issue and the mechanism of fetching the address
is JIT-generic, move it into the core as a helper instead. On the JIT
side, we get information on whether the retrieved address is a fixed
one, that is, not changing through JIT passes, or a dynamic one. For
the former, JITs can optimize their imm emission because this doesn't
change jump offsets throughout JIT process.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
All lists that reach the tree_nodes_free() function have both zero
counter and true dead flag. The reason for this is that lists to be
release are selected by nf_conncount_gc_list() which already decrements
the list counter and sets on the dead flag. Therefore, this if statement
in tree_nodes_free() is unnecessary and wrong.
Fixes: 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in conn_free")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
There is a reference counter to ensure that masquerade modules register
notifiers only once. However, the existing reference counter approach is
not safe, test commands are:
while :
do
modprobe ip6t_MASQUERADE &
modprobe nft_masq_ipv6 &
modprobe -rv ip6t_MASQUERADE &
modprobe -rv nft_masq_ipv6 &
done
numbers below represent the reference counter.
--------------------------------------------------------
CPU0 CPU1 CPU2 CPU3 CPU4
[insmod] [insmod] [rmmod] [rmmod] [insmod]
--------------------------------------------------------
0->1
register 1->2
returns 2->1
returns 1->0
0->1
register <--
unregister
--------------------------------------------------------
The unregistation of CPU3 should be processed before the
registration of CPU4.
In order to fix this, use a mutex instead of reference counter.
splat looks like:
[ 323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
[ 323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
[ 323.869574] irq event stamp: 194074
[ 323.898930] hardirqs last enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
[ 323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 323.898930] softirqs last enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
[ 323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
[ 323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
[ 323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
[ 323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
[ 323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
[ 323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
[ 323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
[ 323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
[ 323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
[ 323.898930] FS: 00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
[ 323.898930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
[ 323.898930] Call Trace:
[ 323.898930] ? atomic_notifier_chain_register+0x2d0/0x2d0
[ 323.898930] ? down_read+0x150/0x150
[ 323.898930] ? sched_clock_cpu+0x126/0x170
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 323.898930] register_netdevice_notifier+0xbb/0x790
[ 323.898930] ? __dev_close_many+0x2d0/0x2d0
[ 323.898930] ? __mutex_unlock_slowpath+0x17f/0x740
[ 323.898930] ? wait_for_completion+0x710/0x710
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 323.898930] ? up_write+0x6c/0x210
[ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 324.127073] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[ 324.127073] nft_chain_filter_init+0x1e/0xe8a [nf_tables]
[ 324.127073] nf_tables_module_init+0x37/0x92 [nf_tables]
[ ... ]
Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
register_{netdevice/inetaddr/inet6addr}_notifier may return an error
value, this patch adds the code to handle these error paths.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When ip6_route_me_harder is invoked, it resets outgoing interface of:
- link-local scoped packets sent by neighbor discovery
- multicast packets sent by MLD host
- multicast packets send by MLD proxy daemon that sets outgoing
interface through IPV6_PKTINFO ipi6_ifindex
Link-local and multicast packets must keep their original oif after
ip6_route_me_harder is called.
Signed-off-by: Alin Nastac <alin.nastac@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull hwmon fixes from Guenter Roeck:
- fix temp4_type attribute permissions in w83795 driver
- fix tacho fault detection in mlxreg-fan driver
- fix current value calculations in ina2xx driver
- fix initial notification/warning in raspberrypi driver
- fix a NULL pointer access in ina2xx
* tag 'hwmon-for-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (w83795) temp4_type has writable permission
hwmon: (mlxreg-fan) Fix macros for tacho fault reading
hwmon: (ina2xx) Fix current value calculation
hwmon: (raspberrypi) Fix initial notify
hwmon (ina2xx) Fix NULL id pointer in probe()
|
|
syzbot was able to trigger the WARN in cttimeout_default_get() by
passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use
same timeout values.
Furthermore, also fetch GRE timeouts. GRE is a bit more complicated,
as it still can be a module and its netns_proto_gre struct layout isn't
visible outside of the gre module. Can't move timeouts around, it
appears conntrack sysctl unregister assumes net_generic() returns
nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
A followup nf-next patch could make gre tracker be built-in as well
if needed, its not that large.
Last, make the WARN() mention the missing protocol value in case
anything else is missing.
Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ip_vs_dst_event is supposed to clean up all dst used in ipvs'
destinations when a net dev is going down. But it works only
when the dst's dev is the same as the dev from the event.
Now with the same priority but late registration,
ip_vs_dst_notifier is always called later than ipv6_dev_notf
where the dst's dev is set to lo for NETDEV_DOWN event.
As the dst's dev lo is not the same as the dev from the event
in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work.
Also as these dst have to wait for dest_trash_timer to clean
them up. It would cause some non-permanent kernel warnings:
unregister_netdevice: waiting for br0 to become free. Usage count = 3
To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf
by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5.
Note that for ipv4 route fib_netdev_notifier doesn't set dst's
dev to lo in NETDEV_DOWN event, so this fix is only needed when
IP_VS_IPV6 is defined.
Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-11-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix an off-by-one bug when adjusting subprog start offsets after
patching, from Edward.
2) Fix several bugs such as overflow in size allocation in queue /
stack map creation, from Alexei.
3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp
helper, from Andrey.
4) Fix several bugs in bpftool such as preventing an infinite loop
in get_fdinfo, error handling and man page references, from Quentin.
5) Fix a warning in bpf_trace_printk() that wasn't catching an
invalid format string, from Martynas.
6) Fix a bug in BPF cgroup local storage where non-atomic allocation
was used in atomic context, from Roman.
7) Fix a NULL pointer dereference bug in bpftool from reallocarray()
error handling, from Jakub and Wen.
8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools
include infrastructure so that bpftool compiles on older RHEL7-like
user space which does not ship these headers, from Yonghong.
9) Fix BPF kselftests for user space where to get ping test working
with ping6 and ping -6, from Li.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
PPC KVM fixes for 4.20
This has a single 1-line patch which fixes a bug in the recently-merged
nested HV KVM support.
|
|
Pull dma-mapping fixes from Christoph Hellwig:
"Two dma-direct / swiotlb regressions fixes:
- zero is a valid physical address on some arm boards, we can't use
it as the error value
- don't try to cache flush the error return value (no matter what it
is)"
* tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: Skip cache maintenance on map error
dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
|
|
Pull NFS client bugfixes from Trond Myklebust:
- Fix a NFSv4 state manager deadlock when returning a delegation
- NFSv4.2 copy do not allocate memory under the lock
- flexfiles: Use the correct stateid for IO in the tightly coupled case
* tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
flexfiles: use per-mirror specified stateid for IO
NFSv4.2 copy do not allocate memory under the lock
NFSv4: Fix a NFSv4 state manager deadlock
|
|
I'm taking over the maintainance of Sparse so add myself as
maintainer and move Christopher's info to CREDITS.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull XArray updates from Matthew Wilcox:
"We found some bugs in the DAX conversion to XArray (and one bug which
predated the XArray conversion). There were a couple of bugs in some
of the higher-level functions, which aren't actually being called in
today's kernel, but surfaced as a result of converting existing radix
tree & IDR users over to the XArray.
Some of the other changes to how the higher-level APIs work were also
motivated by converting various users; again, they're not in use in
today's kernel, so changing them has a low probability of introducing
a bug.
Dan can still trigger a bug in the DAX code with hot-offline/online,
and we're working on tracking that down"
* tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax:
XArray tests: Add missing locking
dax: Avoid losing wakeup in dax_lock_mapping_entry
dax: Fix huge page faults
dax: Fix dax_unlock_mapping_entry for PMD pages
dax: Reinstate RCU protection of inode
dax: Make sure the unlocking entry isn't locked
dax: Remove optimisation from dax_lock_mapping_entry
XArray tests: Correct some 64-bit assumptions
XArray: Correct xa_store_range
XArray: Fix Documentation
XArray: Handle NULL pointers differently for allocation
XArray: Unify xa_store and __xa_store
XArray: Add xa_store_bh() and xa_store_irq()
XArray: Turn xa_erase into an exported function
XArray: Unify xa_cmpxchg and __xa_cmpxchg
XArray: Regularise xa_reserve
nilfs2: Use xa_erase_irq
XArray: Export __xa_foo to non-GPL modules
XArray: Fix xa_for_each with a single element at 0
|
|
In ip packet generation, pagedlen is initialized for each skb at the
start of the loop in __ip(6)_append_data, before label alloc_new_skb.
Depending on compiler options, code can be generated that jumps to
this label, triggering use of an an uninitialized variable.
In practice, at -O2, the generated code moves the initialization below
the label. But the code should not rely on that for correctness.
Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a qdisc setup including pacing FQ is dismantled and recreated,
some TCP packets are sent earlier than instructed by TCP stack.
TCP can be fooled when ACK comes back, because the following
operation can return a negative value.
tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
Some paths in TCP stack were not dealing properly with this,
this patch addresses four of them.
Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull HID fixes from Jiri Kosina:
- revert of the high-resolution scrolling feature, as it breaks certain
hardware due to incompatibilities between Logitech and Microsoft
worlds. Peter Hutterer is working on a fixed implementation. Until
that is finished, revert by Benjamin Tissoires.
- revert of incorrect strncpy->strlcpy conversion in uhid, from David
Herrmann
- fix for buggy sendfile() implementation on uhid device node, from
Eric Biggers
- a few assorted device-ID specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
Revert "Input: Add the `REL_WHEEL_HI_RES` event code"
Revert "HID: input: Create a utility class for counting scroll events"
Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling acceleration""
Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"
Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"
Revert "HID: logitech: fix a used uninitialized GCC warning"
Revert "HID: input: simplify/fix high-res scroll event handling"
HID: Add quirk for Primax PIXART OEM mice
HID: i2c-hid: Disable runtime PM for LG touchscreen
HID: multitouch: Add pointstick support for Cirque Touchpad
HID: steam: remove input device when a hid client is running.
Revert "HID: uhid: use strlcpy() instead of strncpy()"
HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
HID: input: Ignore battery reported by Symbol DS4308
HID: Add quirk for Microsoft PIXART OEM mouse
|
|
Pull arm64 fixes from Catalin Marinas::
- Fix wrong conflict resolution around CONFIG_ARM64_SSBD
- Fix sparse warning on unsigned long constant
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
arm64: sysreg: fix sparse warnings
|
|
Pull networking fixes from David Miller:
1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.
2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.
3) Fix socket wmem accounting in SCTP, from Xin Long.
4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.
5) qed driver passes bytes instead of bits into second arg of
bitmap_weight(). From Denis Bolotin.
6) Fix reset deadlock in ibmvnic, from Juliet Kim.
7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
Machata.
8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
compression during this period, from Eric Dumazet.
9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.
10) Don't leave dangling error pointer if bpf_prog_add() fails in
thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
headers, set sq->tso_hdrs to NULL.
11) Fix race condition over state variables in act_police, from Davide
Caratti.
12) Disable guest csum in the presence of XDP in virtio_net, from Jason
Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
net: gemini: Fix copy/paste error
net: phy: mscc: fix deadlock in vsc85xx_default_config
dt-bindings: dsa: Fix typo in "probed"
net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
net: amd: add missing of_node_put()
team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
virtio-net: fail XDP set if guest csum is negotiated
virtio-net: disable guest csum during XDP set
net/sched: act_police: add missing spinlock initialization
net: don't keep lonely packets forever in the gro hash
net/ipv6: re-do dad when interface has IFF_NOARP flag change
packet: copy user buffers before orphan or clone
ibmvnic: Update driver queues after change in ring size support
ibmvnic: Fix RX queue buffer cleanup
net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
net/dim: Update DIM start sample after each DIM iteration
net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
net/smc: use after free fix in smc_wr_tx_put_slot()
net/smc: atomic SMCD cursor handling
net/smc: add SMC-D shutdown signal
...
|
|
Pull xfs fixes from Darrick Wong:
"Dave and I have continued our work fixing corruption problems that can
be found when running long-term burn-in exercisers on xfs. Here are
some patches fixing most of the problems, but there will likely be
more. :/
- Numerous corruption fixes for copy on write
- Numerous corruption fixes for blocksize < pagesize writes
- Don't miscalculate AG reservations for small final AGs
- Fix page cache truncation to work properly for reflink and extent
shifting
- Fix use-after-free when retrying failed inode/dquot buffer logging
- Fix corruptions seen when using copy_file_range in directio mode"
* tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: readpages doesn't zero page tail beyond EOF
vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
iomap: dio data corruption and spurious errors when pipes fill
iomap: sub-block dio needs to zeroout beyond EOF
iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
xfs: delalloc -> unwritten COW fork allocation can go wrong
xfs: flush removing page cache in xfs_reflink_remap_prep
xfs: extent shifting doesn't fully invalidate page cache
xfs: finobt AG reserves don't consider last AG can be a runt
xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
xfs: uncached buffer tracing needs to print bno
xfs: make xfs_file_remap_range() static
xfs: fix shared extent data corruption due to missing cow reservation
|
|
The TX stats should be started with the tx_stats_syncp,
there seems to be a copy/paste error in the driver.
Signed-off-by: Andreas Fiedler <andreas.fiedler@gmx.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The vsc85xx_default_config function called in the vsc85xx_config_init
function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs
mistakenly calls phy_read and phy_write in-between phy_select_page and
phy_restore_page.
phy_select_page and phy_restore_page actually take and release the MDIO
bus lock and phy_write and phy_read take and release the lock to write
or read to a PHY register.
Let's fix this deadlock by using phy_modify_paged which handles
correctly a read followed by a write in a non-standard page.
Fixes: 6a0bfbbe20b0 ("net: phy: mscc: migrate to phy_select/restore_page functions")
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The correct form is "can be probed", so fix the typo.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|