aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2021-08-13KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernelsSean Christopherson1-27/+1
Remove an ancient restriction that disallowed exposing EFER.NX to the guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU. The motivation of the check, added by commit 2cc51560aed0 ("KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run the guest with the host's EFER.NX and thus avoid context switching EFER if the only divergence was the NX bit. Fast forward to today, and KVM has long since stopped running the guest with the host's EFER.NX. Not only does KVM context switch EFER if host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 && guest.EFER.NX=1 when using shadow paging (to emulate SMEP). Furthermore, the entire motivation for the restriction was made obsolete over a decade ago when Intel added dedicated host and guest EFER fields in the VMCS (Nehalem timeframe), which reduced the overhead of context switching EFER from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles. In practice, the removed restriction only affects non-PAE 32-bit kernels, as EFER.NX is set during boot if NX is supported and the kernel will use PAE paging (32-bit or 64-bit), regardless of whether or not the kernel will actually use NX itself (mark PTEs non-executable). Alternatively and/or complementarily, startup_32_smp() in head_32.S could be modified to set EFER.NX=1 regardless of paging mode, thus eliminating the scenario where NX is supported but not enabled. However, that runs the risk of breaking non-KVM non-PAE kernels (though the risk is very, very low as there are no known EFER.NX errata), and also eliminates an easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER across nested virtualization transitions. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210805183804.1221554-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-10KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulationSean Christopherson1-1/+1
Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to effectively get the controls for the current VMCS, as opposed to using vmx->secondary_exec_controls, which is the cached value of KVM's desired controls for vmcs01 and truly not reflective of any particular VMCS. While the waitpkg control is not dynamic, i.e. vmcs01 will always hold the same waitpkg configuration as vmx->secondary_exec_controls, the same does not hold true for vmcs02 if the L1 VMM hides the feature from L2. If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL, L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP. Fixes: 6e3ba4abcea5 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-05KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit buildsSean Christopherson1-1/+1
Take a signed 'long' instead of an 'unsigned long' for the number of pages to add/subtract to the total number of pages used by the MMU. This fixes a zero-extension bug on 32-bit kernels that effectively corrupts the per-cpu counter used by the shrinker. Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus an unsigned 32-bit value on 32-bit kernels. As a result, the value used to adjust the per-cpu counter is zero-extended (unsigned -> signed), not sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to 4294967295 and effectively corrupts the counter. This was found by a staggering amount of sheer dumb luck when running kvm-unit-tests on a 32-bit KVM build. The shrinker just happened to kick in while running tests and do_shrink_slab() logged an error about trying to free a negative number of objects. The truly lucky part is that the kernel just happened to be a slightly stale build, as the shrinker no longer yells about negative objects as of commit 18bb473e5031 ("mm: vmscan: shrink deferred objects proportional to priority"). vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460 Fixes: bc8a3d8925a8 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210804214609.1096003-1-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-04KVM: selftests: fix hyperv_clock testMaxim Levitsky1-1/+1
The test was mistakenly using addr_gpa2hva on a gva and that happened to work accidentally. Commit 106a2e766eae ("KVM: selftests: Lower the min virtual address for misc page allocations") revealed this bug. Fixes: 2c7f76b4c42b ("selftests: kvm: Add basic Hyper-V clocksources tests", 2021-03-18) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210804112057.409498-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-04KVM: SVM: improve the code readability for ASID managementMingwei Zhang1-19/+24
KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped because it is never used by VM. Thus, in existing code, ASID value and its bitmap postion always has an 'offset-by-1' relationship. Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range [min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately. Existing code mixes the usage of ASID value and its bitmap position by using the same variable called 'min_asid'. Fix the min_asid usage: ensure that its usage is consistent with its name; allocate extra size for ASID 0 to ensure that each ASID has the same value with its bitmap position. Add comments on ASID bitmap allocation to clarify the size change. Signed-off-by: Mingwei Zhang <mizhang@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Marc Orr <marcorr@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Alper Gun <alpergun@google.com> Cc: Dionna Glaze <dionnaglaze@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vipin Sharma <vipinsh@google.com> Cc: Peter Gonda <pgonda@google.com> Cc: Joerg Roedel <joro@8bytes.org> Message-Id: <20210802180903.159381-1-mizhang@google.com> [Fix up sev_asid_free to also index by ASID, as suggested by Sean Christopherson, and use nr_asids in sev_cpu_init. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-04KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCBSean Christopherson1-1/+1
Use the raw ASID, not ASID-1, when nullifying the last used VMCB when freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by the raw ASID, thus KVM could get a false negative when checking for a different VMCB if KVM manages to reallocate the same ASID+VMCB combo for a new VM. Note, this cannot cause a functional issue _in the current code_, as pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is guaranteed to mismatch on the first VMRUN. However, prior to commit 8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the last_cpu variable. Thus it's theoretically possible that older versions of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID and VMCB exactly match those of a prior VM. Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-04KVM: Do not leak memory for duplicate debugfs directoriesPaolo Bonzini1-2/+16
KVM creates a debugfs directory for each VM in order to store statistics about the virtual machine. The directory name is built from the process pid and a VM fd. While generally unique, it is possible to keep a file descriptor alive in a way that causes duplicate directories, which manifests as these messages: [ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present! Even though this should not happen in practice, it is more or less expected in the case of KVM for testcases that call KVM_CREATE_VM and close the resulting file descriptor repeatedly and in parallel. When this happens, debugfs_create_dir() returns an error but kvm_create_vm_debugfs() goes on to allocate stat data structs which are later leaked. The slow memory leak was spotted by syzkaller, where it caused OOM reports. Since the issue only affects debugfs, do a lookup before calling debugfs_create_dir, so that the message is downgraded and rate-limited. While at it, ensure kvm->debugfs_dentry is NULL rather than an error if it is not created. This fixes kvm_destroy_vm_debugfs, which was not checking IS_ERR_OR_NULL correctly. Cc: stable@vger.kernel.org Fixes: 536a6f88c49d ("KVM: Create debugfs dir and stat files for each VM") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03KVM: selftests: Test access to XMM fast hypercallsVitaly Kuznetsov2-4/+42
Check that #UD is raised if bit 16 is clear in HYPERV_CPUID_FEATURES.EDX and an 'XMM fast' hypercall is issued. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03KVM: x86: hyper-v: Check if guest is allowed to use XMM registers for hypercall inputVitaly Kuznetsov1-2/+11
TLFS states that "Availability of the XMM fast hypercall interface is indicated via the “Hypervisor Feature Identification” CPUID Leaf (0x40000003, see section 2.4.4) ... Any attempt to use this interface when the hypervisor does not indicate availability will result in a #UD fault." Implement the check for 'strict' mode (KVM_CAP_HYPERV_ENFORCE_CPUID). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03KVM: x86: Introduce trace_kvm_hv_hypercall_done()Vitaly Kuznetsov2-0/+16
Hypercall failures are unusual with potentially far going consequences so it would be useful to see their results when tracing. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03KVM: x86: hyper-v: Check access to hypercall before reading XMM registersVitaly Kuznetsov1-3/+3
In case guest doesn't have access to the particular hypercall we can avoid reading XMM registers. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-30KVM: x86: accept userspace interrupt only if no event is injectedPaolo Bonzini1-2/+11
Once an exception has been injected, any side effects related to the exception (such as setting CR2 or DR6) have been taked place. Therefore, once KVM sets the VM-entry interruption information field or the AMD EVENTINJ field, the next VM-entry must deliver that exception. Pending interrupts are processed after injected exceptions, so in theory it would not be a problem to use KVM_INTERRUPT when an injected exception is present. However, DOSEMU is using run->ready_for_interrupt_injection to detect interrupt windows and then using KVM_SET_SREGS/KVM_SET_REGS to inject the interrupt manually. For this to work, the interrupt window must be delayed after the completion of the previous event injection. Cc: stable@vger.kernel.org Reported-by: Stas Sergeev <stsp2@yandex.ru> Tested-by: Stas Sergeev <stsp2@yandex.ru> Fixes: 71cc849b7093 ("KVM: x86: Fix split-irqchip vs interrupt injection window request") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: add missing compat KVM_CLEAR_DIRTY_LOGPaolo Bonzini1-0/+28
The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer, therefore it needs a compat ioctl implementation. Otherwise, 32-bit userspace fails to invoke it on 64-bit kernels; for x86 it might work fine by chance if the padding is zero, but not on big-endian architectures. Reported-by: Thomas Sattler Cc: stable@vger.kernel.org Fixes: 2a31b9db1535 ("kvm: introduce manual dirty log reprotect") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: use cpu_relax when halt pollingLi RongQing1-0/+1
SMT siblings share caches and other hardware, and busy halt polling will degrade its sibling performance if its sibling is working Sean Christopherson suggested as below: "Rather than disallowing halt-polling entirely, on x86 it should be sufficient to simply have the hardware thread yield to its sibling(s) via PAUSE. It probably won't get back all performance, but I would expect it to be close. This compiles on all KVM architectures, and AFAICT the intended usage of cpu_relax() is identical for all architectures." Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Message-Id: <20210727111247.55510-1-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrlMaxim Levitsky1-1/+1
Currently when SVM is enabled in guest CPUID, AVIC is inhibited as soon as the guest CPUID is set. AVIC happens to be fully disabled on all vCPUs by the time any guest entry starts (if after migration the entry can be nested). The reason is that currently we disable avic right away on vCPU from which the kvm_request_apicv_update was called and for this case, it happens to be called on all vCPUs (by svm_vcpu_after_set_cpuid). After we stop doing this, AVIC will end up being disabled only when KVM_REQ_APICV_UPDATE is processed which is after we done switching to the nested guest. Fix this by just using vmcb01 in svm_refresh_apicv_exec_ctrl for avic (which is a right thing to do anyway). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: tweak warning about enabled AVIC on nested entryMaxim Levitsky1-1/+1
It is possible that AVIC was requested to be disabled but not yet disabled, e.g if the nested entry is done right after svm_vcpu_after_set_cpuid. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivatedMaxim Levitsky1-2/+5
It is possible for AVIC inhibit and AVIC active state to be mismatched. Currently we disable AVIC right away on vCPU which started the AVIC inhibit request thus this warning doesn't trigger but at least in theory, if svm_set_vintr is called at the same time on multiple vCPUs, the warning can happen. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: s390: restore old debugfs namesChristian Borntraeger3-27/+27
commit bc9e9e672df9 ("KVM: debugfs: Reuse binary stats descriptors") did replace the old definitions with the binary ones. While doing that it missed that some files are names different than the counters. This is especially important for kvm_stat which does have special handling for counters named instruction_*. Fixes: commit bc9e9e672df9 ("KVM: debugfs: Reuse binary stats descriptors") CC: Jing Zhang <jingzhangos@google.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20210726150108.5603-1-borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initializedPaolo Bonzini2-3/+3
Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect dereference of vmcb->control.reserved_sw before the vmcb is checked for being non-NULL. The compiler is usually sinking the dereference after the check; instead of doing this ourselves in the source, ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called with a non-NULL VMCB. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Vineeth Pillai <viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Untested for now due to issues with my AMD machine. - Paolo]
2021-07-27KVM: selftests: Introduce access_tracking_perf_testDavid Matlack3-0/+431
This test measures the performance effects of KVM's access tracking. Access tracking is driven by the MMU notifiers test_young, clear_young, and clear_flush_young. These notifiers do not have a direct userspace API, however the clear_young notifier can be triggered by marking a pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages that mechanism to enable access tracking on guest memory. To measure performance this test runs a VM with a configurable number of vCPUs that each touch every page in disjoint regions of memory. Performance is measured in the time it takes all vCPUs to finish touching their predefined region. Example invocation: $ ./access_tracking_perf_test -v 8 Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages guest physical test memory offset: 0xffdfffff000 Populating memory : 1.337752570s Writing to populated memory : 0.010177640s Reading from populated memory : 0.009548239s Mark memory idle : 23.973131748s Writing to idle memory : 0.063584496s Mark memory idle : 24.924652964s Reading from idle memory : 0.062042814s Breaking down the results: * "Populating memory": The time it takes for all vCPUs to perform the first write to every page in their region. * "Writing to populated memory" / "Reading from populated memory": The time it takes for all vCPUs to write and read to every page in their region after it has been populated. This serves as a control for the later results. * "Mark memory idle": The time it takes for every vCPU to mark every page in their region as idle through page_idle. * "Writing to idle memory" / "Reading from idle memory": The time it takes for all vCPUs to write and read to every page in their region after it has been marked idle. This test should be portable across architectures but it is only enabled for x86_64 since that's all I have tested. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210713220957.3493520-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: selftests: Fix missing break in dirty_log_perf_test arg parsingDavid Matlack1-0/+1
There is a missing break statement which causes a fallthrough to the next statement where optarg will be null and a segmentation fault will be generated. Fixes: 9e965bb75aae ("KVM: selftests: Add backing src parameter to dirty_log_perf_test") Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210713220957.3493520-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27x86/kvm: fix vcpu-id indexed array sizesJuergen Gross2-3/+3
KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1 elements. Note that this is currently no real problem, as KVM_MAX_VCPU_ID is an odd number, resulting in always enough padding being available at the end of those arrays. Nevertheless this should be fixed in order to avoid rare problems in case someone is using an even number for KVM_MAX_VCPU_ID. Signed-off-by: Juergen Gross <jgross@suse.com> Message-Id: <20210701154105.23215-2-jgross@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK accessVitaly Kuznetsov1-2/+2
MSR_KVM_ASYNC_PF_ACK MSR is part of interrupt based asynchronous page fault interface and not the original (deprecated) KVM_FEATURE_ASYNC_PF. This is stated in Documentation/virt/kvm/msr.rst. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20210722123018.260035-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26docs: virt: kvm: api.rst: replace some charactersMauro Carvalho Chehab1-14/+14
The conversion tools used during DocBook/LaTeX/html/Markdown->ReST conversion and some cut-and-pasted text contain some characters that aren't easily reachable on standard keyboards and/or could cause troubles when parsed by the documentation build system. Replace the occurences of the following characters: - U+00a0 (' '): NO-BREAK SPACE as it can cause lines being truncated on PDF output Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Message-Id: <ff70cb42d63f3a1da66af1b21b8d038418ed5189.1626947264.git.mchehab+huawei@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26KVM: Documentation: Fix KVM_CAP_ENFORCE_PV_FEATURE_CPUID nameVitaly Kuznetsov1-1/+1
'KVM_CAP_ENFORCE_PV_CPUID' doesn't match the define in include/uapi/linux/kvm.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210722092628.236474-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state()Vitaly Kuznetsov3-13/+13
Make svm_copy_vmrun_state()/svm_copy_vmloadsave_state() interface match 'memcpy(dest, src)' to avoid any confusion. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210719090322.625277-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state()Vitaly Kuznetsov3-5/+6
To match svm_copy_vmrun_state(), rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state(). Opportunistically add missing braces to 'else' branch in vmload_vmsave_interception(). No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210716144104.465269-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-18Linux 5.14-rc2Linus Torvalds1-1/+1
2021-07-18Documentation: Fix intiramfs script nameRobert Richter2-5/+5
Documentation was not changed when renaming the script in commit 80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh"). Fixing this. Basically does: $ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh) Fixes: 80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh") Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-07-18Kbuild: lto: fix module versionings mismatch in GNU make 3.XLecopzer Chen1-1/+1
When building modules(CONFIG_...=m), I found some of module versions are incorrect and set to 0. This can be found in build log for first clean build which shows WARNING: EXPORT symbol "XXXX" [drivers/XXX/XXX.ko] version generation failed, symbol will not be versioned. But in second build(incremental build), the WARNING disappeared and the module version becomes valid CRC and make someone who want to change modules without updating kernel image can't insert their modules. The problematic code is + $(foreach n, $(filter-out FORCE,$^), \ + $(if $(wildcard $(n).symversions), \ + ; cat $(n).symversions >> $@.symversions)) For example: rm -f fs/notify/built-in.a.symversions ; rm -f fs/notify/built-in.a; \ llvm-ar cDPrST fs/notify/built-in.a fs/notify/fsnotify.o \ fs/notify/notification.o fs/notify/group.o ... `foreach n` shows nothing to `cat` into $(n).symversions because `if $(wildcard $(n).symversions)` return nothing, but actually they do exist during this line was executed. -rw-r--r-- 1 root root 168580 Jun 13 19:10 fs/notify/fsnotify.o -rw-r--r-- 1 root root 111 Jun 13 19:10 fs/notify/fsnotify.o.symversions The reason is the $(n).symversions are generated at runtime, but Makefile wildcard function expends and checks the file exist or not during parsing the Makefile. Thus fix this by use `test` shell command to check the file existence in runtime. Rebase from both: 1. [https://lore.kernel.org/lkml/20210616080252.32046-1-lecopzer.chen@mediatek.com/] 2. [https://lore.kernel.org/lkml/20210702032943.7865-1-lecopzer.chen@mediatek.com/] Fixes: 38e891849003 ("kbuild: lto: fix module versioning") Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-07-18kbuild: do not suppress Kconfig prompts for silent buildMasahiro Yamada1-4/+5
When a new CONFIG option is available, Kbuild shows a prompt to get the user input. $ make [ snip ] Core Scheduling for SMT (SCHED_CORE) [N/y/?] (NEW) This is the only interactive place in the build process. Commit 174a1dcc9642 ("kbuild: sink stdout from cmd for silent build") suppressed Kconfig prompts as well because syncconfig is invoked by the 'cmd' macro. You cannot notice the fact that Kconfig is waiting for the user input. Use 'kecho' to show the equivalent short log without suppressing stdout from sub-make. Fixes: 174a1dcc9642 ("kbuild: sink stdout from cmd for silent build") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2021-07-18scripts/setlocalversion: fix a bug when LOCALVERSION is emptyMikulas Patocka1-5/+8
The commit 042da426f8eb ("scripts/setlocalversion: simplify the short version part") reduces indentation. Unfortunately, it also changes behavior in a subtle way - if the user has empty "LOCALVERSION" variable, the plus sign is appended to the kernel version. It wasn't appended before. This patch reverts to the old behavior - we append the plus sign only if the LOCALVERSION variable is not set. Fixes: 042da426f8eb ("scripts/setlocalversion: simplify the short version part") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-07-18perf sched: Fix record failure when CONFIG_SCHEDSTATS is not setYang Jihong1-4/+29
The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events. As a result, the command fails. Before: #perf sched record sleep 1 event syntax error: 'sched:sched_stat_wait' \___ unknown tracepoint Error: File /sys/kernel/tracing/events/sched/sched_stat_wait not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events Solution: Check whether schedstat tracepoints are exposed. If no, these events are not recorded. After: # perf sched record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ] # perf sched report run measurement overhead: 4736 nsecs sleep measurement overhead: 9059979 nsecs the run test took 999854 nsecs the sleep test took 8945271 nsecs nr_run_events: 716 nr_sleep_events: 785 nr_wakeup_events: 0 ... ------------------------------------------------------------ Fixes: 2a09b5de235a6 ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yafang Shao <laoar.shao@gmail.com> Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-18perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernelYang Jihong6-42/+38
The "address" member of "struct probe_trace_point" uses long data type. If kernel is 64-bit and perf program is 32-bit, size of "address" variable is 32 bits. As a result, upper 32 bits of address read from kernel are truncated, an error occurs during address comparison in kprobe_warn_out_range(). Before: # perf probe -a schedule schedule is out of .text, skip it. Error: Failed to add events. Solution: Change data type of "address" variable to u64 and change corresponding address printing and value assignment. After: # perf.new.new probe -a schedule Added new event: probe:schedule (on schedule) You can now use it in all perf tools, such as: perf record -e probe:schedule -aR sleep 1 # perf probe -l probe:schedule (on schedule@kernel/sched/core.c) # perf record -e probe:schedule -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ] # perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'probe:schedule' # Event count (approx.): 1366 # # Overhead Command Shared Object Symbol # ........ ............... ................. ............ # 6.22% migration/0 [kernel.kallsyms] [k] schedule 6.22% migration/1 [kernel.kallsyms] [k] schedule 6.22% migration/2 [kernel.kallsyms] [k] schedule 6.22% migration/3 [kernel.kallsyms] [k] schedule 6.15% migration/10 [kernel.kallsyms] [k] schedule 6.15% migration/11 [kernel.kallsyms] [k] schedule 6.15% migration/12 [kernel.kallsyms] [k] schedule 6.15% migration/13 [kernel.kallsyms] [k] schedule 6.15% migration/14 [kernel.kallsyms] [k] schedule 6.15% migration/15 [kernel.kallsyms] [k] schedule 6.15% migration/4 [kernel.kallsyms] [k] schedule 6.15% migration/5 [kernel.kallsyms] [k] schedule 6.15% migration/6 [kernel.kallsyms] [k] schedule 6.15% migration/7 [kernel.kallsyms] [k] schedule 6.15% migration/8 [kernel.kallsyms] [k] schedule 6.15% migration/9 [kernel.kallsyms] [k] schedule 0.22% rcu_sched [kernel.kallsyms] [k] schedule ... # # (Cannot load tips.txt file, please install perf!) # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jianlin Lv <jianlin.lv@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-18perf data: Close all files in close_dir()Riccardo Mancini1-1/+1
When using 'perf report' in directory mode, the first file is not closed on exit, causing a memory leak. The problem is caused by the iterating variable never reaching 0. Fixes: 145520631130bd64 ("perf data: Add perf_data__(create_dir|close_dir) functions") Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zhen Lei <thunder.leizhen@huawei.com> Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-18perf probe-file: Delete namelist in del_events() on the error pathRiccardo Mancini1-2/+2
ASan reports some memory leaks when running: # perf test "42: BPF filter" This second leak is caused by a strlist not being dellocated on error inside probe_file__del_events. This patch adds a goto label before the deallocation and makes the error path jump to it. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Fixes: e7895e422e4da63d ("perf probe: Split del_perf_probe_events()") Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-17Revert "mm/slub: use stackdepot to save stack trace in objects"Linus Torvalds2-50/+30
This reverts commit 788691464c29455346dc613a3b43c2fb9e5757a4. It's not clear why, but it causes unexplained problems in entirely unrelated xfs code. The most likely explanation is some slab corruption, possibly triggered due to CONFIG_SLUB_DEBUG_ON. See [1]. It ends up having a few other problems too, like build errors on arch/arc, and Geert reporting it using much more memory on m68k [3] (it probably does so elsewhere too, but it is probably just more noticeable on m68k). The architecture issues (both build and memory use) are likely just because this change effectively force-enabled STACKDEPOT (along with a very bad default value for the stackdepot hash size). But together with the xfs issue, this all smells like "this commit was not ready" to me. Link: https://lore.kernel.org/linux-xfs/YPE3l82acwgI2OiV@infradead.org/ [1] Link: https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/ [2] Link: https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ [3] Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-16ARM: dts: versatile: Fix up interrupt controller node namesSudeep Holla2-4/+3
Once the new schema interrupt-controller/arm,vic.yaml is added, we get the below warnings: arch/arm/boot/dts/versatile-ab.dt.yaml: intc@10140000: $nodename:0: 'intc@10140000' does not match '^interrupt-controller(@[0-9a-f,]+)*$' arch/arm/boot/dts/versatile-ab.dt.yaml: intc@10140000: 'clear-mask' does not match any of the regexes Fix the node names for the interrupt controller to conform to the standard node name interrupt-controller@.. Also drop invalid clear-mask property. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20210701132118.759454-1-sudeep.holla@arm.com' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16ARM: multi_v7_defconfig: Make NOP_USB_XCEIV driver built-inStefan Wahren1-1/+1
The usage of usb-nop-xceiv PHY on Raspberry Pi boards with BCM283x has been a "regression source" a lot of times. The last case is breakage of USB mass storage boot has been commit e590474768f1 ("driver core: Set fw_devlink=on by default") for multi_v7_defconfig. As long as NOP_USB_XCEIV is configured as module, the dwc2 USB driver defer probing endlessly and prevent booting from USB mass storage device. So make the driver built-in as in bcm2835_defconfig and arm64/defconfig. Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default") Reported-by: Ojaswin Mujoo <ojaswin98@gmail.com> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://lore.kernel.org/r/1625915095-23077-1-git-send-email-stefan.wahren@i2se.com' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16ARM: configs: Update u8500_defconfigLinus Walleij1-0/+5
The platform lost the framebuffer due to a commit solving a circular dependency in v5.14-rc1, so add it back in by explicitly selecting the framebuffer. The U8500 has also gained a few systems using touchscreens from Cypress, Melfas and Zinitix so add these at the same time as we're updating the defconfig anyway. Fixes: f611b1e7624c ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: phone-devel@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Stephan Gerhold <stephan@gerhold.net> Cc: newbyte@disroot.org Link: https://lore.kernel.org/r/20210712085522.672482-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16ARM: configs: Update Vexpress defconfigLinus Walleij1-10/+7
This updates the Versatile Express defconfig for the changes in the v5.14-rc1 kernel: - The Framebuffer CONFIG_FB needs to be explicitly selected or we don't get any framebuffer anymore. DRM has stopped to select FB because of circular dependency. - CONFIG_CMA options were moved around. - CONFIG_MODULES options were moved around. - CONFIG_CRYPTO_HW was moved around. Fixes: f611b1e7624c ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20210713133708.94397-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16ARM: configs: Update Versatile defconfigLinus Walleij1-3/+1
This updates the Versatile defconfig for the changes in the v5.14-rc1 kernel: - The Framebuffer CONFIG_FB needs to be explicitly selected or we don't get any framebuffer anymore. DRM has stopped to select FB because of circular dependency. - The CONFIG_FB_MODE_HELPERS are not needed when using DRM framebuffer emulation as DRM does. - The Acorn fonts are removed, the default framebuffer font works fine. I don't know why this was selected in the first place or how the Kconfig was altered so it was removed. Fixes: f611b1e7624c ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210714081819.139210-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16ARM: configs: Update RealView defconfigLinus Walleij1-3/+1
This updates the RealView defconfig for the changes in the v5.14-rc1 kernel: - The Framebuffer CONFIG_FB needs to be explicitly selected or we don't get any framebuffer anymore. DRM has stopped to select FB because of circular dependency. - The CONFIG_FB_MODE_HELPERS are not needed when using DRM framebuffer emulation as DRM does. - Drop two unused penguin logos. Fixes: f611b1e7624c ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210714090040.182381-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16ARM: configs: Update Integrator defconfigLinus Walleij1-4/+1
This updates the Integrator defconfig for the changes in the v5.14-rc1 kernel: - The Framebuffer CONFIG_FB needs to be explicitly selected or we don't get any framebuffer anymore. DRM has stopped to select FB because of circular dependency. - Drop the unused Matrox FB drivers that are only used with specific PCI cards. Fixes: f611b1e7624c ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210714122703.212609-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16arm: Typo s/PCI_IXP4XX_LEGACY/IXP4XX_PCI_LEGACY/Geert Uytterhoeven1-1/+1
Kconfig symbol PCI_IXP4XX_LEGACY does not exist, but IXP4XX_PCI_LEGACY does. Fixes: d5d9f7ac58ea1041 ("ARM/ixp4xx: Make NEED_MACH_IO_H optional") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/82ce37c617293521f095a945a255456b9512769c.1626255077.git.geert+renesas@glider.be' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-07-16dt-bindings: display: renesas,du: Make resets optional on R-Car H1Geert Uytterhoeven1-1/+0
The "resets" property is not present on R-Car Gen1 SoCs. Supporting it would require migrating from renesas,cpg-clocks to renesas,cpg-mssr. Reflect this in the DT bindings by removing the global "required: resets". All SoCs that do have "resets" properties already have SoC-specific rules making it required. Fixes: 99d66127fad25ebb ("dt-bindings: display: renesas,du: Convert binding to YAML") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/98575791b154d80347d5b78132c1d53f5315ee62.1626257936.git.geert+renesas@glider.be Signed-off-by: Rob Herring <robh@kernel.org>
2021-07-16perf test bpf: Free obj_bufRiccardo Mancini1-0/+2
ASan reports some memory leaks when running: # perf test "42: BPF filter" The first of these leaks is caused by obj_buf never being deallocated in __test__bpf. This patch adds the missing free. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Fixes: ba1fae431e74bb42 ("perf test: Add 'perf test BPF'") Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lore.kernel.org/lkml/60f3ca935fe6672e7e866276ce6264c9e26e4c87.1626343282.git.rickyman7@gmail.com [ Added missing stdlib.h include ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-16cifs: do not share tcp sessions of dfs connectionsPaulo Alcantara2-3/+38
Make sure that we do not share tcp sessions of dfs mounts when mounting regular shares that connect to same server. DFS connections rely on a single instance of tcp in order to do failover properly in cifs_reconnect(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-16zonefs: remove redundant null bio checkXianting Tian1-3/+0
bio_alloc() with __GFP_DIRECT_RECLAIM, which is included in GFP_NOFS, never fails, see comments in bio_alloc_bioset(). Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2021-07-15Revert "Makefile: Enable -Wimplicit-fallthrough for Clang"Linus Torvalds1-3/+6
This reverts commit b7eb335e26a9c7f258c96b3962c283c379d3ede0. It turns out that the problem with the clang -Wimplicit-fallthrough warning is not about the kernel source code, but about clang itself, and that the warning is unusable until clang fixes its broken ways. In particular, when you enable this warning for clang, you not only get warnings about implicit fallthroughs. You also get this: warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough] which is completely broken becasue it (a) doesn't even tell you where the problem is (seriously: no line numbers, no filename, no nothing). (b) is fundamentally broken anyway, because there are perfectly valid reasons to have a fallthrough statement even if it turns out that it can perhaps not be reached. In the kernel, an example of that second case is code in the scheduler: switch (state) { case cpuset: if (IS_ENABLED(CONFIG_CPUSETS)) { cpuset_cpus_allowed_fallback(p); state = possible; break; } fallthrough; case possible: where if CONFIG_CPUSETS is enabled you actually never hit the fallthrough case at all. But that in no way makes the fallthrough wrong. So the warning is completely broken, and enabling it for clang is a very bad idea. In the meantime, we can keep the gcc option enabled, and make the gcc build use -Wimplicit-fallthrough=5 which means that we will at least continue to require a proper fallthrough statement, and that gcc won't silently accept the magic comment versions. Because gcc does this all correctly, and while the odd "=5" part is kind of obscure, it's documented in [1]: "-Wimplicit-fallthrough=5 doesn’t recognize any comments as fallthrough comments, only attributes disable the warning" so if clang ever fixes its bad behavior we can try enabling it there again. Link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html [1] Cc: Kees Cook <keescook@chromium.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>