aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-11-26x86/insn: Add some more Intel instructions to the opcode mapAdrian Hunter1-12/+32
Add to the opcode map the following instructions: v4fmaddps v4fmaddss v4fnmaddps v4fnmaddss vaesdec vaesdeclast vaesenc vaesenclast vcvtne2ps2bf16 vcvtneps2bf16 vdpbf16ps gf2p8affineinvqb vgf2p8affineinvqb gf2p8affineqb vgf2p8affineqb gf2p8mulb vgf2p8mulb vp2intersectd vp2intersectq vp4dpwssd vp4dpwssds vpclmulqdq vpcompressb vpcompressw vpdpbusd vpdpbusds vpdpwssd vpdpwssds vpexpandb vpexpandw vpopcntb vpopcntd vpopcntq vpopcntw vpshldd vpshldq vpshldvd vpshldvq vpshldvw vpshldw vpshrdd vpshrdq vpshrdvd vpshrdvq vpshrdvw vpshrdw vpshufbitqmb For information about the instructions, refer Intel SDM May 2019 (325462-070US) and Intel Architecture Instruction Set Extensions May 2019 (319433-037). The instruction decoding can be tested using the perf tools' "x86 instruction decoder - new instructions" test e.g. $ perf test -v "new " 2>&1 | grep -i 'v4fmaddps' Decoded ok: 62 f2 7f 48 9a 20 v4fmaddps (%eax),%zmm0,%zmm4 Decoded ok: 62 f2 7f 48 9a a4 c8 78 56 34 12 v4fmaddps 0x12345678(%eax,%ecx,8),%zmm0,%zmm4 Decoded ok: 62 f2 7f 48 9a 20 v4fmaddps (%rax),%zmm0,%zmm4 Decoded ok: 67 62 f2 7f 48 9a 20 v4fmaddps (%eax),%zmm0,%zmm4 Decoded ok: 62 f2 7f 48 9a a4 c8 78 56 34 12 v4fmaddps 0x12345678(%rax,%rcx,8),%zmm0,%zmm4 Decoded ok: 67 62 f2 7f 48 9a a4 c8 78 56 34 12 v4fmaddps 0x12345678(%eax,%ecx,8),%zmm0,%zmm4 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191125125044.31879-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo63-857/+5365
To pick up BPF changes we'll need. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds7-111/+730
Pull networking updates from David Miller: "Another merge window, another pull full of stuff: 1) Support alternative names for network devices, from Jiri Pirko. 2) Introduce per-netns netdev notifiers, also from Jiri Pirko. 3) Support MSG_PEEK in vsock/virtio, from Matias Ezequiel Vara Larsen. 4) Allow compiling out the TLS TOE code, from Jakub Kicinski. 5) Add several new tracepoints to the kTLS code, also from Jakub. 6) Support set channels ethtool callback in ena driver, from Sameeh Jubran. 7) New SCTP events SCTP_ADDR_ADDED, SCTP_ADDR_REMOVED, SCTP_ADDR_MADE_PRIM, and SCTP_SEND_FAILED_EVENT. From Xin Long. 8) Add XDP support to mvneta driver, from Lorenzo Bianconi. 9) Lots of netfilter hw offload fixes, cleanups and enhancements, from Pablo Neira Ayuso. 10) PTP support for aquantia chips, from Egor Pomozov. 11) Add UDP segmentation offload support to igb, ixgbe, and i40e. From Josh Hunt. 12) Add smart nagle to tipc, from Jon Maloy. 13) Support L2 field rewrite by TC offloads in bnxt_en, from Venkat Duvvuru. 14) Add a flow mask cache to OVS, from Tonghao Zhang. 15) Add XDP support to ice driver, from Maciej Fijalkowski. 16) Add AF_XDP support to ice driver, from Krzysztof Kazimierczak. 17) Support UDP GSO offload in atlantic driver, from Igor Russkikh. 18) Support it in stmmac driver too, from Jose Abreu. 19) Support TIPC encryption and auth, from Tuong Lien. 20) Introduce BPF trampolines, from Alexei Starovoitov. 21) Make page_pool API more numa friendly, from Saeed Mahameed. 22) Introduce route hints to ipv4 and ipv6, from Paolo Abeni. 23) Add UDP segmentation offload to cxgb4, Rahul Lakkireddy" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1857 commits) libbpf: Fix usage of u32 in userspace code mm: Implement no-MMU variant of vmalloc_user_node_flags slip: Fix use-after-free Read in slip_open net: dsa: sja1105: fix sja1105_parse_rgmii_delays() macvlan: schedule bc_work even if error enetc: add support Credit Based Shaper(CBS) for hardware offload net: phy: add helpers phy_(un)lock_mdio_bus mdio_bus: don't use managed reset-controller ax88179_178a: add ethtool_op_get_ts_info() mlxsw: spectrum_router: Fix use of uninitialized adjacency index mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels bpf: Simplify __bpf_arch_text_poke poke type handling bpf: Introduce BPF_TRACE_x helper for the tracing tests bpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT bpf, testing: Add various tail call test cases bpf, x86: Emit patchable direct jump as tail call bpf: Constant map key tracking for prog array pokes bpf: Add poke dependency tracking for prog array maps bpf: Add initial poke descriptor table for jit images bpf: Move owner type, jited info into array auxiliary data ...
2019-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds7-162/+3192
Pull crypto updates from Herbert Xu: "API: - Add library interfaces of certain crypto algorithms for WireGuard - Remove the obsolete ablkcipher and blkcipher interfaces - Move add_early_randomness() out of rng_mutex Algorithms: - Add blake2b shash algorithm - Add blake2s shash algorithm - Add curve25519 kpp algorithm - Implement 4 way interleave in arm64/gcm-ce - Implement ciphertext stealing in powerpc/spe-xts - Add Eric Biggers's scalar accelerated ChaCha code for ARM - Add accelerated 32r2 code from Zinc for MIPS - Add OpenSSL/CRYPTOGRAMS poly1305 implementation for ARM and MIPS Drivers: - Fix entropy reading failures in ks-sa - Add support for sam9x60 in atmel - Add crypto accelerator for amlogic GXL - Add sun8i-ce Crypto Engine - Add sun8i-ss cryptographic offloader - Add a host of algorithms to inside-secure - Add NPCM RNG driver - add HiSilicon HPRE accelerator - Add HiSilicon TRNG driver" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (285 commits) crypto: vmx - Avoid weird build failures crypto: lib/chacha20poly1305 - use chacha20_crypt() crypto: x86/chacha - only unregister algorithms if registered crypto: chacha_generic - remove unnecessary setkey() functions crypto: amlogic - enable working on big endian kernel crypto: sun8i-ce - enable working on big endian crypto: mips/chacha - select CRYPTO_SKCIPHER, not CRYPTO_BLKCIPHER hwrng: ks-sa - Enable COMPILE_TEST crypto: essiv - remove redundant null pointer check before kfree crypto: atmel-aes - Change data type for "lastc" buffer crypto: atmel-tdes - Set the IV after {en,de}crypt crypto: sun4i-ss - fix big endian issues crypto: sun4i-ss - hide the Invalid keylen message crypto: sun4i-ss - use crypto_ahash_digestsize crypto: sun4i-ss - remove dependency on not 64BIT crypto: sun4i-ss - Fix 64-bit size_t warnings on sun4i-ss-hash.c MAINTAINERS: Add maintainer for HiSilicon SEC V2 driver crypto: hisilicon - add DebugFS for HiSilicon SEC Documentation: add DebugFS doc for HiSilicon SEC crypto: hisilicon - add SRIOV for HiSilicon SEC ...
2019-11-25Merge tag 'livepatching-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatchingLinus Torvalds1-4/+0
Pull livepatching updates from Petr Mladek: - New API to track system state changes done be livepatch callbacks. It helps to maintain compatibility between livepatches. - Update Kconfig help text. ORC is another reliable unwinder. - Disable generic selftest timeout. Livepatch selftests have their own per-operation fine-grained timeouts. * tag 'livepatching-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: x86/stacktrace: update kconfig help text for reliable unwinders livepatch: Selftests of the API for tracking system state changes livepatch: Documentation of the new API for tracking system state changes livepatch: Allow to distinguish different version of system state changes livepatch: Basic API to track system state changes livepatch: Keep replaced patches until post_patch callback is called selftests/livepatch: Disable the timeout
2019-11-25Merge tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printkLinus Torvalds14-65/+57
Pull printk updates from Petr Mladek: - Allow to print symbolic error names via new %pe modifier. - Use pr_warn() instead of the remaining pr_warning() calls. Fix formatting of the related lines. - Add VSPRINTF entry to MAINTAINERS. * tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits) checkpatch: don't warn about new vsprintf pointer extension '%pe' MAINTAINERS: Add VSPRINTF tools lib api: Renaming pr_warning to pr_warn ASoC: samsung: Use pr_warn instead of pr_warning lib: cpu_rmap: Use pr_warn instead of pr_warning trace: Use pr_warn instead of pr_warning dma-debug: Use pr_warn instead of pr_warning vgacon: Use pr_warn instead of pr_warning fs: afs: Use pr_warn instead of pr_warning sh/intc: Use pr_warn instead of pr_warning scsi: Use pr_warn instead of pr_warning platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning platform/x86: asus-laptop: Use pr_warn instead of pr_warning platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning oprofile: Use pr_warn instead of pr_warning of: Use pr_warn instead of pr_warning macintosh: Use pr_warn instead of pr_warning idsn: Use pr_warn instead of pr_warning ide: Use pr_warn instead of pr_warning crypto: n2: Use pr_warn instead of pr_warning ...
2019-11-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds24-452/+1033
Pull KVM updates from Paolo Bonzini: "ARM: - data abort report and injection - steal time support - GICv4 performance improvements - vgic ITS emulation fixes - simplify FWB handling - enable halt polling counters - make the emulated timer PREEMPT_RT compliant s390: - small fixes and cleanups - selftest improvements - yield improvements PPC: - add capability to tell userspace whether we can single-step the guest - improve the allocation of XIVE virtual processor IDs - rewrite interrupt synthesis code to deliver interrupts in virtual mode when appropriate. - minor cleanups and improvements. x86: - XSAVES support for AMD - more accurate report of nested guest TSC to the nested hypervisor - retpoline optimizations - support for nested 5-level page tables - PMU virtualization optimizations, and improved support for nested PMU virtualization - correct latching of INITs for nested virtualization - IOAPIC optimization - TSX_CTRL virtualization for more TAA happiness - improved allocation and flushing of SEV ASIDs - many bugfixes and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints KVM: x86: Grab KVM's srcu lock when setting nested state KVM: x86: Open code shared_msr_update() in its only caller KVM: Fix jump label out_free_* in kvm_init() KVM: x86: Remove a spurious export of a static function KVM: x86: create mmu/ subdirectory KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page KVM: x86: remove set but not used variable 'called' KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID KVM: x86: do not modify masked bits of shared MSRs KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT KVM: x86: Unexport kvm_vcpu_reload_apic_access_page() KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1 KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization ...
2019-11-25Merge tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-0/+2
Pull xen updates from Juergen Gross: - a small series to remove the build constraint of Xen x86 MCE handling to 64-bit only - a bunch of minor cleanups * tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Fix Kconfig indentation xen/mcelog: also allow building for 32-bit kernels xen/mcelog: add PPIN to record when available xen/mcelog: drop __MC_MSR_MCGCAP xen/gntdev: Use select for DMA_SHARED_BUFFER xen: mm: make xen_mm_init static xen: mm: include <xen/xen-ops.h> for missing declarations
2019-11-25Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds5-44/+319
Pull RAS updates from Borislav Petkov: - Fully reworked thermal throttling notifications, there should be no more spamming of dmesg (Srinivas Pandruvada and Benjamin Berg) - More enablement for the Intel-compatible CPUs Zhaoxin (Tony W Wang-oc) - PPIN support for Icelake (Tony Luck) * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/therm_throt: Optimize notifications of thermal throttle x86/mce: Add Xeon Icelake to list of CPUs that support PPIN x86/mce: Lower throttling MCE messages' priority to warning x86/mce: Add Zhaoxin LMCE support x86/mce: Add Zhaoxin CMCI support x86/mce: Add Zhaoxin MCE support x86/mce/amd: Make disable_err_thresholding() static
2019-11-25Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-19/+26
Pull x86 microcode updates from Borislav Petkov: "This converts the late loading method to load the microcode in parallel (vs sequentially currently). The patch remained in linux-next for the maximum amount of time so that any potential and hard to debug fallout be minimized. Now cloud folks have their milliseconds back but all the normal people should use early loading anyway :-)" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Issue the revision updated message only on the BSP x86/microcode: Update late microcode in parallel x86/microcode/amd: Fix two -Wunused-but-set-variable warnings
2019-11-25Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds1-0/+6
Pull arm64 updates from Catalin Marinas: "Apart from the arm64-specific bits (core arch and perf, new arm64 selftests), it touches the generic cow_user_page() (reviewed by Kirill) together with a macro for x86 to preserve the existing behaviour on this architecture. Summary: - On ARMv8 CPUs without hardware updates of the access flag, avoid failing cow_user_page() on PFN mappings if the pte is old. The patches introduce an arch_faults_on_old_pte() macro, defined as false on x86. When true, cow_user_page() makes the pte young before attempting __copy_from_user_inatomic(). - Covert the synchronous exception handling paths in arch/arm64/kernel/entry.S to C. - FTRACE_WITH_REGS support for arm64. - ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4 - Several kselftest cases specific to arm64, together with a MAINTAINERS update for these files (moved to the ARM64 PORT entry). - Workaround for a Neoverse-N1 erratum where the CPU may fetch stale instructions under certain conditions. - Workaround for Cortex-A57 and A72 errata where the CPU may speculatively execute an AT instruction and associate a VMID with the wrong guest page tables (corrupting the TLB). - Perf updates for arm64: additional PMU topologies on HiSilicon platforms, support for CCN-512 interconnect, AXI ID filtering in the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2. - GICv3 optimisation to avoid a heavy barrier when accessing the ICC_PMR_EL1 register. - ELF HWCAP documentation updates and clean-up. - SMC calling convention conduit code clean-up. - KASLR diagnostics printed during boot - NVIDIA Carmel CPU added to the KPTI whitelist - Some arm64 mm clean-ups: use generic free_initrd_mem(), remove stale macro, simplify calculation in __create_pgd_mapping(), typos. - Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for endinanness to help with allmodconfig" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: Kconfig: add a choice for endianness kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous" arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry arm64: kaslr: Check command line before looking for a seed arm64: kaslr: Announce KASLR status on boot kselftest: arm64: fake_sigreturn_misaligned_sp kselftest: arm64: fake_sigreturn_bad_size kselftest: arm64: fake_sigreturn_duplicated_fpsimd kselftest: arm64: fake_sigreturn_missing_fpsimd kselftest: arm64: fake_sigreturn_bad_size_for_magic0 kselftest: arm64: fake_sigreturn_bad_magic kselftest: arm64: add helper get_current_context kselftest: arm64: extend test_init functionalities kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht] kselftest: arm64: mangle_pstate_invalid_daif_bits kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils kselftest: arm64: extend toplevel skeleton Makefile drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform arm64: mm: reserve CMA and crashkernel in ZONE_DMA32 ...
2019-11-25Merge branch 'x86/core' into perf/core, to resolve conflicts and to pick up completed topic treeIngo Molnar7-11/+70
Conflicts: tools/perf/check-headers.sh Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar20-94/+863
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-24bpf: Simplify __bpf_arch_text_poke poke type handlingDaniel Borkmann1-60/+25
Given that we have BPF_MOD_NOP_TO_{CALL,JUMP}, BPF_MOD_{CALL,JUMP}_TO_NOP and BPF_MOD_{CALL,JUMP}_TO_{CALL,JUMP} poke types and that we also pass in old_addr as well as new_addr, it's a bit redundant and unnecessarily complicates __bpf_arch_text_poke() itself since we can derive the same from the *_addr that were passed in. Hence simplify and use BPF_MOD_{CALL,JUMP} as types which also allows to clean up call-sites. In addition to that, __bpf_arch_text_poke() currently verifies that text matches expected old_insn before we invoke text_poke_bp(). Also add a check on new_insn and skip rewrite if it already matches. Reason why this is rather useful is that it avoids making any special casing in prog_array_map_poke_run() when old and new prog were NULL and has the benefit that also for this case we perform a check on text whether it really matches our expectations. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/fcb00a2b0b288d6c73de4ef58116a821c8fe8f2f.1574555798.git.daniel@iogearbox.net
2019-11-24bpf, x86: Emit patchable direct jump as tail callDaniel Borkmann1-95/+187
Add initial code emission for *direct* jumps for tail call maps in order to avoid the retpoline overhead from a493a87f38cf ("bpf, x64: implement retpoline for tail call") for situations that allow for it, meaning, for known constant keys at verification time which are used as index into the tail call map. In case of Cilium which makes heavy use of tail calls, constant keys are used in the vast majority, only for a single occurrence we use a dynamic key. High level outline is that if the target prog is NULL in the map, we emit a 5-byte nop for the fall-through case and if not, we emit a 5-byte direct relative jmp to the target bpf_func + skipped prologue offset. Later during runtime, we patch these 5-byte nop/jmps upon tail call map update or deletions dynamically. Note that on x86-64 the direct jmp works as we reuse the same stack frame and skip prologue (as opposed to some other JIT implementations). One of the issues is that the tail call map slots can change at any given time even during JITing. Therefore, we have two passes: i) emit nops for all patchable locations during main JITing phase until we declare prog->jited = 1 eventually. At this point the image is stable, not public yet and with all jmps disabled. While JITing, we collect additional info like poke->ip in order to remember the patch location for later modifications. In ii) bpf_tail_call_direct_fixup() walks over the progs poke_tab, locks the tail call maps poke_mutex to prevent from parallel updates and patches in the right locations via __bpf_arch_text_poke(). Note, the main bpf_arch_text_poke() cannot be used at this point since we're not yet exposed to kallsyms. For the update we use plain memcpy() since the image is not public and still in read-write mode. After patching, we activate that poke entry through poke->ip_stable. Meaning, at this point any tail call map updates/deletions are not going to ignore that poke entry anymore. Then, bpf_arch_text_poke() might still occur on the read-write image until we finally locked it as read-only. Both modifications on the given image are under text_mutex to avoid interference with each other when update requests come in in parallel for different tail call maps (current one we have locked in JIT and different one where poke->ip_stable was already set). Example prog: # ./bpftool p d x i 1655 0: (b7) r3 = 0 1: (18) r2 = map[id:526] 3: (85) call bpf_tail_call#12 4: (b7) r0 = 1 5: (95) exit Before: # ./bpftool p d j i 1655 0xffffffffc076e55c: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 _ 19: xor %edx,%edx |_ index (arg 3) 1b: movabs $0xffff88d95cc82600,%rsi |_ map (arg 2) 25: mov %edx,%edx | index >= array->map.max_entries 27: cmp %edx,0x24(%rsi) | 2a: jbe 0x0000000000000066 |_ 2c: mov -0x224(%rbp),%eax | tail call limit check 32: cmp $0x20,%eax | 35: ja 0x0000000000000066 | 37: add $0x1,%eax | 3a: mov %eax,-0x224(%rbp) |_ 40: mov 0xd0(%rsi,%rdx,8),%rax |_ prog = array->ptrs[index] 48: test %rax,%rax | prog == NULL check 4b: je 0x0000000000000066 |_ 4d: mov 0x30(%rax),%rax | goto *(prog->bpf_func + prologue_size) 51: add $0x19,%rax | 55: callq 0x0000000000000061 | retpoline for indirect jump 5a: pause | 5c: lfence | 5f: jmp 0x000000000000005a | 61: mov %rax,(%rsp) | 65: retq |_ 66: mov $0x1,%eax 6b: pop %rbx 6c: pop %r15 6e: pop %r14 70: pop %r13 72: pop %rbx 73: leaveq 74: retq After; state after JIT: # ./bpftool p d j i 1655 0xffffffffc08e8930: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 _ 19: xor %edx,%edx |_ index (arg 3) 1b: movabs $0xffff9d8afd74c000,%rsi |_ map (arg 2) 25: mov -0x224(%rbp),%eax | tail call limit check 2b: cmp $0x20,%eax | 2e: ja 0x000000000000003e | 30: add $0x1,%eax | 33: mov %eax,-0x224(%rbp) |_ 39: jmpq 0xfffffffffffd1785 |_ [direct] goto *(prog->bpf_func + prologue_size) 3e: mov $0x1,%eax 43: pop %rbx 44: pop %r15 46: pop %r14 48: pop %r13 4a: pop %rbx 4b: leaveq 4c: retq After; state after map update (target prog): # ./bpftool p d j i 1655 0xffffffffc08e8930: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 19: xor %edx,%edx 1b: movabs $0xffff9d8afd74c000,%rsi 25: mov -0x224(%rbp),%eax 2b: cmp $0x20,%eax . 2e: ja 0x000000000000003e . 30: add $0x1,%eax . 33: mov %eax,-0x224(%rbp) |_ 39: jmpq 0xffffffffffb09f55 |_ goto *(prog->bpf_func + prologue_size) 3e: mov $0x1,%eax 43: pop %rbx 44: pop %r15 46: pop %r14 48: pop %r13 4a: pop %rbx 4b: leaveq 4c: retq After; state after map update (no prog): # ./bpftool p d j i 1655 0xffffffffc08e8930: 0: nopl 0x0(%rax,%rax,1) 5: push %rbp 6: mov %rsp,%rbp 9: sub $0x200,%rsp 10: push %rbx 11: push %r13 13: push %r14 15: push %r15 17: pushq $0x0 19: xor %edx,%edx 1b: movabs $0xffff9d8afd74c000,%rsi 25: mov -0x224(%rbp),%eax 2b: cmp $0x20,%eax . 2e: ja 0x000000000000003e . 30: add $0x1,%eax . 33: mov %eax,-0x224(%rbp) |_ 39: nopl 0x0(%rax,%rax,1) |_ fall-through nop 3e: mov $0x1,%eax 43: pop %rbx 44: pop %r15 46: pop %r14 48: pop %r13 4a: pop %rbx 4b: leaveq 4c: retq Nice bonus is that this also shrinks the code emission quite a bit for every tail call invocation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/6ada4c1c9d35eeb5f4ecfab94593dafa6b5c4b09.1574452833.git.daniel@iogearbox.net
2019-11-24bpf, x86: Generalize and extend bpf_arch_text_poke for direct jumpsDaniel Borkmann1-18/+46
Add BPF_MOD_{NOP_TO_JUMP,JUMP_TO_JUMP,JUMP_TO_NOP} patching for x86 JIT in order to be able to patch direct jumps or nop them out. We need this facility in order to patch tail call jumps and in later work also BPF static keys. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/aa4784196a8e5e985af4b30a4fe5336bce6e9643.1574452833.git.daniel@iogearbox.net
2019-11-23kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraintsJim Mattson1-1/+3
Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting new constraints on the data values that kvm would accept for the guest MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been opt-in via a new KVM capability, but they were applied indiscriminately, breaking at least one existing hypervisor. Relax the constraints to allow either or both of FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is enabled. This change is sufficient to fix the aforementioned breakage. Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-23KVM: x86: Grab KVM's srcu lock when setting nested stateSean Christopherson1-0/+3
Acquire kvm->srcu for the duration of ->set_nested_state() to fix a bug where nVMX derefences ->memslots without holding ->srcu or ->slots_lock. The other half of nested migration, ->get_nested_state(), does not need to acquire ->srcu as it is a purely a dump of internal KVM (and CPU) state to userspace. Detected as an RCU lockdep splat that is 100% reproducible by running KVM's state_test selftest with CONFIG_PROVE_LOCKING=y. Note that the failing function, kvm_is_visible_gfn(), is only checking the validity of a gfn, it's not actually accessing guest memory (which is more or less unsupported during vmx_set_nested_state() due to incorrect MMU state), i.e. vmx_set_nested_state() itself isn't fundamentally broken. In any case, setting nested state isn't a fast path so there's no reason to go out of our way to avoid taking ->srcu. ============================= WARNING: suspicious RCU usage 5.4.0-rc7+ #94 Not tainted ----------------------------- include/linux/kvm_host.h:626 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by evmcs_test/10939: #0: ffff88826ffcb800 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x630 [kvm] stack backtrace: CPU: 1 PID: 10939 Comm: evmcs_test Not tainted 5.4.0-rc7+ #94 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x68/0x9b kvm_is_visible_gfn+0x179/0x180 [kvm] mmu_check_root+0x11/0x30 [kvm] fast_cr3_switch+0x40/0x120 [kvm] kvm_mmu_new_cr3+0x34/0x60 [kvm] nested_vmx_load_cr3+0xbd/0x1f0 [kvm_intel] nested_vmx_enter_non_root_mode+0xab8/0x1d60 [kvm_intel] vmx_set_nested_state+0x256/0x340 [kvm_intel] kvm_arch_vcpu_ioctl+0x491/0x11a0 [kvm] kvm_vcpu_ioctl+0xde/0x630 [kvm] do_vfs_ioctl+0xa2/0x6c0 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x54/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f59a2b95f47 Fixes: 8fcc4b5923af5 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-23KVM: x86: Open code shared_msr_update() in its only callerSean Christopherson1-20/+9
Fold shared_msr_update() into its sole user to eliminate its pointless bounds check, its godawful printk, its misleading comment (it's called under a global lock), and its woefully inaccurate name. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-23KVM: x86: Remove a spurious export of a static functionSean Christopherson1-1/+0
A recent change inadvertently exported a static function, which results in modpost throwing a warning. Fix it. Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-22crypto: x86/chacha - only unregister algorithms if registeredEric Biggers1-1/+2
It's not valid to call crypto_unregister_skciphers() without a prior call to crypto_register_skciphers(). Fixes: 84e03fa39fbe ("crypto: x86/chacha - expose SIMD ChaCha routine as library function") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-21KVM: x86: create mmu/ subdirectoryPaolo Bonzini4-2/+2
Preparatory work for shattering mmu.c into multiple files. Besides making it easier to follow, this will also make it possible to write unit tests for various parts. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-pageLiran Alon1-7/+0
According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use of the INVVPID/INVEPT Instruction, the hypervisor needs to execute INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and either: "Virtualize APIC accesses" VM-execution control was changed from 0 to 1, OR the value of apic_access_page was changed. In the nested case, the burden falls on L1, unless L0 enables EPT in vmcs02 but L1 enables neither EPT nor VPID in vmcs12. For this reason prepare_vmcs02() and load_vmcs12_host_state() have special code to request a TLB flush in case L1 does not use EPT but it uses "virtualize APIC accesses". This special case however is not necessary. On a nested vmentry the physical TLB will already be flushed except if all the following apply: * L0 uses VPID * L1 uses VPID * L0 can guarantee TLB entries populated while running L1 are tagged differently than TLB entries populated while running L2. If the first condition is false, the processor will flush the TLB on vmentry to L2. If the second or third condition are false, prepare_vmcs02() will request KVM_REQ_TLB_FLUSH. However, even if both are true, no extra TLB flush is needed to handle the APIC access page: * if L1 doesn't use VPID, the second condition doesn't hold and the TLB will be flushed anyway. * if L1 uses VPID, it has to flush the TLB itself with INVVPID and section 28.3.3.3 doesn't apply to L0. * even INVEPT is not needed because, if L0 uses EPT, it uses different EPTP when running L2 than L1 (because guest_mode is part of mmu-role). In this case SDM section 28.3.3.4 doesn't apply. Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(), one could note that L0 won't flush TLB only in cases where SDM sections 28.3.3.3 and 28.3.3.4 don't apply. In particular, if L0 uses different VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section 28.3.3.3 doesn't apply. Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit(). Side-note: This patch can be viewed as removing parts of commit fb6c81984313 ("kvm: vmx: Flush TLB when the APIC-access address changes”) that is not relevant anymore since commit 1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”). i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT, then L0 will use same EPTP for both L0 and L1. Which indeed required L0 to execute INVEPT before entering L2 guest. This assumption is not true anymore since when guest_mode was added to mmu-role. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21KVM: x86: remove set but not used variable 'called'Mao Wenan1-3/+2
Fixes gcc '-Wunused-but-set-variable' warning: arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask: arch/x86/kvm/x86.c:7911:7: warning: variable called set but not used [-Wunused-but-set-variable] It is not used since commit 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Mao Wenan <maowenan@huawei.com> Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinningLiran Alon1-3/+3
vmcs->apic_access_page is simply a token that the hypervisor puts into the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers APIC-access VMExit or APIC virtualization logic whenever a CPU running in VMX non-root mode read/write from/to this PFN. As every write either triggers an APIC-access VMExit or write is performed on vmcs->virtual_apic_page, the PFN pointed to by vmcs->apic_access_page should never actually be touched by CPU. Therefore, there is no need to mark vmcs02->apic_access_page as dirty after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21Merge branch 'kvm-tsx-ctrl' into HEADPaolo Bonzini35-207/+1098
Conflicts: arch/x86/kvm/vmx/vmx.c
2019-11-21KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack itPaolo Bonzini1-14/+30
If X86_FEATURE_RTM is disabled, the guest should not be able to access MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all transactions from the guest to abort. Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionalityPaolo Bonzini2-21/+40
The current guest mitigation of TAA is both too heavy and not really sufficient. It is too heavy because it will cause some affected CPUs (those that have MDS_NO but lack TAA_NO) to fall back to VERW and get the corresponding slowdown. It is not really sufficient because it will cause the MDS_NO bit to disappear upon microcode update, so that VMs started before the microcode update will not be runnable anymore afterwards, even with tsx=on. Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for the guest and let it run without the VERW mitigation. Even though MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write it on every vmentry, we can use the shared MSR functionality because the host kernel need not protect itself from TSX-based side-channels. Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUIDPaolo Bonzini3-4/+9
Because KVM always emulates CPUID, the CPUID clear bit (bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually" by the hypervisor when performing said emulation. Right now neither kvm-intel.ko nor kvm-amd.ko implement MSR_IA32_TSX_CTRL but this will change in the next patch. Reviewed-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21KVM: x86: do not modify masked bits of shared MSRsPaolo Bonzini1-2/+3
"Shared MSRs" are guest MSRs that are written to the host MSRs but keep their value until the next return to userspace. They support a mask, so that some bits keep the host value, but this mask is only used to skip an unnecessary MSR write and the value written to the MSR is always the guest MSR. Fix this and, while at it, do not update smsr->values[slot].curr if for whatever reason the wrmsr fails. This should only happen due to reserved bits, so the value written to smsr->values[slot].curr will not match when the user-return notifier and the host value will always be restored. However, it is untidy and in rare cases this can actually avoid spurious WRMSRs on return to userspace. Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIESPaolo Bonzini1-2/+8
KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR && !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not hidden (it actually was), yet the value says that TSX is not vulnerable to microarchitectural data sampling. Fix both. Cc: stable@vger.kernel.org Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller5-104/+496
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-11-20 The following pull-request contains BPF updates for your *net-next* tree. We've added 81 non-merge commits during the last 17 day(s) which contain a total of 120 files changed, 4958 insertions(+), 1081 deletions(-). There are 3 trivial conflicts, resolve it by always taking the chunk from 196e8ca74886c433: <<<<<<< HEAD ======= void *bpf_map_area_mmapable_alloc(u64 size, int numa_node); >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD void *bpf_map_area_alloc(u64 size, int numa_node) ======= static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { ======= /* kmalloc()'ed memory can't be mmap()'ed */ if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 The main changes are: 1) Addition of BPF trampoline which works as a bridge between kernel functions, BPF programs and other BPF programs along with two new use cases: i) fentry/fexit BPF programs for tracing with practically zero overhead to call into BPF (as opposed to k[ret]probes) and ii) attachment of the former to networking related programs to see input/output of networking programs (covering xdpdump use case), from Alexei Starovoitov. 2) BPF array map mmap support and use in libbpf for global data maps; also a big batch of libbpf improvements, among others, support for reading bitfields in a relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko. 3) Extend s390x JIT with usage of relative long jumps and loads in order to lift the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich. 4) Add BPF audit support and emit messages upon successful prog load and unload in order to have a timeline of events, from Daniel Borkmann and Jiri Olsa. 5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode (XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson. 6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API call named bpf_get_link_xdp_info() for retrieving the full set of prog IDs attached to XDP, from Toke Høiland-Jørgensen. 7) Add BTF support for array of int, array of struct and multidimensional arrays and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau. 8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo. 9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid xdping to be run as standalone, from Jiri Benc. 10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song. 11) Fix a memory leak in BPF fentry test run data, from Colin Ian King. 12) Various smaller misc cleanups and improvements mostly all over BPF selftests and samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPTLiran Alon1-2/+4
Since commit 1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role"), guest_mode was added to mmu-role and therefore if L0 use EPT, it will always run L1 and L2 with different EPTP. i.e. EPTP01!=EPTP02. Because TLB entries are tagged with EP4TA, KVM can assume TLB entries populated while running L2 are tagged differently than TLB entries populated while running L1. Therefore, update nested_has_guest_tlb_tag() to consider if L0 use EPT instead of if L1 use EPT. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-20KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()Liran Alon1-1/+0
The function is only used in kvm.ko module. Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-20KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1Chenyi Qiang1-0/+1
When L1 guest uses 5-level paging, it fails vm-entry to L2 due to invalid host-state. It needs to add CR4_LA57 bit to nested CR4_FIXED1 MSR. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-20KVM: nVMX: Use semi-colon instead of comma for exit-handlers initializationLiran Alon1-13/+13
Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-20KVM: x86: Zero the IOAPIC scan request dest vCPUs bitmapNitesh Narayan Lal1-0/+1
Not zeroing the bitmap used for identifying the destination vCPUs for an IOAPIC scan request in fixed delivery mode could lead to waking up unwanted vCPUs. This patch zeroes the vCPU bitmap before passing it to kvm_bitmap_or_dest_vcpus(), which is responsible for setting the bitmap with the bits corresponding to the destination vCPUs. Fixes: 7ee30bc132c6("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-19Merge tag 'perf-core-for-mingo-5.5-20191119' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/coreIngo Molnar1-6/+12
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: x86/insn: Adrian Hunter: - Add some more Intel instructions to the opcode map: cldemote, encls, enclu, enclv, enqcmd, enqcmds, movdir64b, movdiri, pconfig, tpause, umonitor, umwait, wbnoinvd. - The instruction decoding can be tested using the perf tools' "x86 instruction decoder - new instructions" test as folllows: $ perf test -v "new " 2>&1 | grep -i cldemote Decoded ok: 0f 1c 00 cldemote (%eax) Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678 Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8) Decoded ok: 0f 1c 00 cldemote (%rax) Decoded ok: 41 0f 1c 00 cldemote (%r8) Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678 Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8) Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8) $ perf test -v "new " 2>&1 | grep -i tpause Decoded ok: 66 0f ae f3 tpause %ebx Decoded ok: 66 0f ae f3 tpause %ebx Decoded ok: 66 41 0f ae f0 tpause %r8d callchains: Adrian Hunter: - Fix segfault in thread__resolve_callchain_sample(). perf probe: - Line fixes to show only lines where probes can be used with 'perf probe -L', and when reporting them via 'perf probe -l'. - Support multiprobe events. perf scripts python: Adrian Hunter: - Fix use of TRUE with SQLite < 3.23 in exported-sql-viewer.py. perf maps: - Trim 'struct map' by removing the rb_node member for sorting by map name, as that is only needed for processing kernel maps, and only when classifying symbols by section at load time. Sort them by name using qsort() and do lookups using bsearch() when map_groups__find_by_name() is used. perf parse: Ian Rogers: - Report initial event parsing error, providing a less cryptic message to state that a PMU wasn't found in the system. perf vendor events: James Clark: - Fix commas so that PMU event files for arm64, power8 and power nine become valid JSON. libtraceevent: Konstantin Khlebnikov: - Fix parsing of event %o and %X argument types. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-18x86/insn: Add some Intel instructions to the opcode mapAdrian Hunter1-6/+12
Add to the opcode map the following instructions: cldemote tpause umonitor umwait movdiri movdir64b enqcmd enqcmds encls enclu enclv pconfig wbnoinvd For information about the instructions, refer Intel SDM May 2019 (325462-070US) and Intel Architecture Instruction Set Extensions May 2019 (319433-037). The instruction decoding can be tested using the perf tools' "x86 instruction decoder - new instructions" test as folllows: $ perf test -v "new " 2>&1 | grep -i cldemote Decoded ok: 0f 1c 00 cldemote (%eax) Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678 Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8) Decoded ok: 0f 1c 00 cldemote (%rax) Decoded ok: 41 0f 1c 00 cldemote (%r8) Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678 Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8) Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8) $ perf test -v "new " 2>&1 | grep -i tpause Decoded ok: 66 0f ae f3 tpause %ebx Decoded ok: 66 0f ae f3 tpause %ebx Decoded ok: 66 41 0f ae f0 tpause %r8d $ perf test -v "new " 2>&1 | grep -i umonitor Decoded ok: 67 f3 0f ae f0 umonitor %ax Decoded ok: f3 0f ae f0 umonitor %eax Decoded ok: 67 f3 0f ae f0 umonitor %eax Decoded ok: f3 0f ae f0 umonitor %rax Decoded ok: 67 f3 41 0f ae f0 umonitor %r8d $ perf test -v "new " 2>&1 | grep -i umwait Decoded ok: f2 0f ae f0 umwait %eax Decoded ok: f2 0f ae f0 umwait %eax Decoded ok: f2 41 0f ae f0 umwait %r8d $ perf test -v "new " 2>&1 | grep -i movdiri Decoded ok: 0f 38 f9 03 movdiri %eax,(%ebx) Decoded ok: 0f 38 f9 88 78 56 34 12 movdiri %ecx,0x12345678(%eax) Decoded ok: 48 0f 38 f9 03 movdiri %rax,(%rbx) Decoded ok: 48 0f 38 f9 88 78 56 34 12 movdiri %rcx,0x12345678(%rax) $ perf test -v "new " 2>&1 | grep -i movdir64b Decoded ok: 66 0f 38 f8 18 movdir64b (%eax),%ebx Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx Decoded ok: 67 66 0f 38 f8 1c movdir64b (%si),%bx Decoded ok: 67 66 0f 38 f8 8c 34 12 movdir64b 0x1234(%si),%cx Decoded ok: 66 0f 38 f8 18 movdir64b (%rax),%rbx Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%rax),%rcx Decoded ok: 67 66 0f 38 f8 18 movdir64b (%eax),%ebx Decoded ok: 67 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx $ perf test -v "new " 2>&1 | grep -i enqcmd Decoded ok: f2 0f 38 f8 18 enqcmd (%eax),%ebx Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx Decoded ok: 67 f2 0f 38 f8 1c enqcmd (%si),%bx Decoded ok: 67 f2 0f 38 f8 8c 34 12 enqcmd 0x1234(%si),%cx Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx Decoded ok: f2 0f 38 f8 18 enqcmd (%rax),%rbx Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%rax),%rcx Decoded ok: 67 f2 0f 38 f8 18 enqcmd (%eax),%ebx Decoded ok: 67 f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx $ perf test -v "new " 2>&1 | grep -i enqcmds Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx $ perf test -v "new " 2>&1 | grep -i encls Decoded ok: 0f 01 cf encls Decoded ok: 0f 01 cf encls $ perf test -v "new " 2>&1 | grep -i enclu Decoded ok: 0f 01 d7 enclu Decoded ok: 0f 01 d7 enclu $ perf test -v "new " 2>&1 | grep -i enclv Decoded ok: 0f 01 c0 enclv Decoded ok: 0f 01 c0 enclv $ perf test -v "new " 2>&1 | grep -i pconfig Decoded ok: 0f 01 c5 pconfig Decoded ok: 0f 01 c5 pconfig $ perf test -v "new " 2>&1 | grep -i wbnoinvd Decoded ok: f3 0f 09 wbnoinvd Decoded ok: f3 0f 09 wbnoinvd Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191115135447.6519-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller24-107/+892
Lots of overlapping changes and parallel additions, stuff like that. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-17crypto: curve25519 - x86_64 library and KPP implementationsJason A. Donenfeld2-0/+2476
This implementation is the fastest available x86_64 implementation, and unlike Sandy2x, it doesn't requie use of the floating point registers at all. Instead it makes use of BMI2 and ADX, available on recent microarchitectures. The implementation was written by Armando Faz-Hernández with contributions (upstream) from Samuel Neves and me, in addition to further changes in the kernel implementation from us. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Co-developed-by: Samuel Neves <sneves@dei.uc.pt> [ardb: - move to arch/x86/crypto - wire into lib/crypto framework - implement crypto API KPP hooks ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: blake2s - x86_64 SIMD implementationJason A. Donenfeld3-0/+493
These implementations from Samuel Neves support AVX and AVX-512VL. Originally this used AVX-512F, but Skylake thermal throttling made AVX-512VL more attractive and possible to do with negligable difference. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Co-developed-by: Samuel Neves <sneves@dei.uc.pt> [ardb: move to arch/x86/crypto, wire into lib/crypto framework] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17int128: move __uint128_t compiler test to KconfigArd Biesheuvel1-1/+1
In order to use 128-bit integer arithmetic in C code, the architecture needs to have declared support for it by setting ARCH_SUPPORTS_INT128, and it requires a version of the toolchain that supports this at build time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test whether __SIZEOF_INT128__ is defined, since this is only the case for compilers that can support 128-bit integers. Let's fold this additional test into the Kconfig declaration of ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles, e.g., to decide whether a certain object needs to be included in the first place. Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: x86/poly1305 - expose existing driver as poly1305 libraryArd Biesheuvel1-16/+41
Implement the arch init/update/final Poly1305 library routines in the accelerated SIMD driver for x86 so they are accessible to users of the Poly1305 library interface as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: x86/poly1305 - depend on generic library not generic shashArd Biesheuvel1-11/+55
Remove the dependency on the generic Poly1305 driver. Instead, depend on the generic library so that we only reuse code without pulling in the generic skcipher implementation as well. While at it, remove the logic that prefers the non-SIMD path for short inputs - this is no longer necessary after recent FPU handling changes on x86. Since this removes the last remaining user of the routines exported by the generic shash driver, unexport them and make them static. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: x86/poly1305 - unify Poly1305 state struct with generic codeArd Biesheuvel1-59/+29
In preparation of exposing a Poly1305 library interface directly from the accelerated x86 driver, align the state descriptor of the x86 code with the one used by the generic driver. This is needed to make the library interface unified between all implementations. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: poly1305 - move core routines into a separate libraryArd Biesheuvel1-1/+1
Move the core Poly1305 routines shared between the generic Poly1305 shash driver and the Adiantum and NHPoly1305 drivers into a separate library so that using just this pieces does not pull in the crypto API pieces of the generic Poly1305 routine. In a subsequent patch, we will augment this generic library with init/update/final routines so that Poyl1305 algorithm can be used directly without the need for using the crypto API's shash abstraction. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: x86/chacha - expose SIMD ChaCha routine as library functionArd Biesheuvel1-25/+66
Wire the existing x86 SIMD ChaCha code into the new ChaCha library interface, so that users of the library interface will get the accelerated version when available. Given that calls into the library API will always go through the routines in this module if it is enabled, switch to static keys to select the optimal implementation available (which may be none at all, in which case we defer to the generic implementation for all invocations). Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: x86/chacha - depend on generic chacha library instead of crypto driverArd Biesheuvel1-55/+35
In preparation of extending the x86 ChaCha driver to also expose the ChaCha library interface, drop the dependency on the chacha_generic crypto driver as a non-SIMD fallback, and depend on the generic ChaCha library directly. This way, we only pull in the code we actually need, without registering a set of ChaCha skciphers that we will never use. Since turning the FPU on and off is cheap these days, simplify the SIMD routine by dropping the per-page yield, which makes for a cleaner switch to the library API as well. This also allows use to invoke the skcipher walk routines in non-atomic mode. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17crypto: chacha - move existing library code into lib/cryptoArd Biesheuvel1-1/+1
Currently, our generic ChaCha implementation consists of a permute function in lib/chacha.c that operates on the 64-byte ChaCha state directly [and which is always included into the core kernel since it is used by the /dev/random driver], and the crypto API plumbing to expose it as a skcipher. In order to support in-kernel users that need the ChaCha streamcipher but have no need [or tolerance] for going through the abstractions of the crypto API, let's expose the streamcipher bits via a library API as well, in a way that permits the implementation to be superseded by an architecture specific one if provided. So move the streamcipher code into a separate module in lib/crypto, and expose the init() and crypt() routines to users of the library. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>