| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.
Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.
VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@
|
|
|
|
|
|
|
|
| |
This diff adds support to be able to load a randomly linked kernel VA
(subject to some range restrictions). This change has been in snaps for
a few days without any fallout.
ok deraadt@
|
|
|
|
|
|
|
|
|
| |
Emulate kvm pvclock in vmm(4). Compatible with pvclock(4) in OpenBSD. Linux
does not attach to this (yet).
Fixes by reyk@ and tested extensively by reyk@, tb@ and phessler@
ok mlarkin@ phessler@ reyk@
|
|
|
|
|
|
| |
cpu_idle_cycle()
ok mpi@ kettenis@
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.
With help from Mike Larkin
ok mlarkin@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when we have a serial console by introducing the notion of a "primary"
graphics device. The primary graphics device is the one set up and
used by firmware (BIOS, UEFI).
The goal is to make sure that wsdisplay0 and drm0 reliably attach to
the primary graphics device such that X works out of the box even
if you have multiple cards or if you are using a serial console.
This also fixes the situation where inteldrm(4) or radeondrm(4) would
take over the console on UEFI systems even if the kernel was booted
with a serial console.
ok jsg@
|
|
|
|
| |
ok deraadt
|
|
|
|
|
| |
details from the ELF header instead of faking it.
Proposal from mlarkin, tested on most architectures already
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
following return.
This change adds a constraint that the name passed to the RETGUARD_* macros
must correspond to the name in the corresponding ENTRY which starts the
function (or a function which appears beforehand in the same file). Since
we use the distance from the ENTRY definition to calculate how much padding
to insert, the ENTRY symbol must be in scope at assembly time. This is
almost always the case already, since it is the natural way to name the
retguard symbols so they remain unique.
ok deraadt@
|
|
|
|
|
|
|
|
|
| |
Ensure TLB is flushed to avoid stale entries when uvm removes
entries from a guest VM's EPT. This is done on VM teardown and when uvm
pages out pages in low memory situations. Prompted by a conversation with
Maxime from NetBSD a few months back.
ok deraadt
|
|
|
|
|
|
|
|
|
|
|
| |
control features on AMD. Linux tries to use them and since these are not
fully implemented yet, it results in an OOPS during boot on recent
hardware.
When these are properly passed through, we can restore advertising
support for this feature.
ok deraadt@
|
|
|
|
|
|
|
| |
the MSRs to support them. Fixes an OOPS during Linux guest VM boot on
Ryzen.
ok deraadt
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the guest VM. By default, VMX sets the limits to 0xFFFF on exit, which is
larger than what we want and can lead to security issues.
While here, reset the LDT as well. We don't use this in OpenBSD, and
VMX loads a null LDT selector on exit anyway, but resetting it here
prevents any future surprises.
Pointed out by Maxime Villard from NetBSD - thanks!
ok deraadt@
|
|
|
|
|
|
|
|
| |
- use an IPI to notify other CPUs toi update CR4 and the MSRs
- use the cpu(4) resume callback to restore the pctr(4) settings after
suspend/hibernate
ok kettenis@ deraadt@
|
|
|
|
| |
ok kettenis@ deraadt@
|
|
|
|
|
|
|
|
|
|
| |
that could leave `ddb_mp_mutex' locked if one CPU incremented
`db_active' while another CPU was in the critical section. When the race
hit, the debugger was unable to resume execution or switch between CPUs.
Race analyzed by patrick@
OK mpi@ patrick@
|
|
|
|
| |
ok mlarkin@
|
|
|
|
|
|
| |
Allow save/restore of %drX registers during VM exit and entry
discussed with deraadt@
|
|
|
|
|
|
|
| |
fixes kernel core dump to be readable by savecore. From fukaumi at
soum.co.jp
ok mlarkin
|
|
|
|
| |
parts on the fly.
|
| |
|
|
|
|
| |
no words or punctation were modified.
|
|
|
|
|
|
|
|
|
| |
This change expands the direct map to 4 slots (512GB each), to support
machines with up to 2TB physical memory. Should further expansion be
required, this change provides the means to do that with a single #define
change.
with help from and ok guenther
|
|
|
|
|
|
|
|
|
|
| |
We currently ignore MSR_SMBASE and MSR_SMM_MONITOR_CTL, but the SDM says
accessing the former for read and latter for write while not in SMM mode
should produce a #GP. This change detects those operations and injects
a #GP as the documentation says. The previous behaviour was harmless, just
not correct.
ok pd
|
|
|
|
|
|
|
| |
rdmsr_safe is used when reading potentially missing MSRs, to avoid
triggering #GPs in the kernel.
ok guenther
|
|
|
|
|
|
| |
By default, nothing changes -- shutdown is initiated. But allows turning
power button into a sleep button if desired.
(grudging) ok from a few parties
|
|
|
|
|
|
|
|
|
|
| |
An earlier diff moved the top level page, this diff finishes the lower
layers. New pages are allocated for the existing hiererchy (which thus
benefit from random placement from pmemrange/etc). Existing managed
pages are returned to uvm (a small number of bootstrap pages are not
returned as they are allocated in locore0 and thus aren't managed).
ok deraadt
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change moves the PML4 for pmap_kernel elsewhere during early boot.
Lower levels of pmap_kernel will be moved in subsequent changes, but there
are other pmap changes coming that need to be integrated first.
In snaps for 3 days, no fallout seen.
ok deraadt and substantial input and help from guenther@
cvs: ----------------------------------------------------------------------
|
|
|
|
| |
suggested by and ok kettenis@
|
|
|
|
|
|
|
|
| |
Bump the number of L2 page table entries reserved for the kernel from 16
to 64, to allow for larger kernels. This diff was in snaps for 21 days
without any reported fallout.
ok deraadt
|
|
|
|
|
| |
compiled with retpoline enabled are even piggier now.
diagnosed with robert kettenis and drahn
|
| |
|
|
|
|
|
|
|
| |
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.
ok visa@
|
|
|
|
| |
ok patrick@, naddy@
|
|
|
|
|
|
|
| |
the values, just try it and handle the #GP if it faults.
Problem reported by Maxime Villard (max(at)m00nbsd.net)
ok mlarkin@
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.
lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@
|
|
|
|
|
|
|
| |
Use inline functions instead of GNU C statement expressions, and
make them available to userland. With clues from guenther@.
ok guenther@ kettenis@
|
|
|
|
| |
ok deraadt@
|
|
|
|
|
|
| |
which speeds things up considerably compared to an uncached mapping.
ok deraadt@
|
|
|
|
|
|
|
|
|
| |
like we already do for MWAIT/MONITOR. Also match Intel here by not
exposing the SVM capability to AMD guests.
Allows Linux guests to boot in vmd(8) on Ryzen CPUs.
ok mlarkin@
|
| |
|
|
|
|
| |
though amd only provides public redistributable updates for >= family 10h.
|
|
|
|
| |
ok mlarkin@
|
|
|
|
| |
ok deraadt@, krw@, jca@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts
|
|
|
|
| |
OK deraadt@ mpi@
|
|
|
|
|
|
|
| |
for now as amd64/i386 firmware still caters for legacy OSes that only
support a single PCI segment.
ok patrick@
|