wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
*	Make sure that all CPUs end up with the same bits set in SCTLR_EL1.	kettenis	2021-03-27	1	-24/+24
\| \| \| \| \| \| \| \| \| \|	Do this by clearing all the bits marked RES0 and set all the bits marked RES1 for the ARMv8.0. Any optional features introduced in later revisions of the architecture (such as PAN) will be enabled after SCTLR_EL1 is initialized. ok patrick@
*	Add ARMv8.5 instruction set related CPU features.	kettenis	2021-03-27	1	-3/+78
\| \| \| \|	ok patrick@
*	Load MSI pages through bus_dma(9). Our interrupt controllers for MSIs	patrick	2021-03-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	typically pass the physical address, however retrieved, to our PCIe controller code. This physical address can in practise be directly given to the PCIe, but it is not a given that the CPU and the PCIe controller are able to use the same physical addresses. This is even more obvious with an smmu(4) inbetween, which can change the world view by introducing I/O virtual addresses. Hence for this it is indeed necessary to map those pages, which thanks to integration with bus_dma(9) works easily. For this we remember the PCI devices' DMA tag in the interrupt handle during the MSI map, so that we can use the smmu(4)-hooked DMA tag to load the physical address. While some systems might prefer to implement "trapping" pages for MSIs, to make sure devices cannot trigger other devices' interrupts, we only make sure the whole page is mapped. Having the IOMMU create a mapping for each MSI is a bit wasteful, but for now it's the simplest way to implement it. Discussed with and ok kettenis@
*	spelling	jsg	2021-03-11	2	-4/+4
\|
*	Revise the ASID allocation sheme to avoid a hang when running out of free	kettenis	2021-03-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	ASIDs. This should only happen on systems with 8-bit ASIDs, which are currently unsupported in OpenBSD. The new scheme uses "generations". Whenever we run out of ASIDs we bump the generation and flush the complete TLB. The pmaps of processes that are currently on the CPU are carried over into the new generation. This implementation relies on the scheduler lock to make sure this happens without any (known) races. ok patrick@, mpi@
*	Add memory attributes for stage-2 pagetables.	patrick	2021-02-28	1	-1/+7
\| \| \| \|	ok kettenis@
*	Add some infrastructure in the PCI chipset tag for pci_probe_device_hook()	patrick	2021-02-25	1	-2/+4
\| \| \| \| \| \|	so that we can provide IOMMU-hooked bus DMA tags for each PCI device. ok kettenis@
*	remove some unused includes	jsg	2021-02-23	1	-10/+0
\|
*	One CPUs that implement the VHE extension and have the E2H bit set, keep	kettenis	2021-02-21	1	-1/+2
\| \| \| \| \| \|	running the kernel in EL2. ok patrick@
*	Add support for FIQs. We need these to support agtimer(4) on Apple M1 SoCs	kettenis	2021-02-17	3	-18/+17
\| \| \| \| \| \| \| \|	since its interrupts seem to be hardwared to trigger an FIQ instead of an IRQ. This means we need to manipulate both the F and the I bit in the DAIF register when enabling and disabling interrupts. ok patrick@
*	Introduce BUS_SPACE_MAP_POSTED such that we can distinguish between	kettenis	2021-02-16	3	-13/+14
\| \| \| \| \| \| \| \|	posted and non-posted device memory mappings and set the right memory attributes for them. Needed because on the Apple M1 using the wrong mapping will fault. ok patrick@, dlg@
*	While it should be possible to use "normal uncachable" mappings for	kettenis	2021-02-15	1	-1/+2
\| \| \| \| \| \| \| \|	write-combining on arm64 as Linux does, this doesn't seem to work on NXP's LX2160A SoC. So switch to using "device" mappings for now to make amdgpu(4) work better. ok patrick@
*	last argument to pmap_fault_fixup() is unused, delete it	deraadt	2020-10-21	1	-2/+2
\| \| \| \|	noticed by kettenis
*	Add code to print CPU features.	kettenis	2020-10-18	1	-6/+51
\| \| \| \|	ok naddy@
*	Enable PAN (Privileged Access Never) on CPUs that support it. This means	kettenis	2020-08-17	1	-1/+3
\| \| \| \| \| \| \| \|	that user-space access from the kernel is not allowed for "normal" load/store instructions. Only the special "unprivileged" load/store instructions are allowed. We already use those in copyin(9) and copyout(9). ok patrick@, drahn@, jsg@
*	Remove "for all XXX platforms" from comment. Fixes the issue pointed out	kettenis	2020-08-14	1	-2/+2
\| \| \| \| \| \|	by miod@ where the powerpc64 claimed to be "for all AArch64 platforms". ok patrick@
*	Re-work intr_barrier(9) on arm64 to remove layer violation. So far we	patrick	2020-07-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	have stored the struct cpu_info * in the wrapper around the interrupt handler cookie, but since we can have a few layers inbetween, this does not seem very nice. Instead have each and every interrupt controller provide a barrier function. This means that intr_barrier(9) will in the end be executed by the interrupt controller that actually wired the pin to a core. And that's the only place where the information is stored. ok kettenis@
*	Store struct cpu_info * in arm64's interrupt wrap. intr_barrier() can	patrick	2020-07-16	1	-1/+2
\| \| \| \| \| \| \| \| \|	already assume every cookie is wrapped and simply retrieve the pointer from it. It's a bit of a layer violation though, since only the intc should actually store that kind of information. This is good enough for now, but I'm already cooking up a diff to resolve this. ok dlg@
*	To be able to have intr_barrier() on arm64, we need to be able to	patrick	2020-07-16	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	somehow gain access to the struct cpu_info * used to establish the interrupt. One possibility is to store the pointer in the cookie returned by the establish methods. A better way would be to ask the interrupt controller directly to do barrier. This means that all external facing interrupt establish functions need to wrap the cookie in a common way. We already do this for FDT-based interrupts. Also most PCI controllers already return the cookie from the FDT API, which is already wrapped. So arm64's acpi_intr_establish() and acpipci(4) now need to explicitly wrap it, since they call ic->ic_establish directly, which does not wrap. ok dlg@
*	Userland timecounter implementation for arm64.	kettenis	2020-07-15	1	-2/+2
\| \| \| \|	ok naddy@
*	Implement pci_intr_establish_cpu() on arm64 and armv7. The function pointer	patrick	2020-07-14	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \|	in the chipset tag for establishing interrupts now takes a struct cpu_info *. The normal pci_intr_establish() macro passes NULL as ci, which indicates that the primary CPU is to be used. The PCI controller drivers can then simply pass the ci on to our arm64/armv7 interrupt establish "framework". Prompted by dlg@ ok kettenis@
*	Extend the interrupt API on arm64 and armv7 to be able to pass around	patrick	2020-07-14	2	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a struct cpu_info . From a driver point of view the fdt_intr_establish_ API now also exist same functions with a _cpu suffix. Internally the "old" functions now call their _cpu counterparts, passing NULL as ci. NULL will be interpreted as primary CPU in the interrupt controller code. The internal framework for interrupt controllers has been changed so that the establish methods provided by an interrupt controller function always takes a struct cpu_info *. Some drivers, like imxgpio(4) and rkgpio(4), only have a single interrupt line for multiple pins. On those we simply disallow trying to establish an interrupt on a non-primary CPU, returning NULL. Since we do not have MP yet on armv7, all armv7 interrupt controllers do return NULL if an attempt is made to establish an interrupt on a different CPU. That said, so far there's no way this can happen. If we ever gain MP support, this is a reminder that the interrupt controller drivers have to be adjusted. Prompted by dlg@ ok kettenis@
*	Add support for timeconting in userland.	pirofti	2020-07-06	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This diff exposes parts of clock_gettime(2) and gettimeofday(2) to userland via libc eliberating processes from the need for a context switch everytime they want to count the passage of time. If a timecounter clock can be exposed to userland than it needs to set its tc_user member to a non-zero value. Tested with one or multiple counters per architecture. The timing data is shared through a pointer found in the new ELF auxiliary vector AUX_openbsd_timekeep containing timehands information that is frequently updated by the kernel. Timing differences between the last kernel update and the current time are adjusted in userland by the tc_get_timecount() function inside the MD usertc.c file. This permits a much more responsive environment, quite visible in browsers, office programs and gaming (apparently one is are able to fly in Minecraft now). Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others! OK from at least kettenis@, cheloha@, naddy@, sthen@
*	Remove obsolete <machine/stdarg.h> header. Nowadays the vararg	visa	2020-06-30	1	-56/+0
\| \| \| \| \| \| \| \|	functionality is provided by <sys/stdarg.h> using compiler builtins. Tested in a ports bulk build on amd64 by naddy@ OK naddy@ mpi@
*	Implement cpu_rnd_messybits() as a read of the virtual counter xored	naddy	2020-06-05	1	-2/+11
\| \| \| \| \| \| \| \|	with a bit-reversed copy of itself. There is progressively less entropy in the higher bits of a counter than in the lower bits, so bit-reverse one half in order to extract maximal entropy. style fixes and ok kettenis@
*	Allow userland access to the virtual counter.	kettenis	2020-06-05	1	-1/+4
\| \| \| \|	ok patrick@, deraadt@
*	introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.	dlg	2020-05-31	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rnd.c uses nanotime to get access to some bits that change quickly between events that it can mix into the entropy pool. it doesn't use nanotime to get a monotonically increasing set or ordered and accurate timestamps, it just wants something with bits that change. there's been discussions for years about letting rnd use a clock that's super fast to read, but not necessarily accurate, but it wasn't until recently that i figured out it wasn't interested in time at all, so things like keeping a fast clock coherent between cpu cores or correct according to ntp is unecessary. this means we can just let rnd read the cycle counters on cpus and things will be fine. cpus with cycle counters that vary in their speed and arent kept consistent between cores may even be desirable in this context. so this is the first step in converting rnd.c to reading cycle counter. it copies the nanotime backend to each arch, and they can replace it with something MD as a second step later on. djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits. thanks to visa for his eyes. ok deraadt@ visa@ deraadt@ says he will help handle any MD fallout that occurs.
*	Fix typo in comment.	kettenis	2020-05-17	1	-2/+2
\|
*	Add machdep.compatible.	kettenis	2020-05-17	1	-4/+8
\| \| \| \|	ok jsg@
*	Fix some of the more esoteric bus_space functions. The	kettenis	2020-04-13	1	-7/+8
\| \| \| \| \| \| \| \| \|	bus_space_read_region_n, bus_space_write_region_n and bus_space_set_region_n functions were all broken. Fixes mvneta(4) on arm64. ok patrick@
*	controler -> controller	jsg	2020-02-20	1	-3/+3
\|
*	Convert db_addr_t -> vaddr_t but leave the typedef for now.	mpi	2019-11-07	1	-3/+3
\|
*	Cache flush operations on arm64 were being incorrectly treated as write	drahn	2019-10-17	1	-4/+4
\| \| \| \| \| \| \|	operations, however they should be treated as read per the design. Switch to using bit defines, correct said defines. Fixes cache flushing causing Firefox to abort. ok kettenis@ kurt@
*	Newer ARMv8 processors now include a new CSV2 field in their processor	kettenis	2019-09-01	1	-1/+7
\| \| \| \| \| \| \| \| \|	feature register that can indicate that a processor is not vulnarable to Spectre v2 attacks. Use this field in favour of adding specific processors to a whitelist. Continue to whitelist the few processors that are known not to be vulnerable but don't set the appropriate value in the CSV2 field. ok jsg@
*	Fix a typo I noticed reviewing the smbios code cleanup diff.	kmos	2019-08-04	1	-2/+2
\| \| \| \| \| \|	(stirng -> string) ok kettenis@ who pointed out I should fix the new arm64 smbiosvar.h too
*	Cleanup the bios(4)/smbios(4) code a bit. Fix some KNF issues, reduce	kettenis	2019-08-04	1	-111/+111
\| \| \| \| \| \| \|	differences between the i386 and amd64 versions of the code and switch to using the standard C integer exact width integer types. ok deraadt@
*	Implement smbios support on arm64.	kettenis	2019-08-04	1	-0/+278
\| \| \| \|	ok deraadt@, jsg@
*	Register cpu(4) as a cooling device. This supports passive cooling by	kettenis	2019-07-02	1	-1/+2
\| \| \| \| \| \|	clamping the maximum DVFS state. ok mlarkin@, patrick@
*	Implement suspend/resume support for MSI-X interrupts. Loosely based on	kettenis	2019-06-25	1	-1/+8
\| \| \| \| \| \|	an earlier diff from sf@. ok jmatthew@, also ok mlarkin@, sf@ for a slightly different earlier version
*	Remove the unused pvh_attrs attribute from struct vm_page_md.	patrick	2019-06-04	1	-4/+2
\| \| \| \|	ok kettenis@
*	Bump MAXCPUS to 32 so that we can use all cores on the Ampere eMAG.	patrick	2019-06-04	1	-2/+2
\| \| \| \|	ok kettenis@
*	Map the raw bus space operations to the regular ones.	patrick	2019-06-03	1	-1/+15
\| \| \| \|	ok kettenis@
*	Change pci_intr_handle_t into a struct and replace duplicated code that	kettenis	2019-06-02	1	-2/+16
\| \| \| \| \| \| \| \|	implements mapping of MSI and MSI-X interrupts with new generic functions. Fixes a use-after-free in sone PCI device drivers that call pci_intr_string(9) after pci_intr_establish(9). ok deraadt@
*	Bump VM_MAX_KERNEL_ADDRESS so that we have about 16G of KVA. Since	patrick	2019-06-01	1	-2/+2
\| \| \| \| \| \| \| \| \|	we need KVA to keep track of all the RAM pages, machines with a lot of memory easily exhaust our KVA space. We need about 1G of KVA per 32G of memory, so with 16G of KVA we can maintain close to 512G of memory. ok kettenis@
*	Add MSI-X support for acpipci(4). This splits out some generic code into	kettenis	2019-05-31	1	-2/+6
\| \| \| \| \| \| \|	a new pci_machdep.c file such that it can be re-used by other arm64 PCI host bridge drivers in the future. ok patrick@
*	Add the needed ICC_PMR_EL1 register bit defines for the previous	patrick	2019-05-13	1	-2/+6
\| \| \| \| \| \| \|	commit to unbreak the build. from kettenis@ ok drahn@
*	Remove some junk that we don't use.	kettenis	2019-05-04	1	-12/+1
\| \| \| \|	ok patrick@
*	change marks[] array to uint64_t, so the code can track full 64-bit	deraadt	2019-04-10	1	-2/+2
\| \| \| \| \|	details from the ELF header instead of faking it. Proposal from mlarkin, tested on most architectures already
*	Setting and getting the rounding mode on our arm64 FPU has not worked	patrick	2019-03-12	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	in libm since the rounding mode is in fpcr, not fpsr. Since both FPU registers are 32-bit we can store them in the 64-bit fenv_t to make handling the bits easier. While there add FE_DENORMAL, which also exists on x86. Also make sure that whenever we are being passed an exception mask, we only allow the bits that are supported by hardware. Found by regression tests Debugged with Moritz Buhl ok kettenis@
*	Sprinkle a few ifdefs for _LOCORE and _KERNEL and reorder a few lines	patrick	2019-02-16	1	-21/+26
\| \| \| \| \| \| \|	so that pmap.h can be included as part of the mmap_hint regression test. From Moritz Buhl ok bluhm@