summaryrefslogtreecommitdiffstats
path: root/sys/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Add a guard page between I/O virtual address space allocations. The ideapatrick2021-04-031-3/+4
| | | | | | | | | | | is that IOVA allocations always have a gap in-between which produces a fault on access. If a transfer to a given allocation runs further than expected we should be able to see it. We pre-allocate IOVA on bus DMA map creation, and as long as we don't allocate a PTE descriptor, this comes with no cost. We have plenty of address space anyway, so adding a page-sized gap does not hurt at all and can only have positive effects. Idea from kettenis@
* Exclude the first page from I/O virtual address space, which is the NULLpatrick2021-04-031-3/+4
| | | | | | | | | pointer address. Not allowing this one to be allocated might help find driver bugs, where the device is programmed with a NULL pointer. We have plenty of address space anyway, so excluding this single page does not hurt at all and can only have positive effects. Idea from kettenis@
* Fix Dale's email addresstb2021-04-024-8/+8
| | | | ok drahn
* Turns out the PCIe DARTs support a full 32-bit device virtual address space.kettenis2021-03-291-4/+9
| | | | | | | | Adjust the region managed by the extend accordingly but avoid the first and last page. The last page collides with the MSI address used by the PCIe controller and not using the first page helps finding bugs. ok patrick@
* Fix IA32_EPT_VPID_CAP_XO_TRANSLATIONS specificationdv2021-03-291-2/+2
| | | | | | Per Intel SDM (Vol 3D, App. A.10) bit 0 should be read as a 1 if enabled. From Adam Steen. ok mlarkin@
* Make sure that all CPUs end up with the same bits set in SCTLR_EL1.kettenis2021-03-272-27/+28
| | | | | | | | | | Do this by clearing all the bits marked RES0 and set all the bits marked RES1 for the ARMv8.0. Any optional features introduced in later revisions of the architecture (such as PAN) will be enabled after SCTLR_EL1 is initialized. ok patrick@
* Add ARMv8.5 instruction set related CPU features.kettenis2021-03-272-4/+184
| | | | ok patrick@
* Return EOPNOTSUPP for unsupported ioctlskn2021-03-261-16/+6
| | | | | | | | | Match what apm(4/macppc) says and make apmd(8) log an approiate warning when unsupported power actions are requested. Merge identical cases while here. This syncs with the apm ioctl handlers on loongson and arm64.
* Fix "mach dtb" return code to avoid bogus bootkn2021-03-261-6/+8
| | | | | | | | | | | | Bootloader command functions must return zero in case of failure, returning 1 tells the bootloader to boot the currently set kernel iamge. "machine dtb" is is the wrong way around so using it triggers a boot. Fix this and print a brief usage (like other commands such as "hexdump" do) while here. Feedback OK patrick
* Fix errno, merge ioctl caseskn2021-03-261-13/+5
| | | | | | | | | The EBADF error is always overwritten for the standby, suspend and hibernate ioctls, only the mode ioctl has it right. Merge the now identical casese while here. OK patrick
* remove uneeded includes in md armv7 filesjsg2021-03-2569-307/+71
| | | | based on include-what-you-use suggestions
* The logic in mmrw() to check whether an address is within directbluhm2021-03-241-4/+4
| | | | | | | | | | | | | map was the wrong way around. The && prevented an EFAULT error and could pass userland addresses as kernel source to copyout(9). The kernel could crash with protection fault due to an invalid offset when reading /dev/kmem. Also make the range checks stricter. Not only the start address must be valid, but also the end address must be within the region to be copied. Note that sysctl kern.allowkmem=0 makes the bug unreachable by default. OK deraadt@
* Pack the SPCR struct definition since the struct isn't naturally alignedpatrick2021-03-231-2/+2
| | | | | | | or padded, and hence e. g. the access to the PCI vendor/device id would be broken. The structs for the other tables all seem to be packed as well. ok kettenis@
* Now that MSI pages are properly mapped, all that debug code in smmu(4)patrick2021-03-221-34/+2
| | | | | | can be removed. The only thing left to implement for smmu(4) to work out of the box with PCIe devices is to reserve the PCIe MMIO windows. Let's see how we can do this properly.
* Load MSI pages through bus_dma(9). Our interrupt controllers for MSIspatrick2021-03-224-18/+100
| | | | | | | | | | | | | | | | | | | | | | | | | typically pass the physical address, however retrieved, to our PCIe controller code. This physical address can in practise be directly given to the PCIe, but it is not a given that the CPU and the PCIe controller are able to use the same physical addresses. This is even more obvious with an smmu(4) inbetween, which can change the world view by introducing I/O virtual addresses. Hence for this it is indeed necessary to map those pages, which thanks to integration with bus_dma(9) works easily. For this we remember the PCI devices' DMA tag in the interrupt handle during the MSI map, so that we can use the smmu(4)-hooked DMA tag to load the physical address. While some systems might prefer to implement "trapping" pages for MSIs, to make sure devices cannot trigger other devices' interrupts, we only make sure the whole page is mapped. Having the IOMMU create a mapping for each MSI is a bit wasteful, but for now it's the simplest way to implement it. Discussed with and ok kettenis@
* Disambiguate expressions.visa2021-03-211-3/+3
|
* another unfortunate action to cope with relentless kernel growthderaadt2021-03-191-2/+2
|
* Add missing memory clobbers to "data" barriers.kettenis2021-03-173-11/+11
|
* Always use an allocated buffer for {Read,Write}Blocks() to makeyasuoka2021-03-172-80/+34
| | | | | | | | | efid_io() simpler. Also fixes the problem on some machines when boot from CD-ROM. It happened because the previous version passed unaligned pointers to the functions even if it is restricted by the IoAlign property of the media. idea from kettenis, work with asou ok kettenis
* Node without a "status" property should be considered enabled as well.kettenis2021-03-161-3/+3
| | | | Same change made to arm64 a week ago.
* Make sure that switching the console from serial to framebuffer workskettenis2021-03-162-22/+25
| | | | | | for framebuffer nodes under / and /chosen. Same change made to arm64 last month.
* acpi_intr_disestablish() should free its own cookie.patrick2021-03-161-1/+2
| | | | ok kettenis@
* Bump MAXTSIZ to 256MB on i386.kurt2021-03-161-2/+2
| | | | okay deraadt@
* Fix some correctness issues in the lowelevel kernel bringup code.kettenis2021-03-163-5/+20
| | | | | | | | | | | | | | - Make sure we install a dummy page table in TTBR0_EL1 before we change the size of the VA space in TCR_EL1. - Flush the TLB after updating TCR_EL1. - Flush TLB after installing the real kernel page table in TTBR1_EL1. - Add some barriers around TLB flushes to make it consistent with other places where we do TLB flushes. ok drahn@, patrick@
* Add code to acpiiort(4) to look up named components in the IORT andpatrick2021-03-153-4/+80
| | | | | | | map them. This makes ACPI's call to acpi_iommu_device_map() do work through acpiiort(4). ok kettenis@
* Change API of acpiiort(4). It was written as a hook before, taking thepatrick2021-03-157-33/+23
| | | | | | | | | | PCI attach args and replacing the DMA tag inside. Our other IOMMU API though takes a DMA tag and returns the old one or a new one. To have acpiiort(4) integrate better with non-PCI ACPI devices, change the API so that it is more similar to the other API. This also makes the code easier to understand. ok kettenis@
* Add acpi_iommu_device_map(), which replaces the DMA tag with one thatpatrick2021-03-153-3/+21
| | | | | | | | | is blessed with IOMMU magic, if available. This is mainly for arm64, since on amd64 and i386 the IOMMU only captures PCIe devices, as far as I know, which uses the pci_probe_device_hook(). This though is for non-PCI devices attached through ACPI. ok kettenis@
* Don't put an extern variable (ppc_kvm_stolen) into vmparam.h, other instancesderaadt2021-03-152-6/+3
| | | | of this file are only doing cpp #define
* We can use memory marked as EfiBootServicesCode or EfiBootServicesDatakettenis2021-03-131-3/+6
| | | | | | as well. ok drahn@, kn@
* spellingjsg2021-03-11140-331/+331
|
* Add SMP support.kettenis2021-03-111-14/+99
| | | | ok patrick@
* grow media a littlederaadt2021-03-111-2/+2
|
* Let MAIR comment catch up with reality.kettenis2021-03-101-2/+5
|
* pmap_avail_setup() is the only place physmem is calculated, delete a bunchderaadt2021-03-101-9/+2
| | | | | of code which thinks it could be done elsewhere. ok kurt
* Node without a "status" property should be considered enabled as well.kettenis2021-03-091-3/+3
| | | | ok patrick@
* Recognize Apple Firestorm cores.kettenis2021-03-091-1/+3
|
* Add initial bits for Check Point UTM-1 EDGE N.visa2021-03-093-3/+15
| | | | From Thaison Nguyen
* ofw_read_mem_regions() can skip calculation of physmem. pmap.cderaadt2021-03-091-5/+1
| | | | | | | | | | already calculates _usable_ memory and updates physmem (if it is 0), whereas ofw_read_mem_regions() was counting usable+unuseable memory. ie. 4G or more on some machines. powerpc's 32-bit pagetable cannot use memory beyond 4G phys addr. (On a 4G machine, physmem64 was calculated as 0, which caused the installer's auto-diskabel code to place /tmp on the b partition). ok gkoehler, works for kurt also
* Enable ixl(4).patrick2021-03-082-2/+4
|
* Revise the ASID allocation sheme to avoid a hang when running out of freekettenis2021-03-082-31/+120
| | | | | | | | | | | | | ASIDs. This should only happen on systems with 8-bit ASIDs, which are currently unsupported in OpenBSD. The new scheme uses "generations". Whenever we run out of ASIDs we bump the generation and flush the complete TLB. The pmaps of processes that are currently on the CPU are carried over into the new generation. This implementation relies on the scheduler lock to make sure this happens without any (known) races. ok patrick@, mpi@
* Explicitly align kernel text.mortimer2021-03-072-5/+6
| | | | | | | lld11 no longer quietly aligns this when given an address, so we do the alignment explicitly. ok kettenis@
* Since with the current design there's one device per domain, and onepatrick2021-03-061-17/+11
| | | | | | | | | | domain per pagetable, there's no need for a backpointer to the domain in the pagetable entry descriptor. There can't be any other domain. Also since there's no list, no list entry member is needed either. This reduces early allocation to half of the previous size. I think it's possible to reduce it even further and not need a pagetable entry descriptor at all, but I need to think about that a bit more.
* One major issue talked about in research papers is reducing the overheadpatrick2021-03-061-61/+103
| | | | | | | | | | | | | | | | | | | of the IOVA allocation. As far as I can see the current "best solution" is to cache IOVA ranges in percpu magazines. I don't think we have this issue at all thanks to bus_dmamap_create(9). The map is created ahead of time, and we know the maximum size of the DMA transfer. Since with smmu(4) we have IOVA per domain, allocating IOVA 'early' is essentially free. But pagetable mapping also incurs a performance penalty, since we allocate pagetable entry descriptors through pools. Since we have the IOVA early, we can allocate those early as well. This allocation is a bit more expensive though, but can be optimized further. All this means that there is no allocation overhead in hot code paths. The "only" thing remaining is assigning IOVA to the segments, adjusting the pagetable mappings, and flushing the IOTLB on unload. Maybe there's a way to do a combined flush for NICs, because we give a list of mbufs to the network stack and we could do the IOTLB invalidation only once right before we hand over the mbuf list to the upper layers.
* ansijsg2021-03-062-6/+4
|
* Improve readability of softc accesses.patrick2021-03-051-13/+20
|
* Introduce an IOVA allocator instead of mapping pages 1:1. Mapping pages 1:1patrick2021-03-052-106/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | obviously reduces the overhead of IOVA allocation, but instead you have the problem of doubly mapped pages, and making sure a page is only unmapped once the last user is gone. My initial attempt, modeled after apldart(4), calls the allocator for each segment. Unfortunately this introduces a performance penalty which reduces performance from around 700 Mbit/s to about 20 Mbit/s, or even less, in a simple single stream tcpbench scenario. Most mbufs from userland seem to have at least 3 segments. Calculating the needed IOVA space upfront reduces this penalty. IOVA allocation overhead could be reduced once and for all if it is possible to reserve IOVA during bus_dmamap_create(9), as it is only called upon creation and basically never for each DMA cycle. This needs some more thought. With this we now put the pressure on the PTED pools instead. Additionally, but not part of this diff, percpu pools for the PTEDs seem to reduce the overhead for that single stream tcpbench scenario to 0.3%. Right now this means we're hitting a different bottleneck, not related to the IOMMU. The next bottleneck will be discovered once forwarding is unlocked. Though it should be possible to benchmark the current implementation, and different designs, using a cycles counter. With IOVA allocation it's not easily possible to correlate memory passed to bus_dmamem_map(9) with memory passed to bus_dmamap_load(9). So far my code try to use the same cachability attributes as the kenrel uses for its userland mappings. For the devices we support, there seems to be no need so far. If this ever gives us any trouble in the feature, I'll have a look and fix it. While drivers should call bus_dmamap_unload(9) before bus_dmamap_destroy(9), the API explicitly states that bus_dmamap_destroy(9) should unload the map if it is still loaded. Hence we need to do exactly that. I actually have found one network driver which behaves that way, and the developer intends to change the network driver's behaviour.
* Extend the commented code that shows which additional mappings are needed,patrick2021-03-051-6/+24
| | | | | | | or which regions need to be reserved. As it turns out, a region we should not map is the PCIe address space. Making a PCIe device try to do DMA to an address in PCIe address space will obviously not make its way to SMMU and host memory. We'll probably have to add an API for that.
* Turns out the cores on Apple's M1 SoC only support 8-bit ASIDs.kettenis2021-03-041-52/+57
| | | | | | | | | | | Thank you Apple (not)! Add an initial attempt to support such systems. This isn't good enough since the kernel will hang once you create more than 127 processes. But it makes things work reasonably well until you reach that limit which is good enough to build things on the machine itself. ok patrick@
* Print feature that indicates a CPU core supports 16-bit ASIDs.kettenis2021-03-041-1/+13
| | | | ok patrick@
* Tweak whitespace and adjust prototypes.visa2021-03-041-23/+21
|