| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
is that IOVA allocations always have a gap in-between which produces a fault
on access. If a transfer to a given allocation runs further than expected
we should be able to see it. We pre-allocate IOVA on bus DMA map creation,
and as long as we don't allocate a PTE descriptor, this comes with no cost.
We have plenty of address space anyway, so adding a page-sized gap does not
hurt at all and can only have positive effects.
Idea from kettenis@
|
|
|
|
|
|
|
|
|
| |
pointer address. Not allowing this one to be allocated might help find
driver bugs, where the device is programmed with a NULL pointer. We have
plenty of address space anyway, so excluding this single page does not
hurt at all and can only have positive effects.
Idea from kettenis@
|
|
|
|
| |
ok drahn
|
|
|
|
|
|
|
|
| |
Adjust the region managed by the extend accordingly but avoid the first
and last page. The last page collides with the MSI address used by the
PCIe controller and not using the first page helps finding bugs.
ok patrick@
|
|
|
|
|
|
| |
Per Intel SDM (Vol 3D, App. A.10) bit 0 should be read as a 1 if enabled.
From Adam Steen. ok mlarkin@
|
|
|
|
|
|
|
|
|
|
| |
Do this by clearing all the bits marked RES0 and set all the bits
marked RES1 for the ARMv8.0.
Any optional features introduced in later revisions of the architecture
(such as PAN) will be enabled after SCTLR_EL1 is initialized.
ok patrick@
|
|
|
|
| |
ok patrick@
|
|
|
|
|
|
|
|
|
| |
Match what apm(4/macppc) says and make apmd(8) log an approiate warning when
unsupported power actions are requested.
Merge identical cases while here.
This syncs with the apm ioctl handlers on loongson and arm64.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bootloader command functions must return zero in case of failure,
returning 1 tells the bootloader to boot the currently set kernel iamge.
"machine dtb" is is the wrong way around so using it triggers a boot.
Fix this and print a brief usage (like other commands such as "hexdump" do)
while here.
Feedback OK patrick
|
|
|
|
|
|
|
|
|
| |
The EBADF error is always overwritten for the standby, suspend and
hibernate ioctls, only the mode ioctl has it right.
Merge the now identical casese while here.
OK patrick
|
|
|
|
| |
based on include-what-you-use suggestions
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
map was the wrong way around. The && prevented an EFAULT error and
could pass userland addresses as kernel source to copyout(9). The
kernel could crash with protection fault due to an invalid offset
when reading /dev/kmem.
Also make the range checks stricter. Not only the start address
must be valid, but also the end address must be within the region
to be copied.
Note that sysctl kern.allowkmem=0 makes the bug unreachable by
default.
OK deraadt@
|
|
|
|
|
|
|
| |
or padded, and hence e. g. the access to the PCI vendor/device id would be
broken. The structs for the other tables all seem to be packed as well.
ok kettenis@
|
|
|
|
|
|
| |
can be removed. The only thing left to implement for smmu(4) to work
out of the box with PCIe devices is to reserve the PCIe MMIO windows.
Let's see how we can do this properly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
typically pass the physical address, however retrieved, to our PCIe
controller code. This physical address can in practise be directly
given to the PCIe, but it is not a given that the CPU and the PCIe
controller are able to use the same physical addresses.
This is even more obvious with an smmu(4) inbetween, which can change
the world view by introducing I/O virtual addresses. Hence for this
it is indeed necessary to map those pages, which thanks to integration
with bus_dma(9) works easily.
For this we remember the PCI devices' DMA tag in the interrupt handle
during the MSI map, so that we can use the smmu(4)-hooked DMA tag to
load the physical address.
While some systems might prefer to implement "trapping" pages for MSIs,
to make sure devices cannot trigger other devices' interrupts, we only
make sure the whole page is mapped.
Having the IOMMU create a mapping for each MSI is a bit wasteful, but
for now it's the simplest way to implement it.
Discussed with and ok kettenis@
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
efid_io() simpler. Also fixes the problem on some machines when boot
from CD-ROM. It happened because the previous version passed
unaligned pointers to the functions even if it is restricted by the
IoAlign property of the media. idea from kettenis, work with asou
ok kettenis
|
|
|
|
| |
Same change made to arm64 a week ago.
|
|
|
|
|
|
| |
for framebuffer nodes under / and /chosen.
Same change made to arm64 last month.
|
|
|
|
| |
ok kettenis@
|
|
|
|
| |
okay deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Make sure we install a dummy page table in TTBR0_EL1 before we change
the size of the VA space in TCR_EL1.
- Flush the TLB after updating TCR_EL1.
- Flush TLB after installing the real kernel page table in TTBR1_EL1.
- Add some barriers around TLB flushes to make it consistent with
other places where we do TLB flushes.
ok drahn@, patrick@
|
|
|
|
|
|
|
| |
map them. This makes ACPI's call to acpi_iommu_device_map() do work
through acpiiort(4).
ok kettenis@
|
|
|
|
|
|
|
|
|
|
| |
PCI attach args and replacing the DMA tag inside. Our other IOMMU API
though takes a DMA tag and returns the old one or a new one. To have
acpiiort(4) integrate better with non-PCI ACPI devices, change the API
so that it is more similar to the other API. This also makes the code
easier to understand.
ok kettenis@
|
|
|
|
|
|
|
|
|
| |
is blessed with IOMMU magic, if available. This is mainly for arm64,
since on amd64 and i386 the IOMMU only captures PCIe devices, as far
as I know, which uses the pci_probe_device_hook(). This though is for
non-PCI devices attached through ACPI.
ok kettenis@
|
|
|
|
| |
of this file are only doing cpp #define
|
|
|
|
|
|
| |
as well.
ok drahn@, kn@
|
| |
|
|
|
|
| |
ok patrick@
|
| |
|
| |
|
|
|
|
|
| |
of code which thinks it could be done elsewhere.
ok kurt
|
|
|
|
| |
ok patrick@
|
| |
|
|
|
|
| |
From Thaison Nguyen
|
|
|
|
|
|
|
|
|
|
| |
already calculates _usable_ memory and updates physmem (if it is 0),
whereas ofw_read_mem_regions() was counting usable+unuseable memory.
ie. 4G or more on some machines. powerpc's 32-bit pagetable cannot use memory
beyond 4G phys addr.
(On a 4G machine, physmem64 was calculated as 0, which caused the installer's
auto-diskabel code to place /tmp on the b partition).
ok gkoehler, works for kurt also
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ASIDs. This should only happen on systems with 8-bit ASIDs, which are
currently unsupported in OpenBSD.
The new scheme uses "generations". Whenever we run out of ASIDs we bump
the generation and flush the complete TLB. The pmaps of processes that
are currently on the CPU are carried over into the new generation. This
implementation relies on the scheduler lock to make sure this happens
without any (known) races.
ok patrick@, mpi@
|
|
|
|
|
|
|
| |
lld11 no longer quietly aligns this when given an address, so we do the
alignment explicitly.
ok kettenis@
|
|
|
|
|
|
|
|
|
|
| |
domain per pagetable, there's no need for a backpointer to the domain
in the pagetable entry descriptor. There can't be any other domain.
Also since there's no list, no list entry member is needed either.
This reduces early allocation to half of the previous size. I think
it's possible to reduce it even further and not need a pagetable entry
descriptor at all, but I need to think about that a bit more.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of the IOVA allocation. As far as I can see the current "best solution"
is to cache IOVA ranges in percpu magazines. I don't think we have this
issue at all thanks to bus_dmamap_create(9). The map is created ahead
of time, and we know the maximum size of the DMA transfer. Since with
smmu(4) we have IOVA per domain, allocating IOVA 'early' is essentially
free. But pagetable mapping also incurs a performance penalty, since we
allocate pagetable entry descriptors through pools. Since we have the
IOVA early, we can allocate those early as well. This allocation is a
bit more expensive though, but can be optimized further.
All this means that there is no allocation overhead in hot code paths.
The "only" thing remaining is assigning IOVA to the segments, adjusting
the pagetable mappings, and flushing the IOTLB on unload. Maybe there's
a way to do a combined flush for NICs, because we give a list of mbufs
to the network stack and we could do the IOTLB invalidation only once
right before we hand over the mbuf list to the upper layers.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
obviously reduces the overhead of IOVA allocation, but instead you have the
problem of doubly mapped pages, and making sure a page is only unmapped once
the last user is gone. My initial attempt, modeled after apldart(4), calls
the allocator for each segment. Unfortunately this introduces a performance
penalty which reduces performance from around 700 Mbit/s to about 20 Mbit/s,
or even less, in a simple single stream tcpbench scenario. Most mbufs from
userland seem to have at least 3 segments. Calculating the needed IOVA space
upfront reduces this penalty. IOVA allocation overhead could be reduced once
and for all if it is possible to reserve IOVA during bus_dmamap_create(9), as
it is only called upon creation and basically never for each DMA cycle. This
needs some more thought.
With this we now put the pressure on the PTED pools instead. Additionally, but
not part of this diff, percpu pools for the PTEDs seem to reduce the overhead
for that single stream tcpbench scenario to 0.3%. Right now this means we're
hitting a different bottleneck, not related to the IOMMU. The next bottleneck
will be discovered once forwarding is unlocked. Though it should be possible
to benchmark the current implementation, and different designs, using a cycles
counter.
With IOVA allocation it's not easily possible to correlate memory passed to
bus_dmamem_map(9) with memory passed to bus_dmamap_load(9). So far my code
try to use the same cachability attributes as the kenrel uses for its userland
mappings. For the devices we support, there seems to be no need so far. If
this ever gives us any trouble in the feature, I'll have a look and fix it.
While drivers should call bus_dmamap_unload(9) before bus_dmamap_destroy(9),
the API explicitly states that bus_dmamap_destroy(9) should unload the map
if it is still loaded. Hence we need to do exactly that. I actually have
found one network driver which behaves that way, and the developer intends
to change the network driver's behaviour.
|
|
|
|
|
|
|
| |
or which regions need to be reserved. As it turns out, a region we should
not map is the PCIe address space. Making a PCIe device try to do DMA to
an address in PCIe address space will obviously not make its way to SMMU
and host memory. We'll probably have to add an API for that.
|
|
|
|
|
|
|
|
|
|
|
| |
Thank you Apple (not)!
Add an initial attempt to support such systems. This isn't good enough
since the kernel will hang once you create more than 127 processes.
But it makes things work reasonably well until you reach that limit
which is good enough to build things on the machine itself.
ok patrick@
|
|
|
|
| |
ok patrick@
|
| |
|