aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/PCI
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--Documentation/PCI/acpi-info.rst18
-rw-r--r--Documentation/PCI/boot-interrupts.rst159
-rw-r--r--Documentation/PCI/endpoint/function/binding/pci-ntb.rst38
-rw-r--r--Documentation/PCI/endpoint/function/binding/pci-test.rst26
-rw-r--r--Documentation/PCI/endpoint/function/binding/pci-test.txt19
-rw-r--r--Documentation/PCI/endpoint/index.rst7
-rw-r--r--Documentation/PCI/endpoint/pci-endpoint-cfs.rst26
-rw-r--r--Documentation/PCI/endpoint/pci-endpoint.rst18
-rw-r--r--Documentation/PCI/endpoint/pci-ntb-function.rst348
-rw-r--r--Documentation/PCI/endpoint/pci-ntb-howto.rst161
-rw-r--r--Documentation/PCI/endpoint/pci-vntb-function.rst129
-rw-r--r--Documentation/PCI/endpoint/pci-vntb-howto.rst167
-rw-r--r--Documentation/PCI/index.rst2
-rw-r--r--Documentation/PCI/pci-error-recovery.rst12
-rw-r--r--Documentation/PCI/pci-iov-howto.rst7
-rw-r--r--Documentation/PCI/pci.rst36
-rw-r--r--Documentation/PCI/pcieaer-howto.rst23
-rw-r--r--Documentation/PCI/sysfs-pci.rst138
18 files changed, 1249 insertions, 85 deletions
diff --git a/Documentation/PCI/acpi-info.rst b/Documentation/PCI/acpi-info.rst
index 060217081c79..34c64a5a66ec 100644
--- a/Documentation/PCI/acpi-info.rst
+++ b/Documentation/PCI/acpi-info.rst
@@ -22,9 +22,9 @@ or if the device has INTx interrupts connected by platform interrupt
controllers and a _PRT is needed to describe those connections.
ACPI resource description is done via _CRS objects of devices in the ACPI
-namespace [2].   The _CRS is like a generalized PCI BAR: the OS can read
+namespace [2]. The _CRS is like a generalized PCI BAR: the OS can read
_CRS and figure out what resource is being consumed even if it doesn't have
-a driver for the device [3].  That's important because it means an old OS
+a driver for the device [3]. That's important because it means an old OS
can work correctly even on a system with new devices unknown to the OS.
The new devices might not do anything, but the OS can at least make sure no
resources conflict with them.
@@ -41,15 +41,15 @@ ACPI, that device will have a specific _HID/_CID that tells the OS what
driver to bind to it, and the _CRS tells the OS and the driver where the
device's registers are.
-PCI host bridges are PNP0A03 or PNP0A08 devices.  Their _CRS should
-describe all the address space they consume.  This includes all the windows
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
+describe all the address space they consume. This includes all the windows
they forward down to the PCI bus, as well as registers of the host bridge
-itself that are not forwarded to PCI.  The host bridge registers include
+itself that are not forwarded to PCI. The host bridge registers include
things like secondary/subordinate bus registers that determine the bus
range below the bridge, window registers that describe the apertures, etc.
These are all device-specific, non-architected things, so the only way a
PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which contain
-the device-specific details.  The host bridge registers also include ECAM
+the device-specific details. The host bridge registers also include ECAM
space, since it is consumed by the host bridge.
ACPI defines a Consumer/Producer bit to distinguish the bridge registers
@@ -66,7 +66,7 @@ the PNP0A03/PNP0A08 device itself. The workaround was to describe the
bridge registers (including ECAM space) in PNP0C02 catch-all devices [6].
With the exception of ECAM, the bridge register space is device-specific
anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need to
-know about it.  
+know about it.
New architectures should be able to use "Consumer" Extended Address Space
descriptors in the PNP0A03 device for bridge registers, including ECAM,
@@ -75,9 +75,9 @@ ia64 kernels assume all address space descriptors, including "Consumer"
Extended Address Space ones, are windows, so it would not be safe to
describe bridge registers this way on those architectures.
-PNP0C02 "motherboard" devices are basically a catch-all.  There's no
+PNP0C02 "motherboard" devices are basically a catch-all. There's no
programming model for them other than "don't use these resources for
-anything else."  So a PNP0C02 _CRS should claim any address space that is
+anything else." So a PNP0C02 _CRS should claim any address space that is
(1) not claimed by _CRS under any other device object in the ACPI namespace
and (2) should not be assigned by the OS to something else.
diff --git a/Documentation/PCI/boot-interrupts.rst b/Documentation/PCI/boot-interrupts.rst
new file mode 100644
index 000000000000..2ec70121bfca
--- /dev/null
+++ b/Documentation/PCI/boot-interrupts.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Boot Interrupts
+===============
+
+:Author: - Sean V Kelley <sean.v.kelley@linux.intel.com>
+
+Overview
+========
+
+On PCI Express, interrupts are represented with either MSI or inbound
+interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a
+given Core IO converts the legacy interrupt messages from PCI Express to
+MSI interrupts. If the IO-APIC is disabled (via the mask bits in the
+IO-APIC table entries), the messages are routed to the legacy PCH. This
+in-band interrupt mechanism was traditionally necessary for systems that
+did not support the IO-APIC and for boot. Intel in the past has used the
+term "boot interrupts" to describe this mechanism. Further, the PCI Express
+protocol describes this in-band legacy wire-interrupt INTx mechanism for
+I/O devices to signal PCI-style level interrupts. The subsequent paragraphs
+describe problems with the Core IO handling of INTx message routing to the
+PCH and mitigation within BIOS and the OS.
+
+
+Issue
+=====
+
+When in-band legacy INTx messages are forwarded to the PCH, they in turn
+trigger a new interrupt for which the OS likely lacks a handler. When an
+interrupt goes unhandled over time, they are tracked by the Linux kernel as
+Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
+reaches a specific count with the error "nobody cared". This disabled IRQ
+now prevents valid usage by an existing interrupt which may happen to share
+the IRQ line::
+
+ irq 19: nobody cared (try booting with the "irqpoll" option)
+ CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
+ Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
+ Call Trace:
+
+ <IRQ>
+ ? dump_stack+0x46/0x5e
+ ? __report_bad_irq+0x2e/0xb0
+ ? note_interrupt+0x242/0x290
+ ? nNIKAL100_memoryRead16+0x8/0x10 [nikal]
+ ? handle_irq_event_percpu+0x55/0x70
+ ? handle_irq_event+0x4f/0x80
+ ? handle_fasteoi_irq+0x81/0x180
+ ? handle_irq+0x1c/0x30
+ ? do_IRQ+0x41/0xd0
+ ? common_interrupt+0x84/0x84
+ </IRQ>
+
+ handlers:
+ irq_default_primary_handler threaded usb_hcd_irq
+ Disabling IRQ #19
+
+
+Conditions
+==========
+
+The use of threaded interrupts is the most likely condition to trigger
+this problem today. Threaded interrupts may not be reenabled after the IRQ
+handler wakes. These "one shot" conditions mean that the threaded interrupt
+needs to keep the interrupt line masked until the threaded handler has run.
+Especially when dealing with high data rate interrupts, the thread needs to
+run to completion; otherwise some handlers will end up in stack overflows
+since the interrupt of the issuing device is still active.
+
+Affected Chipsets
+=================
+
+The legacy interrupt forwarding mechanism exists today in a number of
+devices including but not limited to chipsets from AMD/ATI, Broadcom, and
+Intel. Changes made through the mitigations below have been applied to
+drivers/pci/quirks.c
+
+Starting with ICX there are no longer any IO-APICs in the Core IO's
+devices. IO-APIC is only in the PCH. Devices connected to the Core IO's
+PCIe Root Ports will use native MSI/MSI-X mechanisms.
+
+Mitigations
+===========
+
+The mitigations take the form of PCI quirks. The preference has been to
+first identify and make use of a means to disable the routing to the PCH.
+In such a case a quirk to disable boot interrupt generation can be
+added. [1]_
+
+Intel® 6300ESB I/O Controller Hub
+ Alternate Base Address Register:
+ BIE: Boot Interrupt Enable
+
+ == ===========================
+ 0 Boot interrupt is enabled.
+ 1 Boot interrupt is disabled.
+ == ===========================
+
+Intel® Sandy Bridge through Sky Lake based Xeon servers:
+ Coherent Interface Protocol Interrupt Control
+ dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
+ When this bit is set. Local INTx messages received from the
+ Intel® Quick Data DMA/PCI Express ports are not routed to legacy
+ PCH - they are either converted into MSI via the integrated IO-APIC
+ (if the IO-APIC mask bit is clear in the appropriate entries)
+ or cause no further action (when mask bit is set)
+
+In the absence of a way to directly disable the routing, another approach
+has been to make use of PCI Interrupt pin to INTx routing tables for
+purposes of redirecting the interrupt handler to the rerouted interrupt
+line by default. Therefore, on chipsets where this INTx routing cannot be
+disabled, the Linux kernel will reroute the valid interrupt to its legacy
+interrupt. This redirection of the handler will prevent the occurrence of
+the spurious interrupt detection which would ordinarily disable the IRQ
+line due to excessive unhandled counts. [2]_
+
+The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
+disable) the redirection of the interrupt handler to the PCH interrupt
+line. The option can be overridden by either pci=ioapicreroute or
+pci=noioapicreroute. [3]_
+
+
+More Documentation
+==================
+
+There is an overview of the legacy interrupt handling in several datasheets
+(6300ESB and 6700PXH below). While largely the same, it provides insight
+into the evolution of its handling with chipsets.
+
+Example of disabling of the boot interrupt
+------------------------------------------
+
+ - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
+ 5.7.3 Boot Interrupt
+ https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
+
+ - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
+ Datasheet - Volume 2: Registers (Document # 330784-003)
+ 6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
+ https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
+
+Example of handler rerouting
+----------------------------
+
+ - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
+ 2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
+ https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
+
+
+If you have any legacy PCI interrupt questions that aren't answered, email me.
+
+Cheers,
+ Sean V Kelley
+ sean.v.kelley@linux.intel.com
+
+.. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
+.. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
+.. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
diff --git a/Documentation/PCI/endpoint/function/binding/pci-ntb.rst b/Documentation/PCI/endpoint/function/binding/pci-ntb.rst
new file mode 100644
index 000000000000..40253d3d5163
--- /dev/null
+++ b/Documentation/PCI/endpoint/function/binding/pci-ntb.rst
@@ -0,0 +1,38 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+PCI NTB Endpoint Function
+==========================
+
+1) Create a subdirectory to pci_epf_ntb directory in configfs.
+
+Standard EPF Configurable Fields:
+
+================ ===========================================================
+vendorid should be 0x104c
+deviceid should be 0xb00d for TI's J721E SoC
+revid don't care
+progif_code don't care
+subclass_code should be 0x00
+baseclass_code should be 0x5
+cache_line_size don't care
+subsys_vendor_id don't care
+subsys_id don't care
+interrupt_pin don't care
+msi_interrupts don't care
+msix_interrupts don't care
+================ ===========================================================
+
+2) Create a subdirectory to directory created in 1
+
+NTB EPF specific configurable fields:
+
+================ ===========================================================
+db_count Number of doorbells; default = 4
+mw1 size of memory window1
+mw2 size of memory window2
+mw3 size of memory window3
+mw4 size of memory window4
+num_mws Number of memory windows; max = 4
+spad_count Number of scratchpad registers; default = 64
+================ ===========================================================
diff --git a/Documentation/PCI/endpoint/function/binding/pci-test.rst b/Documentation/PCI/endpoint/function/binding/pci-test.rst
new file mode 100644
index 000000000000..57ee866fb165
--- /dev/null
+++ b/Documentation/PCI/endpoint/function/binding/pci-test.rst
@@ -0,0 +1,26 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+PCI Test Endpoint Function
+==========================
+
+name: Should be "pci_epf_test" to bind to the pci_epf_test driver.
+
+Configurable Fields:
+
+================ ===========================================================
+vendorid should be 0x104c
+deviceid should be 0xb500 for DRA74x and 0xb501 for DRA72x
+revid don't care
+progif_code don't care
+subclass_code don't care
+baseclass_code should be 0xff
+cache_line_size don't care
+subsys_vendor_id don't care
+subsys_id don't care
+interrupt_pin Should be 1 - INTA, 2 - INTB, 3 - INTC, 4 -INTD
+msi_interrupts Should be 1 to 32 depending on the number of MSI interrupts
+ to test
+msix_interrupts Should be 1 to 2048 depending on the number of MSI-X
+ interrupts to test
+================ ===========================================================
diff --git a/Documentation/PCI/endpoint/function/binding/pci-test.txt b/Documentation/PCI/endpoint/function/binding/pci-test.txt
deleted file mode 100644
index cd76ba47394b..000000000000
--- a/Documentation/PCI/endpoint/function/binding/pci-test.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-PCI TEST ENDPOINT FUNCTION
-
-name: Should be "pci_epf_test" to bind to the pci_epf_test driver.
-
-Configurable Fields:
-vendorid : should be 0x104c
-deviceid : should be 0xb500 for DRA74x and 0xb501 for DRA72x
-revid : don't care
-progif_code : don't care
-subclass_code : don't care
-baseclass_code : should be 0xff
-cache_line_size : don't care
-subsys_vendor_id : don't care
-subsys_id : don't care
-interrupt_pin : Should be 1 - INTA, 2 - INTB, 3 - INTC, 4 -INTD
-msi_interrupts : Should be 1 to 32 depending on the number of MSI interrupts
- to test
-msix_interrupts : Should be 1 to 2048 depending on the number of MSI-X
- interrupts to test
diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
index d114ea74b444..4d2333e7ae06 100644
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@@ -11,3 +11,10 @@ PCI Endpoint Framework
pci-endpoint-cfs
pci-test-function
pci-test-howto
+ pci-ntb-function
+ pci-ntb-howto
+ pci-vntb-function
+ pci-vntb-howto
+
+ function/binding/pci-test
+ function/binding/pci-ntb
diff --git a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
index b6d39cdec56e..fb73345cfb8a 100644
--- a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
+++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
@@ -24,7 +24,7 @@ Directory Structure
The pci_ep configfs has two directories at its root: controllers and
functions. Every EPC device present in the system will have an entry in
-the *controllers* directory and and every EPF driver present in the system
+the *controllers* directory and every EPF driver present in the system
will have an entry in the *functions* directory.
::
@@ -43,6 +43,7 @@ entries corresponding to EPF driver will be created by the EPF core.
.. <EPF Driver1>/
... <EPF Device 11>/
... <EPF Device 21>/
+ ... <EPF Device 31>/
.. <EPF Driver2>/
... <EPF Device 12>/
... <EPF Device 22>/
@@ -68,6 +69,24 @@ created)
... subsys_vendor_id
... subsys_id
... interrupt_pin
+ ... <Symlink EPF Device 31>/
+ ... primary/
+ ... <Symlink EPC Device1>/
+ ... secondary/
+ ... <Symlink EPC Device2>/
+
+If an EPF device has to be associated with 2 EPCs (like in the case of
+Non-transparent bridge), symlink of endpoint controller connected to primary
+interface should be added in 'primary' directory and symlink of endpoint
+controller connected to secondary interface should be added in 'secondary'
+directory.
+
+The <EPF Device> directory can have a list of symbolic links
+(<Symlink EPF Device 31>) to other <EPF Device>. These symbolic links should
+be created by the user to represent the virtual functions that are bound to
+the physical function. In the above directory structure <EPF Device 11> is a
+physical function and <EPF Device 31> is a virtual function. An EPF device once
+it's linked to another EPF device, cannot be linked to a EPC device.
EPC Device
==========
@@ -88,7 +107,8 @@ entries corresponding to EPC device will be created by the EPC core.
The <EPC Device> directory will have a list of symbolic links to
<EPF Device>. These symbolic links should be created by the user to
-represent the functions present in the endpoint device.
+represent the functions present in the endpoint device. Only <EPF Device>
+that represents a physical function can be linked to a EPC device.
The <EPC Device> directory will also have a *start* field. Once
"1" is written to this field, the endpoint device will be ready to
@@ -115,4 +135,4 @@ all the EPF devices are created and linked with the EPC device.
| interrupt_pin
| function
-[1] :doc:`pci-endpoint`
+[1] Documentation/PCI/endpoint/pci-endpoint.rst
diff --git a/Documentation/PCI/endpoint/pci-endpoint.rst b/Documentation/PCI/endpoint/pci-endpoint.rst
index 0e2311b5617b..4f5622a65555 100644
--- a/Documentation/PCI/endpoint/pci-endpoint.rst
+++ b/Documentation/PCI/endpoint/pci-endpoint.rst
@@ -78,8 +78,8 @@ by the PCI controller driver.
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
-APIs for the PCI Endpoint Function Driver
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+EPC APIs for the PCI Endpoint Function Driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver.
@@ -117,8 +117,8 @@ by the PCI endpoint function driver.
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
free the memory space allocated using pci_epc_mem_alloc_addr().
-Other APIs
-~~~~~~~~~~
+Other EPC APIs
+~~~~~~~~~~~~~~
There are other APIs provided by the EPC library. These are used for binding
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
@@ -160,8 +160,8 @@ PCI Endpoint Function(EPF) Library
The EPF library provides APIs to be used by the function driver and the EPC
library to provide endpoint mode functionality.
-APIs for the PCI Endpoint Function Driver
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+EPF APIs for the PCI Endpoint Function Driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver.
@@ -204,8 +204,8 @@ by the PCI endpoint controller library.
The PCI endpoint controller library invokes pci_epf_linkup() when the
EPC device has established the connection to the host.
-Other APIs
-~~~~~~~~~~
+Other EPF APIs
+~~~~~~~~~~~~~~
There are other APIs provided by the EPF library. These are used to notify
the function driver when the EPF device is bound to the EPC device.
@@ -214,7 +214,7 @@ pci-ep-cfs.c can be used as reference for using these APIs.
* pci_epf_create()
Create a new PCI EPF device by passing the name of the PCI EPF device.
- This name will be used to bind the the EPF device to a EPF driver.
+ This name will be used to bind the EPF device to a EPF driver.
* pci_epf_destroy()
diff --git a/Documentation/PCI/endpoint/pci-ntb-function.rst b/Documentation/PCI/endpoint/pci-ntb-function.rst
new file mode 100644
index 000000000000..3b9d836a4924
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-ntb-function.rst
@@ -0,0 +1,348 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+PCI NTB Function
+=================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
+
+PCI Non-Transparent Bridges (NTB) allow two host systems to communicate
+with each other by exposing each host as a device to the other host.
+NTBs typically support the ability to generate interrupts on the remote
+machine, expose memory ranges as BARs, and perform DMA. They also support
+scratchpads, which are areas of memory within the NTB that are accessible
+from both machines.
+
+PCI NTB Function allows two different systems (or hosts) to communicate
+with each other by configuring the endpoint instances in such a way that
+transactions from one system are routed to the other system.
+
+In the below diagram, PCI NTB function configures the SoC with multiple
+PCI Endpoint (EP) instances in such a way that transactions from one EP
+controller are routed to the other EP controller. Once PCI NTB function
+configures the SoC with multiple EP instances, HOST1 and HOST2 can
+communicate with each other using SoC as a bridge.
+
+.. code-block:: text
+
+ +-------------+ +-------------+
+ | | | |
+ | HOST1 | | HOST2 |
+ | | | |
+ +------^------+ +------^------+
+ | |
+ | |
+ +---------|-------------------------------------------------|---------+
+ | +------v------+ +------v------+ |
+ | | | | | |
+ | | EP | | EP | |
+ | | CONTROLLER1 | | CONTROLLER2 | |
+ | | <-----------------------------------> | |
+ | | | | | |
+ | | | | | |
+ | | | SoC With Multiple EP Instances | | |
+ | | | (Configured using NTB Function) | | |
+ | +-------------+ +-------------+ |
+ +---------------------------------------------------------------------+
+
+Constructs used for Implementing NTB
+====================================
+
+ 1) Config Region
+ 2) Self Scratchpad Registers
+ 3) Peer Scratchpad Registers
+ 4) Doorbell (DB) Registers
+ 5) Memory Window (MW)
+
+
+Config Region:
+--------------
+
+Config Region is a construct that is specific to NTB implemented using NTB
+Endpoint Function Driver. The host and endpoint side NTB function driver will
+exchange information with each other using this region. Config Region has
+Control/Status Registers for configuring the Endpoint Controller. Host can
+write into this region for configuring the outbound Address Translation Unit
+(ATU) and to indicate the link status. Endpoint can indicate the status of
+commands issued by host in this region. Endpoint can also indicate the
+scratchpad offset and number of memory windows to the host using this region.
+
+The format of Config Region is given below. All the fields here are 32 bits.
+
+.. code-block:: text
+
+ +------------------------+
+ | COMMAND |
+ +------------------------+
+ | ARGUMENT |
+ +------------------------+
+ | STATUS |
+ +------------------------+
+ | TOPOLOGY |
+ +------------------------+
+ | ADDRESS (LOWER 32) |
+ +------------------------+
+ | ADDRESS (UPPER 32) |
+ +------------------------+
+ | SIZE |
+ +------------------------+
+ | NO OF MEMORY WINDOW |
+ +------------------------+
+ | MEMORY WINDOW1 OFFSET |
+ +------------------------+
+ | SPAD OFFSET |
+ +------------------------+
+ | SPAD COUNT |
+ +------------------------+
+ | DB ENTRY SIZE |
+ +------------------------+
+ | DB DATA |
+ +------------------------+
+ | : |
+ +------------------------+
+ | : |
+ +------------------------+
+ | DB DATA |
+ +------------------------+
+
+
+ COMMAND:
+
+ NTB function supports three commands:
+
+ CMD_CONFIGURE_DOORBELL (0x1): Command to configure doorbell. Before
+ invoking this command, the host should allocate and initialize
+ MSI/MSI-X vectors (i.e., initialize the MSI/MSI-X Capability in the
+ Endpoint). The endpoint on receiving this command will configure
+ the outbound ATU such that transactions to Doorbell BAR will be routed
+ to the MSI/MSI-X address programmed by the host. The ARGUMENT
+ register should be populated with number of DBs to configure (in the
+ lower 16 bits) and if MSI or MSI-X should be configured (BIT 16).
+
+ CMD_CONFIGURE_MW (0x2): Command to configure memory window (MW). The
+ host invokes this command after allocating a buffer that can be
+ accessed by remote host. The allocated address should be programmed
+ in the ADDRESS register (64 bit), the size should be programmed in
+ the SIZE register and the memory window index should be programmed
+ in the ARGUMENT register. The endpoint on receiving this command
+ will configure the outbound ATU such that transactions to MW BAR
+ are routed to the address provided by the host.
+
+ CMD_LINK_UP (0x3): Command to indicate an NTB application is
+ bound to the EP device on the host side. Once the endpoint
+ receives this command from both the hosts, the endpoint will
+ raise a LINK_UP event to both the hosts to indicate the host
+ NTB applications can start communicating with each other.
+
+ ARGUMENT:
+
+ The value of this register is based on the commands issued in
+ command register. See COMMAND section for more information.
+
+ TOPOLOGY:
+
+ Set to NTB_TOPO_B2B_USD for Primary interface
+ Set to NTB_TOPO_B2B_DSD for Secondary interface
+
+ ADDRESS/SIZE:
+
+ Address and Size to be used while configuring the memory window.
+ See "CMD_CONFIGURE_MW" for more info.
+
+ MEMORY WINDOW1 OFFSET:
+
+ Memory Window 1 and Doorbell registers are packed together in the
+ same BAR. The initial portion of the region will have doorbell
+ registers and the latter portion of the region is for memory window 1.
+ This register will specify the offset of the memory window 1.
+
+ NO OF MEMORY WINDOW:
+
+ Specifies the number of memory windows supported by the NTB device.
+
+ SPAD OFFSET:
+
+ Self scratchpad region and config region are packed together in the
+ same BAR. The initial portion of the region will have config region
+ and the latter portion of the region is for self scratchpad. This
+ register will specify the offset of the self scratchpad registers.
+
+ SPAD COUNT:
+
+ Specifies the number of scratchpad registers supported by the NTB
+ device.
+
+ DB ENTRY SIZE:
+
+ Used to determine the offset within the DB BAR that should be written
+ in order to raise doorbell. EPF NTB can use either MSI or MSI-X to
+ ring doorbell (MSI-X support will be added later). MSI uses same
+ address for all the interrupts and MSI-X can provide different
+ addresses for different interrupts. The MSI/MSI-X address is provided
+ by the host and the address it gives is based on the MSI/MSI-X
+ implementation supported by the host. For instance, ARM platform
+ using GIC ITS will have the same MSI-X address for all the interrupts.
+ In order to support all the combinations and use the same mechanism
+ for both MSI and MSI-X, EPF NTB allocates a separate region in the
+ Outbound Address Space for each of the interrupts. This region will
+ be mapped to the MSI/MSI-X address provided by the host. If a host
+ provides the same address for all the interrupts, all the regions
+ will be translated to the same address. If a host provides different
+ addresses, the regions will be translated to different addresses. This
+ will ensure there is no difference while raising the doorbell.
+
+ DB DATA:
+
+ EPF NTB supports 32 interrupts, so there are 32 DB DATA registers.
+ This holds the MSI/MSI-X data that has to be written to MSI address
+ for raising doorbell interrupt. This will be populated by EPF NTB
+ while invoking CMD_CONFIGURE_DOORBELL.
+
+Scratchpad Registers:
+---------------------
+
+ Each host has its own register space allocated in the memory of NTB endpoint
+ controller. They are both readable and writable from both sides of the bridge.
+ They are used by applications built over NTB and can be used to pass control
+ and status information between both sides of a device.
+
+ Scratchpad registers has 2 parts
+ 1) Self Scratchpad: Host's own register space
+ 2) Peer Scratchpad: Remote host's register space.
+
+Doorbell Registers:
+-------------------
+
+ Doorbell Registers are used by the hosts to interrupt each other.
+
+Memory Window:
+--------------
+
+ Actual transfer of data between the two hosts will happen using the
+ memory window.
+
+Modeling Constructs:
+====================
+
+There are 5 or more distinct regions (config, self scratchpad, peer
+scratchpad, doorbell, one or more memory windows) to be modeled to achieve
+NTB functionality. At least one memory window is required while more than
+one is permitted. All these regions should be mapped to BARs for hosts to
+access these regions.
+
+If one 32-bit BAR is allocated for each of these regions, the scheme would
+look like this:
+
+====== ===============
+BAR NO CONSTRUCTS USED
+====== ===============
+BAR0 Config Region
+BAR1 Self Scratchpad
+BAR2 Peer Scratchpad
+BAR3 Doorbell
+BAR4 Memory Window 1
+BAR5 Memory Window 2
+====== ===============
+
+However if we allocate a separate BAR for each of the regions, there would not
+be enough BARs for all the regions in a platform that supports only 64-bit
+BARs.
+
+In order to be supported by most of the platforms, the regions should be
+packed and mapped to BARs in a way that provides NTB functionality and
+also makes sure the host doesn't access any region that it is not supposed
+to.
+
+The following scheme is used in EPF NTB Function:
+
+====== ===============================
+BAR NO CONSTRUCTS USED
+====== ===============================
+BAR0 Config Region + Self Scratchpad
+BAR1 Peer Scratchpad
+BAR2 Doorbell + Memory Window 1
+BAR3 Memory Window 2
+BAR4 Memory Window 3
+BAR5 Memory Window 4
+====== ===============================
+
+With this scheme, for the basic NTB functionality 3 BARs should be sufficient.
+
+Modeling Config/Scratchpad Region:
+----------------------------------
+
+.. code-block:: text
+
+ +-----------------+------->+------------------+ +-----------------+
+ | BAR0 | | CONFIG REGION | | BAR0 |
+ +-----------------+----+ +------------------+<-------+-----------------+
+ | BAR1 | | |SCRATCHPAD REGION | | BAR1 |
+ +-----------------+ +-->+------------------+<-------+-----------------+
+ | BAR2 | Local Memory | BAR2 |
+ +-----------------+ +-----------------+
+ | BAR3 | | BAR3 |
+ +-----------------+ +-----------------+
+ | BAR4 | | BAR4 |
+ +-----------------+ +-----------------+
+ | BAR5 | | BAR5 |
+ +-----------------+ +-----------------+
+ EP CONTROLLER 1 EP CONTROLLER 2
+
+Above diagram shows Config region + Scratchpad region for HOST1 (connected to
+EP controller 1) allocated in local memory. The HOST1 can access the config
+region and scratchpad region (self scratchpad) using BAR0 of EP controller 1.
+The peer host (HOST2 connected to EP controller 2) can also access this
+scratchpad region (peer scratchpad) using BAR1 of EP controller 2. This
+diagram shows the case where Config region and Scratchpad regions are allocated
+for HOST1, however the same is applicable for HOST2.
+
+Modeling Doorbell/Memory Window 1:
+----------------------------------
+
+.. code-block:: text
+
+ +-----------------+ +----->+----------------+-----------+-----------------+
+ | BAR0 | | | Doorbell 1 +-----------> MSI-X ADDRESS 1 |
+ +-----------------+ | +----------------+ +-----------------+
+ | BAR1 | | | Doorbell 2 +---------+ | |
+ +-----------------+----+ +----------------+ | | |
+ | BAR2 | | Doorbell 3 +-------+ | +-----------------+
+ +-----------------+----+ +----------------+ | +-> MSI-X ADDRESS 2 |
+ | BAR3 | | | Doorbell 4 +-----+ | +-----------------+
+ +-----------------+ | |----------------+ | | | |
+ | BAR4 | | | | | | +-----------------+
+ +-----------------+ | | MW1 +---+ | +-->+ MSI-X ADDRESS 3||
+ | BAR5 | | | | | | +-----------------+
+ +-----------------+ +----->-----------------+ | | | |
+ EP CONTROLLER 1 | | | | +-----------------+
+ | | | +---->+ MSI-X ADDRESS 4 |
+ +----------------+ | +-----------------+
+ EP CONTROLLER 2 | | |
+ (OB SPACE) | | |
+ +-------> MW1 |
+ | |
+ | |
+ +-----------------+
+ | |
+ | |
+ | |
+ | |
+ | |
+ +-----------------+
+ PCI Address Space
+ (Managed by HOST2)
+
+Above diagram shows how the doorbell and memory window 1 is mapped so that
+HOST1 can raise doorbell interrupt on HOST2 and also how HOST1 can access
+buffers exposed by HOST2 using memory window1 (MW1). Here doorbell and
+memory window 1 regions are allocated in EP controller 2 outbound (OB) address
+space. Allocating and configuring BARs for doorbell and memory window1
+is done during the initialization phase of NTB endpoint function driver.
+Mapping from EP controller 2 OB space to PCI address space is done when HOST2
+sends CMD_CONFIGURE_MW/CMD_CONFIGURE_DOORBELL.
+
+Modeling Optional Memory Windows:
+---------------------------------
+
+This is modeled the same was as MW1 but each of the additional memory windows
+is mapped to separate BARs.
diff --git a/Documentation/PCI/endpoint/pci-ntb-howto.rst b/Documentation/PCI/endpoint/pci-ntb-howto.rst
new file mode 100644
index 000000000000..1884bf29caba
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-ntb-howto.rst
@@ -0,0 +1,161 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================================
+PCI Non-Transparent Bridge (NTB) Endpoint Function (EPF) User Guide
+===================================================================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
+
+This document is a guide to help users use pci-epf-ntb function driver
+and ntb_hw_epf host driver for NTB functionality. The list of steps to
+be followed in the host side and EP side is given below. For the hardware
+configuration and internals of NTB using configurable endpoints see
+Documentation/PCI/endpoint/pci-ntb-function.rst
+
+Endpoint Device
+===============
+
+Endpoint Controller Devices
+---------------------------
+
+For implementing NTB functionality at least two endpoint controller devices
+are required.
+
+To find the list of endpoint controller devices in the system::
+
+ # ls /sys/class/pci_epc/
+ 2900000.pcie-ep 2910000.pcie-ep
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/controllers
+ 2900000.pcie-ep 2910000.pcie-ep
+
+
+Endpoint Function Drivers
+-------------------------
+
+To find the list of endpoint function drivers in the system::
+
+ # ls /sys/bus/pci-epf/drivers
+ pci_epf_ntb pci_epf_ntb
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/functions
+ pci_epf_ntb pci_epf_ntb
+
+
+Creating pci-epf-ntb Device
+----------------------------
+
+PCI endpoint function device can be created using the configfs. To create
+pci-epf-ntb device, the following commands can be used::
+
+ # mount -t configfs none /sys/kernel/config
+ # cd /sys/kernel/config/pci_ep/
+ # mkdir functions/pci_epf_ntb/func1
+
+The "mkdir func1" above creates the pci-epf-ntb function device that will
+be probed by pci_epf_ntb driver.
+
+The PCI endpoint framework populates the directory with the following
+configurable fields::
+
+ # ls functions/pci_epf_ntb/func1
+ baseclass_code deviceid msi_interrupts pci-epf-ntb.0
+ progif_code secondary subsys_id vendorid
+ cache_line_size interrupt_pin msix_interrupts primary
+ revid subclass_code subsys_vendor_id
+
+The PCI endpoint function driver populates these entries with default values
+when the device is bound to the driver. The pci-epf-ntb driver populates
+vendorid with 0xffff and interrupt_pin with 0x0001::
+
+ # cat functions/pci_epf_ntb/func1/vendorid
+ 0xffff
+ # cat functions/pci_epf_ntb/func1/interrupt_pin
+ 0x0001
+
+
+Configuring pci-epf-ntb Device
+-------------------------------
+
+The user can configure the pci-epf-ntb device using its configfs entry. In order
+to change the vendorid and the deviceid, the following
+commands can be used::
+
+ # echo 0x104c > functions/pci_epf_ntb/func1/vendorid
+ # echo 0xb00d > functions/pci_epf_ntb/func1/deviceid
+
+In order to configure NTB specific attributes, a new sub-directory to func1
+should be created::
+
+ # mkdir functions/pci_epf_ntb/func1/pci_epf_ntb.0/
+
+The NTB function driver will populate this directory with various attributes
+that can be configured by the user::
+
+ # ls functions/pci_epf_ntb/func1/pci_epf_ntb.0/
+ db_count mw1 mw2 mw3 mw4 num_mws
+ spad_count
+
+A sample configuration for NTB function is given below::
+
+ # echo 4 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/db_count
+ # echo 128 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/spad_count
+ # echo 2 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/num_mws
+ # echo 0x100000 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/mw1
+ # echo 0x100000 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/mw2
+
+Binding pci-epf-ntb Device to EP Controller
+--------------------------------------------
+
+NTB function device should be attached to two PCI endpoint controllers
+connected to the two hosts. Use the 'primary' and 'secondary' entries
+inside NTB function device to attach one PCI endpoint controller to
+primary interface and the other PCI endpoint controller to the secondary
+interface::
+
+ # ln -s controllers/2900000.pcie-ep/ functions/pci-epf-ntb/func1/primary
+ # ln -s controllers/2910000.pcie-ep/ functions/pci-epf-ntb/func1/secondary
+
+Once the above step is completed, both the PCI endpoint controllers are ready to
+establish a link with the host.
+
+
+Start the Link
+--------------
+
+In order for the endpoint device to establish a link with the host, the _start_
+field should be populated with '1'. For NTB, both the PCI endpoint controllers
+should establish link with the host::
+
+ # echo 1 > controllers/2900000.pcie-ep/start
+ # echo 1 > controllers/2910000.pcie-ep/start
+
+
+RootComplex Device
+==================
+
+lspci Output
+------------
+
+Note that the devices listed here correspond to the values populated in
+"Creating pci-epf-ntb Device" section above::
+
+ # lspci
+ 0000:00:00.0 PCI bridge: Texas Instruments Device b00d
+ 0000:01:00.0 RAM memory: Texas Instruments Device b00d
+
+
+Using ntb_hw_epf Device
+-----------------------
+
+The host side software follows the standard NTB software architecture in Linux.
+All the existing client side NTB utilities like NTB Transport Client and NTB
+Netdev, NTB Ping Pong Test Client and NTB Tool Test Client can be used with NTB
+function device.
+
+For more information on NTB see
+:doc:`Non-Transparent Bridge <../../driver-api/ntb>`
diff --git a/Documentation/PCI/endpoint/pci-vntb-function.rst b/Documentation/PCI/endpoint/pci-vntb-function.rst
new file mode 100644
index 000000000000..0c51f53ab972
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-vntb-function.rst
@@ -0,0 +1,129 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+PCI vNTB Function
+=================
+
+:Author: Frank Li <Frank.Li@nxp.com>
+
+The difference between PCI NTB function and PCI vNTB function is
+
+PCI NTB function need at two endpoint instances and connect HOST1
+and HOST2.
+
+PCI vNTB function only use one host and one endpoint(EP), use NTB
+connect EP and PCI host
+
+.. code-block:: text
+
+
+ +------------+ +---------------------------------------+
+ | | | |
+ +------------+ | +--------------+
+ | NTB | | | NTB |
+ | NetDev | | | NetDev |
+ +------------+ | +--------------+
+ | NTB | | | NTB |
+ | Transfer | | | Transfer |
+ +------------+ | +--------------+
+ | | | | |
+ | PCI NTB | | | |
+ | EPF | | | |
+ | Driver | | | PCI Virtual |
+ | | +---------------+ | NTB Driver |
+ | | | PCI EP NTB |<------>| |
+ | | | FN Driver | | |
+ +------------+ +---------------+ +--------------+
+ | | | | | |
+ | PCI BUS | <-----> | PCI EP BUS | | Virtual PCI |
+ | | PCI | | | BUS |
+ +------------+ +---------------+--------+--------------+
+ PCI RC PCI EP
+
+Constructs used for Implementing vNTB
+=====================================
+
+ 1) Config Region
+ 2) Self Scratchpad Registers
+ 3) Peer Scratchpad Registers
+ 4) Doorbell (DB) Registers
+ 5) Memory Window (MW)
+
+
+Config Region:
+--------------
+
+It is same as PCI NTB Function driver
+
+Scratchpad Registers:
+---------------------
+
+It is appended after Config region.
+
+.. code-block:: text
+
+
+ +--------------------------------------------------+ Base
+ | |
+ | |
+ | |
+ | Common Config Register |
+ | |
+ | |
+ | |
+ +-----------------------+--------------------------+ Base + span_offset
+ | | |
+ | Peer Span Space | Span Space |
+ | | |
+ | | |
+ +-----------------------+--------------------------+ Base + span_offset
+ | | | + span_count * 4
+ | | |
+ | Span Space | Peer Span Space |
+ | | |
+ +-----------------------+--------------------------+
+ Virtual PCI Pcie Endpoint
+ NTB Driver NTB Driver
+
+
+Doorbell Registers:
+-------------------
+
+ Doorbell Registers are used by the hosts to interrupt each other.
+
+Memory Window:
+--------------
+
+ Actual transfer of data between the two hosts will happen using the
+ memory window.
+
+Modeling Constructs:
+====================
+
+32-bit BARs.
+
+====== ===============
+BAR NO CONSTRUCTS USED
+====== ===============
+BAR0 Config Region
+BAR1 Doorbell
+BAR2 Memory Window 1
+BAR3 Memory Window 2
+BAR4 Memory Window 3
+BAR5 Memory Window 4
+====== ===============
+
+64-bit BARs.
+
+====== ===============================
+BAR NO CONSTRUCTS USED
+====== ===============================
+BAR0 Config Region + Scratchpad
+BAR1
+BAR2 Doorbell
+BAR3
+BAR4 Memory Window 1
+BAR5
+====== ===============================
+
+
diff --git a/Documentation/PCI/endpoint/pci-vntb-howto.rst b/Documentation/PCI/endpoint/pci-vntb-howto.rst
new file mode 100644
index 000000000000..4ab8e4a26d4b
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-vntb-howto.rst
@@ -0,0 +1,167 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================================
+PCI Non-Transparent Bridge (NTB) Endpoint Function (EPF) User Guide
+===================================================================
+
+:Author: Frank Li <Frank.Li@nxp.com>
+
+This document is a guide to help users use pci-epf-vntb function driver
+and ntb_hw_epf host driver for NTB functionality. The list of steps to
+be followed in the host side and EP side is given below. For the hardware
+configuration and internals of NTB using configurable endpoints see
+Documentation/PCI/endpoint/pci-vntb-function.rst
+
+Endpoint Device
+===============
+
+Endpoint Controller Devices
+---------------------------
+
+To find the list of endpoint controller devices in the system::
+
+ # ls /sys/class/pci_epc/
+ 5f010000.pcie_ep
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/controllers
+ 5f010000.pcie_ep
+
+Endpoint Function Drivers
+-------------------------
+
+To find the list of endpoint function drivers in the system::
+
+ # ls /sys/bus/pci-epf/drivers
+ pci_epf_ntb pci_epf_test pci_epf_vntb
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/functions
+ pci_epf_ntb pci_epf_test pci_epf_vntb
+
+
+Creating pci-epf-vntb Device
+----------------------------
+
+PCI endpoint function device can be created using the configfs. To create
+pci-epf-vntb device, the following commands can be used::
+
+ # mount -t configfs none /sys/kernel/config
+ # cd /sys/kernel/config/pci_ep/
+ # mkdir functions/pci_epf_vntb/func1
+
+The "mkdir func1" above creates the pci-epf-ntb function device that will
+be probed by pci_epf_vntb driver.
+
+The PCI endpoint framework populates the directory with the following
+configurable fields::
+
+ # ls functions/pci_epf_ntb/func1
+ baseclass_code deviceid msi_interrupts pci-epf-ntb.0
+ progif_code secondary subsys_id vendorid
+ cache_line_size interrupt_pin msix_interrupts primary
+ revid subclass_code subsys_vendor_id
+
+The PCI endpoint function driver populates these entries with default values
+when the device is bound to the driver. The pci-epf-vntb driver populates
+vendorid with 0xffff and interrupt_pin with 0x0001::
+
+ # cat functions/pci_epf_vntb/func1/vendorid
+ 0xffff
+ # cat functions/pci_epf_vntb/func1/interrupt_pin
+ 0x0001
+
+
+Configuring pci-epf-vntb Device
+-------------------------------
+
+The user can configure the pci-epf-vntb device using its configfs entry. In order
+to change the vendorid and the deviceid, the following
+commands can be used::
+
+ # echo 0x1957 > functions/pci_epf_vntb/func1/vendorid
+ # echo 0x0809 > functions/pci_epf_vntb/func1/deviceid
+
+In order to configure NTB specific attributes, a new sub-directory to func1
+should be created::
+
+ # mkdir functions/pci_epf_vntb/func1/pci_epf_vntb.0/
+
+The NTB function driver will populate this directory with various attributes
+that can be configured by the user::
+
+ # ls functions/pci_epf_vntb/func1/pci_epf_vntb.0/
+ db_count mw1 mw2 mw3 mw4 num_mws
+ spad_count
+
+A sample configuration for NTB function is given below::
+
+ # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
+ # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
+ # echo 1 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
+ # echo 0x100000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
+
+A sample configuration for virtual NTB driver for virutal PCI bus::
+
+ # echo 0x1957 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
+ # echo 0x080A > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
+ # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
+
+Binding pci-epf-ntb Device to EP Controller
+--------------------------------------------
+
+NTB function device should be attached to PCI endpoint controllers
+connected to the host.
+
+ # ln -s controllers/5f010000.pcie_ep functions/pci-epf-ntb/func1/primary
+
+Once the above step is completed, the PCI endpoint controllers are ready to
+establish a link with the host.
+
+
+Start the Link
+--------------
+
+In order for the endpoint device to establish a link with the host, the _start_
+field should be populated with '1'. For NTB, both the PCI endpoint controllers
+should establish link with the host (imx8 don't need this steps)::
+
+ # echo 1 > controllers/5f010000.pcie_ep/start
+
+RootComplex Device
+==================
+
+lspci Output at Host side
+-------------------------
+
+Note that the devices listed here correspond to the values populated in
+"Creating pci-epf-ntb Device" section above::
+
+ # lspci
+ 00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0000 (rev 01)
+ 01:00.0 RAM memory: Freescale Semiconductor Inc Device 0809
+
+Endpoint Device / Virtual PCI bus
+=================================
+
+lspci Output at EP Side / Virtual PCI bus
+-----------------------------------------
+
+Note that the devices listed here correspond to the values populated in
+"Creating pci-epf-ntb Device" section above::
+
+ # lspci
+ 10:00.0 Unassigned class [ffff]: Dawicontrol Computersysteme GmbH Device 1234 (rev ff)
+
+Using ntb_hw_epf Device
+-----------------------
+
+The host side software follows the standard NTB software architecture in Linux.
+All the existing client side NTB utilities like NTB Transport Client and NTB
+Netdev, NTB Ping Pong Test Client and NTB Tool Test Client can be used with NTB
+function device.
+
+For more information on NTB see
+:doc:`Non-Transparent Bridge <../../driver-api/ntb>`
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 6768305e4c26..c17c87af1968 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -12,7 +12,9 @@ Linux PCI Bus Subsystem
pciebus-howto
pci-iov-howto
msi-howto
+ sysfs-pci
acpi-info
pci-error-recovery
pcieaer-howto
endpoint/index
+ boot-interrupts
diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
index 13beee23cb04..187f43a03200 100644
--- a/Documentation/PCI/pci-error-recovery.rst
+++ b/Documentation/PCI/pci-error-recovery.rst
@@ -79,7 +79,7 @@ This structure has the form::
struct pci_error_handlers
{
- int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
+ int (*error_detected)(struct pci_dev *dev, pci_channel_state_t);
int (*mmio_enabled)(struct pci_dev *dev);
int (*slot_reset)(struct pci_dev *dev);
void (*resume)(struct pci_dev *dev);
@@ -87,11 +87,11 @@ This structure has the form::
The possible channel states are::
- enum pci_channel_state {
+ typedef enum {
pci_channel_io_normal, /* I/O channel is in normal state */
pci_channel_io_frozen, /* I/O to channel is blocked */
pci_channel_io_perm_failure, /* PCI card is dead */
- };
+ } pci_channel_state_t;
Possible return values are::
@@ -248,7 +248,7 @@ STEP 4: Slot Reset
------------------
In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
-the platform will perform a slot reset on the requesting PCI device(s).
+platform will perform a slot reset on the requesting PCI device(s).
The actual steps taken by a platform to perform a slot reset
will be platform-dependent. Upon completion of slot reset, the
platform will call the device slot_reset() callback.
@@ -295,7 +295,7 @@ and let the driver restart normal I/O processing.
A driver can still return a critical failure for this function if
it can't get the device operational after reset. If the platform
previously tried a soft reset, it might now try a hard reset (power
-cycle) and then call slot_reset() again. It the device still can't
+cycle) and then call slot_reset() again. If the device still can't
be recovered, there is nothing more that can be done; the platform
will typically report a "permanent failure" in such a case. The
device will be considered "dead" in this case.
@@ -348,7 +348,7 @@ STEP 6: Permanent Failure
-------------------------
A "permanent failure" has occurred, and the platform cannot recover
the device. The platform will call error_detected() with a
-pci_channel_state value of pci_channel_io_perm_failure.
+pci_channel_state_t value of pci_channel_io_perm_failure.
The device driver should, at this point, assume the worst. It should
cancel all pending I/O, refuse all new I/O, returning -EIO to
diff --git a/Documentation/PCI/pci-iov-howto.rst b/Documentation/PCI/pci-iov-howto.rst
index b9fd003206f1..27d35933cea2 100644
--- a/Documentation/PCI/pci-iov-howto.rst
+++ b/Documentation/PCI/pci-iov-howto.rst
@@ -125,14 +125,14 @@ Following piece of code illustrates the usage of the SR-IOV API.
...
}
- static int dev_suspend(struct pci_dev *dev, pm_message_t state)
+ static int dev_suspend(struct device *dev)
{
...
return 0;
}
- static int dev_resume(struct pci_dev *dev)
+ static int dev_resume(struct device *dev)
{
...
@@ -165,8 +165,7 @@ Following piece of code illustrates the usage of the SR-IOV API.
.id_table = dev_id_table,
.probe = dev_probe,
.remove = dev_remove,
- .suspend = dev_suspend,
- .resume = dev_resume,
+ .driver.pm = &dev_pm_ops,
.shutdown = dev_shutdown,
.sriov_configure = dev_sriov_configure,
};
diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
index 6864f9a70f5f..cced568d78e9 100644
--- a/Documentation/PCI/pci.rst
+++ b/Documentation/PCI/pci.rst
@@ -17,7 +17,7 @@ PCI device drivers.
A more complete resource is the third edition of "Linux Device Drivers"
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
LDD3 is available for free (under Creative Commons License) from:
-http://lwn.net/Kernel/LDD3/.
+https://lwn.net/Kernel/LDD3/.
However, keep in mind that all documents are subject to "bit rot".
Refer to the source code if things are not working as described here.
@@ -103,6 +103,7 @@ need pass only as many optional fields as necessary:
- subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
- class and classmask fields default to 0
- driver_data defaults to 0UL.
+ - override_only field defaults to 0.
Note that driver_data must match the value used by any of the pci_device_id
entries defined in the driver. This makes the driver_data field mandatory
@@ -209,12 +210,12 @@ the PCI device by calling pci_enable_device(). This will:
OS BUG: we don't check resource allocations before enabling those
resources. The sequence would make more sense if we called
pci_request_resources() before calling pci_enable_device().
- Currently, the device drivers can't detect the bug when when two
+ Currently, the device drivers can't detect the bug when two
devices have been allocated the same range. This is not a common
problem and unlikely to get fixed soon.
This has been discussed before but not changed as of 2.6.19:
- http://lkml.org/lkml/2006/3/2/194
+ https://lore.kernel.org/r/20060302180025.GC28895@flint.arm.linux.org.uk/
pci_set_master() will enable DMA by setting the bus master bit
@@ -239,7 +240,7 @@ from the PCI device config space. Use the values in the pci_dev structure
as the PCI "bus address" might have been remapped to a "host physical"
address by the arch/chip-set specific kernel support.
-See Documentation/io-mapping.txt for how to access device registers
+See Documentation/driver-api/io-mapping.rst for how to access device registers
or device memory.
The device driver needs to call pci_request_region() to verify
@@ -265,33 +266,33 @@ Set the DMA mask size
---------------------
.. note::
If anything below doesn't make sense, please refer to
- Documentation/DMA-API.txt. This section is just a reminder that
+ Documentation/core-api/dma-api.rst. This section is just a reminder that
drivers need to indicate DMA capabilities of the device and is not
an authoritative source for DMA interfaces.
While all drivers should explicitly indicate the DMA capability
(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
32-bit bus master capability for streaming data need the driver
-to "register" this capability by calling pci_set_dma_mask() with
+to "register" this capability by calling dma_set_mask() with
appropriate parameters. In general this allows more efficient DMA
on systems where System RAM exists above 4G _physical_ address.
Drivers for all PCI-X and PCIe compliant devices must call
-pci_set_dma_mask() as they are 64-bit DMA devices.
+dma_set_mask() as they are 64-bit DMA devices.
Similarly, drivers must also "register" this capability if the device
-can directly address "consistent memory" in System RAM above 4G physical
-address by calling pci_set_consistent_dma_mask().
+can directly address "coherent memory" in System RAM above 4G physical
+address by calling dma_set_coherent_mask().
Again, this includes drivers for all PCI-X and PCIe compliant devices.
Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
64-bit DMA capable for payload ("streaming") data but not control
-("consistent") data.
+("coherent") data.
Setup shared control data
-------------------------
-Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
-memory. See Documentation/DMA-API.txt for a full description of
+Once the DMA masks are set, the driver can allocate "coherent" (a.k.a. shared)
+memory. See Documentation/core-api/dma-api.rst for a full description of
the DMA APIs. This section is just a reminder that it needs to be done
before enabling DMA on the device.
@@ -366,7 +367,7 @@ steps need to be performed:
- Disable the device from generating IRQs
- Release the IRQ (free_irq())
- Stop all DMA activity
- - Release DMA buffers (both streaming and consistent)
+ - Release DMA buffers (both streaming and coherent)
- Unregister from other subsystems (e.g. scsi or netdev)
- Disable device from responding to MMIO/IO Port addresses
- Release MMIO/IO Port resource(s)
@@ -419,9 +420,9 @@ Once DMA is stopped, clean up streaming DMA first.
I.e. unmap data buffers and return buffers to "upstream"
owners if there is one.
-Then clean up "consistent" buffers which contain the control data.
+Then clean up "coherent" buffers which contain the control data.
-See Documentation/DMA-API.txt for details on unmapping interfaces.
+See Documentation/core-api/dma-api.rst for details on unmapping interfaces.
Unregister from other subsystems
@@ -514,9 +515,8 @@ your driver if they're helpful, or just use plain hex constants.
The device IDs are arbitrary hex numbers (vendor controlled) and normally used
only in a single location, the pci_device_id table.
-Please DO submit new vendor/device IDs to http://pci-ids.ucw.cz/.
-There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
-and https://github.com/pciutils/pciids.
+Please DO submit new vendor/device IDs to https://pci-ids.ucw.cz/.
+There's a mirror of the pci.ids file at https://github.com/pciutils/pciids.
Obsolete functions
diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
index 18bdefaafd1a..0b36b9ebfa4b 100644
--- a/Documentation/PCI/pcieaer-howto.rst
+++ b/Documentation/PCI/pcieaer-howto.rst
@@ -156,12 +156,6 @@ default reset_link function, but different upstream ports might
have different specifications to reset pci express link, so all
upstream ports should provide their own reset_link functions.
-In struct pcie_port_service_driver, a new pointer, reset_link, is
-added.
-::
-
- pci_ers_result_t (*reset_link) (struct pci_dev *dev);
-
Section 3.2.2.2 provides more detailed info on when to call
reset_link.
@@ -212,15 +206,10 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within
a hierarchy in question. Then, performing link reset at upstream is
necessary. As different kinds of devices might use different approaches
to reset link, AER port service driver is required to provide the
-function to reset link. Firstly, kernel looks for if the upstream
-component has an aer driver. If it has, kernel uses the reset_link
-callback of the aer driver. If the upstream component has no aer driver
-and the port is downstream port, we will perform a hot reset as the
-default by setting the Secondary Bus Reset bit of the Bridge Control
-register associated with the downstream port. As for upstream ports,
-they should provide their own aer service drivers with reset_link
-function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
-reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
+function to reset link via callback parameter of pcie_do_recovery()
+function. If reset_link is not NULL, recovery function will use it
+to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER
+and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
to mmio_enabled.
helper functions
@@ -243,9 +232,9 @@ messages to root port when an error is detected.
::
- int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
+ int pci_aer_clear_nonfatal_status(struct pci_dev *dev);`
-pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
+pci_aer_clear_nonfatal_status clears non-fatal errors in the uncorrectable
error status register.
Frequent Asked Questions
diff --git a/Documentation/PCI/sysfs-pci.rst b/Documentation/PCI/sysfs-pci.rst
new file mode 100644
index 000000000000..f495185aa88a
--- /dev/null
+++ b/Documentation/PCI/sysfs-pci.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
+Accessing PCI device resources through sysfs
+============================================
+
+sysfs, usually mounted at /sys, provides access to PCI resources on platforms
+that support it. For example, a given bus might look like this::
+
+ /sys/devices/pci0000:17
+ |-- 0000:17:00.0
+ | |-- class
+ | |-- config
+ | |-- device
+ | |-- enable
+ | |-- irq
+ | |-- local_cpus
+ | |-- remove
+ | |-- resource
+ | |-- resource0
+ | |-- resource1
+ | |-- resource2
+ | |-- revision
+ | |-- rom
+ | |-- subsystem_device
+ | |-- subsystem_vendor
+ | `-- vendor
+ `-- ...
+
+The topmost element describes the PCI domain and bus number. In this case,
+the domain number is 0000 and the bus number is 17 (both values are in hex).
+This bus contains a single function device in slot 0. The domain and bus
+numbers are reproduced for convenience. Under the device directory are several
+files, each with their own function.
+
+ =================== =====================================================
+ file function
+ =================== =====================================================
+ class PCI class (ascii, ro)
+ config PCI config space (binary, rw)
+ device PCI device (ascii, ro)
+ enable Whether the device is enabled (ascii, rw)
+ irq IRQ number (ascii, ro)
+ local_cpus nearby CPU mask (cpumask, ro)
+ remove remove device from kernel's list (ascii, wo)
+ resource PCI resource host addresses (ascii, ro)
+ resource0..N PCI resource N, if present (binary, mmap, rw\ [1]_)
+ resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
+ revision PCI revision (ascii, ro)
+ rom PCI ROM resource, if present (binary, ro)
+ subsystem_device PCI subsystem device (ascii, ro)
+ subsystem_vendor PCI subsystem vendor (ascii, ro)
+ vendor PCI vendor (ascii, ro)
+ =================== =====================================================
+
+::
+
+ ro - read only file
+ rw - file is readable and writable
+ wo - write only file
+ mmap - file is mmapable
+ ascii - file contains ascii text
+ binary - file contains binary data
+ cpumask - file contains a cpumask type
+
+.. [1] rw for IORESOURCE_IO (I/O port) regions only
+
+The read only files are informational, writes to them will be ignored, with
+the exception of the 'rom' file. Writable files can be used to perform
+actions on the device (e.g. changing config space, detaching a device).
+mmapable files are available via an mmap of the file at offset 0 and can be
+used to do actual device programming from userspace. Note that some platforms
+don't support mmapping of certain resources, so be sure to check the return
+value from any attempted mmap. The most notable of these are I/O port
+resources, which also provide read/write access.
+
+The 'enable' file provides a counter that indicates how many times the device
+has been enabled. If the 'enable' file currently returns '4', and a '1' is
+echoed into it, it will then return '5'. Echoing a '0' into it will decrease
+the count. Even when it returns to 0, though, some of the initialisation
+may not be reversed.
+
+The 'rom' file is special in that it provides read-only access to the device's
+ROM file, if available. It's disabled by default, however, so applications
+should write the string "1" to the file to enable it before attempting a read
+call, and disable it following the access by writing "0" to the file. Note
+that the device must be enabled for a rom read to return data successfully.
+In the event a driver is not bound to the device, it can be enabled using the
+'enable' file, documented above.
+
+The 'remove' file is used to remove the PCI device, by writing a non-zero
+integer to the file. This does not involve any kind of hot-plug functionality,
+e.g. powering off the device. The device is removed from the kernel's list of
+PCI devices, the sysfs directory for it is removed, and the device will be
+removed from any drivers attached to it. Removal of PCI root buses is
+disallowed.
+
+Accessing legacy resources through sysfs
+----------------------------------------
+
+Legacy I/O port and ISA memory resources are also provided in sysfs if the
+underlying platform supports them. They're located in the PCI class hierarchy,
+e.g.::
+
+ /sys/class/pci_bus/0000:17/
+ |-- bridge -> ../../../devices/pci0000:17
+ |-- cpuaffinity
+ |-- legacy_io
+ `-- legacy_mem
+
+The legacy_io file is a read/write file that can be used by applications to
+do legacy port I/O. The application should open the file, seek to the desired
+port (e.g. 0x3e8) and do a read or a write of 1, 2 or 4 bytes. The legacy_mem
+file should be mmapped with an offset corresponding to the memory offset
+desired, e.g. 0xa0000 for the VGA frame buffer. The application can then
+simply dereference the returned pointer (after checking for errors of course)
+to access legacy memory space.
+
+Supporting PCI access on new platforms
+--------------------------------------
+
+In order to support PCI resource mapping as described above, Linux platform
+code should ideally define ARCH_GENERIC_PCI_MMAP_RESOURCE and use the generic
+implementation of that functionality. To support the historical interface of
+mmap() through files in /proc/bus/pci, platforms may also set HAVE_PCI_MMAP.
+
+Alternatively, platforms which set HAVE_PCI_MMAP may provide their own
+implementation of pci_mmap_resource_range() instead of defining
+ARCH_GENERIC_PCI_MMAP_RESOURCE.
+
+Platforms which support write-combining maps of PCI resources must define
+arch_can_pci_mmap_wc() which shall evaluate to non-zero at runtime when
+write-combining is permitted. Platforms which support maps of I/O resources
+define arch_can_pci_mmap_io() similarly.
+
+Legacy resources are protected by the HAVE_PCI_LEGACY define. Platforms
+wishing to support legacy functionality should define it and provide
+pci_legacy_read, pci_legacy_write and pci_mmap_legacy_page_range functions.