aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking/page_pool.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/page_pool.rst')
-rw-r--r--Documentation/networking/page_pool.rst223
1 files changed, 223 insertions, 0 deletions
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
new file mode 100644
index 000000000000..5db8c263b0c6
--- /dev/null
+++ b/Documentation/networking/page_pool.rst
@@ -0,0 +1,223 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Page Pool API
+=============
+
+The page_pool allocator is optimized for the XDP mode that uses one frame
+per-page, but it can fallback on the regular page allocator APIs.
+
+Basic use involves replacing alloc_pages() calls with the
+page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages()
+replacing dev_alloc_pages().
+
+API keeps track of inflight pages, in order to let API user know
+when it is safe to free a page_pool object. Thus, API users
+must run page_pool_release_page() when a page is leaving the page_pool or
+call page_pool_put_page() where appropriate in order to maintain correct
+accounting.
+
+API user must call page_pool_put_page() once on a page, as it
+will either recycle the page, or in case of refcnt > 1, it will
+release the DMA mapping and inflight state accounting.
+
+Architecture overview
+=====================
+
+.. code-block:: none
+
+ +------------------+
+ | Driver |
+ +------------------+
+ ^
+ |
+ |
+ |
+ v
+ +--------------------------------------------+
+ | request memory |
+ +--------------------------------------------+
+ ^ ^
+ | |
+ | Pool empty | Pool has entries
+ | |
+ v v
+ +-----------------------+ +------------------------+
+ | alloc (and map) pages | | get page from cache |
+ +-----------------------+ +------------------------+
+ ^ ^
+ | |
+ | cache available | No entries, refill
+ | | from ptr-ring
+ | |
+ v v
+ +-----------------+ +------------------+
+ | Fast cache | | ptr-ring cache |
+ +-----------------+ +------------------+
+
+API interface
+=============
+The number of pools created **must** match the number of hardware queues
+unless hardware restrictions make that impossible. This would otherwise beat the
+purpose of page pool, which is allocate pages fast from cache without locking.
+This lockless guarantee naturally comes from running under a NAPI softirq.
+The protection doesn't strictly have to be NAPI, any guarantee that allocating
+a page will cause no race conditions is enough.
+
+* page_pool_create(): Create a pool.
+ * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
+ * order: 2^order pages on allocation
+ * pool_size: size of the ptr_ring
+ * nid: preferred NUMA node for allocation
+ * dev: struct device. Used on DMA operations
+ * dma_dir: DMA direction
+ * max_len: max DMA sync memory size
+ * offset: DMA address offset
+
+* page_pool_put_page(): The outcome of this depends on the page refcnt. If the
+ driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1
+ the allocator owns the page and will try to recycle it in one of the pool
+ caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
+ using dma_sync_single_range_for_device().
+
+* page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync
+ for the entire memory area configured in area pool->max_len.
+
+* page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller
+ must guarantee safe context (e.g NAPI), since it will recycle the page
+ directly into the pool fast cache.
+
+* page_pool_release_page(): Unmap the page (if mapped) and account for it on
+ inflight counters.
+
+* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
+ caches.
+
+* page_pool_get_dma_addr(): Retrieve the stored DMA address.
+
+* page_pool_get_dma_dir(): Retrieve the stored DMA direction.
+
+* page_pool_put_page_bulk(): Tries to refill a number of pages into the
+ ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full,
+ page_pool_put_page_bulk() will release leftover pages to the page allocator.
+ page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
+ completion loop for the XDP_REDIRECT use case.
+ Please note the caller must not use data area after running
+ page_pool_put_page_bulk(), as this function overwrites it.
+
+* page_pool_get_stats(): Retrieve statistics about the page_pool. This API
+ is only available if the kernel has been configured with
+ ``CONFIG_PAGE_POOL_STATS=y``. A pointer to a caller allocated ``struct
+ page_pool_stats`` structure is passed to this API which is filled in. The
+ caller can then report those stats to the user (perhaps via ethtool,
+ debugfs, etc.). See below for an example usage of this API.
+
+Stats API and structures
+------------------------
+If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API
+``page_pool_get_stats()`` and structures described below are available. It
+takes a pointer to a ``struct page_pool`` and a pointer to a ``struct
+page_pool_stats`` allocated by the caller.
+
+The API will fill in the provided ``struct page_pool_stats`` with
+statistics about the page_pool.
+
+The stats structure has the following fields::
+
+ struct page_pool_stats {
+ struct page_pool_alloc_stats alloc_stats;
+ struct page_pool_recycle_stats recycle_stats;
+ };
+
+
+The ``struct page_pool_alloc_stats`` has the following fields:
+ * ``fast``: successful fast path allocations
+ * ``slow``: slow path order-0 allocations
+ * ``slow_high_order``: slow path high order allocations
+ * ``empty``: ptr ring is empty, so a slow path allocation was forced.
+ * ``refill``: an allocation which triggered a refill of the cache
+ * ``waive``: pages obtained from the ptr ring that cannot be added to
+ the cache due to a NUMA mismatch.
+
+The ``struct page_pool_recycle_stats`` has the following fields:
+ * ``cached``: recycling placed page in the page pool cache
+ * ``cache_full``: page pool cache was full
+ * ``ring``: page placed into the ptr ring
+ * ``ring_full``: page released from page pool because the ptr ring was full
+ * ``released_refcnt``: page released (and not recycled) because refcnt > 1
+
+Coding examples
+===============
+
+Registration
+------------
+
+.. code-block:: c
+
+ /* Page pool registration */
+ struct page_pool_params pp_params = { 0 };
+ struct xdp_rxq_info xdp_rxq;
+ int err;
+
+ pp_params.order = 0;
+ /* internal DMA mapping in page_pool */
+ pp_params.flags = PP_FLAG_DMA_MAP;
+ pp_params.pool_size = DESC_NUM;
+ pp_params.nid = NUMA_NO_NODE;
+ pp_params.dev = priv->dev;
+ pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
+ page_pool = page_pool_create(&pp_params);
+
+ err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0);
+ if (err)
+ goto err_out;
+
+ err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool);
+ if (err)
+ goto err_out;
+
+NAPI poller
+-----------
+
+
+.. code-block:: c
+
+ /* NAPI Rx poller */
+ enum dma_data_direction dma_dir;
+
+ dma_dir = page_pool_get_dma_dir(dring->page_pool);
+ while (done < budget) {
+ if (some error)
+ page_pool_recycle_direct(page_pool, page);
+ if (packet_is_xdp) {
+ if XDP_DROP:
+ page_pool_recycle_direct(page_pool, page);
+ } else (packet_is_skb) {
+ page_pool_release_page(page_pool, page);
+ new_page = page_pool_dev_alloc_pages(page_pool);
+ }
+ }
+
+Stats
+-----
+
+.. code-block:: c
+
+ #ifdef CONFIG_PAGE_POOL_STATS
+ /* retrieve stats */
+ struct page_pool_stats stats = { 0 };
+ if (page_pool_get_stats(page_pool, &stats)) {
+ /* perhaps the driver reports statistics with ethool */
+ ethtool_print_allocation_stats(&stats.alloc_stats);
+ ethtool_print_recycle_stats(&stats.recycle_stats);
+ }
+ #endif
+
+Driver unload
+-------------
+
+.. code-block:: c
+
+ /* Driver unload */
+ page_pool_put_full_page(page_pool, page, false);
+ xdp_rxq_info_unreg(&xdp_rxq);