Age | Commit message (Collapse) | Author | Files | Lines |
|
Make generic XDP processing attribute packets to their actual
queues instead of queue #0. This improves AF_XDP performance
considerably since softirq threads no longer fight over single
AF_XDP socket spinlock.
Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com>
Link: https://lore.kernel.org/r/20220717022050.822766-2-andrey.turkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove locked versions of functions that are no longer used by anyone.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add unlocked variants of devlink_region_create/destroy() functions
to be used in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prepare for devlink reload being called with devlink->lock held and
convert the mlxsw driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add unlocked variants of devlink_dpipe*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add unlocked variants of devlink_sb*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add unlocked variants of devlink_resource*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a lock_class_key per devlink instance to avoid DEADLOCK warning by
lockdep, while locking more than one devlink instance in driver code,
for example in opening VFs flow.
Kernel log:
[ 101.433802] ============================================
[ 101.433803] WARNING: possible recursive locking detected
[ 101.433810] 5.19.0-rc1+ #35 Not tainted
[ 101.433812] --------------------------------------------
[ 101.433813] bash/892 is trying to acquire lock:
[ 101.433815] ffff888127bfc2f8 (&devlink->lock){+.+.}-{3:3}, at: probe_one+0x3c/0x690 [mlx5_core]
[ 101.433909]
but task is already holding lock:
[ 101.433910] ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[ 101.433989]
other info that might help us debug this:
[ 101.433990] Possible unsafe locking scenario:
[ 101.433991] CPU0
[ 101.433991] ----
[ 101.433992] lock(&devlink->lock);
[ 101.433993] lock(&devlink->lock);
[ 101.433995]
*** DEADLOCK ***
[ 101.433996] May be due to missing lock nesting notation
[ 101.433996] 6 locks held by bash/892:
[ 101.433998] #0: ffff88810eb50448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xf3/0x1d0
[ 101.434009] #1: ffff888114777c88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x20d/0x520
[ 101.434017] #2: ffff888102b58660 (kn->active#231){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x230/0x520
[ 101.434023] #3: ffff888102d70198 (&dev->mutex){....}-{3:3}, at: sriov_numvfs_store+0x132/0x310
[ 101.434031] #4: ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[ 101.434108] #5: ffff88812adce198 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x430
[ 101.434116]
stack backtrace:
[ 101.434118] CPU: 5 PID: 892 Comm: bash Not tainted 5.19.0-rc1+ #35
[ 101.434120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 101.434130] Call Trace:
[ 101.434133] <TASK>
[ 101.434135] dump_stack_lvl+0x57/0x7d
[ 101.434145] __lock_acquire.cold+0x1df/0x3e7
[ 101.434151] ? register_lock_class+0x1880/0x1880
[ 101.434157] lock_acquire+0x1c1/0x550
[ 101.434160] ? probe_one+0x3c/0x690 [mlx5_core]
[ 101.434229] ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 101.434232] ? __xa_alloc+0x1ed/0x2d0
[ 101.434236] ? ksys_write+0xf3/0x1d0
[ 101.434239] __mutex_lock+0x12c/0x14b0
[ 101.434243] ? probe_one+0x3c/0x690 [mlx5_core]
[ 101.434312] ? probe_one+0x3c/0x690 [mlx5_core]
[ 101.434380] ? devlink_alloc_ns+0x11b/0x910
[ 101.434385] ? mutex_lock_io_nested+0x1320/0x1320
[ 101.434388] ? lockdep_init_map_type+0x21a/0x7d0
[ 101.434391] ? lockdep_init_map_type+0x21a/0x7d0
[ 101.434393] ? __init_swait_queue_head+0x70/0xd0
[ 101.434397] probe_one+0x3c/0x690 [mlx5_core]
[ 101.434467] pci_device_probe+0x1b4/0x480
[ 101.434471] really_probe+0x1e0/0xaa0
[ 101.434474] __driver_probe_device+0x219/0x480
[ 101.434478] driver_probe_device+0x49/0x130
[ 101.434481] __device_attach_driver+0x1b8/0x280
[ 101.434484] ? driver_allows_async_probing+0x140/0x140
[ 101.434487] bus_for_each_drv+0x123/0x1a0
[ 101.434489] ? bus_for_each_dev+0x1a0/0x1a0
[ 101.434491] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 101.434494] ? trace_hardirqs_on+0x2d/0x100
[ 101.434498] __device_attach+0x1a3/0x430
[ 101.434501] ? device_driver_attach+0x1e0/0x1e0
[ 101.434503] ? pci_bridge_d3_possible+0x1e0/0x1e0
[ 101.434506] ? pci_create_resource_files+0xeb/0x190
[ 101.434511] pci_bus_add_device+0x6c/0xa0
[ 101.434514] pci_iov_add_virtfn+0x9e4/0xe00
[ 101.434517] ? trace_hardirqs_on+0x2d/0x100
[ 101.434521] sriov_enable+0x64a/0xca0
[ 101.434524] ? pcibios_sriov_disable+0x10/0x10
[ 101.434528] mlx5_core_sriov_configure+0xab/0x280 [mlx5_core]
[ 101.434602] sriov_numvfs_store+0x20a/0x310
[ 101.434605] ? sriov_totalvfs_show+0xc0/0xc0
[ 101.434608] ? sysfs_file_ops+0x170/0x170
[ 101.434611] ? sysfs_file_ops+0x117/0x170
[ 101.434614] ? sysfs_file_ops+0x170/0x170
[ 101.434616] kernfs_fop_write_iter+0x348/0x520
[ 101.434619] new_sync_write+0x2e5/0x520
[ 101.434621] ? new_sync_read+0x520/0x520
[ 101.434624] ? lock_acquire+0x1c1/0x550
[ 101.434626] ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 101.434630] vfs_write+0x5cb/0x8d0
[ 101.434633] ksys_write+0xf3/0x1d0
[ 101.434635] ? __x64_sys_read+0xb0/0xb0
[ 101.434638] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 101.434640] ? syscall_enter_from_user_mode+0x1d/0x50
[ 101.434643] do_syscall_64+0x3d/0x90
[ 101.434647] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 101.434650] RIP: 0033:0x7f5ff536b2f7
[ 101.434658] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f
1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 101.434661] RSP: 002b:00007ffd9ea85d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 101.434664] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5ff536b2f7
[ 101.434666] RDX: 0000000000000002 RSI: 000055c4c279e230 RDI: 0000000000000001
[ 101.434668] RBP: 000055c4c279e230 R08: 000000000000000a R09: 0000000000000001
[ 101.434669] R10: 000055c4c283cbf0 R11: 0000000000000246 R12: 0000000000000002
[ 101.434670] R13: 00007f5ff543d500 R14: 0000000000000002 R15: 00007f5ff543d700
[ 101.434673] </TASK>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use netif_napi_add_tx() for NAPI in Tx direction instead of the regular
netif_napi_add() function.
Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes the of_match_ptr() pointer when dereferencing the
ksz_dt_ids which produce the unused variable warning.
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently CoW Rx skbs whenever we can't decrypt to a user
space buffer. The skbs can be enormous (64kB) and CoW does
a linear alloc which has a strong chance of failing under
memory pressure. Or even without, skb_cow_data() assumes
GFP_ATOMIC.
Allocate a new frag'd skb and decrypt into it. We finally
take advantage of the decrypted skb getting returned via
darg.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "zero-copy" path in SW TLS will engage either for no skbs or
for all but last. If the recvmsg parameters are right and the
socket can do ZC we'll ZC until the iterator can't fit a full
record at which point we'll decrypt one more record and copy
over the necessary bits to fill up the request.
The only reason we hold onto the ZC skbs which went thru the async
path until the end of recvmsg() is to count bytes. We need an accurate
count of zc'ed bytes so that we can calculate how much of the non-zc'd
data to copy. To allow freeing input skbs on the ZC path count only
how much of the list we'll need to consume.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Async crypto currently benefits from the fact that we decrypt
in place. When we allow input and output to be different skbs
we will have to hang onto the input while we move to the next
record. Clone the inputs and keep them on a list.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Async crypto TLS Rx currently waits for crypto to be done
in order to strip the TLS header and tailer. Simplify
the code by moving the pointers immediately, since only
TLS 1.2 is supported here there is no message padding.
This simplifies the decryption into a new skb in the next
patch as we don't have to worry about input vs output
skb in the decrypt_done() handler any more.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of using ctx->recv_pkt after decryption read the skb
from darg.skb. This moves the decision of what the "output skb"
is to the decrypt handlers. For now after decrypt handler returns
successfully ctx->recv_pkt is simply moved to darg.skb, but it
will change soon.
Note that tls_decrypt_sg() cannot clear the ctx->recv_pkt
because it gets called to re-encrypt (i.e. by the device offload).
So we need an awkward temporary if() in tls_rx_one_record().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Callers always pass ctx->recv_pkt into decrypt_skb_update(),
and it propagates it to its callees. This may give someone
the false impression that those functions can accept any valid
skb containing a TLS record. That's not the case, the record
sequence number is read from the context, and they can only
take the next record coming out of the strp.
Let the functions get the skb from the context instead of
passing it in. This will also make it cleaner to return
a different skb than ctx->recv_pkt as the decrypted one
later on.
Since we're touching the definition of decrypt_skb_update()
use this as an opportunity to rename it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I already forgot to transform darg from input to output
semantics once on the NIC inline crypto fastpath. To
avoid this happening again create a device equivalent
of decrypt_internal(). A function responsible for decryption
and transforming darg.
While at it rename decrypt_internal() to a hopefully slightly
more meaningful name.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We no longer allow a decrypted skb to remain linked to ctx->recv_pkt.
Anything on the list is decrypted, anything on ctx->recv_pkt needs
to be decrypted.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Detach the skb from ctx->recv_pkt after decryption is done,
even if we can't consume it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I thought that having the skb either always on the ctx->rx_list
or ctx->recv_pkt will simplify the handling, as we would not
have to remember to flip it from one to the other on exit paths.
This became a little harder to justify with the fix for BPF
sockmaps. Subsequent changes will make the situation even worse.
Queue the skbs only when really needed.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
recvmsg() in TLS gets data from the skb list (rx_list) or fresh
skbs we read from TCP via strparser. The former holds skbs which were
already decrypted for peek or decrypted and partially consumed.
tls_wait_data() only notices appearance of fresh skbs coming out
of TCP (or psock). It is possible, if there is a concurrent call
to peek() and recv() that the peek() will move the data from input
to rx_list without recv() noticing. recv() will then read data out
of order or never wake up.
This is not a practical use case/concern, but it makes the self
tests less reliable. This patch solves the problem by allowing
only one reader in.
Because having multiple processes calling read()/peek() is not
normal avoid adding a lock and try to fast-path the single reader
case.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend SMC-R link group netlink attribute SMC_GEN_LGR_SMCR.
Introduce SMC_NLA_LGR_R_BUF_TYPE to show the buffer type of
SMC-R link group.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On long-running enterprise production servers, high-order contiguous
memory pages are usually very rare and in most cases we can only get
fragmented pages.
When replacing TCP with SMC-R in such production scenarios, attempting
to allocate high-order physically contiguous sndbufs and RMBs may result
in frequent memory compaction, which will cause unexpected hung issue
and further stability risks.
So this patch is aimed to allow SMC-R link group to use virtually
contiguous sndbufs and RMBs to avoid potential issues mentioned above.
Whether to use physically or virtually contiguous buffers can be set
by sysctl smcr_buf_type.
Note that using virtually contiguous buffers will bring an acceptable
performance regression, which can be mainly divided into two parts:
1) regression in data path, which is brought by additional address
translation of sndbuf by RNIC in Tx. But in general, translating
address through MTT is fast.
Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
latency and bandwidth test with physically and virtually contiguous
buffers are as follows:
- client:
smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\
-t 5 -vu tcp_{bw|lat}
- server:
smc_run taskset -c <cpu> qperf
[latency]
msgsize tcp smcr smcr-use-virt-buf
1 11.17 us 7.56 us 7.51 us (-0.67%)
2 10.65 us 7.74 us 7.56 us (-2.31%)
4 11.11 us 7.52 us 7.59 us ( 0.84%)
8 10.83 us 7.55 us 7.51 us (-0.48%)
16 11.21 us 7.46 us 7.51 us ( 0.71%)
32 10.65 us 7.53 us 7.58 us ( 0.61%)
64 10.95 us 7.74 us 7.80 us ( 0.76%)
128 11.14 us 7.83 us 7.87 us ( 0.47%)
256 10.97 us 7.94 us 7.92 us (-0.28%)
512 11.23 us 7.94 us 8.20 us ( 3.25%)
1024 11.60 us 8.12 us 8.20 us ( 0.96%)
2048 14.04 us 8.30 us 8.51 us ( 2.49%)
4096 16.88 us 9.13 us 9.07 us (-0.64%)
8192 22.50 us 10.56 us 11.22 us ( 6.26%)
16384 28.99 us 12.88 us 13.83 us ( 7.37%)
32768 40.13 us 16.76 us 16.95 us ( 1.16%)
65536 68.70 us 24.68 us 24.85 us ( 0.68%)
[bandwidth]
msgsize tcp smcr smcr-use-virt-buf
1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%)
2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%)
4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%)
8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%)
16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%)
32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%)
64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%)
128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%)
256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%)
512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%)
1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%)
2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%)
4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%)
8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%)
16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%)
32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%)
65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%)
2) regression in buffer initialization and destruction path, which is
brought by additional MR operations of sndbufs. But thanks to link
group buffer reuse mechanism, the impact of this kind of regression
decreases as times of buffer reuse increases.
Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
buffer-related function obtained by bpftrace are as follows:
Function Phys-bufs Virt-bufs
smcr_new_buf_create() 67154 ns 79164 ns
smc_ib_buf_map_sg() 525 ns 928 ns
smc_ib_get_memory_region() 162294 ns 161191 ns
smc_wr_reg_send() 9957 ns 9635 ns
smc_ib_put_memory_region() 203548 ns 198374 ns
smc_ib_buf_unmap_sg() 508 ns 1158 ns
------------
Test environment notes:
1. Above tests run on 2 VMs within the same Host.
2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
the each VM respectively.
3. VMs' vCPUs are binded to different physical CPUs, and the binded
physical CPUs are isolated by `isolcpus=xxx` cmdline.
4. NICs' queue number are set to 1.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch introduces a new SMC-R specific element buf_type
in struct smc_link_group, for recording the value of sysctl
smcr_buf_type when link group is created.
New created link group will create and reuse buffers of the
type specified by buf_type.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch introduces the sysctl smcr_buf_type for setting
the type of SMC-R sndbufs and RMBs.
Valid values includes:
- SMCR_PHYS_CONT_BUFS, which means use physically contiguous
buffers for better performance and is the default value.
- SMCR_VIRT_CONT_BUFS, which means use virtually contiguous
buffers in case of physically contiguous memory is scarce.
- SMCR_MIXED_BUFS, which means first try to use physically
contiguous buffers. If not available, then use virtually
contiguous buffers.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some CPU, such as Xeon, can guarantee DMA cache coherency.
So it is no need to use dma sync APIs to flush cache on such CPUs.
In order to avoid calling dma sync APIs on the IO path, use the
dma_need_sync to check whether smc_buf_desc needs dma sync when
creating smc_buf_desc.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
smc_ib_sync_sg_for_cpu/device are the ops used for dma memory cache
consistency. Smc sndbufs are dma buffers, where CPU writes data to
it and PCIE device reads data from it. So for sndbufs,
smc_ib_sync_sg_for_device is needed and smc_ib_sync_sg_for_cpu is
redundant as PCIE device will not write the buffers. Smc rmbs
are dma buffers, where PCIE device write data to it and CPU read
data from it. So for rmbs, smc_ib_sync_sg_for_cpu is needed and
smc_ib_sync_sg_for_device is redundant as CPU will not write the buffers.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipv4 arp_accept has a new option '2' to create new neighbor entries
only if the src ip is in the same subnet as an address configured on
the interface that received the garp message. This selftest tests all
options in arp_accept.
ipv6 has a sysctl endpoint, accept_untracked_na, that defines the
behavior for accepting untracked neighbor advertisements. A new option
similar to that of arp_accept for learning only from the same subnet is
added to accept_untracked_na. This selftest tests this new feature.
Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Suggested-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a third knob, '2', which extends the
accept_untracked_na option to learn a neighbor only if the src ip is
in the same subnet as an address configured on the interface that
received the neighbor advertisement. This is similar to the arp_accept
configuration for ipv4.
Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Suggested-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In many deployments, we want the option to not learn a neighbor from
garp if the src ip is not in the same subnet as an address configured
on the interface that received the garp message. net.ipv4.arp_accept
sysctl is currently used to control creation of a neigh from a
received garp packet. This patch adds a new option '2' to
net.ipv4.arp_accept which extends option '1' by including the subnet
check.
Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Suggested-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When number of LMACs active on a CGX/RPM are 3, then
current NIX link credit config based on per lmac fifo
length which inturn is calculated as
'lmac_fifo_len = total_fifo_len / 3', is incorrect. In HW
one of the LMAC gets half of the FIFO and rest gets 1/4th.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha Sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes smatch static tool warning reported by smatch tool.
rvu_npc_hash.c:1232 rvu_npc_exact_del_table_entry_by_id() error:
uninitialized symbol 'drop_mcam_idx'.
rvu_npc_hash.c:1312 rvu_npc_exact_add_table_entry() error:
uninitialized symbol 'drop_mcam_idx'.
rvu_npc_hash.c:1391 rvu_npc_exact_update_table_entry() error:
uninitialized symbol 'hash_index'.
rvu_npc_hash.c:1428 rvu_npc_exact_promisc_disable() error:
uninitialized symbol 'drop_mcam_idx'.
rvu_npc_hash.c:1473 rvu_npc_exact_promisc_enable() error:
uninitialized symbol 'drop_mcam_idx'.
otx2_dmac_flt.c:191 otx2_dmacflt_update() error: 'rsp'
dereferencing possible ERR_PTR()
otx2_dmac_flt.c:60 otx2_dmacflt_add_pfmac() error: 'rsp'
dereferencing possible ERR_PTR()
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move qca8k driver to qca dir in preparation for code split and
introduction of ipq4019 switch based on qca8k.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
delay_timer has been unused since commit c3498d34dd36 ("cbq: remove
TCA_CBQ_OVL_STRATEGY support"). Delete it.
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Return directly without intermediate value store at the end of
devlink_port_new_notify() function.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the typo in a name of devlink_port_new_notifiy() function.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The return value is not used, so change the return value type to void.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Clang warns:
arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection]
DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
^
arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here
extern u64 x86_spec_ctrl_current;
^
1 error generated.
The declaration should be using DECLARE_PER_CPU instead so all
attributes stay in sync.
Cc: stable@vger.kernel.org
Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The UML function names to_virt() and to_phys() are exposed by UML
headers, and are very generic and may be defined by drivers. As it
turns out, commit 9409c9b6709e ("pmem: refactor pmem_clear_poison()")
did exactly that.
This results in build errors such as the following when trying to build
um:allmodconfig:
drivers/nvdimm/pmem.c: In function ‘pmem_dax_zero_page_range’:
./arch/um/include/asm/page.h:105:20: error: too few arguments to function ‘to_phys’
105 | #define __pa(virt) to_phys((void *) (unsigned long) (virt))
| ^~~~~~~
Use less generic function names for the um specific to_phys() and
to_virt() functions to fix the problem and to avoid similar problems in
the future.
Fixes: 9409c9b6709e ("pmem: refactor pmem_clear_poison()")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
nfp_tun_write_neigh() function will configure a tunnel neighbour when
calling nfp_tun_neigh_event_handler() or nfp_flower_cmsg_process_one_rx()
(with no tunnel neighbour type) from firmware.
When configuring IP on physical port as a tunnel endpoint, no operation
will be performed after receiving the cmsg mentioned above.
Therefore, add a progress to configure tunnel neighbour in this case.
v2: Correct format of fixes tag.
Fixes: f1df7956c11f ("nfp: flower: rework tunnel neighbour configuration")
Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220714081915.148378-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add missing error checks in tls_device_init.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add Shyam Sundar S K as an additional maintainer to support the AMD XGBE
network device driver.
Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/db367f24089c2bbbcd1cec8e21af49922017a110.1657751501.git.thomas.lendacky@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
xenvif_rx_next_skb() is expecting the rx queue not being empty, but
in case the loop in xenvif_rx_action() is doing multiple iterations,
the availability of another skb in the rx queue is not being checked.
This can lead to crashes:
[40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
[40072.537534] PGD 0 P4D 0
[40072.537644] Oops: 0000 [#1] SMP NOPTI
[40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
[40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
[40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
[40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
[40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
[40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
[40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
[40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
[40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
[40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
[40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
[40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
[40072.539211] Call Trace:
[40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
[40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]
Fix that by stopping the loop in case the rx queue becomes empty.
Cc: stable@vger.kernel.org
Fixes: 98f6d57ced73 ("xen-netback: process guest rx packets in batches")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220713135322.19616-1-jgross@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The DRM_AMD_DC_DCN display engine support (Raven, Navi, and newer) has
not been building cleanly on powerpc and causes link errors due to
mixing hard- and soft-float object files:
powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o uses soft float
powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o
[..]
and while patches are floating around, it's not exactly obvious what is
going on.
The problem bisects to commit 41b7a347bf14 ("powerpc: Book3S 64-bit
outline-only KASAN support") but that is probably more about changing
config variables than the fundamental cause.
Despite the bisection result, a more directly related commit seems to be
26f4712aedbd ("drm/amd/display: move FPU related code from dcn31 to
dml/dcn31 folder"). It's probably a combination of the two.
This has been going on since the merge window, without any final word.
So instead of blindly applying patches that may or may not be the right
thing, let's disable this for now.
As Michael Ellerman says:
"IIUIC this code was never enabled on ppc before, so disabling it seems
like a reasonable fix to get the build clean"
and once we have more actual feedback (and find any potential users) we
can always re-enable it with the patch that fixes the issues and
back-port as necessary.
Fixes: 41b7a347bf14 ("powerpc: Book3S 64-bit outline-only KASAN support")
Fixes: 26f4712aedbd ("drm/amd/display: move FPU related code from dcn31 to dml/dcn31 folder")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/20220606153910.GA1773067@roeck-us.net/
Link: https://lore.kernel.org/all/20220618232737.2036722-1-linux@roeck-us.net/
Link: https://lore.kernel.org/all/20220713050724.GA2471738@roeck-us.net/
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This test implement the scenario described in the commit
"ip: fix dflt addr selection for connected nexthop".
The test configures a nexthop object with an output device only (no gateway
address) and a route that uses this nexthop. The goal is to check if the
kernel selects a valid source address.
Link: https://lore.kernel.org/netdev/20220712095545.10947-1-nicolas.dichtel@6wind.com/
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220713114853.29406-2-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When a nexthop is added, without a gw address, the default scope was set
to 'host'. Thus, when a source address is selected, 127.0.0.1 may be chosen
but rejected when the route is used.
When using a route without a nexthop id, the scope can be configured in the
route, thus the problem doesn't exist.
To explain more deeply: when a user creates a nexthop, it cannot specify
the scope. To create it, the function nh_create_ipv4() calls fib_check_nh()
with scope set to 0. fib_check_nh() calls fib_check_nh_nongw() wich was
setting scope to 'host'. Then, nh_create_ipv4() calls
fib_info_update_nhc_saddr() with scope set to 'host'. The src addr is
chosen before the route is inserted.
When a 'standard' route (ie without a reference to a nexthop) is added,
fib_create_info() calls fib_info_update_nhc_saddr() with the scope set by
the user. iproute2 set the scope to 'link' by default.
Here is a way to reproduce the problem:
ip netns add foo
ip -n foo link set lo up
ip netns add bar
ip -n bar link set lo up
sleep 1
ip -n foo link add name eth0 type dummy
ip -n foo link set eth0 up
ip -n foo address add 192.168.0.1/24 dev eth0
ip -n foo link add name veth0 type veth peer name veth1 netns bar
ip -n foo link set veth0 up
ip -n bar link set veth1 up
ip -n bar address add 192.168.1.1/32 dev veth1
ip -n bar route add default dev veth1
ip -n foo nexthop add id 1 dev veth0
ip -n foo route add 192.168.1.1 nhid 1
Try to get/use the route:
> $ ip -n foo route get 192.168.1.1
> RTNETLINK answers: Invalid argument
> $ ip netns exec foo ping -c1 192.168.1.1
> ping: connect: Invalid argument
Try without nexthop group (iproute2 sets scope to 'link' by dflt):
ip -n foo route del 192.168.1.1
ip -n foo route add 192.168.1.1 dev veth0
Try to get/use the route:
> $ ip -n foo route get 192.168.1.1
> 192.168.1.1 dev veth0 src 192.168.0.1 uid 0
> cache
> $ ip netns exec foo ping -c1 192.168.1.1
> PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.039 ms
>
> --- 192.168.1.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms
CC: stable@vger.kernel.org
Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops")
Reported-by: Edwin Brossette <edwin.brossette@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220713114853.29406-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
LKP reports a build issue on Clang, related to a literal load of
__current issued through the ldr_va macro. This turns out to be due to
the fact that group relocations are disabled when CONFIG_COMPILE_TEST=y,
which means that the ldr_va macro resolves to a pair of LDR
instructions, the first one being a literal load issued too far from its
literal pool.
Due to the introduction of a couple of new uses of this macro in commit
508074607c7b95b2 ("ARM: 9195/1: entry: avoid explicit literal loads"),
the literal pools end up getting rearranged in a way that causes the
literal for __current to go out of range. Let's fix this up by putting a
.ltorg directive in a suitable place in the code.
Link: https://lore.kernel.org/all/202205290805.1vZLAr36-lkp@intel.com/
Fixes: 508074607c7b95b2 ("ARM: 9195/1: entry: avoid explicit literal loads")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
"ARM: 9192/1: amba: fix memory leak in amba_device_try_add()" leads
to a refcount underflow if amba_device_add() fails, which called by
of_amba_device_create(), the of_amba_device_create() already exists
the error handling, so amba_put_device() only need to be added into
amba_deferred_retry().
Fixes: 7719a68b2fa4 ("ARM: 9192/1: amba: fix memory leak in amba_device_try_add()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|