Age | Commit message (Collapse) | Author | Files | Lines |
|
The test_run code detects whether a page has been modified and
re-initialises the xdp_frame structure if it has, using
xdp_update_frame_from_buff(). However, xdp_update_frame_from_buff()
doesn't touch frame->mem, so that wasn't correctly re-initialised, which
led to the pages from page_pool not being returned correctly. Syzbot
noticed this as a memory leak.
Fix this by also copying the frame->mem structure when re-initialising
the frame, like we do on initialisation of a new page from page_pool.
Fixes: e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Reported-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/bpf/20241030-test-run-mem-fix-v1-1-41e88e8cae43@redhat.com
|
|
Like the DP83826, the DP83825 can also be configured as an RMII master or
slave via a control register. The existing function responsible for this
configuration is renamed to a general dp8382x function. The DP83825 only
supports RMII so nothing more needs to be configured.
With this change, the dp83822_driver list is reorganized according to the
device name.
Signed-off-by: Erik Schumacher <erik.schumacher@iris-sensing.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/aa62d081804f44b5af0e8de2372ae6bfe1affd34.camel@iris-sensing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, HIP08 devices does not register the ptp devices, so the
hdev->ptp is NULL. But the tx process would still try to set hardware time
stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash.
[ 128.087798] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
...
[ 128.280251] pc : hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[ 128.286600] lr : hclge_ptp_set_tx_info+0x20/0x140 [hclge]
[ 128.292938] sp : ffff800059b93140
[ 128.297200] x29: ffff800059b93140 x28: 0000000000003280
[ 128.303455] x27: ffff800020d48280 x26: ffff0cb9dc814080
[ 128.309715] x25: ffff0cb9cde93fa0 x24: 0000000000000001
[ 128.315969] x23: 0000000000000000 x22: 0000000000000194
[ 128.322219] x21: ffff0cd94f986000 x20: 0000000000000000
[ 128.328462] x19: ffff0cb9d2a166c0 x18: 0000000000000000
[ 128.334698] x17: 0000000000000000 x16: ffffcf1fc523ed24
[ 128.340934] x15: 0000ffffd530a518 x14: 0000000000000000
[ 128.347162] x13: ffff0cd6bdb31310 x12: 0000000000000368
[ 128.353388] x11: ffff0cb9cfbc7070 x10: ffff2cf55dd11e02
[ 128.359606] x9 : ffffcf1f85a212b4 x8 : ffff0cd7cf27dab0
[ 128.365831] x7 : 0000000000000a20 x6 : ffff0cd7cf27d000
[ 128.372040] x5 : 0000000000000000 x4 : 000000000000ffff
[ 128.378243] x3 : 0000000000000400 x2 : ffffcf1f85a21294
[ 128.384437] x1 : ffff0cb9db520080 x0 : ffff0cb9db500080
[ 128.390626] Call trace:
[ 128.393964] hclge_ptp_set_tx_info+0x2c/0x140 [hclge]
[ 128.399893] hns3_nic_net_xmit+0x39c/0x4c4 [hns3]
[ 128.405468] xmit_one.constprop.0+0xc4/0x200
[ 128.410600] dev_hard_start_xmit+0x54/0xf0
[ 128.415556] sch_direct_xmit+0xe8/0x634
[ 128.420246] __dev_queue_xmit+0x224/0xc70
[ 128.425101] dev_queue_xmit+0x1c/0x40
[ 128.429608] ovs_vport_send+0xac/0x1a0 [openvswitch]
[ 128.435409] do_output+0x60/0x17c [openvswitch]
[ 128.440770] do_execute_actions+0x898/0x8c4 [openvswitch]
[ 128.446993] ovs_execute_actions+0x64/0xf0 [openvswitch]
[ 128.453129] ovs_dp_process_packet+0xa0/0x224 [openvswitch]
[ 128.459530] ovs_vport_receive+0x7c/0xfc [openvswitch]
[ 128.465497] internal_dev_xmit+0x34/0xb0 [openvswitch]
[ 128.471460] xmit_one.constprop.0+0xc4/0x200
[ 128.476561] dev_hard_start_xmit+0x54/0xf0
[ 128.481489] __dev_queue_xmit+0x968/0xc70
[ 128.486330] dev_queue_xmit+0x1c/0x40
[ 128.490856] ip_finish_output2+0x250/0x570
[ 128.495810] __ip_finish_output+0x170/0x1e0
[ 128.500832] ip_finish_output+0x3c/0xf0
[ 128.505504] ip_output+0xbc/0x160
[ 128.509654] ip_send_skb+0x58/0xd4
[ 128.513892] udp_send_skb+0x12c/0x354
[ 128.518387] udp_sendmsg+0x7a8/0x9c0
[ 128.522793] inet_sendmsg+0x4c/0x8c
[ 128.527116] __sock_sendmsg+0x48/0x80
[ 128.531609] __sys_sendto+0x124/0x164
[ 128.536099] __arm64_sys_sendto+0x30/0x5c
[ 128.540935] invoke_syscall+0x50/0x130
[ 128.545508] el0_svc_common.constprop.0+0x10c/0x124
[ 128.551205] do_el0_svc+0x34/0xdc
[ 128.555347] el0_svc+0x20/0x30
[ 128.559227] el0_sync_handler+0xb8/0xc0
[ 128.563883] el0_sync+0x160/0x180
Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs
1024-1279 are in different BAR space addresses. However,
hclge_fetch_pf_reg does not distinguish the tqp space information when
reading the tqp space information. When the number of TQPs is greater
than 1024, access bar space overwriting occurs.
The problem of different segments has been considered during the
initialization of tqp.io_base. Therefore, tqp.io_base is directly used
when the queue is read in hclge_fetch_pf_reg.
The error message:
Unable to handle kernel paging request at virtual address ffff800037200000
pc : hclge_fetch_pf_reg+0x138/0x250 [hclge]
lr : hclge_get_regs+0x84/0x1d0 [hclge]
Call trace:
hclge_fetch_pf_reg+0x138/0x250 [hclge]
hclge_get_regs+0x84/0x1d0 [hclge]
hns3_get_regs+0x2c/0x50 [hns3]
ethtool_get_regs+0xf4/0x270
dev_ethtool+0x674/0x8a0
dev_ioctl+0x270/0x36c
sock_do_ioctl+0x110/0x2a0
sock_ioctl+0x2ac/0x530
__arm64_sys_ioctl+0xa8/0x100
invoke_syscall+0x4c/0x124
el0_svc_common.constprop.0+0x140/0x15c
do_el0_svc+0x30/0xd0
el0_svc+0x1c/0x2c
el0_sync_handler+0xb0/0xb4
el0_sync+0x168/0x180
Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently the misc irq is initialized before reset_timer setup. But
it will access the reset_timer in the irq handler. So initialize
the reset_timer earlier.
Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, there is a time window between misc irq enabled
and service task inited. If an interrupte is reported at
this time, it will cause warning like below:
[ 16.324639] Call trace:
[ 16.324641] __queue_delayed_work+0xb8/0xe0
[ 16.324643] mod_delayed_work_on+0x78/0xd0
[ 16.324655] hclge_errhand_task_schedule+0x58/0x90 [hclge]
[ 16.324662] hclge_misc_irq_handle+0x168/0x240 [hclge]
[ 16.324666] __handle_irq_event_percpu+0x64/0x1e0
[ 16.324667] handle_irq_event+0x80/0x170
[ 16.324670] handle_fasteoi_edge_irq+0x110/0x2bc
[ 16.324671] __handle_domain_irq+0x84/0xfc
[ 16.324673] gic_handle_irq+0x88/0x2c0
[ 16.324674] el1_irq+0xb8/0x140
[ 16.324677] arch_cpu_idle+0x18/0x40
[ 16.324679] default_idle_call+0x5c/0x1bc
[ 16.324682] cpuidle_idle_call+0x18c/0x1c4
[ 16.324684] do_idle+0x174/0x17c
[ 16.324685] cpu_startup_entry+0x30/0x6c
[ 16.324687] secondary_start_kernel+0x1a4/0x280
[ 16.324688] ---[ end trace 6aa0bff672a964aa ]---
So don't auto enable misc vector when request irq..
Fixes: 7be1b9f3e99f ("net: hns3: make hclge_service use delayed workqueue")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch modifies the implementation of debugfs:
When the user process stops unexpectedly, not all data of the file system
is read. In this case, the save_buf pointer is not released. When the user
process is called next time, save_buf is used to copy the cached data
to the user space. As a result, the queried data is inconsistent. To solve
this problem, determine whether the function is invoked for the first time
based on the value of *ppos. If *ppos is 0, obtain the actual data.
Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, the netdev->features is configured in hns3_nic_set_features.
As a result, __netdev_update_features considers that there is no feature
difference, and the procedures of the real features are missing.
Fixes: 2a7556bb2b73 ("net: hns3: implement ndo_features_check ops for hns3 driver")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When a reset type that is not supported by the driver is input, a reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.
The error message:
hns3 0000:39:01.0: cmd failed -16
hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
hns3 0000:39:01.0: failed to reset VF stack
hns3 0000:39:01.0: failed to reset VF(4)
hns3 0000:39:01.0: prepare reset(2) wait done
hns3 0000:39:01.0 eth4: already uninitialized
Use the crash tool to view struct hclgevf_dev:
struct hclgevf_dev {
...
default_reset_request = 0x20,
reset_level = HNAE3_NONE_RESET,
reset_pending = 0x100,
reset_type = HNAE3_NONE_RESET,
...
};
Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
To avoid errors in pgtable prefectch, add a sync command to sync
io-pagtable.
This is a supplement for the previous patch.
We want all the tx packet can be handled with tx bounce buffer path.
But it depends on the remain space of the spare buffer, checked by the
hns3_can_use_tx_bounce(). In most cases, maybe 99.99%, it returns true.
But once it return false by no available space, the packet will be handled
with the former path, which will map/unmap the skb buffer.
Then the driver will face the smmu prefetch risk again.
So add a sync command in this case to avoid smmu prefectch,
just protects corner scenes.
Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The SMMU engine on HIP09 chip has a hardware issue.
SMMU pagetable prefetch features may prefetch and use a invalid PTE
even the PTE is valid at that time. This will cause the device trigger
fake pagefaults. The solution is to avoid prefetching by adding a
SYNC command when smmu mapping a iova. But the performance of nic has a
sharp drop. Then we do this workaround, always enable tx bounce buffer,
avoid mapping/unmapping on TX path.
This issue only affects HNS3, so we always enable
tx bounce buffer when smmu enabled to improve performance.
Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision")
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If access to offset + length is larger than the skbuff length, then
skb_checksum() triggers BUG_ON().
skb_checksum() internally subtracts the length parameter while iterating
over skbuff, BUG_ON(len) at the end of it checks that the expected
length to be included in the checksum calculation is fully consumed.
Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support")
Reported-by: Slavin Liu <slavin-ayu@qq.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add compatible for the MAC controller on qcs8300 platforms.
Since qcs8300 shares the same EMAC as sa8775p, so it fallback to the
compatible.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yijie Yang <quic_yijiyang@quicinc.com>
Link: https://patch.msgid.link/20241029-schema-v3-2-fbde519eaf00@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add compatible for the MAC controller on qcs615 platform.
Since qcs615 shares the same EMAC as sm8150, so it fallback to that
compatible.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yijie Yang <quic_yijiyang@quicinc.com>
Link: https://patch.msgid.link/20241029-schema-v3-1-fbde519eaf00@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Function debugfs_remove() recursively removes a directory, include all
files created by debugfs_create_file(). Therefore, there is no need to
explicitly record each file with member ->bnad_dentry_files[] and
explicitly delete them at the end. Remove field bnad_dentry_files[] and
its related processing codes for simplification.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241028020943.507-3-thunder.leizhen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Driver bna can work fine even if any previous call to debugfs create
APIs failed. All return value checks of them should be dropped, as
debugfs APIs say.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241028020943.507-2-thunder.leizhen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When some code has been moved in the commit in Fixes, some "return err;"
have correctly been changed in goto <some_where_in_the_error_handling_path>
but this one was missed.
Should "ops->maxtype > RTNL_MAX_TYPE" happen, then some resources would
leak.
Go through the error handling path to fix these leaks.
Fixes: 0d3008d1a9ae ("rtnetlink: Move ops->validate to rtnl_newlink().")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/eca90eeb4d9e9a0545772b68aeaab883d9fe2279.1729952228.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
linux-firmware commit 808cba84 ("mtk_wed: add firmware for mt7988
Wireless Ethernet Dispatcher") added mt7988_wo_{0,1}.bin in the
'mediatek/mt7988' directory while driver current expects the files in
the 'mediatek' directory.
Change path in the driver header now that the firmware has been added.
Fixes: e2f64db13aa1 ("net: ethernet: mtk_wed: introduce WED support for MT7988")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/Zxz0GWTR5X5LdWPe@pidgin.makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Test that after changing the remote address of an ip6gre net device
traffic is forwarded as expected. Test with both flat and hierarchical
topologies and with and without an input / output keys.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/02b05246d2cdada0cf2fccffc0faa8a424d0f51b.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The device stores IPv6 addresses that are used for encapsulation in
linear memory that is managed by the driver.
Changing the remote address of an ip6gre net device never worked
properly, but since cited commit the following reproducer [1] would
result in a warning [2] and a memory leak [3]. The problem is that the
new remote address is never added by the driver to its hash table (and
therefore the device) and the old address is never removed from it.
Fix by programming the new address when the configuration of the ip6gre
net device changes and removing the old one. If the address did not
change, then the above would result in increasing the reference count of
the address and then decreasing it.
[1]
# ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit
# ip link set dev bla type ip6gre remote 2001:db8:3::1
# ip link del dev bla
# devlink dev reload pci/0000:01:00.0
[2]
WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0
Modules linked in:
CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0
[...]
Call Trace:
<TASK>
mlxsw_sp_router_netdevice_event+0x55f/0x1240
notifier_call_chain+0x5a/0xd0
call_netdevice_notifiers_info+0x39/0x90
unregister_netdevice_many_notify+0x63e/0x9d0
rtnl_dellink+0x16b/0x3a0
rtnetlink_rcv_msg+0x142/0x3f0
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x242/0x390
netlink_sendmsg+0x1de/0x420
____sys_sendmsg+0x2bd/0x320
___sys_sendmsg+0x9a/0xe0
__sys_sendmsg+0x7a/0xd0
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[3]
unreferenced object 0xffff898081f597a0 (size 32):
comm "ip", pid 1626, jiffies 4294719324
hex dump (first 32 bytes):
20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01 ...............
21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00 !Ia.............
backtrace (crc fd9be911):
[<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260
[<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340
[<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240
[<00000000743e7757>] notifier_call_chain+0x5a/0xd0
[<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90
[<000000002509645d>] register_netdevice+0x5f7/0x7a0
[<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130
[<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120
[<000000004df7c7cc>] rtnl_newlink+0x471/0xa20
[<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0
[<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100
[<00000000908bca63>] netlink_unicast+0x242/0x390
[<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420
[<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320
[<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0
[<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0
Fixes: cf42911523e0 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/e91012edc5a6cb9df37b78fd377f669381facfcb.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Non-coherent architectures, like ARM, may require invalidating caches
before the device can use the DMA mapped memory, which means that before
posting pages to device, drivers should sync the memory for device.
Sync for device can be configured as page pool responsibility. Set the
relevant flag and define max_len for sync.
Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/92e01f05c4f506a4f0a9b39c10175dcc01994910.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When Rx packet is received, drivers should sync the pages for CPU, to
ensure the CPU reads the data written by the device and not stale
data from its cache.
Add the missing sync call in Rx path, sync the actual length of data for
each fragment.
Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/461486fac91755ca4e04c2068c102250026dcd0b.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tx header should be pushed for each packet which is transmitted via
Spectrum ASICs. The cited commit moved the call to skb_cow_head() from
mlxsw_sp_port_xmit() to functions which handle Tx header.
In case that mlxsw_sp->ptp_ops->txhdr_construct() is used to handle Tx
header, and txhdr_construct() is mlxsw_sp_ptp_txhdr_construct(), there is
no call for skb_cow_head() before pushing Tx header size to SKB. This flow
is relevant for Spectrum-1 and Spectrum-4, for PTP packets.
Add the missing call to skb_cow_head() to make sure that there is both
enough room to push the Tx header and that the SKB header is not cloned and
can be modified.
An additional set will be sent to net-next to centralize the handling of
the Tx header by pushing it to every packet just before transmission.
Cc: Richard Cochran <richardcochran@gmail.com>
Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/5145780b07ebbb5d3b3570f311254a3a2d554a44.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
make dtbs_check:
arch/arm64/boot/dts/renesas/r8a77980-condor.dtb: ethernet@e7400000: 'iommus' does not match any of the regexes: '@[0-9a-f]$', 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/net/renesas,ether.yaml#
Ethernet Controllers on R-Car Gen2/Gen3 SoCs can make use of the IOMMU,
so add the missing iommus property.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/2ca890323477a21c22e13f6a1328288f4ee816f9.1729868894.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As documented in skbuff.h, devices with NETIF_F_IPV6_CSUM capability
can only checksum TCP and UDP over IPv6 if the IP header does not
contains extension.
This is enforced for UDP packets emitted from user-space to an IPv6
address as they go through ip6_make_skb(), which calls
__ip6_append_data() where a check is done on the header size before
setting CHECKSUM_PARTIAL.
But the introduction of UDP encapsulation with fou6 added a code-path
where it is possible to get an skb with a partial UDP checksum and an
IPv6 header with extension:
* fou6 adds a UDP header with a partial checksum if the inner packet
does not contains a valid checksum.
* ip6_tunnel adds an IPv6 header with a destination option extension
header if encap_limit is non-zero (the default value is 4).
The thread linked below describes in more details how to reproduce the
problem with GRE-in-UDP tunnel.
Add a check on the network header size in skb_csum_hwoffload_help() to
make sure no IPv6 packet with extension header is handed to a network
device with NETIF_F_IPV6_CSUM capability.
Link: https://lore.kernel.org/netdev/26548921.1r3eYUQgxm@benoit.monin/T/#u
Fixes: aa3463d65e7b ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: Benoît Monin <benoit.monin@gmx.fr>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/5fbeecfc311ea182aa1d1c771725ab8b4cac515e.1729778144.git.benoit.monin@gmx.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lan969x supports a number of different features, depending on the
target. Add new field sparx5->features and initialize the features based
on the target. Also add the function sparx5_has_feature() and use it
throughout. For now, we only need to handle features: PSFP and PTP -
more will come in the future.
[1] https://www.microchip.com/en-us/product/lan9698
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-15-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add lan9691-switch compatible string to mchp_sparx5_match. Guard it with
IS_ENABLED(CONFIG_LAN969X_SWITCH) to make sure Sparx5 can be compiled on
its own.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-14-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add compatible strings for the twelve different lan969x targets that we
support. Either a sparx5-switch or lan9691-switch compatible string
provided on their own, or any lan969x-switch compatible string with a
fallback to lan9691-switch.
Also, add myself as a maintainer.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-13-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the is_sparx5() macro (introduced in earlier series [1]), in places
where we need to handle things a bit differently on lan969x.
These places are:
- in sparx5_dsm_calendar_update() we need to switch the calendar
from a to b on lan969x.
- in sparx5_start() we need to make sure the HSCH_SYS_CLK_PER
register is only touched on Sparx5.
- in sparx5_start() we need to disable VCAP and FDMA for lan969x
(will come in later series).
- in sparx5_mirror_port_get() we must make sure the
ANA_AC_PROBE_PORT_CFG1 register is only read on Sparx5.
- sparx5_netdev.c and sparx5_packet.c we need to use different IFH
(Internal Frame Header) offsets for lan969x.
- in sparx5_port_fifo_sz() we must bail out on lan969x.
- in sparx5_port_config_low_set() we must configure the phase
detection registers.
- in sparx5_port_config() and sparx5_port_init() we must do some
additional configuration of the port devices.
- in sparx5_dwrr_conf_set() we must derive the scheduling layer
[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-8-d3290f581663@microchip.com/
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-12-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lan969x has support for RedBox / HSR / PRP (not implemented yet). In
order to accommodate for this in the future, we need to give lan969x it's
own function for calculating the DSM calendar.
The function calculates the calendar for each taxi bus. The calendar is
used for bandwidth allocation towards the ports attached to the taxi
bus. A calendar configuration consists of up-to 64 slots, which may be
allocated to ports or left unused. Each slot accounts for 1 clock cycle.
Also expose sparx5_cal_speed_to_value(), sparx5_get_port_cal_speed,
sparx5_cal_bw and SPX5_DSM_CAL_EMPTY for use by lan969x.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-11-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add PTP IRQ handler for lan969x. This is required, as the PTP registers
are placed in two different targets on Sparx5 and lan969x. The
implementation is otherwise the same as on Sparx5.
Also, expose sparx5_get_hwtimestamp() for use by lan969x.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-10-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a bunch of small lan969x ops in bulk. These ops are explained in
detail in a previous series [1].
[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-8-d3290f581663@microchip.com/
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-9-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the lan969x constants to match data. These are already used
throughout the Sparx5 code (introduced in earlier series [1]), so no
need to update any code use.
[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com/
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-8-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new file lan969x_regs.c that defines all the register differences
for lan969x, and add it to the lan969x match data.
GW_DEV2G5_PHASE_DETECTOR_CTRL, FP_DEV2G5_PHAD_CTRL_PHAD_ENA and
FP_DEV2G5_PHAD_CTRL_PHAD_FAILED are required by the new register macros
which was introduced earlier. Add these for Sparx5 also.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-7-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add match data for lan969x, with initial fields for iomap, iomap_size
and ioranges. Add new Kconfig symbol CONFIG_LAN969X_CONFIG for compiling
the lan969x driver.
It has been decided to give lan969x its own Kconfig symbol, as a
considerable amount of code is needed, beside the Sparx5 code, to add
full chip support (and more will be added in future series). Also this
makes it possible to compile Sparx5 without lan969x.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-6-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lan969x will require a few additional registers for certain operations.
Some are shared, some are not. Add these.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-5-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, add the sparx5 context pointer to certain
IFH (Internal Frame Header) functions. This is required, as the
is_sparx5() function will be used here in a subsequent patch.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-4-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, rework the function that calculates the SDLB
(Service Dual Leacky Bucket) clock. This is required, as the
HSCH_SYS_CLK_PER register is Sparx5-exclusive. Instead derive the clock
from the core clock, using the sparx5_clk_period() function. The clock
stays the same before and after this patch, only now,
sparx5_sdlb_clk_hz_get() can be used for lan969x too.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-3-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, use spx5_rmw() for enabling the update of
the calendar. This is required to not overwrite the DSM_TAXI_CAL_CFG
register, as an additional write will be added before this one, in a
subsequent patch.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-2-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, add lan969x targets to
sparx5_target_chiptype and set the core clock frequency for these
throughout. Lan969x only supports a core clock frequency of 328MHz.
Also, set the policer update internal (pol_upd_int) matching the 328 MHz
frequency of the lan969x targets.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-1-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver currently uses page_pool_put_page() to recycle
page pool pages. Since gve uses split pages, if the fragment
being recycled is not the last fragment in the page, there
is no dma sync operation. When the last fragment is recycled,
dma sync is performed by page pool infra according to the
value passed as dma_sync_size which right now is set to the
size of fragment.
But the correct thing to do is to dma sync the entire page when
the last fragment is recycled. Hence change to using
page_pool_put_full_page().
Link: https://lore.kernel.org/netdev/89d7ce83-cc1d-4791-87b5-6f7af29a031d@huawei.com/
Suggested-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Fixes: ebdfae0d377b ("gve: adopt page pool for DQ RDA mode")
Link: https://patch.msgid.link/20241023221141.3008011-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move mbox, hw resources and interrupt configuration functions to common
header file. So, that they can be used later by the RVU representor driver.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-5-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reuse the maximum support HW MTU value that is fetch during probe.
Instead of fetching through mbox each time mtu is changed as the
value is fixed for interface.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-4-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Group the queue(RX/TX/CQ) memory allocation and free code to single APIs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-3-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define new API "otx2_init_rsrc" and move the HW blocks
NIX/NPA resources configuration code under this API. So, that
it can be used by the RVU representor driver that has similar
resources of RVU NIC.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-2-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Debugging certain flows in the offloaded switch data path can be done by
installing two tc-mirred filters for mirroring: one in the hardware data
path, which copies the frames to the CPU, and one which takes the frame
from there and mirrors it to a virtual interface like a dummy device,
where it can be seen with tcpdump.
The effect of having 2 filters run on the same packet can be obtained by
default using tc, by not specifying either the 'skip_sw' or 'skip_hw'
keywords.
Instead of refusing to offload mirroring/redirecting packets towards
interfaces that aren't switch ports, just treat every other destination
for what it is: something that is handled in software, behind the CPU
port.
Usage:
$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress protocol ip flower action mirred ingress mirror dev dummy0
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the CPU bandwidth capacity permits, it may be useful to mirror the
entire ingress of a user port to software.
This is in fact possible to express even if there is no net_device
representation for the CPU port. In fact, that approach was already
exhausted and that representation wouldn't have even helped [1].
The idea behind implementing this is that currently, we refuse to
offload any mirroring towards a non-DSA target net_device. But if we
acknowledge the fact that to reach any foreign net_device, the switch
must send the packet to the CPU anyway, then we can simply offload just
that part, and let the software do the rest. There is only one condition
we need to uphold: the filter needs to be present in the software data
path as well (no skip_sw).
There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress
of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress
of target interface). We don't have the ability/API to offload
FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port,
but we could also permit that through mirred to the CPU + software.
Example:
$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0
Any DSA driver with a ds->ops->port_mirror_add() implementation can now
make use of this with no additional change.
[1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not leave -EOPNOTSUPP errors without an explanation. It is confusing
for the user to figure out what is wrong otherwise.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We already have an "extack" stack variable in
dsa_user_add_cls_matchall_police() and
dsa_user_add_cls_matchall_mirred(), there is no need to retrieve
it again from cls->common.extack.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The body is a bit hard to read, hard to extend, and has duplicated
conditions.
Clean up the "if (many conditions) else if (many conditions, some
of them repeated)" pattern by:
- Moving the repeated conditions out
- Replacing the repeated tests for the same variable with a switch/case
- Moving the protocol check inside the dsa_user_add_cls_matchall_mirred()
function call.
This is pure refactoring, no logic has been changed, though some tests
were reordered. The order does not matter - they are independent things
to be tested for.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|