Age | Commit message (Collapse) | Author | Files | Lines |
|
traceback.print_exception() seems tricky to call, we're missing
some argument, so re-raise instead.
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Fixes: 3aacf8281336 ("tools: ynl: add an object hierarchy to represent parsed spec")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Chuck run into an issue with a single-element attr-set which
only has an attr with value of 0. The search for max attr in
a struct records attrs with value larger than 0 only (max_val
is set to 0 at the start). Adjust the comparison, alternatively
max_val could be init'ed to -1. Somehow picking the last attr
of a value seems like a good idea in general.
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_kfree_skb() is aliased to consume_skb().
When a driver is dropping a packet by calling dev_kfree_skb_any()
we should propagate the drop reason instead of pretending
the packet was consumed.
Note: Now we have enum skb_drop_reason we could remove
enum skb_free_reason (for linux-6.4)
v2: added an unlikely(), suggested by Yunsheng Lin.
Fixes: e6247027e517 ("net: introduce dev_consume_skb_any()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a repeated copy/paste typo.
Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Per the discussion in [1], hairpin parameters will be exposed using
devlink, remove the debugfs files.
[1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/all/20230222230202.523667-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the typo/copy-paste error by replacing struct variable ah_esp_mask name
by ah_esp_hdr.
Issue identified using doublebitand.cocci Coccinelle semantic patch.
Fixes: b7cf966126eb ("octeontx2-pf: Add flow classification using IP next level protocol")
Link: https://lore.kernel.org/all/20210111112537.3277-1-naveenm@marvell.com/
Signed-off-by: Deepak R Varma <drv@mailo.com>
Link: https://lore.kernel.org/r/Y/YYkKddeHOt80cO@ubun2204.myguest.virtualbox.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With this refcnt added in sctp_stream_priorities, we don't need to
traverse all streams to check if the prio is used by other streams
when freeing one stream's prio in sctp_sched_prio_free_sid(). This
can avoid a nested loop (up to 65535 * 65535), which may cause a
stuck as Ying reported:
watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [ksoftirqd/23:136]
Call Trace:
<TASK>
sctp_sched_prio_free_sid+0xab/0x100 [sctp]
sctp_stream_free_ext+0x64/0xa0 [sctp]
sctp_stream_free+0x31/0x50 [sctp]
sctp_association_free+0xa5/0x200 [sctp]
Note that it doesn't need to use refcount_t type for this counter,
as its accessing is always protected under the sock lock.
v1->v2:
- add a check in sctp_sched_prio_set to avoid the possible prio_head
refcnt overflow.
Fixes: 9ed7bfc79542 ("sctp: fix memory leak in sctp_stream_outq_migrate()")
Reported-by: Ying Xu <yinxu@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/825eb0c905cb864991eba335f4a2b780e543f06b.1677085641.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
devm_request_region is for I/O regions. Use devm_request_mem_region
instead. This fixes the driver failing to probe since 99df45c9e0a4
("sunhme: fix an IS_ERR() vs NULL check in probe"), which checked the
result.
Fixes: 914d9b2711dd ("sunhme: switch to devres")
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230222204242.2658247-1-seanga2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When checksum offload is disabled in the driver via ethtool,
the PTP 1-step sync packets contain incorrect checksum, since
the stack calculates the checksum before driver updates
PTP timestamp field in the packet. This results in PTP packets
getting dropped at the other end. This patch fixes the issue by
re-calculating the UDP checksum after updating PTP
timestamp field in the driver.
Fixes: 2958d17a8984 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Link: https://lore.kernel.org/r/20230222113600.1965116-1-saikrishnag@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, it is possible to let some PHYs to advertise not supported
EEE link modes. So, validate them before overwriting existing
configuration.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
With following patches:
commit 9b01c885be36 ("net: phy: c22: migrate to genphy_c45_write_eee_adv()")
commit 5827b168125d ("net: phy: c45: migrate to genphy_c45_write_eee_adv()")
we set the advertisement to potentially supported values. This behavior
may introduce new regressions on systems where EEE was disabled by
default (BIOS or boot loader configuration or by other ways.)
At same time, with this patches, we would overwrite EEE advertisement
configuration made over ethtool.
To avoid this issues, we need to cache initial and ethtool advertisement
configuration and store it for later use.
Fixes: 9b01c885be36 ("net: phy: c22: migrate to genphy_c45_write_eee_adv()")
Fixes: 5827b168125d ("net: phy: c45: migrate to genphy_c45_write_eee_adv()")
Fixes: 022c3f87f88e ("net: phy: add genphy_c45_ethtool_get/set_eee() support")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add new genphy_c45_an_config_eee_aneg() function and replace some of
genphy_c45_write_eee_adv() calls. This will be needed by the next patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make sure we use proper variable to validate access to potentially not
supported registers. Otherwise we will get false read/write errors.
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202302211644.c12d19de-yujie.liu@intel.com
Fixes: 022c3f87f88e ("net: phy: add genphy_c45_ethtool_get/set_eee() support")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
During IPsec RoCE TX creation a struct for the flow group creation is
allocated, but never freed. Free that struct once it is no longer in use.
Fixes: 22551e77e550 ("net/mlx5: Configure IPsec steering for egress RoCEv2 traffic")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/a69739482cca7176d3a466f87bbf5af1250b09bb.1677056384.git.leon@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add tests to check whether the total fib info length is calculated
corretly in route notify process.
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230222083629.335683-3-luwei32@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In function rt6_nlmsg_size(), the length of nexthop is calculated
by multipling the nexthop length of fib6_info and the number of
siblings. However if the fib6_info has no lwtunnel but the siblings
have lwtunnels, the nexthop length is less than it should be, and
it will trigger a warning in inet6_rt_notify() as follows:
WARNING: CPU: 0 PID: 6082 at net/ipv6/route.c:6180 inet6_rt_notify+0x120/0x130
......
Call Trace:
<TASK>
fib6_add_rt2node+0x685/0xa30
fib6_add+0x96/0x1b0
ip6_route_add+0x50/0xd0
inet6_rtm_newroute+0x97/0xa0
rtnetlink_rcv_msg+0x156/0x3d0
netlink_rcv_skb+0x5a/0x110
netlink_unicast+0x246/0x350
netlink_sendmsg+0x250/0x4c0
sock_sendmsg+0x66/0x70
___sys_sendmsg+0x7c/0xd0
__sys_sendmsg+0x5d/0xb0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
This bug can be reproduced by script:
ip -6 addr add 2002::2/64 dev ens2
ip -6 route add 100::/64 via 2002::1 dev ens2 metric 100
for i in 10 20 30 40 50 60 70;
do
ip link add link ens2 name ipv_$i type ipvlan
ip -6 addr add 2002::$i/64 dev ipv_$i
ifconfig ipv_$i up
done
for i in 10 20 30 40 50 60;
do
ip -6 route append 100::/64 encap ip6 dst 2002::$i via 2002::1
dev ipv_$i metric 100
done
ip -6 route append 100::/64 via 2002::1 dev ipv_70 metric 100
This patch fixes it by adding nexthop_len of every siblings using
rt6_nh_nlmsg_size().
Fixes: beb1afac518d ("net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230222083629.335683-2-luwei32@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
vclocks were using spinlocks to protect access to its timecounter and
cyclecounter. Access to timecounter/cyclecounter is backed by the same
driver callbacks that are used for non-virtual PHCs, but the usage of
the spinlock imposes a new limitation that didn't exist previously: now
they're called in atomic context so they mustn't sleep.
Some drivers like sfc or ice may sleep on these callbacks, causing
errors like "BUG: scheduling while atomic: ptp5/25223/0x00000002"
Fix it replacing the vclock's spinlock by a mutex. It fix the mentioned
bug and it doesn't introduce longer delays.
I've tested synchronizing various different combinations of clocks:
- vclock->sysclock
- sysclock->vclock
- vclock->vclock
- hardware PHC in different NIC -> vclock
- created 4 vclocks and launch 4 parallel phc2sys processes with
lockdep enabled
In all cases, comparing the delays reported by phc2sys, they are in the
same range of values than before applying the patch.
Link: https://lore.kernel.org/netdev/69d0ff33-bd32-6aa5-d36c-fbdc3c01337c@redhat.com/
Fixes: 5d43f951b1ac ("ptp: add ptp virtual clock driver framework")
Reported-by: Yalin Li <yalli@redhat.com>
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20230221130616.21837-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Here is the stack where we allocate percpu counter block:
+-< __alloc_percpu
+-< xt_percpu_counter_alloc
+-< find_check_entry # {arp,ip,ip6}_tables.c
+-< translate_table
And it can be leaked on this code path:
+-> ip6t_register_table
+-> translate_table # allocates percpu counter block
+-> xt_register_table # fails
there is no freeing of the counter block on xt_register_table fail.
Note: xt_percpu_counter_free should be called to free it like we do in
do_replace through cleanup_entry helper (or in __ip6t_unregister_table).
Probability of hitting this error path is low AFAICS (xt_register_table
can only return ENOMEM here, as it is not replacing anything, as we are
creating new netns, and it is hard to imagine that all previous
allocations succeeded and after that one in xt_register_table failed).
But it's worth fixing even the rare leak.
Fixes: 71ae0dff02d7 ("netfilter: xtables: use percpu rule counters")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
pernet tracking doesn't work correctly because other netns might have
set NETLINK_LISTEN_ALL_NSID on its event socket.
In this case its expected that events originating in other net
namespaces are also received.
Making pernet-tracking work while also honoring NETLINK_LISTEN_ALL_NSID
requires much more intrusive changes both in netlink and nfnetlink,
f.e. adding a 'setsockopt' callback that lets nfnetlink know that the
event socket entered (or left) ALL_NSID mode.
Move to global tracking instead: if there is an event socket anywhere
on the system, all net namespaces which have conntrack enabled and
use autobind mode will allocate the ecache extension.
netlink_has_listeners() returns false only if the given group has no
subscribers in any net namespace, the 'net' argument passed to
nfnetlink_has_listeners is only used to derive the protocol (nfnetlink),
it has no other effect.
For proper NETLINK_LISTEN_ALL_NSID-aware pernet tracking of event
listeners a new netlink_has_net_listeners() is also needed.
Fixes: 90d1daa45849 ("netfilter: conntrack: add nf_conntrack_events autodetect mode")
Reported-by: Bryce Kahle <bryce.kahle@datadoghq.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
length for the jumbo packets, it may mismatch.
To fix this, we can just use skb->len instead of parsing exthdrs, as
the hbh exthdr parsing has been done before coming to length_mt6 in
ip6_rcv_core() and br_validate_ipv6() and also the packet has been
trimmed according to the correct IPv6 (ext)hdr length there, and skb
len is trustable in length_mt6().
Note that this patch is especially needed after the IPv6 BIG TCP was
supported in kernel, which is using IPv6 Jumbo packets. Besides, to
match the packets greater than 65535 more properly, a v1 revision of
xt_length may be needed to extend "min, max" to u32 in the future,
and for now the IPv6 Jumbo packets can be matched by:
# ip6tables -m length ! --length 0:65535
Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We are not allowed to return an error at this point.
Looking at the code it looks like ret is always 0 at this
point, but its not.
t = find_table_lock(net, repl->name, &ret, &ebt_mutex);
... this can return a valid table, with ret != 0.
This bug causes update of table->private with the new
blob, but then frees the blob right away in the caller.
Syzbot report:
BUG: KASAN: vmalloc-out-of-bounds in __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
Read of size 4 at addr ffffc90005425000 by task kworker/u4:4/74
Workqueue: netns cleanup_net
Call Trace:
kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
__ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
ebt_unregister_table+0x35/0x40 net/bridge/netfilter/ebtables.c:1372
ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
...
ip(6)tables appears to be ok (ret should be 0 at this point) but make
this more obvious.
Fixes: c58dd2dd443c ("netfilter: Can't fail and free after table replacement")
Reported-by: syzbot+f61594de72d6705aea03@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When calling ip6_route_lookup() for the packet arriving on the VRF
interface, the result is always the real (slave) interface. Expect this
when validating the result.
Fixes: acc641ab95b66 ("netfilter: rpfilter/fib: Populate flowic_l3mdev field")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nf_conntrack_hash_check_insert() callers free the ct entry directly, via
nf_conntrack_free.
This isn't safe anymore because
nf_conntrack_hash_check_insert() might place the entry into the conntrack
table and then delteted the entry again because it found that a conntrack
extension has been removed at the same time.
In this case, the just-added entry is removed again and an error is
returned to the caller.
Problem is that another cpu might have picked up this entry and
incremented its reference count.
This results in a use-after-free/double-free, once by the other cpu and
once by the caller of nf_conntrack_hash_check_insert().
Fix this by making nf_conntrack_hash_check_insert() not fail anymore
after the insertion, just like before the 'Fixes' commit.
This is safe because a racing nf_ct_iterate() has to wait for us
to release the conntrack hash spinlocks.
While at it, make the function return -EAGAIN in the rmmod (genid
changed) case, this makes nfnetlink replay the command (suggested
by Pablo Neira).
Fixes: c56716c69ce1 ("netfilter: extensions: introduce extension genid count")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nf_ct_put() needs to be called to put the refcount got by
nf_conntrack_find_get() to avoid refcount leak when
nf_conntrack_hash_check_insert() fails.
Fixes: 7d367e06688d ("netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The results of "access_ok()" can be mis-speculated. The result is that
you can end speculatively:
if (access_ok(from, size))
// Right here
even for bad from/size combinations. On first glance, it would be ideal
to just add a speculation barrier to "access_ok()" so that its results
can never be mis-speculated.
But there are lots of system calls just doing access_ok() via
"copy_to_user()" and friends (example: fstat() and friends). Those are
generally not problematic because they do not _consume_ data from
userspace other than the pointer. They are also very quick and common
system calls that should not be needlessly slowed down.
"copy_from_user()" on the other hand uses a user-controller pointer and
is frequently followed up with code that might affect caches. Take
something like this:
if (!copy_from_user(&kernelvar, uptr, size))
do_something_with(kernelvar);
If userspace passes in an evil 'uptr' that *actually* points to a kernel
addresses, and then do_something_with() has cache (or other)
side-effects, it could allow userspace to infer kernel data values.
Add a barrier to the common copy_from_user() code to prevent
mis-speculated values which happen after the copy.
Also add a stub for architectures that do not define barrier_nospec().
This makes the macro usable in generic code.
Since the barrier is now usable in generic code, the x86 #ifdef in the
BPF code can also go away.
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When reading the page_pool code the first impression is that keeping
two separate counters, one being the page refcnt and the other being
fragment pp_frag_count, is counter-intuitive.
However without that fragment counter we don't know when to reliably
destroy or sync the outstanding DMA mappings. So let's add a comment
explaining this part.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20230217222130.85205-1-ilias.apalodimas@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MAC Merge layer is supported when ops->get_mm() returns 0.
The implementation was changed during review, and in this process, a bug
was introduced.
Link: https://lore.kernel.org/netdev/20230111161706.1465242-5-vladimir.oltean@nxp.com/
Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Link: https://lore.kernel.org/all/20230220122343.1156614-2-vladimir.oltean@nxp.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the repeated word "for" in comments.
Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20230221083036.2414-1-liubo03@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the failure of the compilation under the sh4.
Because we introduced remap_vmalloc_range() earlier, this has caused
the compilation failure on the sh4 platform. So this introduction of the
header file of linux/vmalloc.h.
config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20230221/202302210041.kpPQLlNQ-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=9f78bf330a66cd400b3e00f370f597e9fa939207
git remote add net-next https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
git fetch --no-tags net-next master
git checkout 9f78bf330a66cd400b3e00f370f597e9fa939207
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash net/
Fixes: 9f78bf330a66 ("xsk: support use vaddr as ring")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302210041.kpPQLlNQ-lkp@intel.com/
Link: https://lore.kernel.org/r/20230221075140.46988-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When devlink instance is put into network namespace and that network
namespace gets deleted, devlink instance is moved back into init_ns.
This is done as a part of cleanup_net() routine. Since cleanup_net()
is called asynchronously from workqueue, there is no guarantee that
the devlink instance move is done after "ip netns del" returns.
So fix this race by making sure that the devlink instance is present
before any other operation.
Reported-by: Amir Tzin <amirtz@nvidia.com>
Fixes: b74c37fd35a2 ("selftests: netdevsim: add tests for devlink reload with resources")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230220132336.198597-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Usage of `set -e` before executing a command causes immediate exit
on failure, without cleanup up the resources allocated at setup.
This can affect the next tests that use the same resources,
leading to a chain of failures.
A simple fix is to always call cleanup function when the script exists.
This approach is already used by other existing tests.
Fixes: 1056691b2680 ("selftests: fib_tests: Make test results more verbose")
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Link: https://lore.kernel.org/r/20230220110400.26737-2-roxana.nicolescu@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Hardware requires an alignment to 64 bytes to return ASO data. Missing
this alignment caused to unpredictable results while ASO events were
generated.
Fixes: 8518d05b8f9a ("net/mlx5e: Create Advanced Steering Operation object for IPsec")
Reported-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/de0302c572b90c9224a72868d4e0d657b6313c4b.1676797613.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, CT misses restore the missed chain on the tc skb extension so
tc will continue from the relevant chain. Instead, restore the CT action's
miss cookie on the extension, which will instruct tc to continue from the
this specific CT action instance on the relevant filter's action list.
Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
this miss mapping instead of the current chain miss object (CHAIN_MISS)
for CT action misses.
To restore this new miss mapping value, add a RX restore rule for each
such mapping value.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Sholmo <ozsh@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reg usage is always a mapped object, not necessarily
containing chain info.
Rename to properly convey what it stores.
This patch doesn't change any functionality.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move tc miss handling code to en_tc.c, and remove
duplicate code.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tc skb extension is a basic requirement for using tc
offload to support correct restoration on action miss.
Depend on it.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.
Move filter handle initialization earlier.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.
CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.
Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.
Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an embarrassingly missed semicolon plus and embarrassingly missed
parenthesis breaking kernel building when CONFIG_RTC_LIB is not set
like the one reported with ia64 config.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302170047.EjCPizu3-lkp@intel.com/
Fixes: 14743ddd2495 ("sfc: add devlink info support for ef100")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Link: https://lore.kernel.org/r/20230220110133.29645-1-alejandro.lucero-palau@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix some indentngs and remove the warning below:
drivers/net/ethernet/sfc/mae.c:657 efx_mae_enumerate_mports() warn: inconsistent indenting
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4117
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230220065958.52941-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY
memcpy() warning on ppc64 platform:
In function ‘fortify_memcpy_chk’,
inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2,
inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4,
inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3:
./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with
attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?
[-Werror=attribute-warning]
513 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Same behaviour on x86 you can get if you use "__always_inline" instead of
"inline" for skb_copy_from_linear_data() in skbuff.h
The call here copies data into inlined tx destricptor, which has 104
bytes (MAX_INLINE) space for data payload. In this case "spc" is known
in compile-time but the destination is used with hidden knowledge
(real structure of destination is different from that the compiler
can see). That cause the fortify warning because compiler can check
bounds, but the real bounds are different. "spc" can't be bigger than
64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined
tx descriptor. The fact that "inl" points into inlined tx descriptor is
determined earlier in mlx4_en_xmit().
Avoid confusing the compiler with "inl + 1" constructions to get to past
the inl header by introducing a flexible array "data" to the struct so
that the compiler can see that we are not dealing with an array of inl
structs, but rather, arbitrary data following the structure. There are
no changes to the structure layout reported by pahole, and the resulting
machine code is actually smaller.
Reported-by: Josef Oskera <joskera@redhat.com>
Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com
Fixes: f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time")
Cc: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When doing timestamping in lan966x and having PROVE_LOCKING
enabled the following warning is shown.
========================================================
WARNING: possible irq lock inversion dependency detected
6.2.0-rc7-01749-gc54e1f7f7e36 #2786 Tainted: G N
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
c2609f50 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x16c/0x2e8
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&lan966x->ptp_ts_id_lock){+.+.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&lan966x->ptp_ts_id_lock);
local_irq_disable();
lock(_xmit_ETHER#2);
lock(&lan966x->ptp_ts_id_lock);
<Interrupt>
lock(_xmit_ETHER#2);
*** DEADLOCK ***
5 locks held by swapper/0/0:
#0: c1001e18 ((&ndev->rs_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x33c
#1: c105e7c4 (rcu_read_lock){....}-{1:2}, at: ndisc_send_skb+0x134/0x81c
#2: c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x17c/0xc64
#3: c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x4c/0x1224
#4: c3056174 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x354/0x1224
the shortest dependencies between 2nd lock and 1st lock:
-> (&lan966x->ptp_ts_id_lock){+.+.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
lan966x_ptp_irq_handler+0x164/0x2a8
irq_thread_fn+0x1c/0x78
irq_thread+0x130/0x278
kthread+0xec/0x110
ret_from_fork+0x14/0x28
SOFTIRQ-ON-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
lan966x_ptp_irq_handler+0x164/0x2a8
irq_thread_fn+0x1c/0x78
irq_thread+0x130/0x278
kthread+0xec/0x110
ret_from_fork+0x14/0x28
INITIAL USE at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock_irqsave+0x4c/0x68
lan966x_ptp_txtstamp_request+0x128/0x1cc
lan966x_port_xmit+0x224/0x43c
dev_hard_start_xmit+0xa8/0x2f0
sch_direct_xmit+0x108/0x2e8
__dev_queue_xmit+0x41c/0x1224
packet_sendmsg+0xdb4/0x134c
__sys_sendto+0xd0/0x154
sys_send+0x18/0x20
ret_fast_syscall+0x0/0x1c
}
... key at: [<c174ba0c>] __key.2+0x0/0x8
... acquired at:
_raw_spin_lock_irqsave+0x4c/0x68
lan966x_ptp_txtstamp_request+0x128/0x1cc
lan966x_port_xmit+0x224/0x43c
dev_hard_start_xmit+0xa8/0x2f0
sch_direct_xmit+0x108/0x2e8
__dev_queue_xmit+0x41c/0x1224
packet_sendmsg+0xdb4/0x134c
__sys_sendto+0xd0/0x154
sys_send+0x18/0x20
ret_fast_syscall+0x0/0x1c
-> (_xmit_ETHER#2){+.-.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
netif_freeze_queues+0x38/0x68
dev_deactivate_many+0xac/0x388
dev_deactivate+0x38/0x6c
linkwatch_do_dev+0x70/0x8c
__linkwatch_run_queue+0xd4/0x1e8
linkwatch_event+0x24/0x34
process_one_work+0x284/0x744
worker_thread+0x28/0x4bc
kthread+0xec/0x110
ret_from_fork+0x14/0x28
IN-SOFTIRQ-W at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
sch_direct_xmit+0x16c/0x2e8
__dev_queue_xmit+0x41c/0x1224
ip6_finish_output2+0x5f4/0xc64
ndisc_send_skb+0x4cc/0x81c
addrconf_rs_timer+0xb0/0x2f8
call_timer_fn+0xb4/0x33c
expire_timers+0xb4/0x10c
run_timer_softirq+0xf8/0x2a8
__do_softirq+0xd4/0x5fc
__irq_exit_rcu+0x138/0x17c
irq_exit+0x8/0x28
__irq_svc+0x90/0xbc
arch_cpu_idle+0x30/0x3c
default_idle_call+0x44/0xac
do_idle+0xc8/0x138
cpu_startup_entry+0x18/0x1c
rest_init+0xcc/0x168
arch_post_acpi_subsys_init+0x0/0x8
INITIAL USE at:
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
netif_freeze_queues+0x38/0x68
dev_deactivate_many+0xac/0x388
dev_deactivate+0x38/0x6c
linkwatch_do_dev+0x70/0x8c
__linkwatch_run_queue+0xd4/0x1e8
linkwatch_event+0x24/0x34
process_one_work+0x284/0x744
worker_thread+0x28/0x4bc
kthread+0xec/0x110
ret_from_fork+0x14/0x28
}
... key at: [<c175974c>] netdev_xmit_lock_key+0x8/0x1c8
... acquired at:
__lock_acquire+0x978/0x2978
lock_acquire.part.0+0xb0/0x248
_raw_spin_lock+0x38/0x48
sch_direct_xmit+0x16c/0x2e8
__dev_queue_xmit+0x41c/0x1224
ip6_finish_output2+0x5f4/0xc64
ndisc_send_skb+0x4cc/0x81c
addrconf_rs_timer+0xb0/0x2f8
call_timer_fn+0xb4/0x33c
expire_timers+0xb4/0x10c
run_timer_softirq+0xf8/0x2a8
__do_softirq+0xd4/0x5fc
__irq_exit_rcu+0x138/0x17c
irq_exit+0x8/0x28
__irq_svc+0x90/0xbc
arch_cpu_idle+0x30/0x3c
default_idle_call+0x44/0xac
do_idle+0xc8/0x138
cpu_startup_entry+0x18/0x1c
rest_init+0xcc/0x168
arch_post_acpi_subsys_init+0x0/0x8
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G N 6.2.0-rc7-01749-gc54e1f7f7e36 #2786
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from mark_lock.part.0+0x59c/0x93c
mark_lock.part.0 from __lock_acquire+0x978/0x2978
__lock_acquire from lock_acquire.part.0+0xb0/0x248
lock_acquire.part.0 from _raw_spin_lock+0x38/0x48
_raw_spin_lock from sch_direct_xmit+0x16c/0x2e8
sch_direct_xmit from __dev_queue_xmit+0x41c/0x1224
__dev_queue_xmit from ip6_finish_output2+0x5f4/0xc64
ip6_finish_output2 from ndisc_send_skb+0x4cc/0x81c
ndisc_send_skb from addrconf_rs_timer+0xb0/0x2f8
addrconf_rs_timer from call_timer_fn+0xb4/0x33c
call_timer_fn from expire_timers+0xb4/0x10c
expire_timers from run_timer_softirq+0xf8/0x2a8
run_timer_softirq from __do_softirq+0xd4/0x5fc
__do_softirq from __irq_exit_rcu+0x138/0x17c
__irq_exit_rcu from irq_exit+0x8/0x28
irq_exit from __irq_svc+0x90/0xbc
Exception stack(0xc1001f20 to 0xc1001f68)
1f20: ffffffff ffffffff 00000001 c011f840 c100e000 c100e000 c1009314 c1009370
1f40: c10f0c1a c0d5e564 c0f5da8c 00000000 00000000 c1001f70 c010f0bc c010f0c0
1f60: 600f0013 ffffffff
__irq_svc from arch_cpu_idle+0x30/0x3c
arch_cpu_idle from default_idle_call+0x44/0xac
default_idle_call from do_idle+0xc8/0x138
do_idle from cpu_startup_entry+0x18/0x1c
cpu_startup_entry from rest_init+0xcc/0x168
rest_init from arch_post_acpi_subsys_init+0x0/0x8
Fix this by using spin_lock_irqsave/spin_lock_irqrestore also
inside lan966x_ptp_irq_handler.
Fixes: e85a96e48e33 ("net: lan966x: Add support for ptp interrupts")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230217210917.2649365-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We are growing the maintainer team for ieee802154 to spread the load for
review and general maintenance. Miquel has been driving the subsystem
forward over the last year and we would like to welcome him as a
maintainer.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230218211317.284889-4-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Alan Ott has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Alan for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-3-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Xue Liu has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Xue Liu for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-2-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Varka Bhadram has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Varka for his work on the driver.
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Despite that prev_hop is used conditionally on cur_hop
is not the first hop, it's initialized unconditionally.
Because initialization implies dereferencing, it might happen
that the code dereferences uninitialized memory, which has been
spotted by KASAN. Fix it by reorganizing hop_cmp() logic.
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/Y+7avK6V9SyAWsXi@yury-laptop/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|