Age | Commit message (Collapse) | Author | Files | Lines |
|
Stafford and I have been using this to shake out OpenRISC bugs, and it's
been a great help, so it's time OpenRISC support for the WireGuard test
suite is made into a proper commit. The QEMU changes necessary for this
to work should also be around the corner now, and they seem some what
stationary in their interface too.
Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This shoud open up various possibilities like time travel execution, and
is also just another platform to help shake out bugs.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In case push_rcu() and related functions are buggy, there's a
WARN_ON(len >= 128), which the selftest tries to hit by being tricky. In
case it is hit, we shouldn't corrupt the kernel's stack, though;
otherwise it may be hard to even receive the report that it's buggy. So
conditionalize the stack write based on that WARN_ON()'s return value.
Note that this never *actually* happens anyway. The WARN_ON() in the
first place is bounded by IS_ENABLED(DEBUG), and isn't expected to ever
actually hit. This is just a debugging sanity check.
Additionally, hoist the constant 128 into a named enum,
MAX_ALLOWEDIPS_BITS, so that it's clear why this value is chosen.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjJZGA6w_DxA+k7Ejbqsq+uGK==koPai3sqdsfJqemvag@mail.gmail.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The kernel.config and debug.config fragments in wireguard selftests mention
some config symbols that have been reworked:
Commit c5665868183f ("mm: kmemleak: use the memory pool for early
allocations") removes the config DEBUG_KMEMLEAK_EARLY_LOG_SIZE and since
then, the config's feature is available without further configuration.
Commit 4675ff05de2d ("kmemcheck: rip it out") removes kmemcheck and the
corresponding arch config HAVE_ARCH_KMEMCHECK. There is no need for this
config.
Commit 3bf195ae6037 ("netfilter: nat: merge nf_nat_ipv4,6 into nat core")
removes the config NF_NAT_IPV4 and since then, the config's feature is
available without further configuration.
Commit 41a2901e7d22 ("rcu: Remove SPARSE_RCU_POINTER Kconfig option")
removes the config SPARSE_RCU_POINTER and since then, the config's feature
is enabled by default.
Commit dfb4357da6dd ("time: Remove CONFIG_TIMER_STATS") removes the feature
and config CONFIG_TIMER_STATS without any replacement.
Commit 3ca17b1f3628 ("lib/ubsan: remove null-pointer checks") removes the
check and config UBSAN_NULL without any replacement.
Adjust the config fragments to those changes in configs.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Using msleep() is problematic because it's compared against
ratelimiter.c's ktime_get_coarse_boottime_ns(), which means on systems
with slow jiffies (such as UML's forced HZ=100), the result is
inaccurate. So switch to using schedule_hrtimeout().
However, hrtimer gives us access only to the traditional posix timers,
and none of the _COARSE variants. So now, rather than being too
imprecise like jiffies, it's too precise.
One solution would be to give it a large "range" value, but this will
still fire early on a loaded system. A better solution is to align the
timeout to the actual coarse timer, and then round up to the nearest
tick, plus change.
So add the timeout to the current coarse time, and then
schedule_hrtimer() until the absolute computed time.
This should hopefully reduce flakes in CI as well. Note that we keep the
retry loop in case the entire function is running behind, because the
test could still be scheduled out, by either the kernel or by the
hypervisor's kernel, in which case restarting the test and hoping to not
be scheduled out still helps.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Ido Schimmel says:
====================
netdevsim: fib: Fix reference count leak on route deletion failure
Fix a recently reported netdevsim bug found using syzkaller.
Patch #1 fixes the bug.
Patch #2 adds a debugfs knob to allow us to test the fix.
Patch #3 adds test cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add IPv4 and IPv6 test cases that ensure that we are not leaking a
reference on the nexthop device when we are unable to delete its
associated route.
Without the fix in a previous patch ("netdevsim: fib: Fix reference
count leak on route deletion failure") both test cases get stuck,
waiting for the reference to be released from the dummy device [1][2].
[1]
unregister_netdevice: waiting for dummy1 to become free. Usage count = 5
leaked reference.
fib_check_nh+0x275/0x620
fib_create_info+0x237c/0x4d30
fib_table_insert+0x1dd/0x1d20
inet_rtm_newroute+0x11b/0x200
rtnetlink_rcv_msg+0x43b/0xd20
netlink_rcv_skb+0x15e/0x430
netlink_unicast+0x53b/0x800
netlink_sendmsg+0x945/0xe40
____sys_sendmsg+0x747/0x960
___sys_sendmsg+0x11d/0x190
__sys_sendmsg+0x118/0x1e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[2]
unregister_netdevice: waiting for dummy1 to become free. Usage count = 5
leaked reference.
fib6_nh_init+0xc46/0x1ca0
ip6_route_info_create+0x1167/0x19a0
ip6_route_add+0x27/0x150
inet6_rtm_newroute+0x161/0x170
rtnetlink_rcv_msg+0x43b/0xd20
netlink_rcv_skb+0x15e/0x430
netlink_unicast+0x53b/0x800
netlink_sendmsg+0x945/0xe40
____sys_sendmsg+0x747/0x960
___sys_sendmsg+0x11d/0x190
__sys_sendmsg+0x118/0x1e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous patch ("netdevsim: fib: Fix reference count leak on route
deletion failure") fixed a reference count leak that happens on route
deletion failure.
Such failures can only be simulated by injecting slab allocation
failures, which cannot be surgically injected.
In order to be able to specifically test this scenario, add a debugfs
knob that allows user space to fail route deletion requests when
enabled.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As part of FIB offload simulation, netdevsim stores IPv4 and IPv6 routes
and holds a reference on FIB info structures that in turn hold a
reference on the associated nexthop device(s).
In the unlikely case where we are unable to allocate memory to process a
route deletion request, netdevsim will not release the reference from
the associated FIB info structure, thereby preventing the associated
nexthop device(s) from ever being removed [1].
Fix this by scheduling a work item that will flush netdevsim's FIB table
upon route deletion failure. This will cause netdevsim to release its
reference from all the FIB info structures in its table.
Reported by Lucas Leong of Trend Micro Zero Day Initiative.
Fixes: 0ae3eb7b4611 ("netdevsim: fib: Perform the route programming in a non-atomic context")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit 3c82a21f4320 ("net: allow binding socket in a VRF when
there's an unbound socket") changed the inet socket lookup to avoid
packets in a VRF from matching an unbound socket. This is to ensure the
necessary isolation between the default and other VRFs for routing and
forwarding. VRF-unaware processes running in the default VRF cannot
access another VRF and have to be run with 'ip vrf exec <vrf>'. This is
to be expected with tcp_l3mdev_accept disabled, but could be reallowed
when this sysctl option is enabled. So instead of directly checking dif
and sdif in inet[6]_match, here call inet_sk_bound_dev_eq(). This
allows a match on unbound socket for non-zero sdif i.e. for packets in
a VRF, if tcp_l3mdev_accept is enabled.
Fixes: 3c82a21f4320 ("net: allow binding socket in a VRF when there's an unbound socket")
Signed-off-by: Mike Manning <mvrmanning@gmail.com>
Link: https://lore.kernel.org/netdev/a54c149aed38fded2d3b5fdb1a6c89e36a083b74.camel@lasnet.de/
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While investigating a separate rose issue [1], and enabling
CONFIG_NET_DEV_REFCNT_TRACKER=y, Bernard reported an orthogonal ax25 issue [2]
An ax25_dev can be used by one (or many) struct ax25_cb.
We thus need different dev_tracker, one per struct ax25_cb.
After this patch is applied, we are able to focus on rose.
[1] https://lore.kernel.org/netdev/fb7544a1-f42e-9254-18cc-c9b071f4ca70@free.fr/
[2]
[ 205.798723] reference already released.
[ 205.798732] allocated in:
[ 205.798734] ax25_bind+0x1a2/0x230 [ax25]
[ 205.798747] __sys_bind+0xea/0x110
[ 205.798753] __x64_sys_bind+0x18/0x20
[ 205.798758] do_syscall_64+0x5c/0x80
[ 205.798763] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 205.798768] freed in:
[ 205.798770] ax25_release+0x115/0x370 [ax25]
[ 205.798778] __sock_release+0x42/0xb0
[ 205.798782] sock_close+0x15/0x20
[ 205.798785] __fput+0x9f/0x260
[ 205.798789] ____fput+0xe/0x10
[ 205.798792] task_work_run+0x64/0xa0
[ 205.798798] exit_to_user_mode_prepare+0x18b/0x190
[ 205.798804] syscall_exit_to_user_mode+0x26/0x40
[ 205.798808] do_syscall_64+0x69/0x80
[ 205.798812] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 205.798827] ------------[ cut here ]------------
[ 205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81
[ 205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video
[ 205.798948] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25]
[ 205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3
[ 205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[ 205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81
[ 205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e
[ 205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286
[ 205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000
[ 205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff
[ 205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618
[ 205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0
[ 205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001
[ 205.799024] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000
[ 205.799028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0
[ 205.799033] Call Trace:
[ 205.799035] <TASK>
[ 205.799038] ? ax25_dev_device_down+0xd9/0x1b0 [ax25]
[ 205.799047] ? ax25_device_event+0x9f/0x270 [ax25]
[ 205.799055] ? raw_notifier_call_chain+0x49/0x60
[ 205.799060] ? call_netdevice_notifiers_info+0x52/0xa0
[ 205.799065] ? dev_close_many+0xc8/0x120
[ 205.799070] ? unregister_netdevice_many+0x13d/0x890
[ 205.799073] ? unregister_netdevice_queue+0x90/0xe0
[ 205.799076] ? unregister_netdev+0x1d/0x30
[ 205.799080] ? mkiss_close+0x7c/0xc0 [mkiss]
[ 205.799084] ? tty_ldisc_close+0x2e/0x40
[ 205.799089] ? tty_ldisc_hangup+0x137/0x210
[ 205.799092] ? __tty_hangup.part.0+0x208/0x350
[ 205.799098] ? tty_vhangup+0x15/0x20
[ 205.799103] ? pty_close+0x127/0x160
[ 205.799108] ? tty_release+0x139/0x5e0
[ 205.799112] ? __fput+0x9f/0x260
[ 205.799118] ax25_dev_device_down+0xd9/0x1b0 [ax25]
[ 205.799126] ax25_device_event+0x9f/0x270 [ax25]
[ 205.799135] raw_notifier_call_chain+0x49/0x60
[ 205.799140] call_netdevice_notifiers_info+0x52/0xa0
[ 205.799146] dev_close_many+0xc8/0x120
[ 205.799152] unregister_netdevice_many+0x13d/0x890
[ 205.799157] unregister_netdevice_queue+0x90/0xe0
[ 205.799161] unregister_netdev+0x1d/0x30
[ 205.799165] mkiss_close+0x7c/0xc0 [mkiss]
[ 205.799170] tty_ldisc_close+0x2e/0x40
[ 205.799173] tty_ldisc_hangup+0x137/0x210
[ 205.799178] __tty_hangup.part.0+0x208/0x350
[ 205.799184] tty_vhangup+0x15/0x20
[ 205.799188] pty_close+0x127/0x160
[ 205.799193] tty_release+0x139/0x5e0
[ 205.799199] __fput+0x9f/0x260
[ 205.799203] ____fput+0xe/0x10
[ 205.799208] task_work_run+0x64/0xa0
[ 205.799213] do_exit+0x33b/0xab0
[ 205.799217] ? __handle_mm_fault+0xc4f/0x15f0
[ 205.799224] do_group_exit+0x35/0xa0
[ 205.799228] __x64_sys_exit_group+0x18/0x20
[ 205.799232] do_syscall_64+0x5c/0x80
[ 205.799238] ? handle_mm_fault+0xba/0x290
[ 205.799242] ? debug_smp_processor_id+0x17/0x20
[ 205.799246] ? fpregs_assert_state_consistent+0x26/0x50
[ 205.799251] ? exit_to_user_mode_prepare+0x49/0x190
[ 205.799256] ? irqentry_exit_to_user_mode+0x9/0x20
[ 205.799260] ? irqentry_exit+0x33/0x40
[ 205.799263] ? exc_page_fault+0x87/0x170
[ 205.799268] ? asm_exc_page_fault+0x8/0x30
[ 205.799273] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 205.799277] RIP: 0033:0x7ff6b80eaca1
[ 205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77.
[ 205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1
[ 205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
[ 205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028
[ 205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00
[ 205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00
[ 205.799304] </TASK>
Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Bernard F6BVP <f6bvp@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220728051821.3160118-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter, no known blockers for
the release.
Current release - regressions:
- wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop(), fix
taking the lock before its initialized
- Bluetooth: mgmt: fix double free on error path
Current release - new code bugs:
- eth: ice: fix tunnel checksum offload with fragmented traffic
Previous releases - regressions:
- tcp: md5: fix IPv4-mapped support after refactoring, don't take the
pure v6 path
- Revert "tcp: change pingpong threshold to 3", improving detection
of interactive sessions
- mld: fix netdev refcount leak in mld_{query | report}_work() due to
a race
- Bluetooth:
- always set event mask on suspend, avoid early wake ups
- L2CAP: fix use-after-free caused by l2cap_chan_put
- bridge: do not send empty IFLA_AF_SPEC attribute
Previous releases - always broken:
- ping6: fix memleak in ipv6_renew_options()
- sctp: prevent null-deref caused by over-eager error paths
- virtio-net: fix the race between refill work and close, resulting
in NAPI scheduled after close and a BUG()
- macsec:
- fix three netlink parsing bugs
- avoid breaking the device state on invalid change requests
- fix a memleak in another error path
Misc:
- dt-bindings: net: ethernet-controller: rework 'fixed-link' schema
- two more batches of sysctl data race adornment"
* tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
stmmac: dwmac-mediatek: fix resource leak in probe
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net: ping6: Fix memleak in ipv6_renew_options().
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
sfc: disable softirqs for ptp TX
ptp: ocp: Select CRC16 in the Kconfig.
tcp: md5: fix IPv4-mapped support
virtio-net: fix the race between refill work and close
mptcp: Do not return EINPROGRESS when subflow creation succeeds
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Bluetooth: Always set event mask on suspend
Bluetooth: mgmt: Fix double free on error path
wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
ice: do not setup vlan for loopback VSI
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: Fix VSIs unable to share unicast MAC
ice: Fix tunnel checksum offload with fragmented traffic
ice: Fix max VLANs available for VF
netfilter: nft_queue: only allow supported familes and hooks
...
|
|
If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt()
before returning. Otherwise it is a resource leak.
Fixes: fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.
=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr 0000000000000308 by task ping6/263
CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
dump_backtrace+0x1a8/0x230
show_stack+0x20/0x70
dump_stack_lvl+0x68/0x84
print_report+0xc4/0x120
kasan_report+0x84/0x120
__asan_load4+0x94/0xd0
find_match.part.0+0x70/0x134
__find_rr_leaf+0x408/0x470
fib6_table_lookup+0x264/0x540
ip6_pol_route+0xf4/0x260
ip6_pol_route_output+0x58/0x70
fib6_rule_lookup+0x1a8/0x330
ip6_route_output_flags_noref+0xd8/0x1a0
ip6_route_output_flags+0x58/0x160
ip6_dst_lookup_tail+0x5b4/0x85c
ip6_dst_lookup_flow+0x98/0x120
rawv6_sendmsg+0x49c/0xc70
inet_sendmsg+0x68/0x94
Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1
Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done
$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done
It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.
cpu0 cpu1
fib6_table_lookup
__find_rr_leaf
addrconf_notify [ NETDEV_CHANGEMTU ]
addrconf_ifdown
RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown
So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.
Fixes: dcd1f572954f ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy(). As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr;
char data[24] = {0};
int fd;
hdr = (struct ipv6_sr_hdr *)data;
hdr->hdrlen = 2;
hdr->type = IPV6_SRCRT_TYPE_4;
fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
close(fd);
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range. The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0). Thus, the local DoS does
not succeed until we change the default value. However, at least
Ubuntu/Fedora/RHEL loosen it.
$ cat /usr/lib/sysctl.d/50-default.conf
...
-net.ipv4.ping_group_range = 0 2147483647
Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.
setsockopt
IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
IPV6_HOPOPTS (inet6_sk(sk)->opt)
IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
IPV6_RTHDR (inet6_sk(sk)->opt)
IPV6_DSTOPTS (inet6_sk(sk)->opt)
IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
getsockopt
IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
For the record, I left a different splat with syzbot's one.
unreferenced object 0xffff888006270c60 (size 96):
comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
hex dump (first 32 bytes):
01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
[<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
[<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
[<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
[<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
[<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.
KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.
While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.
Fix this by locking the watcher list in add_watch_to_object().
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: keyrings@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).
Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The current implementation of fun_xdp_tx(), used for XPD_TX, is
incorrect in that it takes an address/length pair and later releases it
with page_frag_free(). It is OK for XDP_TX but the same code is used by
ndo_xdp_xmit. In that case it loses the XDP memory type and releases the
packet incorrectly for some of the types. Assorted breakage follows.
Change fun_xdp_tx() to take xdp_frame and rely on xdp_return_frame() in
reclaim.
Fixes: db37bc177dae ("net/funeth: add the data path")
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Link: https://lore.kernel.org/r/20220726215923.7887-1-dmichail@fungible.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-07-26
This series contains updates to ice driver only.
Przemyslaw corrects accounting for VF VLANs to allow for correct number
of VLANs for untrusted VF. He also correct issue with checksum offload
on VXLAN tunnels.
Ani allows for two VSIs to share the same MAC address.
Maciej corrects checked bits for descriptor completion of loopback
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: do not setup vlan for loopback VSI
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: Fix VSIs unable to share unicast MAC
ice: Fix tunnel checksum offload with fragmented traffic
ice: Fix max VLANs available for VF
====================
Link: https://lore.kernel.org/r/20220726204646.2171589-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A NULL pointer dereference was reported by Wei Chen:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:__list_del_entry_valid+0x26/0x80
Call Trace:
<TASK>
sctp_sched_dequeue_common+0x1c/0x90
sctp_sched_prio_dequeue+0x67/0x80
__sctp_outq_teardown+0x299/0x380
sctp_outq_free+0x15/0x20
sctp_association_free+0xc3/0x440
sctp_do_sm+0x1ca7/0x2210
sctp_assoc_bh_rcv+0x1f6/0x340
This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.
As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.
Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.
Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sending a PTP packet can imply to use the normal TX driver datapath but
invoked from the driver's ptp worker. The kernel generic TX code
disables softirqs and preemption before calling specific driver TX code,
but the ptp worker does not. Although current ptp driver functionality
does not require it, there are several reasons for doing so:
1) The invoked code is always executed with softirqs disabled for non
PTP packets.
2) Better if a ptp packet transmission is not interrupted by softirq
handling which could lead to high latencies.
3) netdev_xmit_more used by the TX code requires preemption to be
disabled.
Indeed a solution for dealing with kernel preemption state based on static
kernel configuration is not possible since the introduction of dynamic
preemption level configuration at boot time using the static calls
functionality.
Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The crc16() function is used to check the firmware validity, but
the library was not explicitly selected.
Fixes: 3c3673bde50c ("ptp: ocp: Add firmware header checks")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Vadim Fedorenko <vadfed@fb.com>
Link: https://lore.kernel.org/r/20220726220604.1339972-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After the blamed commit, IPv4 SYN packets handled
by a dual stack IPv6 socket are dropped, even if
perfectly valid.
$ nstat | grep MD5
TcpExtTCPMD5Failure 5 0.0
For a dual stack listener, an incoming IPv4 SYN packet
would call tcp_inbound_md5_hash() with @family == AF_INET,
while tp->af_specific is pointing to tcp_sock_ipv6_specific.
Only later when an IPv4-mapped child is created, tp->af_specific
is changed to tcp_sock_ipv6_mapped_specific.
Fixes: 7bbb765b7349 ("net/tcp: Merge TCP-MD5 inbound callbacks")
Reported-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Tested-by: Leonard Crestez <cdleonard@gmail.com>
Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull asm-generic fixes from Arnd Bergmann:
"Two more bug fixes for asm-generic, one addressing an incorrect
Kconfig symbol reference and another one fixing a build failure for
the perf tool on mips and possibly others"
* tag 'asm-generic-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: remove a broken and needless ifdef conditional
tools: Fixed MIPS builds due to struct flock re-definition
|
|
Pull ARM SoC fixes from Arnd Bergmann:
"One last set of changes for the soc tree:
- fix clock frequency on lan966x
- fix incorrect GPIO numbers on some pxa machines
- update Baolin's email address"
* tag 'soc-fixes-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: pxa2xx: Fix GPIO descriptor tables
mailmap: update Baolin Wang's email
ARM: dts: lan966x: fix sys_clk frequency
|
|
We try using cancel_delayed_work_sync() to prevent the work from
enabling NAPI. This is insufficient since we don't disable the source
of the refill work scheduling. This means an NAPI poll callback after
cancel_delayed_work_sync() can schedule the refill work then can
re-enable the NAPI that leads to use-after-free [1].
Since the work can enable NAPI, we can't simply disable NAPI before
calling cancel_delayed_work_sync(). So fix this by introducing a
dedicated boolean to control whether or not the work could be
scheduled from NAPI.
[1]
==================================================================
BUG: KASAN: use-after-free in refill_work+0x43/0xd4
Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42
CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events refill_work
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0xbb/0x6ac
? _printk+0xad/0xde
? refill_work+0x43/0xd4
kasan_report+0xa8/0x130
? refill_work+0x43/0xd4
refill_work+0x43/0xd4
process_one_work+0x43d/0x780
worker_thread+0x2a0/0x6f0
? process_one_work+0x780/0x780
kthread+0x167/0x1a0
? kthread_exit+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
...
Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
New subflows are created within the kernel using O_NONBLOCK, so
EINPROGRESS is the expected return value from kernel_connect().
__mptcp_subflow_connect() has the correct logic to consider EINPROGRESS
to be a successful case, but it has also used that error code as its
return value.
Before v5.19 this was benign: all the callers ignored the return
value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic
netlink command that does use the return value, so the EINPROGRESS gets
propagated to userspace.
Make __mptcp_subflow_connect() always return 0 on success instead.
Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests")
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Florian Westphal says:
====================
netfilter updates for net
Three late fixes for netfilter:
1) If nf_queue user requests packet truncation below size of l3 header,
we corrupt the skb, then crash. Reject such requests.
2) add cond_resched() calls when doing cycle detection in the
nf_tables graph. This avoids softlockup warning with certain
rulesets.
3) Reject rulesets that use nftables 'queue' expression in family/chain
combinations other than those that are supported. Currently the ruleset
will load, but when userspace attempts to reinject you get WARN splat +
packet drops.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_queue: only allow supported familes and hooks
netfilter: nf_tables: add rescheduling points during loop detection walks
netfilter: nf_queue: do not allow packet truncation below transport header offset
====================
Link: https://lore.kernel.org/r/20220726192056.13497-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix early wakeup after suspend
- Fix double free on error
- Fix use-after-free on l2cap_chan_put
* tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Bluetooth: Always set event mask on suspend
Bluetooth: mgmt: Fix double free on error path
====================
Link: https://lore.kernel.org/r/20220726221328.423714-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull misc fixes from Andrew Morton:
"Thirteen hotfixes.
Eight are cc:stable and the remainder are for post-5.18 issues or are
too minor to warrant backporting"
* tag 'mm-hotfixes-stable-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: update Gao Xiang's email addresses
userfaultfd: provide properly masked address for huge-pages
Revert "ocfs2: mount shared volume without ha stack"
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
fs: sendfile handles O_NONBLOCK of out_fd
ntfs: fix use-after-free in ntfs_ucsncmp()
secretmem: fix unhandled fault in truncate
mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
mm: fix missing wake-up event for FSDAX pages
mm: fix page leak with multiple threads mapping the same page
mailmap: update Seth Forshee's email address
tmpfs: fix the issue that the mount and remount results are inconsistent.
mm: kfence: apply kmemleak_ignore_phys on early allocated pool
|
|
I've been in Alibaba Cloud for more than one year, mainly to address
cloud-native challenges (such as high-performance container images) for
open source communities.
Update my email addresses on behalf of my current employer (Alibaba Cloud)
to support all my (team) work in this area. Also add an outdated
@redhat.com address of me.
Link: https://lkml.kernel.org/r/20220719154246.62970-1-xiang@kernel.org
Signed-off-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 824ddc601adc ("userfaultfd: provide unmasked address on
page-fault") was introduced to fix an old bug, in which the offset in the
address of a page-fault was masked. Concerns were raised - although were
never backed by actual code - that some userspace code might break because
the bug has been around for quite a while. To address these concerns a
new flag was introduced, and only when this flag is set by the user,
userfaultfd provides the exact address of the page-fault.
The commit however had a bug, and if the flag is unset, the offset was
always masked based on a base-page granularity. Yet, for huge-pages, the
behavior prior to the commit was that the address is masked to the
huge-page granulrity.
While there are no reports on real breakage, fix this issue. If the flag
is unset, use the address with the masking that was done before.
Link: https://lkml.kernel.org/r/20220711165906.2682-1-namit@vmware.com
Fixes: 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.
refcount_t: increment on 0; use-after-free.
BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W
4.14.234-00003-g1fb6d0bd49a4-dirty #28
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
Google Inc. MSM sm8150 Flame DVT (DT)
Workqueue: hci0 hci_rx_work
Call trace:
dump_backtrace+0x0/0x378
show_stack+0x20/0x2c
dump_stack+0x124/0x148
print_address_description+0x80/0x2e8
__kasan_report+0x168/0x188
kasan_report+0x10/0x18
__asan_load4+0x84/0x8c
refcount_dec_and_test+0x20/0xd0
l2cap_chan_put+0x48/0x12c
l2cap_recv_frame+0x4770/0x6550
l2cap_recv_acldata+0x44c/0x7a4
hci_acldata_packet+0x100/0x188
hci_rx_work+0x178/0x23c
process_one_work+0x35c/0x95c
worker_thread+0x4cc/0x960
kthread+0x1a8/0x1c4
ret_from_fork+0x10/0x18
Cc: stable@kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
When suspending, always set the event mask once disconnects are
successful. Otherwise, if wakeup is disallowed, the event mask is not
set before suspend continues and can result in an early wakeup.
Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Cc: stable@vger.kernel.org
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Don't call mgmt_pending_remove() twice (double free).
Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
that is being stopped") guards clear_bit() using fq.lock even before
fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
According to discussion [2], Toke was not happy with expanding usage of
fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().
Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp
[ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next]
Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently loopback test is failiing due to the error returned from
ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI.
Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Tx side sets EOP and RS bits on descriptors to indicate that a
particular descriptor is the last one and needs to generate an irq when
it was sent. These bits should not be checked on completion path
regardless whether it's the Tx or the Rx. DD bit serves this purpose and
it indicates that a particular descriptor is either for Rx or was
successfully Txed. EOF is also set as loopback test does not xmit
fragmented frames.
Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead
of EOP and RS pair.
Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The driver currently does not allow two VSIs in the same PF domain
to have the same unicast MAC address. This is incorrect in the sense
that a policy decision is being made in the driver when it must be
left to the user. This approach was causing issues when rebooting
the system with VFs spawned not being able to change their MAC addresses.
Such errors were present in dmesg:
[ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already
exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1
Fix that by removing this restriction. Doing this also allows
us to remove some additional code that's checking if a unicast MAC
filter already exists.
Fixes: 47ebc7b02485 ("ice: Check if unicast MAC exists before setting VF MAC")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix checksum offload on VXLAN tunnels.
In case, when mpls protocol is not used, set l4 header to transport
header of skb. This fixes case, when user tries to offload checksums
of VXLAN tunneled traffic.
Steps for reproduction (requires link partner with tunnels):
ip l s enp130s0f0 up
ip a f enp130s0f0
ip a a 10.10.110.2/24 dev enp130s0f0
ip l s enp130s0f0 mtu 1600
ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789
ip l s vxlan12_sut up
ip a a 20.10.110.2/24 dev vxlan12_sut
iperf3 -c 20.10.110.1 #should connect
Offload params: td_offset, cd_tunnel_params were
corrupted, due to l4 header pointing wrong address. NIC would then drop
those packets internally, due to incorrect TX descriptor data,
which increased GLV_TEPC register.
Fixes: 69e66c04c672 ("ice: Add mpls+tso support")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Legacy VLAN implementation allows for untrusted VF to have 8 VLAN
filters, not counting VLAN 0 filters. Current VLAN_V2 implementation
lowers available filters for VF, by counting in VLAN 0 filter for both
TPIDs.
Fix this by counting only non zero VLAN filters.
Without this patch, untrusted VF would not be able to access 8 VLAN
filters.
Fixes: cc71de8fa133 ("ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Trying to use 'queue' statement in ingress (for example)
triggers a splat on reinject:
WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291
... because nf_reinject cannot find the ruleset head.
The netdev family doesn't support async resume at the moment anyway,
so disallow loading such rulesets with a more appropriate
error message.
v2: add 'validate' callback and also check hook points, v1 did
allow ingress use in 'table inet', but that doesn't work either. (Pablo)
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add explicit rescheduling points during ruleset walk.
Switching to a faster algorithm is possible but this is a much
smaller change, suitable for nf tree.
Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Domingo Dirutigliano and Nicola Guerrera report kernel panic when
sending nf_queue verdict with 1-byte nfta_payload attribute.
The IP/IPv6 stack pulls the IP(v6) header from the packet after the
input hook.
If user truncates the packet below the header size, this skb_pull() will
result in a malformed skb (skb->len < 0).
Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull s390 fix from Alexander GordeevL
- Prevent relatively slow PRNO TRNG random number operation from being
called from interrupt context. That could for example cause some
network loads to timeout.
* tag 's390-5.19-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/archrandom: prevent CPACF trng invocations in interrupt context
|
|
The vmf->page can be NULL when the wp_page_reuse() is invoked by
wp_pfn_shared(), it will cause the following panic:
BUG: kernel NULL pointer dereference, address: 000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 18 PID: 923 Comm: Xorg Not tainted 5.19.0-rc8.bm.1-amd64 #263
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g14
RIP: 0010:_compound_head+0x0/0x40
[...]
Call Trace:
wp_page_reuse+0x1c/0xa0
do_wp_page+0x1a5/0x3f0
__handle_mm_fault+0x8cf/0xd20
handle_mm_fault+0xd5/0x2a0
do_user_addr_fault+0x1d0/0x680
exc_page_fault+0x78/0x170
asm_exc_page_fault+0x22/0x30
To fix it, this patch performs a NULL pointer check before dereferencing
the vmf->page.
Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After commit b6c02ef54913 ("bridge: Netlink interface fix."),
br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a
bridge vlan dump is requested but an interface does not have any vlans
configured.
iproute2 ignores such an empty attribute since commit b262a9becbcb
("bridge: Fix output with empty vlan lists") but older iproute2 versions as
well as other utilities have their output changed by the cited kernel
commit, resulting in failed test cases. Regardless, emitting an empty
attribute is pointless and inefficient.
Avoid this change by canceling the attribute if no AF_SPEC data was added.
Fixes: b6c02ef54913 ("bridge: Netlink interface fix.")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Subbaraya Sundeep says:
====================
Octeontx2 minor tc fixes
This patch set fixes two problems found in tc code
wrt to ratelimiting and when installing UDP/TCP filters.
Patch 1: CN10K has different register format compared to
CN9xx hence fixes that.
Patch 2: Check flow mask also before installing a src/dst
port filter, otherwise installing for one port installs for other one too.
====================
Link: https://lore.kernel.org/r/1658650874-16459-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Check the mask for non-zero value before installing tc filters
for L4 source and destination ports. Otherwise installing a
filter for source port installs destination port too and
vice-versa.
Fixes: 1d4d9e42c240 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2
to CN10K. CN10K supports larger burst size. Fix burst exponent
and burst mantissa configuration for CN10K.
Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps'
passed by stack is also u64.
Fixes: e638a83f167e ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|