Age | Commit message (Collapse) | Author | Files | Lines |
|
It seems that rtnl_group_changelink() can call do_setlink
while a prior call to validate_linkmsg(dev = NULL, ...) could
not validate IFLA_ADDRESS / IFLA_BROADCAST
Make sure do_setlink() calls validate_linkmsg() instead
of letting its callers having this responsibility.
With help from Dmitry Vyukov, thanks a lot !
BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
BUG: KMSAN: uninit-value in eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
BUG: KMSAN: uninit-value in eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
CPU: 1 PID: 8695 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #103
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
__msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686
is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
dev_set_mac_address+0x261/0x530 net/core/dev.c:7157
do_setlink+0xbc3/0x5fc0 net/core/rtnetlink.c:2317
rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x455a09
RSP: 002b:00007fc07480ec68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fc07480f6d4 RCX: 0000000000455a09
RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000014
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000
Uninit was stored to memory at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527
__msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:478
do_setlink+0xb84/0x5fc0 net/core/rtnetlink.c:2315
rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2753 [inline]
__kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:988 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: e7ed828f10bd ("netlink: support setting devgroup parameters")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Return messages in extack for port split/unsplit errors. e.g.,
$ devlink port split swp1s1 count 4
Error: mlxsw_spectrum: Port cannot be split further.
devlink answers: Invalid argument
$ devlink port unsplit swp4
Error: mlxsw_spectrum: Port was not split.
devlink answers: Invalid argument
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
devlink reset command can fail if a FIB resource limit is set to a value
lower than the current occupancy. Return a proper message indicating the
reason for the failure.
$ devlink resource sh netdevsim/netdevsim0
netdevsim/netdevsim0:
name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 43 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 4 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 54 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 3 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 40
$ devlink dev reload netdevsim/netdevsim0
Error: netdevsim: New size is less than current occupancy.
devlink answers: Invalid argument
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add extack argument to reload, port_split and port_unsplit operations.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before using nla_get_u32(), better make sure the attribute
is of the proper size.
Code recently was changed, but bug has been there from beginning
of git.
BUG: KMSAN: uninit-value in rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746
CPU: 1 PID: 14139 Comm: syz-executor6 Not tainted 4.17.0-rc5+ #103
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
__msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686
rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746
fib_dump_info+0xc42/0x2190 net/ipv4/fib_semantics.c:1361
rtmsg_fib+0x65f/0x8c0 net/ipv4/fib_semantics.c:419
fib_table_insert+0x2314/0x2b50 net/ipv4/fib_trie.c:1287
inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779
rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x455a09
RSP: 002b:00007faae5fd8c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007faae5fd96d4 RCX: 0000000000455a09
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000
Uninit was stored to memory at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
__msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:529
fib_convert_metrics net/ipv4/fib_semantics.c:1056 [inline]
fib_create_info+0x2d46/0x9dc0 net/ipv4/fib_semantics.c:1150
fib_table_insert+0x3e4/0x2b50 net/ipv4/fib_trie.c:1146
inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779
rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2753 [inline]
__kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:988 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
refactored ipmr_new_table, so that it now returns NULL when
mr_table_alloc fails. Unfortunately, all callers of ipmr_new_table
expect an ERR_PTR.
This can result in NULL deref, for example when ipmr_rules_exit calls
ipmr_free_table with NULL net->ipv4.mrt in the
!CONFIG_IP_MROUTE_MULTIPLE_TABLES version.
This patch makes mr_table_alloc return errors, and changes
ip6mr_new_table and its callers to return/expect error pointers as
well. It also removes the version of mr_table_alloc defined under
!CONFIG_IP_MROUTE_COMMON, since it is never used.
Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during
ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same
setsockopt will fail with -ENOENT, since we haven't actually created
that table.
A similar fix for ipv4 was included in commit 5e1859fbcc3c ("ipv4: ipmr:
various fixes and cleanups").
Fixes: d1db275dd3f6 ("ipv6: ip6mr: support multiple tables")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The last patch apparently added a complete replacement for this
function, but left the old one in place, which now causes a
harmless warning:
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c:731:12: 'hclgevf_cfg_func_mta_filter' defined but not used
I assume it can be removed.
Fixes: 3a678b5806e6 ("net: hns3: Optimize the VF's process of updating multicast MAC")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is now possible to enable the libified nf_tproxy modules without
also enabling NETFILTER_XT_TARGET_TPROXY, which throws off the
ifdef logic in the udp core code:
net/ipv6/netfilter/nf_tproxy_ipv6.o: In function `nf_tproxy_get_sock_v6':
nf_tproxy_ipv6.c:(.text+0x1a8): undefined reference to `udp6_lib_lookup'
net/ipv4/netfilter/nf_tproxy_ipv4.o: In function `nf_tproxy_get_sock_v4':
nf_tproxy_ipv4.c:(.text+0x3d0): undefined reference to `udp4_lib_lookup'
We can actually simplify the conditions now to provide the two functions
exactly when they are needed.
Fixes: 45ca4e0cf273 ("netfilter: Libify xt_TPROXY")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This FW contains several fixes and features.
RDMA
- Several modifications and fixes for Memory Windows
- drop vlan and tcp timestamp from mss calculation in driver for
this FW
- Fix SQ completion flow when local ack timeout is infinite
- Modifications in t10dif support
ETH
- Fix aRFS for tunneled traffic without inner IP.
- Fix chip configuration which may fail under heavy traffic conditions.
- Support receiving any-VNI in VXLAN and GENEVE RX classification.
iSCSI / FcoE
- Fix iSCSI recovery flow
- Drop vlan and tcp timestamp from mss calc for fw 8.37.2.0
Misc
- Several registers (split registers) won't read correctly with
ethtool -d
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tested: 'git grep tw_timeout' comes up empty and it builds :-)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When using a vxlan device as the ingress dev, we count it as a
"no offload dev", so when such a rule comes and err stop is true,
we fail early and don't try the egdev route which can offload it
through the egress device.
Fix that by not calling the block offload if one of the devices
attached to it is not offload capable, but make sure egress on such case
is capable instead.
Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported a rcu_sched self-detected stall on CPU which is caused
by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
value, hb_timer will get stuck there, as in its timer handler it starts
this timer again with this value, then goes to the timer handler again.
This problem is there since very beginning, and thanks to Eric for the
reproducer shared from a syzbot mail.
This patch fixes it by not allowing sctp_transport_timeout to return a
smaller value than HZ/5 for hb_timer, which is based on TCP's min rto.
Note that it doesn't fix this issue by limiting rto_min, as some users
are still using small rto and no proper value was found for it yet.
Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
check that CC can build executables and use that compiler instead of HOSTCC
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix to return error code -ENOMEM from the kvzalloc_node() error handling
case instead of 0, as done elsewhere in this function.
Fixes: 069d11465a80 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following sparse warning:
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:903:5: warning:
symbol 'mlx5e_change_rep_mtu' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When sending an ack to a command packet, the skb is still referenced
after it is sent to the real device. Since the real device could
free the skb, the device pointer would be invalid.
Also, remove an unnecessary variable.
Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RAWIP devices such as rmnet do not have a hardware address and
instead require the kernel to generate a random IID for the
IPv6 addresses.
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock*
instead of tcp_sock* to clean up type casts. This is a pure refactor
patch.
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported a use-after-free:
BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555
CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
...
Allocated by task 4555:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
dst_alloc+0xbb/0x1d0 net/core/dst.c:104
__ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
...
Freed by task 4555:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
dst_destroy+0x267/0x3c0 net/core/dst.c:140
dst_release_immediate+0x71/0x9e net/core/dst.c:205
fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
__ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
...
The problem is that rt_last can point to a deleted route if the insert
fails.
One reproducer is to insert a route and then add a multipath route that
has a duplicate nexthop.e.g,:
$ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
$ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2
Fix by not setting rt_last until the it is verified the insert succeeded.
Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure that XDP_SKB also uses the skb Tx path.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Here we add the functionality required to support zero-copy Tx, and
also exposes various zero-copy related functions for the netdevs.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Added ndo_xsk_async_xmit. This ndo "kicks" the netdev to start to pull
userland AF_XDP Tx frames from a NAPI context.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Extend the xsk_rcv to support the new MEM_TYPE_ZERO_COPY memory, and
wireup ndo_bpf call in bind.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Here, a new type of allocator support is added to the XDP return
API. A zero-copy allocated xdp_buff cannot be converted to an
xdp_frame. Instead is the buff has to be copied. This is not supported
at all in this commit.
Also, an opaque "handle" is added to xdp_buff. This can be used as a
context for the zero-copy allocator implementation.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Extend ndo_bpf with two new commands used for query zero-copy support
and register an UMEM to a queue_id of a netdev.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The xdp_umem_page holds the address for a page. Trade memory for
faster lookup. Later, we'll add DMA address here as well.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Moved struct xdp_umem to xdp_sock.h, in order to prepare for zero-copy
support.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
BCM54612E have 4 multi-functional LED pins that can be configured
through register setting; the LED4 pin can be configured to a 125MHz
reference clock output by setting the spare register. Since the dedicated
CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
pin is the only pin to provide such function in this package, and therefore
it is beneficial to just enable the reference clock by default.
Signed-off-by: Kun Yi <kunyi@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit d02ba2a6110c ("l2tp: fix race in pppol2tp_release with session
object destroy") tried to fix a race condition where a PPPoL2TP socket
would disappear while the L2TP session was still using it. However, it
missed the root issue which is that an L2TP session may accept to be
reconnected if its associated socket has entered the release process.
The tentative fix makes the session hold the socket it is connected to.
That saves the kernel from crashing, but introduces refcount leakage,
preventing the socket from completing the release process. Once stalled,
everything the socket depends on can't be released anymore, including
the L2TP session and the l2tp_ppp module.
The root issue is that, when releasing a connected PPPoL2TP socket, the
session's ->sk pointer (RCU-protected) is reset to NULL and we have to
wait for a grace period before destroying the socket. The socket drops
the session in its ->sk_destruct callback function, so the session
will exist until the last reference on the socket is dropped.
Therefore, there is a time frame where pppol2tp_connect() may accept
reconnecting a session, as it only checks ->sk to figure out if the
session is connected. This time frame is shortened by the fact that
pppol2tp_release() calls l2tp_session_delete(), making the session
unreachable before resetting ->sk. However, pppol2tp_connect() may
grab the session before it gets unhashed by l2tp_session_delete(), but
it may test ->sk after the later got reset. The race is not so hard to
trigger and syzbot found a pretty reliable reproducer:
https://syzkaller.appspot.com/bug?id=418578d2a4389074524e04d641eacb091961b2cf
Before d02ba2a6110c, another race could let pppol2tp_release()
overwrite the ->__sk pointer of an L2TP session, thus tricking
pppol2tp_put_sk() into calling sock_put() on a socket that is different
than the one for which pppol2tp_release() was originally called. To get
there, we had to trigger the race described above, therefore having one
PPPoL2TP socket being released, while the session it is connected to is
reconnecting to a different PPPoL2TP socket. When releasing this new
socket fast enough, pppol2tp_release() overwrites the session's
->__sk pointer with the address of the new socket, before the first
pppol2tp_put_sk() call gets scheduled. Then the pppol2tp_put_sk() call
invoked by the original socket will sock_put() the new socket,
potentially dropping its last reference. When the second
pppol2tp_put_sk() finally runs, its socket has already been freed.
With d02ba2a6110c, the session takes a reference on both sockets.
Furthermore, the session's ->sk pointer is reset in the
pppol2tp_session_close() callback function rather than in
pppol2tp_release(). Therefore, ->__sk can't be overwritten and
pppol2tp_put_sk() is called only once (l2tp_session_delete() will only
run pppol2tp_session_close() once, to protect the session against
concurrent deletion requests). Now pppol2tp_put_sk() will properly
sock_put() the original socket, but the new socket will remain, as
l2tp_session_delete() prevented the release process from completing.
Here, we don't depend on the ->__sk race to trigger the bug. Getting
into the pppol2tp_connect() race is enough to leak the reference, no
matter when new socket is released.
So it all boils down to pppol2tp_connect() failing to realise that the
session has already been connected. This patch drops the unneeded extra
reference counting (mostly reverting d02ba2a6110c) and checks that
neither ->sk nor ->__sk is set before allowing a session to be
connected.
Fixes: d02ba2a6110c ("l2tp: fix race in pppol2tp_release with session object destroy")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current implementation of MDIO bus PM ops doesn't actually implement
bus-specific PM ops but just calls PM ops defined on a device level
what doesn't seem to be fully in line with the core PM model.
When looking e.g. at __device_suspend() the PM core looks for PM ops
of a device in a specific order:
1. device PM domain
2. device type
3. device class
4. device bus
I think it has good reason that there's no PM ops on device level.
Now that a device type representation of PHY's as special type of MDIO
devices was added (only user of MDIO bus PM ops), the MDIO bus
PM ops can be removed including member pm of struct mdio_device.
If for some other type of MDIO device PM ops are needed, it should be
modeled as struct device_type as well.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A PHY is a type of MDIO device, so let's model it as struct device_type
and place PM ops, attribute groups and release callback on device type
level. For this the attribute definitions have to be moved.
This change allows us to get rid of the PM ops on a bus level in a second
step.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All drivers are cleaned up and no references to ndo_xdp_flush
are left in drivers, it is time to remove the net_device_ops
operation ndo_xdp_flush.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Remove the ndo_xdp_flush call implementation tun_xdp_flush
as no callers of ndo_xdp_flush are left.
The tun drivers XDP_TX implementation also used tun_xdp_flush (and
tun_xdp_xmit). This is easily solved by passing the XDP_XMIT_FLUSH
flag to tun_xdp_xmit in tun_xdp_tx.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Remove the ndo_xdp_flush call implementation virtnet_xdp_flush
as no callers of ndo_xdp_flush are left.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Remove the ndo_xdp_flush call implementation ixgbe_xdp_flush
as no callers of ndo_xdp_flush are left.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Remove the ndo_xdp_flush call implementation i40e_xdp_flush
as no callers of ndo_xdp_flush are left.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The names for BPF_ALU64 | BPF_ARSH are emit_a32_arsh_*,
the names for BPF_ALU64 | BPF_LSH are emit_a32_lsh_*, but
the names for BPF_ALU64 | BPF_RSH are emit_a32_lsr_*.
For consistence reason, let's rename emit_a32_lsr_* to
emit_a32_rsh_*.
This patch also corrects a wrong comment.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: Shubham Bansal <illusionist.neo@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@armlinux.org.uk
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
imm24 is signed, so the right range is:
[-(1<<(24 - 1)), (1<<(24 - 1)) - 1]
Note: this patch also fix a typo.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: Shubham Bansal <illusionist.neo@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@armlinux.org.uk
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If the multicast mask value in device tree is configured not all
0xff, the broadcast mac will be lost from tcam table after the
execution of command 'ifconfig up'. The address is appended by
hns_ae_start, but will be clear later by hns_nic_set_rx_mode
called in dev_open process.
This patch fixed it by not use the multicast mask when add a
broadcast address.
Fixes: b5996f11ea54 ("net: add Hisilicon Network Subsystem basic ethernet support")
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If requested tcf proto is not found, get and del filter netlink protocol
handlers output error message to extack, but do not return actual error
code. Add check to return ENOENT when result of tp find function is NULL
pointer.
Fixes: c431f89b18a2 ("net: sched: split tc_ctl_tfilter into three handlers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This code was introduced in 2011 around the same time that we made
netdev_features_t a u64 type. These days a u32 is not big enough to
hold all the potential features.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The features mask needs to be a netdev_features_t (u64) because a u32
is not big enough.
Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes some typos/misspelling errors in the
Documentation/networking files.
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is not safe to do so because such sockets are already in the
hash tables and changing these options can result in invalidating
the tb->fastreuse(port) caching.
This can have later far reaching consequences wrt. bind conflict checks
which rely on these caches (for optimization purposes).
Not to mention that you can currently end up with two identical
non-reuseport listening sockets bound to the same local ip:port
by clearing reuseport on them after they've already both been bound.
There is unfortunately no EISBOUND error or anything similar,
and EISCONN seems to be misleading for a bound-but-not-connected
socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT
is the closest you can get to meaning 'socket in bad state'.
(although perhaps EINVAL wouldn't be a bad choice either?)
This does unfortunately run the risk of breaking buggy
userspace programs...
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
to an integer.
It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
while 2 enables timewait socket reuse only for sockets that we can
prove are loopback connections:
ie. bound to 'lo' interface or where one of source or destination
IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.
This enables quicker reuse of ephemeral ports for loopback connections
- where tcp_tw_reuse is 100% safe from a protocol perspective
(this assumes no artificially induced packet loss on 'lo').
This also makes estblishing many loopback connections *much* faster
(allocating ports out of the first half of the ephemeral port range
is significantly faster, then allocating from the second half)
Without this change in a 32K ephemeral port space my sample program
(it just establishes and closes [::1]:ephemeral -> [::1]:server_port
connections in a tight loop) fails after 32765 connections in 24 seconds.
With it enabled 50000 connections only take 4.7 seconds.
This is particularly problematic for IPv6 where we only have one local
address and cannot play tricks with varying source IP from 127.0.0.0/8
pool.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Wei Wang <weiwan@google.com>
Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the checkpatch issue of NULL comparison. Replace x == NULL
with !x, by using the following coccinelle script:
@disable is_null@
expression e;
@@
-e==NULL
+!e
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|