aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-08-19net: flow_offload: convert block_ing_cb_list to regular list typeVlad Buslov1-6/+7
RCU list block_ing_cb_list is protected by rcu read lock in flow_block_ing_cmd() and with flow_indr_block_ing_cb_lock mutex in all functions that use it. However, flow_block_ing_cmd() needs to call blocking functions while iterating block_ing_cb_list which leads to following suspicious RCU usage warning: [ 401.510948] ============================= [ 401.510952] WARNING: suspicious RCU usage [ 401.510993] 5.3.0-rc3+ #589 Not tainted [ 401.510996] ----------------------------- [ 401.511001] include/linux/rcupdate.h:265 Illegal context switch in RCU read-side critical section! [ 401.511004] other info that might help us debug this: [ 401.511008] rcu_scheduler_active = 2, debug_locks = 1 [ 401.511012] 7 locks held by test-ecmp-add-v/7576: [ 401.511015] #0: 00000000081d71a5 (sb_writers#4){.+.+}, at: vfs_write+0x166/0x1d0 [ 401.511037] #1: 000000002bd338c3 (&of->mutex){+.+.}, at: kernfs_fop_write+0xef/0x1b0 [ 401.511051] #2: 00000000c921c634 (kn->count#317){.+.+}, at: kernfs_fop_write+0xf7/0x1b0 [ 401.511062] #3: 00000000a19cdd56 (&dev->mutex){....}, at: sriov_numvfs_store+0x6b/0x130 [ 401.511079] #4: 000000005425fa52 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x30/0x140 [ 401.511092] #5: 00000000c5822793 (rtnl_mutex){+.+.}, at: unregister_netdevice_notifier+0x35/0x140 [ 401.511101] #6: 00000000c2f3507e (rcu_read_lock){....}, at: flow_block_ing_cmd+0x5/0x130 [ 401.511115] stack backtrace: [ 401.511121] CPU: 21 PID: 7576 Comm: test-ecmp-add-v Not tainted 5.3.0-rc3+ #589 [ 401.511124] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 401.511127] Call Trace: [ 401.511138] dump_stack+0x85/0xc0 [ 401.511146] ___might_sleep+0x100/0x180 [ 401.511154] __mutex_lock+0x5b/0x960 [ 401.511162] ? find_held_lock+0x2b/0x80 [ 401.511173] ? __tcf_get_next_chain+0x1d/0xb0 [ 401.511179] ? mark_held_locks+0x49/0x70 [ 401.511194] ? __tcf_get_next_chain+0x1d/0xb0 [ 401.511198] __tcf_get_next_chain+0x1d/0xb0 [ 401.511251] ? uplink_rep_async_event+0x70/0x70 [mlx5_core] [ 401.511261] tcf_block_playback_offloads+0x39/0x160 [ 401.511276] tcf_block_setup+0x1b0/0x240 [ 401.511312] ? mlx5e_rep_indr_setup_tc_cb+0xca/0x290 [mlx5_core] [ 401.511347] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511359] tc_indr_block_get_and_ing_cmd+0x11b/0x1e0 [ 401.511404] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511414] flow_block_ing_cmd+0x7e/0x130 [ 401.511453] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511462] __flow_indr_block_cb_unregister+0x7f/0xf0 [ 401.511502] mlx5e_nic_rep_netdevice_event+0x75/0xb0 [mlx5_core] [ 401.511513] unregister_netdevice_notifier+0xe9/0x140 [ 401.511554] mlx5e_cleanup_rep_tx+0x6f/0xe0 [mlx5_core] [ 401.511597] mlx5e_detach_netdev+0x4b/0x60 [mlx5_core] [ 401.511637] mlx5e_vport_rep_unload+0x71/0xc0 [mlx5_core] [ 401.511679] esw_offloads_disable+0x5b/0x90 [mlx5_core] [ 401.511724] mlx5_eswitch_disable.cold+0xdf/0x176 [mlx5_core] [ 401.511759] mlx5_device_disable_sriov+0xab/0xb0 [mlx5_core] [ 401.511794] mlx5_core_sriov_configure+0xaf/0xd0 [mlx5_core] [ 401.511805] sriov_numvfs_store+0xf8/0x130 [ 401.511817] kernfs_fop_write+0x122/0x1b0 [ 401.511826] vfs_write+0xdb/0x1d0 [ 401.511835] ksys_write+0x65/0xe0 [ 401.511847] do_syscall_64+0x5c/0xb0 [ 401.511857] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 401.511862] RIP: 0033:0x7fad892d30f8 [ 401.511868] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 96 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 [ 401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8 [ 401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001 [ 401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a [ 401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002 [ 401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740 To fix the described incorrect RCU usage, convert block_ing_cb_list from RCU list to regular list and protect it with flow_indr_block_ing_cb_lock mutex in flow_block_ing_cmd(). Fixes: 1150ab0f1b33 ("flow_offload: support get multi-subsystem block") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller39-196/+429
Merge conflict of mlx5 resolved using instructions in merge commit 9566e650bf7fdf58384bb06df634f7531ca3a97e. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18netfilter: nf_tables: map basechain priority to hardware priorityPablo Neira Ayuso2-3/+18
This patch adds initial support for offloading basechains using the priority range from 1 to 65535. This is restricting the netfilter priority range to 16-bit integer since this is what most drivers assume so far from tc. It should be possible to extend this range of supported priorities later on once drivers are updated to support for 32-bit integer priorities. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18tipc: clean up skb list lock handling on send pathJon Maloy6-25/+26
The policy for handling the skb list locks on the send and receive paths is simple. - On the send path we never need to grab the lock on the 'xmitq' list when the destination is an exernal node. - On the receive path we always need to grab the lock on the 'inputq' list, irrespective of source node. However, when transmitting node local messages those will eventually end up on the receive path of a local socket, meaning that the argument 'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the function tipc_sk_rcv(). This has been handled by always initializing the spinlock of the 'xmitq' list at message creation, just in case it may end up on the receive path later, and despite knowing that the lock in most cases never will be used. This approach is inaccurate and confusing, and has also concealed the fact that the stated 'no lock grabbing' policy for the send path is violated in some cases. We now clean up this by never initializing the lock at message creation, instead doing this at the moment we find that the message actually will enter the receive path. At the same time we fix the four locations where we incorrectly access the spinlock on the send/error path. This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock is initialised") which has now become redundant. CC: Eric Dumazet <edumazet@google.com> Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17devlink: Add generic packet traps and groupsIdo Schimmel1-0/+12
Add generic packet traps and groups that can report dropped packets as well as exceptions such as TTL error. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17devlink: Add packet trap infrastructureIdo Schimmel2-5/+1064
Add the basic packet trap infrastructure that allows device drivers to register their supported packet traps and trap groups with devlink. Each driver is expected to provide basic information about each supported trap, such as name and ID, but also the supported metadata types that will accompany each packet trapped via the trap. The currently supported metadata type is just the input port, but more will be added in the future. For example, output port and traffic class. Trap groups allow users to set the action of all member traps. In addition, users can retrieve per-group statistics in case per-trap statistics are too narrow. In the future, the trap group object can be extended with more attributes, such as policer settings which will limit the amount of traffic generated by member traps towards the CPU. Beside registering their packet traps with devlink, drivers are also expected to report trapped packets to devlink along with relevant metadata. devlink will maintain packets and bytes statistics for each packet trap and will potentially report the trapped packet with its metadata to user space via drop monitor netlink channel. The interface towards the drivers is simple and allows devlink to set the action of the trap. Currently, only two actions are supported: 'trap' and 'drop'. When set to 'trap', the device is expected to provide the sole copy of the packet to the driver which will pass it to devlink. When set to 'drop', the device is expected to drop the packet and not send a copy to the driver. In the future, more actions can be added, such as 'mirror'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Allow user to start monitoring hardware dropsIdo Schimmel1-3/+121
Drop monitor has start and stop commands, but so far these were only used to start and stop monitoring of software drops. Now that drop monitor can also monitor hardware drops, we should allow the user to control these as well. Do that by adding SW and HW flags to these commands. If no flag is specified, then only start / stop monitoring software drops. This is done in order to maintain backward-compatibility with existing user space applications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Add support for summary alert mode for hardware dropsIdo Schimmel1-2/+193
In summary alert mode a notification is sent with a list of recent drop reasons and a count of how many packets were dropped due to this reason. To avoid expensive operations in the context in which packets are dropped, each CPU holds an array whose number of entries is the maximum number of drop reasons that can be encoded in the netlink notification. Each entry stores the drop reason and a count. When a packet is dropped the array is traversed and a new entry is created or the count of an existing entry is incremented. Later, in process context, the array is replaced with a newly allocated copy and the old array is encoded in a netlink notification. To avoid breaking user space, the notification includes the ancillary header, which is 'struct net_dm_alert_msg' with number of entries set to '0'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Add support for packet alert mode for hardware dropsIdo Schimmel1-4/+300
In a similar fashion to software drops, extend drop monitor to send netlink events when packets are dropped by the underlying hardware. The main difference is that instead of encoding the program counter (PC) from which kfree_skb() was called in the netlink message, we encode the hardware trap name. The two are mostly equivalent since they should both help the user understand why the packet was dropped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Consider all monitoring states before performing configurationIdo Schimmel1-2/+7
The drop monitor configuration (e.g., alert mode) is global, but user will be able to enable monitoring of only software or hardware drops. Therefore, ensure that monitoring of both software and hardware drops are disabled before allowing drop monitor configuration to take place. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Add basic infrastructure for hardware dropsIdo Schimmel1-0/+28
Export a function that can be invoked in order to report packets that were dropped by the underlying hardware along with metadata. Subsequent patches will add support for the different alert modes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Initialize hardware per-CPU dataIdo Schimmel1-2/+23
Like software drops, hardware drops also need the same type of per-CPU data. Therefore, initialize it during module initialization and de-initialize it during module exit. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Move per-CPU data init/fini to separate functionsIdo Schimmel1-17/+36
Currently drop monitor only reports software drops to user space, but subsequent patches are going to add support for hardware drops. Like software drops, the per-CPU data of hardware drops needs to be initialized and de-initialized upon module initialization and exit. To avoid code duplication, break this code into separate functions, so that these could be re-used for hardware drops. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetoothDavid S. Miller4-3/+40
Johan Hedberg says: ==================== pull request: bluetooth 2019-08-17 Here's a set of Bluetooth fixes for the 5.3-rc series: - Multiple fixes for Qualcomm (btqca & hci_qca) drivers - Minimum encryption key size debugfs setting (this is required for Bluetooth Qualification) - Fix hidp_send_message() to have a meaningful return value ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17net: bridge: mdb: allow add/delete for host-joined groupsNikolay Aleksandrov3-30/+80
Currently this is needed only for user-space compatibility, so similar object adds/deletes as the dumped ones would succeed. Later it can be used for L2 mcast MAC add/delete. v3: fix compiler warning (DaveM) v2: don't send a notification when used from user-space, arm the group timer if no ports are left after host entry del Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17net: bridge: mdb: dump host-joined entries as wellNikolay Aleksandrov1-10/+31
Currently we dump only the port mdb entries but we can have host-joined entries on the bridge itself and they should be treated as normal temp mdbs, they're already notified: $ bridge monitor all [MDB]dev br0 port br0 grp ff02::8 temp The group will not be shown in the bridge mdb output, but it takes 1 slot and it's timing out. If it's only host-joined then the mdb show output can even be empty. After this patch we show the host-joined groups: $ bridge mdb show dev br0 port br0 grp ff02::8 temp Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17net: bridge: mdb: factor out mdb fillingNikolay Aleksandrov1-31/+37
We have to factor out the mdb fill portion in order to re-use it later for the bridge mdb entries. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17net: bridge: mdb: move vlan commentsNikolay Aleksandrov1-6/+6
Trivial patch to move the vlan comments in their proper places above the vid 0 checks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17net: dsa: remove calls to genphy_config_initHeiner Kallweit1-5/+0
Supported PHY features are either auto-detected or explicitly set. In both cases calling genphy_config_init isn't needed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17Bluetooth: Add debug setting for changing minimum encryption key sizeMarcel Holtmann3-1/+33
For testing and qualification purposes it is useful to allow changing the minimum encryption key size value that the host stack is going to enforce. This adds a new debugfs setting min_encrypt_key_size to achieve this functionality. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-08-16tipc: fix false detection of retransmit failuresTuong Lien2-43/+57
This commit eliminates the use of the link 'stale_limit' & 'prev_from' (besides the already removed - 'stale_cnt') variables in the detection of repeated retransmit failures as there is no proper way to initialize them to avoid a false detection, i.e. it is not really a retransmission failure but due to a garbage values in the variables. Instead, a jiffies variable will be added to individual skbs (like the way we restrict the skb retransmissions) in order to mark the first skb retransmit time. Later on, at the next retransmissions, the timestamp will be checked to see if the skb in the link transmq is "too stale", that is, the link tolerance time has passed, so that a link reset will be ordered. Note, just checking on the first skb in the queue is fine enough since it must be the oldest one. A counter is also added to keep track the actual skb retransmissions' number for later checking when the failure happens. The downside of this approach is that the skb->cb[] buffer is about to be exhausted, however it is always able to allocate another memory area and keep a reference to it when needed. Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria") Reported-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15Merge tag 'rxrpc-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsDavid S. Miller1-10/+11
David Howells says: ==================== rxrpc: Fix local endpoint handling Here's a pair of patches that fix two issues in the handling of local endpoints (rxrpc_local structs): (1) Use list_replace_init() rather than list_replace() if we're going to unconditionally delete the replaced item later, lest the list get corrupted. (2) Don't access the rxrpc_local object after passing our ref to the workqueue, not even to illuminate tracepoints, as the work function may cause the object to be freed. We have to cache the information beforehand. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-28/+98
Pablo Neira Ayuso says: ==================== Netfilter fixes for net This patchset contains Netfilter fixes for net: 1) Extend selftest to cover flowtable with ipsec, from Florian Westphal. 2) Fix interaction of ipsec with flowtable, also from Florian. 3) User-after-free with bound set to rule that fails to load. 4) Adjust state and timeout for flows that expire. 5) Timeout update race with flows in teardown state. 6) Ensure conntrack id hash calculation use invariants as input, from Dirk Morris. 7) Do not push flows into flowtable for TCP fin/rst packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15net/packet: fix race in tpacket_snd()Eric Dumazet1-0/+7
packet_sendmsg() checks tx_ring.pg_vec to decide if it must call tpacket_snd(). Problem is that the check is lockless, meaning another thread can issue a concurrent setsockopt(PACKET_TX_RING ) to flip tx_ring.pg_vec back to NULL. Given that tpacket_snd() grabs pg_vec_lock mutex, we can perform the check again to solve the race. syzbot reported : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 11429 Comm: syz-executor394 Not tainted 5.3.0-rc4+ #101 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:packet_lookup_frame+0x8d/0x270 net/packet/af_packet.c:474 Code: c1 ee 03 f7 73 0c 80 3c 0e 00 0f 85 cb 01 00 00 48 8b 0b 89 c0 4c 8d 24 c1 48 b8 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 <80> 3c 01 00 0f 85 94 01 00 00 48 8d 7b 10 4d 8b 3c 24 48 b8 00 00 RSP: 0018:ffff88809f82f7b8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8880a45c7030 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffff110148b8e06 RDI: ffff8880a45c703c RBP: ffff88809f82f7e8 R08: ffff888087aea200 R09: fffffbfff134ae50 R10: fffffbfff134ae4f R11: ffffffff89a5727f R12: 0000000000000000 R13: 0000000000000001 R14: ffff8880a45c6ac0 R15: 0000000000000000 FS: 00007fa04716f700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa04716edb8 CR3: 0000000091eb4000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: packet_current_frame net/packet/af_packet.c:487 [inline] tpacket_snd net/packet/af_packet.c:2667 [inline] packet_sendmsg+0x590/0x6250 net/packet/af_packet.c:2975 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311 __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg net/socket.c:2439 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15Merge tag 'linux-can-next-for-5.4-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-nextDavid S. Miller5-362/+436
Marc Kleine-Budde says: ==================== pull-request: can-next 2019-08-14 this is a pull request for net-next/master consisting of 41 patches. The first two patches are for the kvaser_pciefd driver: Christer Beskow removes unnecessary code in the kvaser_pciefd_pwm_stop() function, YueHaibing removes the unused including of <linux/version.h>. In the next patch YueHaibing also removes the unused including of <linux/version.h> in the f81601 driver. In the ti_hecc driver the next 6 patches are by me and fix checkpatch warnings. YueHaibing's patch removes an unused variable in the ti_hecc_mailbox_read() function. The next 6 patches all target the xilinx_can driver. Anssi Hannula's patch fixes a chip start failure with an invalid bus. The patch by Venkatesh Yadav Abbarapu skips an error message in case of a deferred probe. The 3 patches by Appana Durga Kedareswara rao fix the RX and TX path for CAN-FD frames. Srinivas Neeli's patch fixes the bit timing calculations for CAN-FD. The next 12 patches are by me and several checkpatch warnings in the af_can, raw and bcm components. Thomas Gleixner provides a patch for the bcm, which switches the timer to HRTIMER_MODE_SOFT and removes the hrtimer_tasklet. Then 6 more patches by me for the gw component, which fix checkpatch warnings, followed by 2 patches by Oliver Hartkopp to add CAN-FD support. The vcan driver gets 3 patches by me, fixing checkpatch warnings. And finally a patch by Andre Hartmann to fix typos in CAN's netlink header. ====================
2019-08-15net: tls, fix sk_write_space NULL write when tx disabledJohn Fastabend1-1/+2
The ctx->sk_write_space pointer is only set when TLS tx mode is enabled. When running without TX mode its a null pointer but we still set the sk sk_write_space pointer on close(). Fix the close path to only overwrite sk->sk_write_space when the current pointer is to the tls_write_space function indicating the tls module should clean it up properly as well. Reported-by: Hillf Danton <hdanton@sina.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Andrey Konovalov <andreyknvl@google.com> Fixes: 57c722e932cfb ("net/tls: swap sk_write_space on close") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15page_pool: fix logic in __page_pool_get_cachedJonathan Lemon1-23/+16
__page_pool_get_cached() will return NULL when the ring is empty, even if there are pages present in the lookaside cache. It is also possible to refill the cache, and then return a NULL page. Restructure the logic so eliminate both cases. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15rds: check for excessive looping in rds_send_xmitAndy Grover3-1/+14
Original commit from 2011 updated to include a change by Yuval Shaia <yuval.shaia@oracle.com> that adds a new statistic counter "send_stuck_rm" to capture the messages looping exessively in the send path. Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15net/rds: Add a few missing rds_stat_names entriesGerd Rausch1-0/+2
In a previous commit, fields were added to "struct rds_statistics" but array "rds_stat_names" was not updated accordingly. Please note the inconsistent naming of the string representations that is done in the name of compatibility with the Oracle internal code-base. s_recv_bytes_added_to_socket -> "recv_bytes_added_to_sock" s_recv_bytes_removed_from_socket -> "recv_bytes_freed_fromsock" Fixes: 192a798f5299 ("RDS: add stat for socket recv memory usage") Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15RDS: don't use GFP_ATOMIC for sk_alloc in rds_createChris Mason1-1/+1
Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15RDS: limit the number of times we loop in rds_send_xmitChris Mason1-1/+11
This will kick the RDS worker thread if we have been looping too long. Original commit from 2012 updated to include a change by Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> that triggers "must_wake" if "rds_ib_recv_refill_one" fails. Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15page_pool: remove unnecessary variable initYunsheng Lin1-1/+1
Remove variable initializations in functions that are followed by assignments before use Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15net/rds: Add RDS6_INFO_SOCKETS and RDS6_INFO_RECV_MESSAGES optionsKa-Cheong Poon1-3/+90
Add support of the socket options RDS6_INFO_SOCKETS and RDS6_INFO_RECV_MESSAGES which update the RDS_INFO_SOCKETS and RDS_INFO_RECV_MESSAGES options respectively. The old options work for IPv4 sockets only. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-14netfilter: nft_bitwise: Adjust parentheses to fix memcmp size argumentNathan Chancellor1-2/+2
clang warns: net/netfilter/nft_bitwise.c:138:50: error: size argument in 'memcmp' call is a comparison [-Werror,-Wmemsize-comparison] if (memcmp(&priv->xor, &zero, sizeof(priv->xor) || ~~~~~~~~~~~~~~~~~~^~ net/netfilter/nft_bitwise.c:138:6: note: did you mean to compare the result of 'memcmp' instead? if (memcmp(&priv->xor, &zero, sizeof(priv->xor) || ^ ) net/netfilter/nft_bitwise.c:138:32: note: explicitly cast the argument to size_t to silence this warning if (memcmp(&priv->xor, &zero, sizeof(priv->xor) || ^ (size_t)( 1 error generated. Adjust the parentheses so that the result of the sizeof is used for the size argument in memcmp, rather than the result of the comparison (which would always be true because sizeof is a non-zero number). Fixes: bd8699e9e292 ("netfilter: nft_bitwise: add offload support") Link: https://github.com/ClangBuiltLinux/linux/issues/638 Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-14rxrpc: Fix read-after-free in rxrpc_queue_local()David Howells1-9/+10
rxrpc_queue_local() attempts to queue the local endpoint it is given and then, if successful, prints a trace line. The trace line includes the current usage count - but we're not allowed to look at the local endpoint at this point as we passed our ref on it to the workqueue. Fix this by reading the usage count before queuing the work item. Also fix the reading of local->debug_id for trace lines, which must be done with the same consideration as reading the usage count. Fixes: 09d2bf595db4 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting") Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-14rxrpc: Fix local endpoint replacementDavid Howells1-1/+1
When a local endpoint (struct rxrpc_local) ceases to be in use by any AF_RXRPC sockets, it starts the process of being destroyed, but this doesn't cause it to be removed from the namespace endpoint list immediately as tearing it down isn't trivial and can't be done in softirq context, so it gets deferred. If a new socket comes along that wants to bind to the same endpoint, a new rxrpc_local object will be allocated and rxrpc_lookup_local() will use list_replace() to substitute the new one for the old. Then, when the dying object gets to rxrpc_local_destroyer(), it is removed unconditionally from whatever list it is on by calling list_del_init(). However, list_replace() doesn't reset the pointers in the replaced list_head and so the list_del_init() will likely corrupt the local endpoints list. Fix this by using list_replace_init() instead. Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting") Reported-by: syzbot+193e29e9387ea5837f1d@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-14netfilter: nft_flow_offload: skip tcp rst and fin packetsPablo Neira Ayuso1-3/+6
TCP rst and fin packets do not qualify to place a flow into the flowtable. Most likely there will be no more packets after connection closure. Without this patch, this flow entry expires and connection tracking picks up the entry in ESTABLISHED state using the fixup timeout, which makes this look inconsistent to the user for a connection that is actually already closed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-13sctp: fix memleak in sctp_send_reset_streamszhengbin1-0/+1
If the stream outq is not empty, need to kfree nstr_list. Fixes: d570a59c5b5f ("sctp: only allow the out stream reset when the stream outq is empty") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13sctp: fix the transport error_count checkXin Long1-1/+1
As the annotation says in sctp_do_8_2_transport_strike(): "If the transport error count is greater than the pf_retrans threshold, and less than pathmaxrtx ..." It should be transport->error_count checked with pathmaxrxt, instead of asoc->pf_retrans. Fixes: 5aa93bcf66f4 ("sctp: Implement quick failover draft from tsvwg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski24-91/+117
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Rename mss field to mss_option field in synproxy, from Fernando Mancera. 2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce. 3) More strict validation of IPVS sysctl values, from Junwei Hu. 4) Remove unnecessary spaces after on the right hand side of assignments, from yangxingwu. 5) Add offload support for bitwise operation. 6) Extend the nft_offload_reg structure to store immediate date. 7) Collapse several ip_set header files into ip_set.h, from Jeremy Sowden. 8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y, from Jeremy Sowden. 9) Fix several sparse warnings due to missing prototypes, from Valdis Kletnieks. 10) Use static lock initialiser to ensure connlabel spinlock is initialized on boot time to fix sched/act_ct.c, patch from Florian Westphal. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13net: devlink: remove redundant rtnl lock assertVlad Buslov1-3/+2
It is enough for caller of devlink_compat_switch_id_get() to hold the net device to guarantee that devlink port is not destroyed concurrently. Remove rtnl lock assertion and modify comment to warn user that they must hold either rtnl lock or reference to net device. This is necessary to accommodate future implementation of rtnl-unlocked TC offloads driver callbacks. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski7-14/+257
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. There is a small merge conflict in libbpf (Cc Andrii so he's in the loop as well): for (i = 1; i <= btf__get_nr_types(btf); i++) { t = (struct btf_type *)btf__type_by_id(btf, i); if (!has_datasec && btf_is_var(t)) { /* replace VAR with INT */ t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0); <<<<<<< HEAD /* * using size = 1 is the safest choice, 4 will be too * big and cause kernel BTF validation failure if * original variable took less than 4 bytes */ t->size = 1; *(int *)(t+1) = BTF_INT_ENC(0, 0, 8); } else if (!has_datasec && kind == BTF_KIND_DATASEC) { ======= t->size = sizeof(int); *(int *)(t + 1) = BTF_INT_ENC(0, 0, 32); } else if (!has_datasec && btf_is_datasec(t)) { >>>>>>> 72ef80b5ee131e96172f19e74b4f98fa3404efe8 /* replace DATASEC with STRUCT */ Conflict is between the two commits 1d4126c4e119 ("libbpf: sanitize VAR to conservative 1-byte INT") and b03bc6853c0e ("libbpf: convert libbpf code to use new btf helpers"), so we need to pick the sanitation fixup as well as use the new btf_is_datasec() helper and the whitespace cleanup. Looks like the following: [...] if (!has_datasec && btf_is_var(t)) { /* replace VAR with INT */ t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0); /* * using size = 1 is the safest choice, 4 will be too * big and cause kernel BTF validation failure if * original variable took less than 4 bytes */ t->size = 1; *(int *)(t + 1) = BTF_INT_ENC(0, 0, 8); } else if (!has_datasec && btf_is_datasec(t)) { /* replace DATASEC with STRUCT */ [...] The main changes are: 1) Addition of core parts of compile once - run everywhere (co-re) effort, that is, relocation of fields offsets in libbpf as well as exposure of kernel's own BTF via sysfs and loading through libbpf, from Andrii. More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2 and http://vger.kernel.org/lpc-bpf2018.html#session-2 2) Enable passing input flags to the BPF flow dissector to customize parsing and allowing it to stop early similar to the C based one, from Stanislav. 3) Add a BPF helper function that allows generating SYN cookies from XDP and tc BPF, from Petar. 4) Add devmap hash-based map type for more flexibility in device lookup for redirects, from Toke. 5) Improvements to XDP forwarding sample code now utilizing recently enabled devmap lookups, from Jesper. 6) Add support for reporting the effective cgroup progs in bpftool, from Jakub and Takshak. 7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter. 8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan. 9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei. 10) Add perf event output helper also for other skb-based program types, from Allan. 11) Fix a co-re related compilation error in selftests, from Yonghong. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13devlink: send notifications for deleted snapshots on region destroyJiri Pirko1-11/+12
Currently the notifications for deleted snapshots are sent only in case user deletes a snapshot manually. Send the notifications in case region is destroyed too. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13netfilter: conntrack: Use consistent ct id hash calculationDirk Morris1-8/+8
Change ct id hash calculation to only use invariants. Currently the ct id hash calculation is based on some fields that can change in the lifetime on a conntrack entry in some corner cases. The current hash uses the whole tuple which contains an hlist pointer which will change when the conntrack is placed on the dying list resulting in a ct id change. This patch also removes the reply-side tuple and extension pointer from the hash calculation so that the ct id will will not change from initialization until confirmation. Fixes: 3c79107631db1f7 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id") Signed-off-by: Dirk Morris <dmorris@metaloft.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-13can: gw: add support for CAN FD framesOliver Hartkopp1-27/+187
Introduce CAN FD support which needs an extension of the netlink API to pass CAN FD type content to the kernel which has a different size to Classic CAN. Additionally the struct canfd_frame has a new 'flags' element that can now be modified with can-gw. The new CGW_FLAGS_CAN_FD option flag defines whether the routing job handles Classic CAN or CAN FD frames. This setting is very strict at reception time and enables the new possibilities, e.g. CGW_FDMOD_* and modifying the flags element of struct canfd_frame, only when CGW_FLAGS_CAN_FD is set. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13can: gw: use struct canfd_frame as internal data structureOliver Hartkopp1-105/+117
To prepare the CAN FD support this patch implements the first adaptions in data structures for CAN FD without changing the current functionality. Additionally some code at the end of this patch is moved or indented to simplify the review of the next implementation step. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13can: gw: cgw_parse_attr(): remove unnecessary braces for single statement blockMarc Kleine-Budde1-2/+1
This patch removes some unnecessary braces for a single statement block. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13can: gw: cgw_dump_jobs(): avoid long linesMarc Kleine-Budde1-2/+3
This patch rewraps the arguments of cgw_put_job() to avoid long lines, which also fixes the indention. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13can: gw: can_can_gw_rcv(): remove return at end of void functionMarc Kleine-Budde1-1/+0
This patch remove the return at the end of the void function can_can_gw_rcv(). Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-08-13can: gw: add missing spaces around operatorsMarc Kleine-Budde1-18/+18
This patch add missing spaces around the '^' and '+' operators. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>