aboutsummaryrefslogtreecommitdiffstats
path: root/net/core/dev.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-05-18net: Fix a bug in removing queues from XPS mapAmritha Nambiar1-1/+1
While removing queues from the XPS map, the individual CPU ID alone was used to index the CPUs map, this should be changed to also factor in the traffic class mapping for the CPU-to-queue lookup. Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multiToshiaki Makita1-1/+1
Syzkaller spotted an old bug which leads to reading skb beyond tail by 4 bytes on vlan tagged packets. This is caused because skb_vlan_tagged_multi() did not check skb_headlen. BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline] BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline] BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline] BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 eth_type_vlan include/linux/if_vlan.h:283 [inline] skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] vlan_features_check include/linux/if_vlan.h:672 [inline] dflt_features_check net/core/dev.c:2949 [inline] netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084 __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43ffa9 RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9 RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0 R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 packet_alloc_skb net/packet/af_packet.c:2803 [inline] packet_snd net/packet/af_packet.c:2894 [inline] packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 58e998c6d239 ("offloading: Force software GSO for multiple vlan tags.") Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-05net: fool proof dev_valid_name()Eric Dumazet1-1/+1
We want to use dev_valid_name() to validate tunnel names, so better use strnlen(name, IFNAMSIZ) than strlen(name) to make sure to not upset KASAN. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: Remove net_rwsem from {, un}register_netdevice_notifier()Kirill Tkhai1-5/+0
These functions take net_rwsem, while wireless_nlevent_flush() also takes it. But down_read() can't be taken recursive, because of rw_semaphore design, which prevents it to be occupied by only readers forever. Since we take pernet_ops_rwsem in {,un}register_netdevice_notifier(), net list can't change, so these down_read()/up_read() can be removed. Fixes: f0b07bb151b0 "net: Introduce net_rwsem to protect net_namespace_list" Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30net: Close race between {un, }register_netdevice_notifier() and setup_net()/cleanup_net()Kirill Tkhai1-0/+6
{un,}register_netdevice_notifier() iterate over all net namespaces hashed to net_namespace_list. But pernet_operations register and unregister netdevices in unhashed net namespace, and they are not seen for netdevice notifiers. This results in asymmetry: 1)Race with register_netdevice_notifier() pernet_operations::init(net) ... register_netdevice() ... call_netdevice_notifiers() ... ... nb is not called ... ... register_netdevice_notifier(nb) -> net skipped ... ... list_add_tail(&net->list, ..) ... Then, userspace stops using net, and it's destructed: pernet_operations::exit(net) unregister_netdevice() call_netdevice_notifiers() ... nb is called ... This always happens with net::loopback_dev, but it may be not the only device. 2)Race with unregister_netdevice_notifier() pernet_operations::init(net) register_netdevice() call_netdevice_notifiers() ... nb is called ... Then, userspace stops using net, and it's destructed: list_del_rcu(&net->list) ... pernet_operations::exit(net) unregister_netdevice_notifier(nb) -> net skipped dev_change_net_namespace() ... call_netdevice_notifiers() ... nb is not called ... unregister_netdevice() call_netdevice_notifiers() ... nb is not called ... This race is more danger, since dev_change_net_namespace() moves real network devices, which use not trivial netdevice notifiers, and if this will happen, the system will be left in unpredictable state. The patch closes the race. During the testing I found two places, where register_netdevice_notifier() is called from pernet init/exit methods (which led to deadlock) and fixed them (see previous patches). The review moved me to one more unusual registration place: raw_init() (can driver). It may be a reason of problems, if someone creates in-kernel CAN_RAW sockets, since they will be destroyed in exit method and raw_release() will call unregister_netdevice_notifier(). But grep over kernel tree does not show, someone creates such sockets from kernel space. Theoretically, there can be more places like this, and which are hidden from review, but we found them on the first bumping there (since there is no a race, it will be 100% reproducible). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30net: Call add/kill vid ndo on vlan filter feature togglingGal Pressman1-0/+20
NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit flip in dev->features in order to keep the driver in a consistent state. These features notify the driver of each added/removed vlan, but toggling of vlan-filter does not notify the driver accordingly for each of the existing vlans. This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT behavior (which notifies the driver about UDP ports in the same manner that vids are reported). Each toggling of the features propagates to the 8021q module, which iterates over the vlans and call add/kill ndo accordingly. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29net: Introduce net_rwsem to protect net_namespace_listKirill Tkhai1-0/+5
rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai1-2/+0
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: fix possible out-of-bound read in skb_network_protocol()Eric Dumazet1-1/+1
skb mac header is not necessarily set at the time skb_network_protocol() is called. Use skb->data instead. BUG: KASAN: slab-out-of-bounds in skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739 Read of size 2 at addr ffff8801b3097a0b by task syz-executor5/14242 CPU: 1 PID: 14242 Comm: syz-executor5 Not tainted 4.16.0-rc6+ #280 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:443 skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739 harmonize_features net/core/dev.c:2924 [inline] netif_skb_features+0x509/0x9b0 net/core/dev.c:3011 validate_xmit_skb+0x81/0xb00 net/core/dev.c:3084 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3142 packet_direct_xmit+0x117/0x790 net/packet/af_packet.c:256 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x3aed/0x60b0 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047 __sys_sendmsg+0xe5/0x210 net/socket.c:2081 Fixes: 19acc327258a ("gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@ovn.org> Reported-by: Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Drop NETDEV_UNREGISTER_FINALKirill Tkhai1-7/+0
Last user is gone after bdf5bd7f2132 "rds: tcp: remove register_netdevice_notifier infrastructure.", so we can remove this netdevice command. This allows to delete rtnl_lock() in netdev_run_todo(), which is hot path for net namespace unregistration. dev_change_net_namespace() and netdev_wait_allrefs() have rcu_barrier() before NETDEV_UNREGISTER_FINAL call, and the source commits say they were introduced to delemit the call with NETDEV_UNREGISTER, but this patch leaves them on the places, since they require additional analysis, whether we need in them for something else. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Make NETDEV_XXX commands enum { }Kirill Tkhai1-0/+20
This patch is preparation to drop NETDEV_UNREGISTER_FINAL. Since the cmd is used in usnic_ib_netdev_event_to_string() to get cmd name, after plain removing NETDEV_UNREGISTER_FINAL from everywhere, we'd have holes in event2str[] in this function. Instead of that, let's make NETDEV_XXX commands names available for everyone, and to define netdev_cmd_to_name() in the way we won't have to shaffle names after their numbers are changed. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-7/+15
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16net: Use rtnl_lock_killable() in register_netdev()Kirill Tkhai1-1/+2
This patch adds rtnl_lock_killable() to one of hot path using rtnl_lock(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14net: use skb_to_full_sk() in skb_update_prio()Eric Dumazet1-7/+15
Andrei Vagin reported a KASAN: slab-out-of-bounds error in skb_update_prio() Since SYNACK might be attached to a request socket, we need to get back to the listener socket. Since this listener is manipulated without locks, add const qualifiers to sock_cgroup_prioidx() so that the const can also be used in skb_update_prio() Also add the const qualifier to sock_cgroup_classid() for consistency. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12net: Make RX-FCS and HW GRO mutually exclusiveGal Pressman1-4/+11
Same as LRO, hardware GRO cannot be enabled with RX-FCS. When both are requested, hardware GRO will be dropped. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09net: introduce IFF_NO_RX_HANDLERPaolo Abeni1-0/+3
Some network devices - notably ipvlan slave - are not compatible with any kind of rx_handler. Currently the hook can be installed but any configuration (bridge, bond, macsec, ...) is nonfunctional. This change allocates a priv_flag bit to mark such devices and explicitly forbid installing a rx_handler if such bit is set. The new bit is used by ipvlan slave device. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+9
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05net: Make RX-FCS and LRO mutually exclusiveGal Pressman1-0/+6
LRO and RX-FCS offloads cannot be enabled at the same time since it is not clear what should happen to the FCS of each coalesced packet. The FCS is not really part of the TCP payload, hence cannot be merged into one big packet. On the other hand, providing one big LRO packet with one FCS contradicts the RX-FCS feature goal. Use the fix features mechanism in order to prevent intersection of the features and drop LRO in case RX-FCS is requested. Enabling RX-FCS while LRO is enabled will result in: $ ethtool -K ens6 rx-fcs on Actual changes: large-receive-offload: off [requested on] rx-fcs: on Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: allow interface to be set into VRF if VLAN interface in same VRFMike Manning1-5/+9
Setting an interface into a VRF fails with 'RTNETLINK answers: File exists' if one of its VLAN interfaces is already in the same VRF. As the VRF is an upper device of the VLAN interface, it is also showing up as an upper device of the interface itself. The solution is to restrict this check to devices other than master. As only one master device can be linked to a device, the check in this case is that the upper device (VRF) being linked to is not the same as the master device instead of it not being any one of the upper devices. The following example shows an interface ens12 (with a VLAN interface ens12.10) being set into VRF green, which behaves as expected: # ip link add link ens12 ens12.10 type vlan id 10 # ip link set dev ens12 master vrfgreen # ip link show dev ens12 3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vrfgreen state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff But if the VLAN interface has previously been set into the same VRF, then setting the interface into the VRF fails: # ip link set dev ens12 nomaster # ip link set dev ens12.10 master vrfgreen # ip link show dev ens12.10 39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrfgreen state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff # ip link set dev ens12 master vrfgreen RTNETLINK answers: File exists The workaround is to move the VLAN interface back into the default VRF beforehand, but it has to be shut first so as to avoid the risk of traffic leaking from the VRF. This fix avoids needing this workaround. Signed-off-by: Mike Manning <mmanning@att.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: Fix spelling mistake "greater then" -> "greater than"Gal Pressman1-1/+1
Fix trivial spelling mistake "greater then" -> "greater than". Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+9
2018-02-14net: fix race on decreasing number of TX queuesJakub Kicinski1-2/+9
netif_set_real_num_tx_queues() can be called when netdev is up. That usually happens when user requests change of number of channels/rings with ethtool -L. The procedure for changing the number of queues involves resetting the qdiscs and setting dev->num_tx_queues to the new value. When the new value is lower than the old one, extra care has to be taken to ensure ordering of accesses to the number of queues vs qdisc reset. Currently the queues are reset before new dev->num_tx_queues is assigned, leaving a window of time where packets can be enqueued onto the queues going down, leading to a likely crash in the drivers, since most drivers don't check if TX skbs are assigned to an active queue. Fixes: e6484930d7c7 ("net: allocate tx queues in register_netdevice") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14net: Make dn_ptr depend on CONFIG_DECNETDavid Ahern1-1/+2
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert default_device_opsKirill Tkhai1-0/+1
These pernet operations consist of exit() and exit_batch() methods. default_device_exit() moves not-local and virtual devices to init_net. There is nothing exciting, because this may happen in any time on a working system, and rtnl_lock() and synchronize_net() protect us from all cases of external dereference. The same for default_device_exit_batch(). Similar unregisteration may happen in any time on a system. Here several lists (like todo_list), which are accessed under rtnl_lock(). After rtnl_unlock() and netdev_run_todo() all the devices are flushed. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert pernet_subsys ops, registered via net_dev_init()Kirill Tkhai1-0/+1
There are: 1)dev_proc_ops and dev_mc_net_ops, which create and destroy pernet proc file and not interesting for another net namespaces; 2)netdev_net_ops, which creates pernet hashes, which are not touched by another pernet_operations. So, make them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29net_sched: plug in qdisc ops change_tx_queue_lenCong Wang1-0/+1
Introduce a new qdisc ops ->change_tx_queue_len() so that each qdisc could decide how to implement this if it wants. Previously we simply read dev->tx_queue_len, after pfifo_fast switches to skb array, we need this API to resize the skb array when we change dev->tx_queue_len. To avoid handling race conditions with TX BH, we need to deactivate all TX queues before change the value and bring them back after we are done, this also makes implementation easier. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29net: introduce helper dev_change_tx_queue_len()Cong Wang1-0/+28
This patch promotes the local change_tx_queue_len() to a core helper function, dev_change_tx_queue_len(), so that rtnetlink and net-sysfs could share the code. This also prepares for the following patch. Note, the -EFAULT in the original code doesn't make sense, we should propagate the errno from notifiers. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29dev: advertise the new ifindex when the netns iface changesNicolas Dichtel1-7/+12
The goal is to let the user follow an interface that moves to another netns. CC: Jiri Benc <jbenc@redhat.com> CC: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29dev: always advertise the new nsid when the netns iface changesNicolas Dichtel1-4/+1
The user should be able to follow any interface that moves to another netns. There is no reason to hide physical interfaces. CC: Jiri Benc <jbenc@redhat.com> CC: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+15
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in 'net'. The esp{4,6}_offload.c conflicts were overlapping changes. The 'out' label is removed so we just return ERR_PTR(-EINVAL) directly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23net: core: Fix kernel-doc for netdev_upper_link()Florian Fainelli1-0/+2
Fixes the following warnings: ./net/core/dev.c:6438: warning: No description found for parameter 'extack' ./net/core/dev.c:6461: warning: No description found for parameter 'extack' Fixes: 42ab19ee9029 ("net: Add extack to upper device linking") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23net: core: Fix kernel-doc for call_netdevice_notifiers_info()Florian Fainelli1-1/+0
Remove the @dev comment, since we do not have a net_device argument, fixes the following kernel doc warning: /net/core/dev.c:1707: warning: Excess function parameter 'dev' description in 'call_netdevice_notifiers_info' Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22net: qdisc_pkt_len_init() should be more robustEric Dumazet1-4/+15
Without proper validation of DODGY packets, we might very well feed qdisc_pkt_len_init() with invalid GSO packets. tcp_hdrlen() might access out-of-bound data, so let's use skb_header_pointer() and proper checks. Whole story is described in commit d0c081b49137 ("flow_dissector: properly cap thoff field") We have the goal of validating DODGY packets earlier in the stack, so we might very well revert this fix in the future. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jason Wang <jasowang@redhat.com> Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-12net: Cap number of queues even with accel_privAlexander Duyck1-2/+1
With the recent fix to ixgbe we can cap the number of queues always regardless of if accel_priv is being used or not since the actual number of queues are being reported via real_num_tx_queues. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-5/+4
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-01-11 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Various BPF related improvements and fixes to nfp driver: i) do not register XDP RXQ structure to control queues, ii) round up program stack size to word size for nfp, iii) restrict MTU changes when BPF offload is active, iv) add more fully featured relocation support to JIT, v) add support for signed compare instructions to the nfp JIT, vi) export and reuse verfier log routine for nfp, and many more, from Jakub, Quentin and Nic. 2) Fix a syzkaller reported GPF in BPF's copy_verifier_state() when we hit kmalloc failure path, from Alexei. 3) Add two follow-up fixes for the recent XDP RXQ series: i) kvzalloc() allocated memory was only kfree()'ed, and ii) fix a memory leak where RX queue was not freed in netif_free_rx_queues(), from Jakub. 4) Add a sample for transferring XDP meta data into the skb, here it is used for setting skb->mark with the buffer from XDP, from Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10net: free RX queue structuresJakub Kicinski1-4/+3
Looks like commit e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info") replaced kvfree(dev->_rx) in free_netdev() with a call to netif_free_rx_queues() which doesn't actually free the rings? While at it remove the unnecessary temporary variable. Fixes: e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-10net: use the right variant of kfreeJakub Kicinski1-1/+1
kvzalloc'ed memory should be kvfree'd. Fixes: e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+13
2018-01-05xdp: generic XDP handling of xdp_rxq_infoJesper Dangaard Brouer1-10/+59
Hook points for xdp_rxq_info: * reg : netif_alloc_rx_queues * unreg: netif_free_rx_queues The net_device have some members (num_rx_queues + real_num_rx_queues) and data-area (dev->_rx with struct netdev_rx_queue's) that were primarily used for exporting information about RPS (CONFIG_RPS) queues to sysfs (CONFIG_SYSFS). For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info, and remove some of the CONFIG_SYSFS ifdefs. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-02Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"David S. Miller1-1/+13
This reverts commit 87c320e51519a83c496ab7bfb4e96c8f9c001e89. Changing the error return code in some situations turns out to be harmful in practice. In particular Michael Ellerman reports that DHCP fails on his powerpc machines, and this revert gets things working again. Johannes Berg agrees that this revert is the best course of action for now. Fixes: 029b6d140550 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-nextDavid S. Miller1-7/+12
Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-12-22 1) Separate ESP handling from segmentation for GRO packets. This unifies the IPsec GSO and non GSO codepath. 2) Add asynchronous callbacks for xfrm on layer 2. This adds the necessary infrastructure to core networking. 3) Allow to use the layer2 IPsec GSO codepath for software crypto, all infrastructure is there now. 4) Also allow IPsec GSO with software crypto for local sockets. 5) Don't require synchronous crypto fallback on IPsec offloading, it is not needed anymore. 6) Check for xdo_dev_state_free and only call it if implemented. From Shannon Nelson. 7) Check for the required add and delete functions when a driver registers xdo_dev_ops. From Shannon Nelson. 8) Define xfrmdev_ops only with offload config. From Shannon Nelson. 9) Update the xfrm stats documentation. From Shannon Nelson. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Lots of overlapping changes. Also on the net-next side the XDP state management is handled more in the generic layers so undo the 'net' nfp fix which isn't applicable in net-next. Include a necessary change by Jakub Kicinski, with log message: ==================== cls_bpf no longer takes care of offload tracking. Make sure netdevsim performs necessary checks. This fixes a warning caused by TC trying to remove a filter it has not added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: Add asynchronous callbacks for xfrm on layer 2.Steffen Klassert1-5/+11
This patch implements asynchronous crypto callbacks and a backlog handler that can be used when IPsec is done at layer 2 in the TX path. It also extends the skb validate functions so that we can update the driver transmit return codes based on async crypto operation or to indicate that we queued the packet in a backlog queue. Joint work with: Aviv Heller <avivh@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20xfrm: Separate ESP handling from segmentation for GRO packets.Steffen Klassert1-3/+2
We change the ESP GSO handlers to only segment the packets. The ESP handling and encryption is defered to validate_xmit_xfrm() where this is done for non GRO packets too. This makes the code more robust and prepares for asynchronous crypto handling. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-19net: Disable GRO_HW when generic XDP is installed on a device.Michael Chan1-0/+18
Hardware should not aggregate any packets when generic XDP is installed. Cc: Ariel Elior <Ariel.Elior@cavium.com> Cc: everest-linux-l2@cavium.com Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19net: Introduce NETIF_F_GRO_HW.Michael Chan1-0/+12
Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware GRO. With this flag, we can now independently turn on or off hardware GRO when GRO is on. Previously, drivers were using NETIF_F_GRO to control hardware GRO and so it cannot be independently turned on or off without affecting GRO. Hardware GRO (just like GRO) guarantees that packets can be re-segmented by TSO/GSO to reconstruct the original packet stream. Logically, GRO_HW should depend on GRO since it a subset, but we will let individual drivers enforce this dependency as they see fit. Since NETIF_F_GRO is not propagated between upper and lower devices, NETIF_F_GRO_HW should follow suit since it is a subset of GRO. In other words, a lower device can independent have GRO/GRO_HW enabled or disabled and no feature propagation is required. This will preserve the current GRO behavior. This can be changed later if we decide to propagate GRO/ GRO_HW/RXCSUM from upper to lower devices. Cc: Ariel Elior <Ariel.Elior@cavium.com> Cc: everest-linux-l2@cavium.com Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15xdp: linearize skb in netif_receive_generic_xdp()Song Liu1-1/+1
In netif_receive_generic_xdp(), it is necessary to linearize all nonlinear skb. However, in current implementation, skb with troom <= 0 are not linearized. This patch fixes this by calling skb_linearize() for all nonlinear skb. Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access") Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-13net: avoid skb_warn_bad_offload on IS_ERRWillem de Bruijn1-1/+1
skb_warn_bad_offload warns when packets enter the GSO stack that require skb_checksum_help or vice versa. Do not warn on arbitrary bad packets. Packet sockets can craft many. Syzkaller was able to demonstrate another one with eth_type games. In particular, suppress the warning when segmentation returns an error, which is for reasons other than checksum offload. See also commit 36c92474498a ("net: WARN if skb_checksum_help() is called on skb requiring segmentation") for context on this warning. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Conflict was two parallel additions of include files to sch_generic.c, no biggie. Signed-off-by: David S. Miller <davem@davemloft.net>