aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/macvlan.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-19Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-2/+3
2020-01-16macvlan: use skb_reset_mac_header() in macvlan_queue_xmit()Eric Dumazet1-2/+3
I missed the fact that macvlan_broadcast() can be used both in RX and TX. skb_eth_hdr() makes only sense in TX paths, so we can not use it blindly in macvlan_broadcast() Fixes: 96cc4b69581d ("macvlan: do not assume mac_header is set in macvlan_broadcast()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jurgen Van Ham <juvanham@gmail.com> Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-1/+1
The ungrafting from PRIO bug fixes in net, when merged into net-next, merge cleanly but create a build failure. The resolution used here is from Petr Machata. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08macvlan: do not assume mac_header is set in macvlan_broadcast()Eric Dumazet1-1/+1
Use of eth_hdr() in tx path is error prone. Many drivers call skb_reset_mac_header() before using it, but others do not. Commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") attempted to fix this generically, but commit d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") brought back the macvlan bug. Lets add a new helper, so that tx paths no longer have to call skb_reset_mac_header() only to get a pointer to skb->data. Hopefully we will be able to revert 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") and save few cycles in transmit fast path. BUG: KASAN: use-after-free in __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] BUG: KASAN: use-after-free in mc_hash drivers/net/macvlan.c:251 [inline] BUG: KASAN: use-after-free in macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 Read of size 4 at addr ffff8880a4932401 by task syz-executor947/9579 CPU: 0 PID: 9579 Comm: syz-executor947 Not tainted 5.5.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:145 __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] mc_hash drivers/net/macvlan.c:251 [inline] macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 macvlan_queue_xmit drivers/net/macvlan.c:520 [inline] macvlan_start_xmit+0x402/0x77f drivers/net/macvlan.c:559 __netdev_start_xmit include/linux/netdevice.h:4447 [inline] netdev_start_xmit include/linux/netdevice.h:4461 [inline] dev_direct_xmit+0x419/0x630 net/core/dev.c:4079 packet_direct_xmit+0x1a9/0x250 net/packet/af_packet.c:240 packet_snd net/packet/af_packet.c:2966 [inline] packet_sendmsg+0x260d/0x6220 net/packet/af_packet.c:2991 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 __sys_sendto+0x262/0x380 net/socket.c:1985 __do_sys_sendto net/socket.c:1997 [inline] __se_sys_sendto net/socket.c:1993 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1993 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x442639 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc13549e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442639 RDX: 000000000000000e RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000403bb0 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x163/0x770 mm/slab.c:3665 kmalloc include/linux/slab.h:561 [inline] tomoyo_realpath_from_path+0xc5/0x660 security/tomoyo/realpath.c:252 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 tomoyo_realpath_from_path+0x1a7/0x660 security/tomoyo/realpath.c:289 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a4932000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1025 bytes inside of 4096-byte region [ffff8880a4932000, ffff8880a4933000) The buggy address belongs to the page: page:ffffea0002924c80 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0 raw: 00fffe0000010200 ffffea0002846208 ffffea00028f3888 ffff8880aa402000 raw: 0000000000000000 ffff8880a4932000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a4932300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880a4932400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880a4932480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: b863ceb7ddce ("[NET]: Add macvlan driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: macvlan: Use the PHY time stamping interface.Richard Cochran1-2/+2
The macvlan layer tests fields of the phy_device in order to determine whether to invoke the PHY's tsinfo ethtool callback. This patch replaces the open coded logic with an invocation of the proper methods. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25macvlan: schedule bc_work even if errorMenglong Dong1-1/+2
While enqueueing a broadcast skb to port->bc_queue, schedule_work() is called to add port->bc_work, which processes the skbs in bc_queue, to "events" work queue. If port->bc_queue is full, the skb will be discarded and schedule_work(&port->bc_work) won't be called. However, if port->bc_queue is full and port->bc_work is not running or pending, port->bc_queue will keep full and schedule_work() won't be called any more, and all broadcast skbs to macvlan will be discarded. This case can happen: macvlan_process_broadcast() is the pending function of port->bc_work, it moves all the skbs in port->bc_queue to the queue "list", and processes the skbs in "list". During this, new skbs will keep being added to port->bc_queue in macvlan_broadcast_enqueue(), and port->bc_queue may already full when macvlan_process_broadcast() return. This may happen, especially when there are a lot of real-time threads and the process is preempted. Fix this by calling schedule_work(&port->bc_work) even if port->bc_work is full in macvlan_broadcast_enqueue(). Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24net: remove unnecessary variables and callbackTaehee Yoo1-7/+0
This patch removes variables and callback these are related to the nested device structure. devices that can be nested have their own nest_level variable that represents the depth of nested devices. In the previous patch, new {lower/upper}_level variables are added and they replace old private nest_level variable. So, this patch removes all 'nest_level' variables. In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added to get lockdep subclass value, which is actually lower nested depth value. But now, they use the dynamic lockdep key to avoid lockdep warning instead of the subclass. So, this patch removes ->ndo_get_lock_subclass() callback. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24net: core: add generic lockdep keysTaehee Yoo1-12/+0
Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+1
Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-29macvlan: Replace strncpy() by strscpy()Gustavo A. R. Silva1-1/+1
The strncpy() function is being deprecated. Replace it by the safer strscpy() and fix the following Coverity warning: "Calling strncpy with a maximum size argument of 16 bytes on destination array ifrr.ifr_ifrn.ifrn_name of size 16 bytes might leave the destination string unterminated." Notice that, unlike strncpy(), strscpy() always null-terminates the destination string. Addresses-Coverity-ID: 1445537 ("Buffer not null terminated") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-20macvlan: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/macvlan.c: In function ‘macvlan_do_ioctl’: drivers/net/macvlan.c:839:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (!net_eq(dev_net(dev), &init_net)) ^ drivers/net/macvlan.c:841:2: note: here case SIOCGHWTSTAMP: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09macvlan: disable SIOCSHWTSTAMP in containerHangbin Liu1-0/+2
Miroslav pointed that with NET_ADMIN enabled in container, a normal user could be mapped to root and is able to change the real device's rx filter via ioctl on macvlan, which would affect the other ptp process on host. Fix it by disabling SIOCSHWTSTAMP in container. Fixes: 254c0a2bfedb ("macvlan: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to real device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek1-1/+1
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20macvlan: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to real deviceHangbin Liu1-0/+48
Similiar to commit a6111d3c93d0 ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device") and commit 37dd9255b2f6 ("vlan: Pass ethtool get_ts_info queries to real device."), add MACVlan HW ptp support. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24macvlan: add ndo_change_proto_down supportAndy Roulin1-0/+1
Add ndo_change_proto_down support through dev_change_proto_down_generic for use by control protocols like VRRPD. Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-31macvlan: use netif_is_macvlan_port()Julian Wiedmann1-4/+2
Replace the macvlan_port_exists() macro with its twin from netdevice.h Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Completely minor snmp doc conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17macvlan: replace kfree_skb by consume_skb for drop profilesYang Wei1-1/+1
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch, perf) friendly. Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net: Add extack argument to ndo_fdb_add()Petr Machata1-1/+2
Drivers may not be able to support certain FDB entries, and an error code is insufficient to give clear hints as to the reasons of rejection. In order to make it possible to communicate the rejection reason, extend ndo_fdb_add() with an extack argument. Adapt the existing implementations of ndo_fdb_add() to take the parameter (and ignore it). Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13net: dev: Add extack argument to dev_set_mac_address()Petr Machata1-2/+2
A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which allows vetoing of MAC address changes. One prominent path to that notification is through dev_set_mac_address(). Therefore give this function an extack argument, so that it can be packed together with the notification. Thus a textual reason for rejection (or a warning) can be communicated back to the user. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03macvlan: return correct error valueMatteo Croce1-2/+5
A MAC address must be unique among all the macvlan devices with the same lower device. The only exception is the passthru [sic] mode, which shares the lower device address. When duplicate addresses are detected, EBUSY is returned when bringing the interface up: # ip link add macvlan0 link eth0 type macvlan # read addr </sys/class/net/eth0/address # ip link set macvlan0 address $addr # ip link set macvlan0 up RTNETLINK answers: Device or resource busy Use correct error code which is EADDRINUSE, and do the check also earlier, on address change: # ip link set macvlan0 address $addr RTNETLINK answers: Address already in use Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19netpoll: allow cleanup to be synchronousDebabrata Banerjee1-1/+1
This fixes a problem introduced by: commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock") When using netconsole on a bond, __netpoll_cleanup can asynchronously recurse multiple times, each __netpoll_free_async call can result in more __netpoll_free_async's. This means there is now a race between cleanup_work queues on multiple netpoll_info's on multiple devices and the configuration of a new netpoll. For example if a netconsole is set to enable 0, reconfigured, and enable 1 immediately, this netconsole will likely not work. Given the reason for __netpoll_free_async is it can be called when rtnl is not locked, if it is locked, we should be able to execute synchronously. It appears to be locked everywhere it's called from. Generalize the design pattern from the teaming driver for current callers of __netpoll_free_async. CC: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11macvlan: Change status when lower device goes downTravis Brown1-0/+1
Today macvlan ignores the notification when a lower device goes administratively down, preventing the lack of connectivity from bubbling up. Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN with NO-CARRIER which should be easy to interpret in userspace. 2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 Signed-off-by: Suresh Krishnan <skrishnan@arista.com> Signed-off-by: Travis Brown <travisb@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09net: Add support for subordinate traffic classes to netdev_pick_txAlexander Duyck1-8/+2
This change makes it so that we can support the concept of subordinate device traffic classes to the core networking code. In doing this we can start pulling out the driver specific bits needed to support selecting a queue based on an upper device. The solution at is currently stands is only partially implemented. I have the start of some XPS bits in here, but I would still need to allow for configuration of the XPS maps on the queues reserved for the subordinate devices. For now I am using the reference to the sb_dev XPS map as just a way to skip the lookup of the lower device XPS map for now as that would result in the wrong queue being picked. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25macvlan: Use software path for offloaded local, broadcast, and multicast trafficAlexander Duyck1-26/+21
This change makes it so that we use a software path for packets that are going to be locally switched between two macvlan interfaces on the same device. In addition we resort to software replication of broadcast and multicast packets instead of offloading that to hardware. The general idea is that using the device for east/west traffic local to the system is extremely inefficient. We can only support up to whatever the PCIe limit is for any given device so this caps us at somewhere around 20G for devices supported by ixgbe. This is compounded even further when you take broadcast and multicast into account as a single 10G port can come to a crawl as a packet is replicated up to 60+ times in some cases. In order to get away from that I am implementing changes so that we handle broadcast/multicast replication and east/west local traffic all in software. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25macvlan: Rename fwd_priv to accel_priv and add accessor functionAlexander Duyck1-12/+19
This change renames the fwd_priv member to accel_priv as this more accurately reflects the actual purpose of this value. In addition I am adding an accessor which will allow us to further abstract this in the future if needed. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-11macvlan: filter out unsupported feature flagsShannon Nelson1-1/+1
Adding a macvlan device on top of a lowerdev that supports the xfrm offloads fails with a new regression: # ip link add link ens1f0 mv0 type macvlan RTNETLINK answers: Operation not permitted Tracing down the failure shows that the macvlan device inherits the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags from the lowerdev, but with no dev->xfrmdev_ops API filled in, it doesn't actually support xfrm. When the request is made to add the new macvlan device, the XFRM listener for NETDEV_REGISTER calls xfrm_api_check() which fails the new registration because dev->xfrmdev_ops is NULL. The macvlan creation succeeds when we filter out the ESP feature flags in macvlan_fix_features(), so let's filter them out like we're already filtering out ~NETIF_F_NETNS_LOCAL. When XFRM support is added in the future, we can add the flags into MACVLAN_FEATURES. This same problem could crop up in the future with any other new feature flags, so let's filter out any flags that aren't defined as supported in macvlan. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23macvlan: fix use-after-free in macvlan_common_newlink()Alexey Kodanev1-1/+1
The following use-after-free was reported by KASan when running LTP macvtap01 test on 4.16-rc2: [10642.528443] BUG: KASAN: use-after-free in macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450 ... [10642.963873] Call Trace: [10642.994352] dump_stack+0x5c/0x7c [10643.035325] print_address_description+0x75/0x290 [10643.092938] kasan_report+0x28d/0x390 [10643.137971] ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.207963] macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.275978] macvtap_newlink+0x171/0x260 [macvtap] [10643.334532] rtnl_newlink+0xd4f/0x1300 ... [10646.256176] Allocated by task 18450: [10646.299964] kasan_kmalloc+0xa6/0xd0 [10646.343746] kmem_cache_alloc_trace+0xf1/0x210 [10646.397826] macvlan_common_newlink+0x6de/0x14a0 [macvlan] [10646.464386] macvtap_newlink+0x171/0x260 [macvtap] [10646.522728] rtnl_newlink+0xd4f/0x1300 ... [10647.022028] Freed by task 18450: [10647.061549] __kasan_slab_free+0x138/0x180 [10647.111468] kfree+0x9e/0x1c0 [10647.147869] macvlan_port_destroy+0x3db/0x650 [macvlan] [10647.211411] rollback_registered_many+0x5b9/0xb10 [10647.268715] rollback_registered+0xd9/0x190 [10647.319675] register_netdevice+0x8eb/0xc70 [10647.370635] macvlan_common_newlink+0xe58/0x14a0 [macvlan] [10647.437195] macvtap_newlink+0x171/0x260 [macvtap] Commit d02fd6e7d293 ("macvlan: Fix one possible double free") handles the case when register_netdevice() invokes ndo_uninit() on error and as a result free the port. But 'macvlan_port_get_rtnl(dev))' check (returns dev->rx_handler_data), which was added by this commit in order to prevent double free, is not quite correct: * for macvlan it always returns NULL because 'lowerdev' is the one that was used to register rx handler (port) in macvlan_port_create() as well as to unregister it in macvlan_port_destroy(). * for macvtap it always returns a valid pointer because macvtap registers its own rx handler before macvlan_common_newlink(). Fixes: d02fd6e7d293 ("macvlan: Fix one possible double free") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02macvlan: Fix one possible double freeGao Feng1-1/+6
Because the macvlan_uninit would free the macvlan port, so there is one double free case in macvlan_common_newlink. When the macvlan port is just created, then register_netdevice or netdev_upper_dev_link failed and they would invoke macvlan_uninit. Then it would reach the macvlan_port_destroy which triggers the double free. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19macvlan/macvtap: Add support for L2 forwarding offloads with macvtapAlexander Duyck1-4/+1
This patch reverts earlier commit b13ba1b83f52 ("macvlan: forbid L2 fowarding offload for macvtap"). The reason for reverting this is because the original patch no longer fixes what it previously did as the underlying structure has changed for macvtap. Specifically macvtap originally pulled packets directly off of the lowerdev. However in commit 6acf54f1cf0a ("macvtap: Add support of packet capture on macvtap device.") that code was changed and instead macvtap would listen directly on the macvtap device itself instead of the lower device. As such, the L2 forwarding offload should now be able to provide a performance advantage of skipping the checks on the lower dev while not introducing any sort of regression. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14macvlan: Only update pkt_type if destination MAC address matchesAlexander Duyck1-1/+3
This patch updates the pkt_type to PACKET_HOST only if the destination MAC address matches on the on the source based macvlan. It didn't make sense to be updating broadcast, multicast, and non-local destined frames with PACKET_HOST. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14macvlan: Only deliver one copy of the frame to the macvlan interfaceAlexander Duyck1-1/+1
This patch intoduces a slight adjustment for macvlan to address the fact that in source mode I was seeing two copies of any packet addressed to the macvlan interface being delivered where there should have been only one. The issue appears to be that one copy was delivered based on the source MAC address and then the second copy was being delivered based on the destination MAC address. To fix it I am just treating a unicast address match as though it is not a match since source based macvlan isn't supposed to be matching based on the destination MAC anyway. Fixes: 79cf79abce71 ("macvlan: add source mode") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04net: Add extack to upper device linkingDavid Ahern1-3/+4
Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20macvlan: code refine to check data before usingZhang Shengju1-5/+8
This patch checks data first at one place, return if it's null. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18macvlan: add offload features for encapsulationDimitris Michailidis1-1/+2
Currently macvlan devices do not set their hw_enc_features making encapsulated Tx packets resort to SW fallbacks. Add encapsulation GSO offloads to ->features as is done for the other GSOs and set ->hw_enc_features. Signed-off-by: Dimitris Michailidis <dmichail@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17macvlan/macvtap: Remove NETIF_F_UFO advertisement.David S. Miller1-1/+1
It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-15/+73
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.validateMatthias Schiffer1-1/+2
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.changelinkMatthias Schiffer1-1/+2
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer1-1/+2
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: Let passthru macvlan correctly restore lower mac addressVlad Yasevich1-3/+44
Passthru macvlans directly change the mac address of the lower level device. That's OK, but after the macvlan is deleted, the lower device is left with changed address and one needs to reboot to bring back the origina HW addresses. This scenario is actually quite common with passthru macvtap devices. This patch attempts to solve this, by storing the mac address of the lower device in macvlan_port structure and keeping track of it through the changes. After this patch, any changes to the lower device mac address done trough the macvlan device, will be reverted back. Any changs done directly to the lower device mac address will be kept. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: convert port passthru to flags.Vlad Yasevich1-13/+24
Convert the port passthru boolean into flags with accesor functions. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: Fix passthru macvlan mac address inheritanceVlad Yasevich1-1/+2
When a lower device of the passthru macvlan changes it's address, passthru macvlan is supposed to change it's own address as well. However, that doesn't happen correctly because the check in macvlan_addr_busy() will catch the fact that the lower level (port) mac address is the same as the address we are trying to assign to the macvlan, and return an error. As a reasult, the address of the passthru macvlan device is never changed. The same thing happens when the user attempts to change the mac address of the passthru macvlan. The simple solution appers to be to not check against the lower device in case of passthru macvlan device, since the 2 addresses are _supposed_ to be the same. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: Do not return error when setting the same mac addressVlad Yasevich1-0/+4
The user currently gets an EBUSY error when attempting to set the mac address on a macvlan device to the same value. This should really be a no-op as nothing changes. Catch the condition and return early. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14macvlan: propagate the mac address change status for lowerdevZhang Shengju1-4/+2
The macvlan dev should propagate the return value of mac address change for lower device in the passthru mode, instead of always return 0. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller1-1/+1
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15macvlan: Fix performance issues with vlan tagged packetsVlad Yasevich1-2/+5
Macvlan always turns on offload features that have sofware fallback (NETIF_GSO_SOFTWARE). This allows much higher guest-guest communications over macvtap. However, macvtap does not turn on these features for vlan tagged traffic. As a result, depending on the HW that mactap is configured on, the performance of guest-guest communication over a vlan is very inconsistent. If the HW supports TSO/UFO over vlans, then the performance will be fine. If not, the the performance will suffer greatly since the VM may continue using TSO/UFO, and will force the host segment the traffic and possibly overlow the macvtap queue. This patch adds the always on offloads to vlan_features. This makes sure that any vlan tagged traffic between 2 guest will not be segmented needlessly. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25macvlan: Fix device ref leak when purging bc_queueHerbert Xu1-1/+10
When a parent macvlan device is destroyed we end up purging its broadcast queue without dropping the device reference count on the packet source device. This causes the source device to linger. This patch drops that reference count. Fixes: 260916dfb48c ("macvlan: Fix potential use-after free for...") Reported-by: Joe Ghalam <Joe.Ghalam@dell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>