aboutsummaryrefslogtreecommitdiffstats
path: root/net/core/rtnetlink.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-03-31net: Do not take net_rwsem in __rtnl_link_unregister()Kirill Tkhai1-3/+3
This function calls call_netdevice_notifier(), which also may take net_rwsem. So, we can't use net_rwsem here. This patch makes callers of this functions take pernet_ops_rwsem, like register_netdevice_notifier() does. This will protect the modifications of net_namespace_list, and allows notifiers to take it (they won't have to care about context). Since __rtnl_link_unregister() is used on module load and unload (which are not frequent operations), this looks for me better, than make all call_netdevice_notifier() always executing in "protected net_namespace_list" context. Also, this fixes the problem we had a deal in 328fbe747ad4 "Close race between {un, }register_netdevice_notifier and ...", and guarantees __rtnl_link_unregister() does not skip exitting net. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29net: Introduce net_rwsem to protect net_namespace_listKirill Tkhai1-0/+5
rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Add more commentsKirill Tkhai1-1/+1
This adds comments to different places to improve readability. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Rename net_sem to pernet_ops_rwsemKirill Tkhai1-2/+2
net_sem is some undefined area name, so it will be better to make the area more defined. Rename it to pernet_ops_rwsem for better readability and better intelligibility. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai1-1/+0
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16net: Add rtnl_lock_killable()Kirill Tkhai1-0/+6
rtnl_lock() is widely used mutex in kernel. Some of kernel code does memory allocations under it. In case of memory deficit this may invoke OOM killer, but the problem is a killed task can't exit if it's waiting for the mutex. This may be a reason of deadlock and panic. This patch adds a new primitive, which responds on SIGKILL, and it allows to use it in the places, where we don't want to sleep forever. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert rtnetlink_net_opsKirill Tkhai1-0/+1
rtnetlink_net_init() and rtnetlink_net_exit() create and destroy netlink socket net::rtnl. The socket is used to send rtnl notification via rtnl_net_notifyid(). There is no a problem to create and destroy it in parallel with other pernet operations, as we link net in setup_net() after the socket is created, and destroy in cleanup_net() after net is unhashed from all the lists and there is no RCU references on it. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Introduce net_sem for protection of pernet_listKirill Tkhai1-2/+2
Currently, the mutex is mostly used to protect pernet operations list. It orders setup_net() and cleanup_net() with parallel {un,}register_pernet_operations() calls, so ->exit{,batch} methods of the same pernet operations are executed for a dying net, as were used to call ->init methods, even after the net namespace is unlinked from net_namespace_list in cleanup_net(). But there are several problems with scalability. The first one is that more than one net can't be created or destroyed at the same moment on the node. For big machines with many cpus running many containers it's very sensitive. The second one is that it's need to synchronize_rcu() after net is removed from net_namespace_list(): Destroy net_ns: cleanup_net() mutex_lock(&net_mutex) list_del_rcu(&net->list) synchronize_rcu() <--- Sleep there for ages list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list) list_for_each_entry_reverse(ops, &pernet_list, list) ops_free_list(ops, &net_exit_list) mutex_unlock(&net_mutex) This primitive is not fast, especially on the systems with many processors and/or when preemptible RCU is enabled in config. So, all the time, while cleanup_net() is waiting for RCU grace period, creation of new net namespaces is not possible, the tasks, who makes it, are sleeping on the same mutex: Create net_ns: copy_net_ns() mutex_lock_killable(&net_mutex) <--- Sleep there for ages I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop with preemptible RCU enabled after CRIU tests round is finished. The solution is to convert net_mutex to the rw_semaphore and add fine grain locks to really small number of pernet_operations, what really need them. Then, pernet_operations::init/::exit methods, modifying the net-related data, will require down_read() locking only, while down_write() will be used for changing pernet_list (i.e., when modules are being loaded and unloaded). This gives signify performance increase, after all patch set is applied, like you may see here: %for i in {1..10000}; do unshare -n bash -c exit; done *before* real 1m40,377s user 0m9,672s sys 0m19,928s *after* real 0m17,007s user 0m5,311s sys 0m11,779 (5.8 times faster) This patch starts replacing net_mutex to net_sem. It adds rw_semaphore, describes the variables it protects, and makes to use, where appropriate. net_mutex is still present, and next patches will kick it out step-by-step. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08rtnetlink: require unique netns identifierChristian Brauner1-0/+48
Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK it is possible for userspace to send us requests with three different properties to identify a target network namespace. This affects at least RTM_{NEW,SET}LINK. Each of them could potentially refer to a different network namespace which is confusing. For legacy reasons the kernel will pick the IFLA_NET_NS_PID property first and then look for the IFLA_NET_NS_FD property but there is no reason to extend this type of behavior to network namespace ids. The regression potential is quite minimal since the rtnetlink requests in question either won't allow IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01rtnetlink: remove check for IFLA_IF_NETNSIDChristian Brauner1-3/+0
RTM_NEWLINK supports the IFLA_IF_NETNSID property since 5bb8ed075428b71492734af66230aa0c07fcc515 so we should not error out when it is passed. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINKChristian Brauner1-5/+1
- Backwards Compatibility: If userspace wants to determine whether RTM_NEWLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_NEWLINK. Userpace should then fallback to other means. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29net: introduce helper dev_change_tx_queue_len()Cong Wang1-13/+5
This patch promotes the local change_tx_queue_len() to a core helper function, dev_change_tx_queue_len(), so that rtnetlink and net-sysfs could share the code. This also prepares for the following patch. Note, the -EFAULT in the original code doesn't make sense, we should propagate the errno from notifiers. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29dev: advertise the new ifindex when the netns iface changesNicolas Dichtel1-11/+20
The goal is to let the user follow an interface that moves to another netns. CC: Jiri Benc <jbenc@redhat.com> CC: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINKChristian Brauner1-11/+28
- Backwards Compatibility: If userspace wants to determine whether RTM_DELLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_DELLINK. Userpace should then fallback to other means. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINKChristian Brauner1-3/+0
- Backwards Compatibility: If userspace wants to determine whether RTM_SETLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_SETLINK. Userpace should then fallback to other means. To retain backwards compatibility the kernel will first check whether a IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either one is found it will be used to identify the target network namespace. This implies that users who do not care whether their running kernel supports IFLA_IF_NETNSID with RTM_SETLINK can pass both IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network namespace. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29rtnetlink: enable IFLA_IF_NETNSID in do_setlink()Christian Brauner1-7/+47
RTM_{NEW,SET}LINK already allow operations on other network namespaces by identifying the target network namespace through IFLA_NET_NS_{FD,PID} properties. This is done by looking for the corresponding properties in do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID property. This introduces no functional changes since all callers of do_setlink() currently block IFLA_IF_NETNSID by reporting an error before they reach do_setlink(). This introduces the helpers: static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net, struct nlattr *tb[]) static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb, struct net *src_net, struct nlattr *tb[], int cap) to simplify permission checks and target network namespace retrieval for RTM_* requests that already support IFLA_NET_NS_{FD,PID} but get extended to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look for IFLA_NET_NS_{FD,PID} properties first before checking for IFLA_IF_NETNSID. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22net: core: Expose number of link up/down transitionsDavid Decotigny1-2/+11
Expose the number of times the link has been going UP or DOWN, and update the "carrier_changes" counter to be the sum of these two events. While at it, also update the sysfs-class-net documentation to cover: carrier_changes (3.15), carrier_up_count (4.16) and carrier_down_count (4.16) Signed-off-by: David Decotigny <decot@googlers.com> [Florian: * rebase * add documentation * merge carrier_changes with up/down counters] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10Merge tag 'mlx5-updates-2018-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller1-1/+9
mlx5-updates-2018-01-08 Four patches from Or that add Hairpin support to mlx5: =========================================================== From: Or Gerlitz <ogerlitz@mellanox.com> We refer the ability of NIC HW to fwd packet received on one port to the other port (also from a port to itself) as hairpin. The application API is based on ingress tc/flower rules set on the NIC with the mirred redirect action. Other actions can apply to packets during the redirect. Hairpin allows to offload the data-path of various SW DDoS gateways, load-balancers, etc to HW. Packets go through all the required processing in HW (header re-write, encap/decap, push/pop vlan) and then forwarded, CPU stays at practically zero usage. HW Flow counters are used by the control plane for monitoring and accounting. Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ). All the flows that share <recv NIC, mirred NIC> are redirected through the same hairpin pair. Currently, only header-rewrite is supported as a packet modification action. I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this functionality on HW simulator, before it was avail in the FW so the driver code could be tested early. =========================================================== From Feras three patches that provide very small changes that allow IPoIB to support RX timestamping for child interfaces, simply by hooking the mlx5e timestamping PTP ioctl to IPoIB child interface netdev profile. One patch from Gal to fix a spilling mistake. Two patches from Eugenia adds drop counters to VF statistics to be reported as part of VF statistics in netlink (iproute2) and implemented them in mlx5 eswitch. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+5
2018-01-09net/core: Add drop counters to VF statisticsEugenia Emantayev1-1/+9
Modern hardware can decide to drop packets going to/from a VF. Add receive and transmit drop counters to be displayed at hypervisor layer in iproute2 per VF statistics. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-04rtnetlink: give a user socket to get_target_net()Andrei Vagin1-5/+5
This function is used from two places: rtnl_dump_ifinfo and rtnl_getlink. In rtnl_getlink(), we give a request skb into get_target_net(), but in rtnl_dump_ifinfo, we give a response skb into get_target_net(). The problem here is that NETLINK_CB() isn't initialized for the response skb. In both cases we can get a user socket and give it instead of skb into get_target_net(). This bug was found by syzkaller with this call-trace: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 3149 Comm: syzkaller140561 Not tainted 4.15.0-rc4-mm1+ #47 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__netlink_ns_capable+0x8b/0x120 net/netlink/af_netlink.c:868 RSP: 0018:ffff8801c880f348 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8443f900 RDX: 000000000000007b RSI: ffffffff86510f40 RDI: 00000000000003d8 RBP: ffff8801c880f360 R08: 0000000000000000 R09: 1ffff10039101e4f R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff86510f40 R13: 000000000000000c R14: 0000000000000004 R15: 0000000000000011 FS: 0000000001a1a880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020151000 CR3: 00000001c9511005 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: netlink_ns_capable+0x26/0x30 net/netlink/af_netlink.c:886 get_target_net+0x9d/0x120 net/core/rtnetlink.c:1765 rtnl_dump_ifinfo+0x2e5/0xee0 net/core/rtnetlink.c:1806 netlink_dump+0x48c/0xce0 net/netlink/af_netlink.c:2222 __netlink_dump_start+0x4f0/0x6d0 net/netlink/af_netlink.c:2319 netlink_dump_start include/linux/netlink.h:214 [inline] rtnetlink_rcv_msg+0x7f0/0xb10 net/core/rtnetlink.c:4485 netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2441 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540 netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline] netlink_unicast+0x4be/0x6a0 net/netlink/af_netlink.c:1334 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897 Cc: Jiri Benc <jbenc@redhat.com> Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface") Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11rtnetlink: fix typo in GSO max segmentsStephen Hemminger1-1/+1
Fixes: 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08rtnetlink: allow GSO maximums to be set on device creationStephen Hemminger1-0/+34
Netlink device already allows changing GSO sizes with ip set command. The part that is missing is allowing overriding GSO settings on device creation. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05rtnetlink: fix rtnl_link msghandler rcu annotationsFlorian Westphal1-2/+3
Incorrect/missing annotations caused a few sparse warnings: rtnetlink.c:155:15: incompatible types .. (different address spaces) rtnetlink.c:157:23: incompatible types .. (different address spaces) rtnetlink.c:185:15: incompatible types .. (different address spaces) rtnetlink.c:285:15: incompatible types .. (different address spaces) rtnetlink.c:317:9: incompatible types .. (different address spaces) rtnetlink.c:3054:23: incompatible types .. (different address spaces) no change in generated code. Fixes: addf9b90de22f7 ("net: rtnetlink: use rcu to free rtnl message handlers") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04rtnetlink: ipv6: convert remaining users to rtnl_register_moduleFlorian Westphal1-1/+0
convert remaining users of rtnl_register to rtnl_register_module and un-export rtnl_register. Requested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-1/+5
Daniel Borkmann says: ==================== pull-request: bpf-next 2017-12-03 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Addition of a software model for BPF offloads in order to ease testing code changes in that area and make semantics more clear. This is implemented in a new driver called netdevsim, which can later also be extended for other offloads. SR-IOV support is added as well to netdevsim. BPF kernel selftests for offloading are added so we can track basic functionality as well as exercising all corner cases around BPF offloading, from Jakub. 2) Today drivers have to drop the reference on BPF progs they hold due to XDP on device teardown themselves. Change this in order to make XDP handling inside the drivers less error prone, and move disabling XDP to the core instead, also from Jakub. 3) Misc set of BPF verifier improvements and cleanups as preparatory work for upcoming BPF-to-BPF calls. Among others, this set also improves liveness marking such that pruning can be slightly more effective. Register and stack liveness information is now included in the verifier log as well, from Alexei. 4) nfp JIT improvements in order to identify load/store sequences in the BPF prog e.g. coming from memcpy lowering and optimizing them through the NPU's command push pull (CPP) instruction, from Jiong. 5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove bpf_prog_attach() magic values and replacing them with actual proper attach flag instead, from David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04rtnetlink: remove __rtnl_registerFlorian Westphal1-25/+8
This removes __rtnl_register and switches callers to either rtnl_register or rtnl_register_module. Also, rtnl_register() will now print an error if memory allocation failed rather than panic the kernel. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04rtnetlink: get reference on module before invoking handlersFlorian Westphal1-35/+78
Add yet another rtnl_register function. It will be used by modules that can be removed. The passed module struct is used to prevent module unload while a netlink dump is in progress or when a DOIT_UNLOCKED doit callback is called. Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04net: rtnetlink: use rcu to free rtnl message handlersFlorian Westphal1-53/+101
rtnetlink is littered with READ_ONCE() because we can have read accesses while another cpu can write to the structure we're reading by (un)registering doit or dumpit handlers. This patch changes this so that (un)registering cpu allocates a new structure and then publishes it via rcu_assign_pointer, i.e. once another cpu can see such pointer no modifications will occur anymore. based on initial patch from Peter Zijlstra. Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03net: xdp: avoid output parameters when querying XDP progJakub Kicinski1-1/+5
The output parameters will get unwieldy if we want to add more information about the program. Simply pass the entire struct netdev_bpf in. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-08rtnetlink: fix missing size for IFLA_IF_NETNSIDColin Ian King1-1/+1
The size for IFLA_IF_NETNSID is missing from the size calculation because the proceeding semicolon was not removed. Fix this by removing the semicolon. Detected by CoverityScan, CID#1461135 ("Structurally dead code") Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: bpf: rename ndo_xdp to ndo_bpfJakub Kicinski1-2/+2
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05rtnetlink: use netnsid to query interfaceJiri Benc1-18/+85
Currently, when an application gets netnsid from the kernel (for example as the result of RTM_GETLINK call on one end of the veth pair), it's not much useful. There's no reliable way to get to the netns fd from the netnsid, nor does any kernel API accept netnsid. Extend the RTM_GETLINK call to also accept netnsid. It will operate on the netns with the given netnsid in such case. Of course, the calling process needs to have enough capabilities in the target name space; for now, require CAP_NET_ADMIN. This can be relaxed in the future. To signal to the calling process that the kernel understood the new IFLA_IF_NETNSID attribute in the query, it will include it in the response. This is needed to detect older kernels, as they will just ignore IFLA_IF_NETNSID and query in the current name space. This patch implemetns IFLA_IF_NETNSID only for get and dump. For set operations, this can be extended later. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25bonding: remove rtmsg_ifinfo called after bond_lower_state_changedXin Long1-1/+0
After the patch 'rtnetlink: bring NETDEV_CHANGELOWERSTATE event process back to rtnetlink_event', bond_lower_state_changed would generate NETDEV_CHANGEUPPER event which would send a notification to userspace in rtnetlink_event. There's no need to call rtmsg_ifinfo to send the notification any more. So this patch is to remove it from these places after bond_lower_state_changed. Besides, after this, rtmsg_ifinfo is not needed to be exported. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25rtnetlink: bring NETDEV_CHANGELOWERSTATE event process back to rtnetlink_eventXin Long1-0/+1
This patch is to bring NETDEV_CHANGELOWERSTATE event process back to rtnetlink_event so that bonding could use it instead of calling rtmsg_ifinfo to send a notification to userspace after netdev lower state is changed in the later patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23net: core: rtnetlink: use BUG_ON instead of if condition followed by BUGGustavo A. R. Silva1-2/+1
Use BUG_ON instead of if condition followed by BUG in do_setlink. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+10
There were quite a few overlapping sets of changes here. Daniel's bug fix for off-by-ones in the new BPF branch instructions, along with the added allowances for "data_end > ptr + x" forms collided with the metadata additions. Along with those three changes came veritifer test cases, which in their final form I tried to group together properly. If I had just trimmed GIT's conflict tags as-is, this would have split up the meta tests unnecessarily. In the socketmap code, a set of preemption disabling changes overlapped with the rename of bpf_compute_data_end() to bpf_compute_data_pointers(). Changes were made to the mv88e6060.c driver set addr method which got removed in net-next. The hyperv transport socket layer had a locking change in 'net' which overlapped with a change of socket state macro usage in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: core: rcu-ify rtnl af_opsFlorian Westphal1-16/+46
rtnl af_ops currently rely on rtnl mutex: unregister (called from module exit functions) takes the rtnl mutex and all users that do af_ops lookup also take the rtnl mutex. IOW, parallel rmmod will block until doit() callback is done. As none of the af_ops implementation sleep we can use rcu instead. doit functions that need the af_ops can now use rcu instead of the rtnl mutex provided the mutex isn't needed for other reasons. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: place link af dump into own helperFlorian Westphal1-30/+42
next patch will rcu-ify rtnl af_ops, i.e. allow af_ops lookup and function calls with rcu read lock held instead of rtnl mutex. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16net: enable interface alias removal via rtnlNicolas Dichtel1-1/+4
IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length of the attribute is 1 ("\0"). However, to remove an alias, the attribute length must be 0 (see dev_set_alias()). Let's define the type to NLA_BINARY to allow 0-length string, so that the alias can be removed. Example: $ ip l s dummy0 alias foo $ ip l l dev dummy0 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff alias foo Before the patch: $ ip l s dummy0 alias "" RTNETLINK answers: Numerical result out of range After the patch: $ ip l s dummy0 alias "" $ ip l l dev dummy0 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff CC: Oliver Hartkopp <oliver@hartkopp.net> CC: Stephen Hemminger <stephen@networkplumber.org> Fixes: 96ca4a2cc145 ("net: remove ifalias on empty given alias") Reported-by: Julien FLoret <julien.floret@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: do not set notification for tx_queue_len in do_setlinkXin Long1-1/+1
NETDEV_CHANGE_TX_QUEUE_LEN event process in rtnetlink_event would send a notification for userspace and tx_queue_len's setting in do_setlink would trigger NETDEV_CHANGE_TX_QUEUE_LEN. So it shouldn't set DO_SETLINK_NOTIFY status for this change to send a notification any more. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: check DO_SETLINK_NOTIFY correctly in do_setlinkXin Long1-1/+1
The check 'status & DO_SETLINK_NOTIFY' in do_setlink doesn't really work after status & DO_SETLINK_MODIFIED, as: DO_SETLINK_MODIFIED 0x1 DO_SETLINK_NOTIFY 0x3 Considering that notifications are suppposed to be sent only when status have the flag DO_SETLINK_NOTIFY, the right check would be: (status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY This would avoid lots of duplicated notifications when setting some properties of a link. Fixes: ba9989069f4e ("rtnl/do_setlink(): notify when a netdev is modified") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: bring NETDEV_CHANGEUPPER event process back in rtnetlink_eventXin Long1-0/+1
libteam needs this event notification in userspace when dev's master dev has been changed. After this, the redundant notifications issue would be fixed in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY correctly in do_setlink'. Fixes: b6b36eb23a46 ("rtnetlink: Do not generate notifications for NETDEV_CHANGEUPPER event") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: bring NETDEV_POST_TYPE_CHANGE event process back in rtnetlink_eventXin Long1-0/+1
As I said in patch 'rtnetlink: bring NETDEV_CHANGEMTU event process back in rtnetlink_event', removing NETDEV_POST_TYPE_CHANGE event was not the right fix for the redundant notifications issue. So bring this event process back to rtnetlink_event and the old redundant notifications issue would be fixed in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY correctly in do_setlink'. Fixes: aef091ae58aa ("rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: bring NETDEV_CHANGE_TX_QUEUE_LEN event process back in rtnetlink_eventXin Long1-0/+1
The same fix for changing mtu in the patch 'rtnetlink: bring NETDEV_CHANGEMTU event process back in rtnetlink_event' is needed for changing tx_queue_len. Note that the redundant notifications issue for tx_queue_len will be fixed in the later patch 'rtnetlink: do not send notification for tx_queue_len in do_setlink'. Fixes: 27b3b551d8a7 ("rtnetlink: Do not generate notifications for NETDEV_CHANGE_TX_QUEUE_LEN event") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16rtnetlink: bring NETDEV_CHANGEMTU event process back in rtnetlink_eventXin Long1-0/+1
Commit 085e1a65f04f ("rtnetlink: Do not generate notifications for MTU events") tried to fix the redundant notifications issue when ip link set mtu by removing NETDEV_CHANGEMTU event process in rtnetlink_event. But it also resulted in no notification generated when dev's mtu is changed via other methods, like: 'ifconfig eth1 mtu 1400' or 'echo 1400 > /sys/class/net/eth1/mtu' It would cause users not to be notified by this change. This patch is to fix it by bringing NETDEV_CHANGEMTU event back into rtnetlink_event, and the redundant notifications issue will be fixed in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY correctly in do_setlink'. Fixes: 085e1a65f04f ("rtnetlink: Do not generate notifications for MTU events") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10rtnetlink: bridge: use ext_ack instead of printkFlorian Westphal1-14/+14
We can now piggyback error strings to userspace via extended acks rather than using printk. Before: bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095 RTNETLINK answers: Invalid argument After: bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095 Error: invalid vlan id. v3: drop 'RTM_' prefixes, suggested by David Ahern, they are not useful, the add/del in bridge command line is enough. Also reword error in response to malformed/bad vlan id attribute size. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+3
Just simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04net: Add extack to ndo_add_slaveDavid Ahern1-4/+6
Pass extack to do_set_master and down to ndo_add_slave Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04dev: advertise the new nsid when the netns iface changesNicolas Dichtel1-9/+22
x-netns interfaces are bound to two netns: the link netns and the upper netns. Usually, this kind of interfaces is created in the link netns and then moved to the upper netns. At the end, the interface is visible only in the upper netns. The link nsid is advertised via netlink in the upper netns, thus the user always knows where is the link part. There is no such mechanism in the link netns. When the interface is moved to another netns, the user cannot "follow" it. This patch adds a new netlink attribute which helps to follow an interface which moves to another netns. When the interface is unregistered, the new nsid is advertised. If the interface is a x-netns interface (ie rtnl_link_ops->get_link_net is defined), the nsid is allocated if needed. CC: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>