aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller22-119/+183
Trivial conflict in net/core/filter.c, a locally computed 'sdif' is now an argument to the function. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller16-105/+157
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Disable BH while holding list spinlock in nf_conncount, from Taehee Yoo. 2) List corruption in nf_conncount, also from Taehee. 3) Fix race that results in leaving around an empty list node in nf_conncount, from Taehee Yoo. 4) Proper chain handling for inactive chains from the commit path, from Florian Westphal. This includes a selftest for this. 5) Do duplicate rule handles when replacing rules, also from Florian. 6) Remove net_exit path in xt_RATEEST that results in splat, from Taehee. 7) Possible use-after-free in nft_compat when releasing extensions. From Florian. 8) Memory leak in xt_hashlimit, from Taehee. 9) Call ip_vs_dst_notifier after ipv6_dev_notf, from Xin Long. 10) Fix cttimeout with udplite and gre, from Florian. 11) Preserve oif for IPv6 link-local generated traffic from mangle table, from Alin Nastac. 12) Missing error handling in masquerade notifiers, from Taehee Yoo. 13) Use mutex to protect registration/unregistration of masquerade extensions in order to prevent a race, from Taehee. 14) Incorrect condition check in tree_nodes_free(), also from Taehee. 15) Fix chain counter leak in rule replacement path, from Taehee. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28netfilter: nf_tables: deactivate expressions in rule replecement routineTaehee Yoo1-11/+4
There is no expression deactivation call from the rule replacement path, hence, chain counter is not decremented. A few steps to reproduce the problem: %nft add table ip filter %nft add chain ip filter c1 %nft add chain ip filter c1 %nft add rule ip filter c1 jump c2 %nft replace rule ip filter c1 handle 3 accept %nft flush ruleset <jump c2> expression means immediate NFT_JUMP to chain c2. Reference count of chain c2 is increased when the rule is added. When rule is deleted or replaced, the reference counter of c2 should be decreased via nft_rule_expr_deactivate() which calls nft_immediate_deactivate(). Splat looks like: [ 214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables] [ 214.398983] Modules linked in: nf_tables nfnetlink [ 214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44 [ 214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables] [ 214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8 [ 214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202 [ 214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0 [ 214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78 [ 214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000 [ 214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba [ 214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6 [ 214.398983] FS: 0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000 [ 214.398983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0 [ 214.398983] Call Trace: [ 214.398983] ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables] [ 214.398983] ? __kasan_slab_free+0x145/0x180 [ 214.398983] ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables] [ 214.398983] ? kfree+0xdb/0x280 [ 214.398983] nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables] [ ... ] Fixes: bb7b40aecbf7 ("netfilter: nf_tables: bogus EBUSY in chain deletions") Reported by: Christoph Anton Mitterer <calestyo@scientia.net> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505 Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27net/ipv4: Fix missing raw_init when CONFIG_PROC_FS is disabledDavid Ahern1-1/+1
Randy reported when CONFIG_PROC_FS is not enabled: ld: net/ipv4/af_inet.o: in function `inet_init': af_inet.c:(.init.text+0x42d): undefined reference to `raw_init' Fix by moving the endif up to the end of the proc entries Fixes: 6897445fb194c ("net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs") Reported-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27tcp: remove hdrlen argument from tcp_queue_rcv()Eric Dumazet1-7/+6
Only one caller needs to pull TCP headers, so lets move __skb_pull() to the caller side. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27net/ncsi: Add NCSI Mellanox OEM commandVijay Khemka4-2/+81
This patch adds OEM Mellanox commands and response handling. It also defines OEM Get MAC Address handler to get and configure the device. ncsi_oem_gma_handler_mlx: This handler send NCSI mellanox command for getting mac address. ncsi_rsp_handler_oem_mlx: This handles response received for all mellanox OEM commands. ncsi_rsp_handler_oem_mlx_gma: This handles get mac address response and set it to device. Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27tipc: fix lockdep warning during node deleteJon Maloy1-2/+5
We see the following lockdep warning: [ 2284.078521] ====================================================== [ 2284.078604] WARNING: possible circular locking dependency detected [ 2284.078604] 4.19.0+ #42 Tainted: G E [ 2284.078604] ------------------------------------------------------ [ 2284.078604] rmmod/254 is trying to acquire lock: [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0 [ 2284.078604] [ 2284.078604] but task is already holding lock: [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc] [ 2284.078604] [ 2284.078604] which lock already depends on the new lock. [ 2284.078604] [ 2284.078604] [ 2284.078604] the existing dependency chain (in reverse order) is: [ 2284.078604] [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}: [ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc] [ 2284.078604] call_timer_fn+0xa1/0x280 [ 2284.078604] run_timer_softirq+0x1f2/0x4d0 [ 2284.078604] __do_softirq+0xfc/0x413 [ 2284.078604] irq_exit+0xb5/0xc0 [ 2284.078604] smp_apic_timer_interrupt+0xac/0x210 [ 2284.078604] apic_timer_interrupt+0xf/0x20 [ 2284.078604] default_idle+0x1c/0x140 [ 2284.078604] do_idle+0x1bc/0x280 [ 2284.078604] cpu_startup_entry+0x19/0x20 [ 2284.078604] start_secondary+0x187/0x1c0 [ 2284.078604] secondary_startup_64+0xa4/0xb0 [ 2284.078604] [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}: [ 2284.078604] del_timer_sync+0x34/0xa0 [ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc] [ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc] [ 2284.078604] tipc_net_stop+0x154/0x170 [tipc] [ 2284.078604] tipc_exit_net+0x16/0x30 [tipc] [ 2284.078604] ops_exit_list.isra.8+0x36/0x70 [ 2284.078604] unregister_pernet_operations+0x87/0xd0 [ 2284.078604] unregister_pernet_subsys+0x1d/0x30 [ 2284.078604] tipc_exit+0x11/0x6f2 [tipc] [ 2284.078604] __x64_sys_delete_module+0x1df/0x240 [ 2284.078604] do_syscall_64+0x66/0x460 [ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 2284.078604] [ 2284.078604] other info that might help us debug this: [ 2284.078604] [ 2284.078604] Possible unsafe locking scenario: [ 2284.078604] [ 2284.078604] CPU0 CPU1 [ 2284.078604] ---- ---- [ 2284.078604] lock(&(&tn->node_list_lock)->rlock); [ 2284.078604] lock((&n->timer)#2); [ 2284.078604] lock(&(&tn->node_list_lock)->rlock); [ 2284.078604] lock((&n->timer)#2); [ 2284.078604] [ 2284.078604] *** DEADLOCK *** [ 2284.078604] [ 2284.078604] 3 locks held by rmmod/254: [ 2284.078604] #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30 [ 2284.078604] #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc] [ 2284.078604] #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19 [...} The reason is that the node timer handler sometimes needs to delete a node which has been disconnected for too long. To do this, it grabs the lock 'node_list_lock', which may at the same time be held by the generic node cleanup function, tipc_node_stop(), during module removal. Since the latter is calling del_timer_sync() inside the same lock, we have a potential deadlock. We fix this letting the timer cleanup function use spin_trylock() instead of just spin_lock(), and when it fails to grab the lock it just returns so that the timer handler can terminate its execution. This is safe to do, since tipc_node_stop() anyway is about to delete both the timer and the node instance. Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance") Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27netns: enable to dump full nsid translation tableNicolas Dichtel1-6/+26
Like the previous patch, the goal is to ease to convert nsids from one netns to another netns. A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27netns: enable to specify a nsid for a get requestNicolas Dichtel1-0/+5
Combined with NETNSA_TARGET_NSID, it enables to "translate" a nsid from one netns to a nsid of another netns. This is useful when using NETLINK_F_LISTEN_ALL_NSID because it helps the user to interpret a nsid received from an other netns. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27netns: add support of NETNSA_TARGET_NSIDNicolas Dichtel1-11/+75
Like it was done for link and address, add the ability to perform get/dump in another netns by specifying a target nsid attribute. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27netns: introduce 'struct net_fill_args'Nicolas Dichtel1-14/+34
This is a preparatory work. To avoid having to much arguments for the function rtnl_net_fill(), a new structure is defined. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27netns: remove net arg from rtnl_net_fill()Nicolas Dichtel1-6/+4
This argument is not used anymore. Fixes: cab3c8ec8d57 ("netns: always provide the id to rtnl_net_fill()") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27sctp: increase sk_wmem_alloc when head->truesize is increasedXin Long1-0/+1
I changed to count sk_wmem_alloc by skb truesize instead of 1 to fix the sk_wmem_alloc leak caused by later truesize's change in xfrm in Commit 02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit"). But I should have also increased sk_wmem_alloc when head->truesize is increased in sctp_packet_gso_append() as xfrm does. Otherwise, sctp gso packet will cause sk_wmem_alloc underflow. Fixes: 02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27net: bridge: export supported booloptsNikolay Aleksandrov1-1/+1
Now that we have at least one bool option, we can export all of the supported bool options via optmask when dumping them. v2: new patch Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27net: bridge: add no_linklocal_learn bool optionNikolay Aleksandrov4-1/+31
Use the new boolopt API to add an option which disables learning from link-local packets. The default is kept as before and learning is enabled. This is a simple map from a boolopt bit to a bridge private flag that is tested before learning. v2: pass NULL for extack via sysfs Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27net: bridge: add support for user-controlled bool optionsNikolay Aleksandrov4-2/+96
We have been adding many new bridge options, a big number of which are boolean but still take up netlink attribute ids and waste space in the skb. Recently we discussed learning from link-local packets[1] and decided yet another new boolean option will be needed, thus introducing this API to save some bridge nl space. The API supports changing the value of multiple boolean options at once via the br_boolopt_multi struct which has an optmask (which options to set, bit per opt) and optval (options' new values). Future boolean options will only be added to the br_boolopt_id enum and then will have to be handled in br_boolopt_toggle/get. The API will automatically add the ability to change and export them via netlink, sysfs can use the single boolopt function versions to do the same. The behaviour with failing/succeeding is the same as with normal netlink option changing. If an option requires mapping to internal kernel flag or needs special configuration to be enabled then it should be handled in br_boolopt_toggle. It should also be able to retrieve an option's current state via br_boolopt_get. v2: WARN_ON() on unsupported option as that shouldn't be possible and also will help catch people who add new options without handling them for both set and get. Pass down extack so if an option desires it could set it on error and be more user-friendly. [1] https://www.spinics.net/lists/netdev/msg532698.html Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-27netfilter: nf_conncount: remove wrong condition check routineTaehee Yoo1-5/+2
All lists that reach the tree_nodes_free() function have both zero counter and true dead flag. The reason for this is that lists to be release are selected by nf_conncount_gc_list() which already decrements the list counter and sets on the dead flag. Therefore, this if statement in tree_nodes_free() is unnecessary and wrong. Fixes: 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in conn_free") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: nat: fix double register in masquerade modulesTaehee Yoo2-14/+32
There is a reference counter to ensure that masquerade modules register notifiers only once. However, the existing reference counter approach is not safe, test commands are: while : do modprobe ip6t_MASQUERADE & modprobe nft_masq_ipv6 & modprobe -rv ip6t_MASQUERADE & modprobe -rv nft_masq_ipv6 & done numbers below represent the reference counter. -------------------------------------------------------- CPU0 CPU1 CPU2 CPU3 CPU4 [insmod] [insmod] [rmmod] [rmmod] [insmod] -------------------------------------------------------- 0->1 register 1->2 returns 2->1 returns 1->0 0->1 register <-- unregister -------------------------------------------------------- The unregistation of CPU3 should be processed before the registration of CPU4. In order to fix this, use a mutex instead of reference counter. splat looks like: [ 323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381] [ 323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n] [ 323.869574] irq event stamp: 194074 [ 323.898930] hardirqs last enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c [ 323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c [ 323.898930] softirqs last enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b [ 323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0 [ 323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27 [ 323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240 [ 323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488 [ 323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000 [ 323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0 [ 323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001 [ 323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000 [ 323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0 [ 323.898930] FS: 00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 [ 323.898930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0 [ 323.898930] Call Trace: [ 323.898930] ? atomic_notifier_chain_register+0x2d0/0x2d0 [ 323.898930] ? down_read+0x150/0x150 [ 323.898930] ? sched_clock_cpu+0x126/0x170 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] register_netdevice_notifier+0xbb/0x790 [ 323.898930] ? __dev_close_many+0x2d0/0x2d0 [ 323.898930] ? __mutex_unlock_slowpath+0x17f/0x740 [ 323.898930] ? wait_for_completion+0x710/0x710 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? up_write+0x6c/0x210 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] nft_chain_filter_init+0x1e/0xe8a [nf_tables] [ 324.127073] nf_tables_module_init+0x37/0x92 [nf_tables] [ ... ] Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables") Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: add missing error handling code for register functionsTaehee Yoo7-20/+61
register_{netdevice/inetaddr/inet6addr}_notifier may return an error value, this patch adds the code to handle these error paths. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: ipv6: Preserve link scope traffic original oifAlin Nastac1-1/+2
When ip6_route_me_harder is invoked, it resets outgoing interface of: - link-local scoped packets sent by neighbor discovery - multicast packets sent by MLD host - multicast packets send by MLD proxy daemon that sets outgoing interface through IPV6_PKTINFO ipi6_ifindex Link-local and multicast packets must keep their original oif after ip6_route_me_harder is called. Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-22/+184
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-11-26 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Extend BTF to support function call types and improve the BPF symbol handling with this info for kallsyms and bpftool program dump to make debugging easier, from Martin and Yonghong. 2) Optimize LPM lookups by making longest_prefix_match() handle multiple bytes at a time, from Eric. 3) Adds support for loading and attaching flow dissector BPF progs from bpftool, from Stanislav. 4) Extend the sk_lookup() helper to be supported from XDP, from Nitin. 5) Enable verifier to support narrow context loads with offset > 0 to adapt to LLVM code generation (currently only offset of 0 was supported). Add test cases as well, from Andrey. 6) Simplify passing device functions for offloaded BPF progs by adding callbacks to bpf_prog_offload_ops instead of ndo_bpf. Also convert nfp and netdevsim to make use of them, from Quentin. 7) Add support for sock_ops based BPF programs to send events to the perf ring-buffer through perf_event_output helper, from Sowmini and Daniel. 8) Add read / write support for skb->tstamp from tc BPF and cg BPF programs to allow for supporting rate-limiting in EDT qdiscs like fq from BPF side, from Vlad. 9) Extend libbpf API to support map in map types and add test cases for it as well to BPF kselftests, from Nikita. 10) Account the maximum packet offset accessed by a BPF program in the verifier and use it for optimizing nfp JIT, from Jiong. 11) Fix error handling regarding kprobe_events in BPF sample loader, from Daniel T. 12) Add support for queue and stack map type in bpftool, from David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, tooFlorian Westphal2-14/+15
syzbot was able to trigger the WARN in cttimeout_default_get() by passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use same timeout values. Furthermore, also fetch GRE timeouts. GRE is a bit more complicated, as it still can be a module and its netns_proto_gre struct layout isn't visible outside of the gre module. Can't move timeouts around, it appears conntrack sysctl unregister assumes net_generic() returns nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead. A followup nf-next patch could make gre tracker be built-in as well if needed, its not that large. Last, make the WARN() mention the missing protocol value in case anything else is missing. Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-26ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notfXin Long1-0/+3
ip_vs_dst_event is supposed to clean up all dst used in ipvs' destinations when a net dev is going down. But it works only when the dst's dev is the same as the dev from the event. Now with the same priority but late registration, ip_vs_dst_notifier is always called later than ipv6_dev_notf where the dst's dev is set to lo for NETDEV_DOWN event. As the dst's dev lo is not the same as the dev from the event in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work. Also as these dst have to wait for dest_trash_timer to clean them up. It would cause some non-permanent kernel warnings: unregister_netdevice: waiting for br0 to become free. Usage count = 3 To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5. Note that for ipv4 route fib_netdev_notifier doesn't set dst's dev to lo in NETDEV_DOWN event, so this fix is only needed when IP_VS_IPV6 is defined. Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-3/+2
Daniel Borkmann says: ==================== pull-request: bpf 2018-11-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix an off-by-one bug when adjusting subprog start offsets after patching, from Edward. 2) Fix several bugs such as overflow in size allocation in queue / stack map creation, from Alexei. 3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp helper, from Andrey. 4) Fix several bugs in bpftool such as preventing an infinite loop in get_fdinfo, error handling and man page references, from Quentin. 5) Fix a warning in bpf_trace_printk() that wasn't catching an invalid format string, from Martynas. 6) Fix a bug in BPF cgroup local storage where non-atomic allocation was used in atomic context, from Roman. 7) Fix a NULL pointer dereference bug in bpftool from reallocarray() error handling, from Jakub and Wen. 8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools include infrastructure so that bpftool compiles on older RHEL7-like user space which does not ship these headers, from Yonghong. 9) Fix BPF kselftests for user space where to get ping test working with ping6 and ping -6, from Li. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25net: remove unsafe skb_insert()Eric Dumazet1-22/+0
I do not see how one can effectively use skb_insert() without holding some kind of lock. Otherwise other cpus could have changed the list right before we have a chance of acquiring list->lock. Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this one probably meant to use __skb_insert() since it appears nesqp->pau_list is protected by nesqp->pau_lock. This looks like nesqp->pau_lock could be removed, since nesqp->pau_list.lock could be used instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Faisal Latif <faisal.latif@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma <linux-rdma@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25net: bridge: remove redundant checks for null p->dev and p->brColin Ian King1-3/+0
A recent change added a null check on p->dev after p->dev was being dereferenced by the ns_capable check on p->dev. It turns out that neither the p->dev and p->br null checks are necessary, and can be removed, which cleans up a static analyis warning. As Nikolay Aleksandrov noted, these checks can be removed because: "My reasoning of why it shouldn't be possible: - On port add new_nbp() sets both p->dev and p->br before creating kobj/sysfs - On port del (trickier) del_nbp() calls kobject_del() before call_rcu() to destroy the port which in turn calls sysfs_remove_dir() which uses kernfs_remove() which deactivates (shouldn't be able to open new files) and calls kernfs_drain() to drain current open/mmaped files in the respective dir before continuing, thus making it impossible to open a bridge port sysfs file with p->dev and p->br equal to NULL. So I think it's safe to remove those checks altogether. It'd be nice to get a second look over my reasoning as I might be missing something in sysfs/kernfs call path." Thanks to Nikolay Aleksandrov's suggestion to remove the check and David Miller for sanity checking this. Detected by CoverityScan, CID#751490 ("Dereference before null check") Fixes: a5f3ea54f3cc ("net: bridge: add support for raw sysfs port options") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24net: always initialize pagedlenWillem de Bruijn2-2/+4
In ip packet generation, pagedlen is initialized for each skb at the start of the loop in __ip(6)_append_data, before label alloc_new_skb. Depending on compiler options, code can be generated that jumps to this label, triggering use of an an uninitialized variable. In practice, at -O2, the generated code moves the initialization below the label. But the code should not rely on that for correctness. Fixes: 15e36f5b8e98 ("udp: paged allocation with gso") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24tcp: address problems caused by EDT misshapsEric Dumazet2-10/+16
When a qdisc setup including pacing FQ is dismantled and recreated, some TCP packets are sent earlier than instructed by TCP stack. TCP can be fooled when ACK comes back, because the following operation can return a negative value. tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; Some paths in TCP stack were not dealing properly with this, this patch addresses four of them. Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller19-125/+197
2018-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds19-122/+193
Pull networking fixes from David Miller: 1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter. 2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann. 3) Fix socket wmem accounting in SCTP, from Xin Long. 4) Fix failed resume crash in ena driver, from Arthur Kiyanovski. 5) qed driver passes bytes instead of bits into second arg of bitmap_weight(). From Denis Bolotin. 6) Fix reset deadlock in ibmvnic, from Juliet Kim. 7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr Machata. 8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK compression during this period, from Eric Dumazet. 9) Add atomicity to SMC protocol cursor handling, from Ursula Braun. 10) Don't leave dangling error pointer if bpf_prog_add() fails in thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO headers, set sq->tso_hdrs to NULL. 11) Fix race condition over state variables in act_police, from Davide Caratti. 12) Disable guest csum in the presence of XDP in virtio_net, from Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits) net: gemini: Fix copy/paste error net: phy: mscc: fix deadlock in vsc85xx_default_config dt-bindings: dsa: Fix typo in "probed" net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue net: amd: add missing of_node_put() team: no need to do team_notify_peers or team_mcast_rejoin when disabling port virtio-net: fail XDP set if guest csum is negotiated virtio-net: disable guest csum during XDP set net/sched: act_police: add missing spinlock initialization net: don't keep lonely packets forever in the gro hash net/ipv6: re-do dad when interface has IFF_NOARP flag change packet: copy user buffers before orphan or clone ibmvnic: Update driver queues after change in ring size support ibmvnic: Fix RX queue buffer cleanup net: thunderx: set xdp_prog to NULL if bpf_prog_add fails net/dim: Update DIM start sample after each DIM iteration net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts net/smc: use after free fix in smc_wr_tx_put_slot() net/smc: atomic SMCD cursor handling net/smc: add SMC-D shutdown signal ...
2018-11-23rocker, dsa, ethsw: Don't filter VLAN events on bridge itselfPetr Machata1-3/+0
Due to an explicit check in rocker_world_port_obj_vlan_add(), dsa_slave_switchdev_event() resp. port_switchdev_event(), VLAN objects that are added to a device that is not a front-panel port device are ignored. Therefore this check is immaterial. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: Replace port obj add/del SDO with a notificationPetr Machata2-44/+25
Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of this field from all clients, which were migrated to use switchdev notification in the previous patches. Add a new function switchdev_port_obj_notify() that sends the switchdev notifications SWITCHDEV_PORT_OBJ_ADD and _DEL. Update switchdev_port_obj_del_now() to dispatch to this new function. Drop __switchdev_port_obj_add() and update switchdev_port_obj_add() likewise. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: Add helpers to aid traversal through lower devicesPetr Machata1-0/+100
After the transition from switchdev operations to notifier chain (which will take place in following patches), the onus is on the driver to find its own devices below possible layer of LAG or other uppers. The logic to do so is fairly repetitive: each driver is looking for its own devices among the lowers of the notified device. For those that it finds, it calls a handler. To indicate that the event was handled, struct switchdev_notifier_port_obj_info.handled is set. The differences lie only in what constitutes an "own" device and what handler to call. Therefore abstract this logic into two helpers, switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If a driver only supports physical ports under a bridge device, it will simply avoid this layer of indirection. One area where this helper diverges from the current switchdev behavior is the case of mixed lowers, some of which are switchdev ports and some of which are not. Previously, such scenario would fail with -EOPNOTSUPP. The helper could do that for lowers for which the passed-in predicate doesn't hold. That would however break the case that switchdev ports from several different drivers are stashed under one master, a scenario that switchdev currently happily supports. Therefore tolerate any and all unknown netdevices, whether they are backed by a switchdev driver or not. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: dsa: slave: Handle SWITCHDEV_PORT_OBJ_ADD/_DELPetr Machata1-0/+56
Following patches will change the way of distributing port object changes from a switchdev operation to a switchdev notifier. The switchdev code currently recursively descends through layers of lower devices, eventually calling the op on a front-panel port device. The notifier will instead be sent referencing the bridge port device, which may be a stacking device that's one of front-panel ports uppers, or a completely unrelated device. DSA currently doesn't support any other uppers than bridge. SWITCHDEV_OBJ_ID_HOST_MDB and _PORT_MDB objects are always notified on the bridge port device. Thus the only case that a stacked device could be validly referenced by port object notifications are bridge notifications for VLAN objects added to the bridge itself. But the driver explicitly rejects such notifications in dsa_port_vlan_add(). It is therefore safe to assume that the only interesting case is that the notification is on a front-panel port netdevice. Therefore keep the filtering by dsa_slave_dev_check() in place. To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking notifier chain. Dispatch to rocker_port_obj_add() resp. _del() to maintain the behavior that the switchdev operation based code currently has. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23switchdev: Add a blocking notifier chainPetr Machata1-0/+26
In general one can't assume that a switchdev notifier is called in a non-atomic context, and correspondingly, the switchdev notifier chain is an atomic one. However, port object addition and deletion messages are delivered from a process context. Even the MDB addition messages, whose delivery is scheduled from atomic context, are queued and the delivery itself takes place in blocking context. For VLAN messages in particular, keeping the blocking nature is important for error reporting. Therefore introduce a blocking notifier chain and related service functions to distribute the notifications for which a blocking context can be assumed. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: unregister rkeys of unused bufferKarsten Graul3-13/+18
When an rmb is no longer in use by a connection, unregister its rkey at the remote peer with an LLC DELETE RKEY message. With this change, unused buffers held in the buffer pool are no longer registered at the remote peer. They are registered before the buffer is actually used and unregistered when they are no longer used by a connection. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: add infrastructure to send delete rkey messagesKarsten Graul3-1/+57
Add the infrastructure to send LLC messages of type DELETE RKEY to unregister a shared memory region at the peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: avoid a delay by waiting for nothingKarsten Graul1-1/+3
When a send failed then don't start to wait for a response in smc_llc_do_confirm_rkey. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: cleanup listen worker mutex unlockingUrsula Braun1-2/+3
For easier reading move the unlock of mutex smc_create_lgr_pending into smc_listen_work(), i.e. into the function the mutex has been locked. No functional change. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: short wait for late smc_clc_wait_msgUrsula Braun3-11/+13
After sending one of the initial LLC messages CONFIRM LINK or ADD LINK, there is already a wait for the LLC response. It does not make sense to wait another long time for a CLC DECLINE. Thus this patch introduces a shorter wait time for these cases. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: no link delete for a never active linkUrsula Braun1-1/+4
If a link is terminated that has never reached the active state, there is no need to trigger an LLC DELETE LINK. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: allow fallback after clc timeoutsUrsula Braun2-6/+11
If connection initialization fails for the LLC CONFIRM LINK or the LLC ADD LINK step, fallback to TCP should be enabled. Thus the negative return code -EAGAIN should switch to a positive timeout reason code in these cases, and the internal CLC socket should not have a set sk_err. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: remove sock_error detour in clc-functionsUrsula Braun1-13/+5
There is no need to store the return value in sk_err, if it is afterwards cleared again with sock_error(). This patch sets the return value directly. Just cleanup, no functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: make smc_lgr_free() staticUrsula Braun2-2/+3
smc_lgr_free() is just called inside smc_core.c. Make it static. Just cleanup, no functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/smc: cleanup tcp_listen_worker initializationUrsula Braun1-1/+0
The tcp_listen_worker is already initialized when socket is created (in smc_sock_alloc()). Get rid of the duplicate initialization in smc_listen(). No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net-gro: use ffs() to speedup napi_gro_flush()Eric Dumazet1-4/+6
We very often have few flows/chains to look at, and we might increase GRO_HASH_BUCKETS to 32 or 64 in the future. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23Merge tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds1-3/+9
Pullk ceph fix from Ilya Dryomov: "A messenger fix, marked for stable" * tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client: libceph: fall back to sendmsg for slab pages
2018-11-23net/sched: act_police: add missing spinlock initializationDavide Caratti1-0/+1
commit f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") introduces a new spinlock, but forgets its initialization. Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police' action is newly created, to avoid the following lockdep splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. <...> Call Trace: dump_stack+0x85/0xcb register_lock_class+0x581/0x590 __lock_acquire+0xd4/0x1330 ? tcf_police_init+0x2fa/0x650 [act_police] ? lock_acquire+0x9e/0x1a0 lock_acquire+0x9e/0x1a0 ? tcf_police_init+0x2fa/0x650 [act_police] ? tcf_police_init+0x55a/0x650 [act_police] _raw_spin_lock_bh+0x34/0x40 ? tcf_police_init+0x2fa/0x650 [act_police] tcf_police_init+0x2fa/0x650 [act_police] tcf_action_init_1+0x384/0x4c0 tcf_action_init+0xf6/0x160 tcf_action_add+0x73/0x170 tc_ctl_action+0x122/0x160 rtnetlink_rcv_msg+0x2a4/0x490 ? netlink_deliver_tap+0x99/0x400 ? validate_linkmsg+0x370/0x370 netlink_rcv_skb+0x4d/0x130 netlink_unicast+0x196/0x230 netlink_sendmsg+0x2e5/0x3e0 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x280/0x2f0 ? _raw_spin_unlock+0x24/0x30 ? handle_pte_fault+0xafe/0xf30 ? find_held_lock+0x2d/0x90 ? syscall_trace_enter+0x1df/0x360 ? __sys_sendmsg+0x5e/0xa0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f1841c7cf10 Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24 RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10 RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003 RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0 R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000 Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net: don't keep lonely packets forever in the gro hashPaolo Abeni1-2/+5
Eric noted that with UDP GRO and NAPI timeout, we could keep a single UDP packet inside the GRO hash forever, if the related NAPI instance calls napi_gro_complete() at an higher frequency than the NAPI timeout. Willem noted that even TCP packets could be trapped there, till the next retransmission. This patch tries to address the issue, flushing the old packets - those with a NAPI_GRO_CB age before the current jiffy - before scheduling the NAPI timeout. The rationale is that such a timeout should be well below a jiffy and we are not flushing packets eligible for sane GRO. v1 -> v2: - clarified the commit message and comment RFC -> v1: - added 'Fixes tags', cleaned-up the wording. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 3b47d30396ba ("net: gro: add a per device gro flush timer") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23net/ipv6: re-do dad when interface has IFF_NOARP flag changeHangbin Liu1-6/+13
When we add a new IPv6 address, we should also join corresponding solicited-node multicast address, unless the interface has IFF_NOARP flag, as function addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do not do dad and add the mcast address. So we will drop corresponding neighbour discovery message that came from other nodes. A typical example is after creating a ipvlan with mode l3, setting up an ipv6 address and changing the mode to l2. Then we will not be able to ping this address as the interface doesn't join related solicited-node mcast address. Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add corresponding mcast group and check if there is a duplicate address on the network. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>