aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-09-02net: dsa: Fix off-by-one number of calls to devlink_port_unregisterVladimir Oltean1-10/+29
When a function such as dsa_slave_create fails, currently the following stack trace can be seen: [ 2.038342] sja1105 spi0.1: Probed switch chip: SJA1105T [ 2.054556] sja1105 spi0.1: Reset switch and programmed static config [ 2.063837] sja1105 spi0.1: Enabled switch tagging [ 2.068706] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy [ 2.076371] ------------[ cut here ]------------ [ 2.080973] WARNING: CPU: 1 PID: 21 at net/core/devlink.c:6184 devlink_free+0x1b4/0x1c0 [ 2.088954] Modules linked in: [ 2.092005] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.3.0-rc6-01360-g41b52e38d2b6-dirty #1746 [ 2.100912] Hardware name: Freescale LS1021A [ 2.105162] Workqueue: events deferred_probe_work_func [ 2.110287] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14) [ 2.117992] [<c030d8cc>] (show_stack) from [<c10b08d8>] (dump_stack+0xb4/0xc8) [ 2.125180] [<c10b08d8>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8) [ 2.132018] [<c0349d04>] (__warn) from [<c0349e34>] (warn_slowpath_null+0x40/0x48) [ 2.139549] [<c0349e34>] (warn_slowpath_null) from [<c0f19d74>] (devlink_free+0x1b4/0x1c0) [ 2.147772] [<c0f19d74>] (devlink_free) from [<c1064fc0>] (dsa_switch_teardown+0x60/0x6c) [ 2.155907] [<c1064fc0>] (dsa_switch_teardown) from [<c1065950>] (dsa_register_switch+0x8e4/0xaa8) [ 2.164821] [<c1065950>] (dsa_register_switch) from [<c0ba7fe4>] (sja1105_probe+0x21c/0x2ec) [ 2.173216] [<c0ba7fe4>] (sja1105_probe) from [<c0b35948>] (spi_drv_probe+0x80/0xa4) [ 2.180920] [<c0b35948>] (spi_drv_probe) from [<c0a4c1cc>] (really_probe+0x108/0x400) [ 2.188711] [<c0a4c1cc>] (really_probe) from [<c0a4c694>] (driver_probe_device+0x78/0x1bc) [ 2.196933] [<c0a4c694>] (driver_probe_device) from [<c0a4a3dc>] (bus_for_each_drv+0x58/0xb8) [ 2.205414] [<c0a4a3dc>] (bus_for_each_drv) from [<c0a4c024>] (__device_attach+0xd0/0x168) [ 2.213637] [<c0a4c024>] (__device_attach) from [<c0a4b1d0>] (bus_probe_device+0x84/0x8c) [ 2.221772] [<c0a4b1d0>] (bus_probe_device) from [<c0a4b72c>] (deferred_probe_work_func+0x84/0xc4) [ 2.230686] [<c0a4b72c>] (deferred_probe_work_func) from [<c03650a4>] (process_one_work+0x218/0x510) [ 2.239772] [<c03650a4>] (process_one_work) from [<c03660d8>] (worker_thread+0x2a8/0x5c0) [ 2.247908] [<c03660d8>] (worker_thread) from [<c036b348>] (kthread+0x148/0x150) [ 2.255265] [<c036b348>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c) [ 2.262444] Exception stack(0xea965fb0 to 0xea965ff8) [ 2.267466] 5fa0: 00000000 00000000 00000000 00000000 [ 2.275598] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.283729] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 2.290333] ---[ end trace ca5d506728a0581a ]--- devlink_free is complaining right here: WARN_ON(!list_empty(&devlink->port_list)); This happens because devlink_port_unregister is no longer done right away in dsa_port_setup when a DSA_PORT_TYPE_USER has failed. Vivien said about this change that: Also no need to call devlink_port_unregister from within dsa_port_setup as this step is inconditionally handled by dsa_port_teardown on error. which is not really true. The devlink_port_unregister function _is_ being called unconditionally from within dsa_port_setup, but not for this port that just failed, just for the previous ones which were set up. ports_teardown: for (i = 0; i < port; i++) dsa_port_teardown(&ds->ports[i]); Initially I was tempted to fix this by extending the "for" loop to also cover the port that failed during setup. But this could have potentially unforeseen consequences unrelated to devlink_port or even other types of ports than user ports, which I can't really test for. For example, if for some reason devlink_port_register itself would fail, then unconditionally unregistering it in dsa_port_teardown would not be a smart idea. The list might go on. So just make dsa_port_setup undo the setup it had done upon failure, and let the for loop undo the work of setting up the previous ports, which are guaranteed to be brought up to a consistent state. Fixes: 955222ca5281 ("net: dsa: use a single switch statement for port setup") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller58-404/+588
r8152 conflicts are the NAPI fixes in 'net' overlapping with some tasklet stuff in net-next Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds57-402/+584
Pull networking fixes from David Miller: 1) Fix some length checks during OGM processing in batman-adv, from Sven Eckelmann. 2) Fix regression that caused netfilter conntrack sysctls to not be per-netns any more. From Florian Westphal. 3) Use after free in netpoll, from Feng Sun. 4) Guard destruction of pfifo_fast per-cpu qdisc stats with qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed in pfifo_fast_enqueue(). 5) Fix memory leak in mld_del_delrec(), from Eric Dumazet. 6) Handle neigh events on internal ports correctly in nfp, from John Hurley. 7) Clear SKB timestamp in NF flow table code so that it does not confuse fq scheduler. From Florian Westphal. 8) taprio destroy can crash if it is invoked in a failure path of taprio_init(), because the list head isn't setup properly yet and the list del is unconditional. Perform the list add earlier to address this. From Vladimir Oltean. 9) Make sure to reapply vlan filters on device up, in aquantia driver. From Dmitry Bogdanov. 10) sgiseeq driver releases DMA memory using free_page() instead of dma_free_attrs(). From Christophe JAILLET. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) net: seeq: Fix the function used to release some memory in an error handling path enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions net: bcmgenet: use ethtool_op_get_ts_info() tc-testing: don't hardcode 'ip' in nsPlugin.py net: dsa: microchip: add KSZ8563 compatibility string dt-bindings: net: dsa: document additional Microchip KSZ8563 switch net: aquantia: fix out of memory condition on rx side net: aquantia: linkstate irq should be oneshot net: aquantia: reapply vlan filters on up net: aquantia: fix limit of vlan filters net: aquantia: fix removal of vlan 0 net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte taprio: Fix kernel panic in taprio_destroy net: dsa: microchip: fill regmap_config name rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2] net: stmmac: dwmac-rk: Don't fail if phy regulator is absent amd-xgbe: Fix error path in xgbe_mod_init() netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder mac80211: Correctly set noencrypt for PAE frames ...
2019-09-01netlabel: remove redundant assignment to pointer iterColin Ian King1-1/+1
Pointer iter is being initialized with a value that is never read and is being re-assigned a little later on. The assignment is redundant and hence can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net/ncsi: add response handlers for PLDM over NC-SIBen Wei2-2/+14
This patch adds handlers for PLDM over NC-SI command response. This enables NC-SI driver recognizes the packet type so the responses don't get dropped as unknown packet type. PLDM over NC-SI are not handled in kernel driver for now, but can be passed back to user space via Netlink for further handling. Signed-off-by: Ben Wei <benwei@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31devlink: Use switch-case instead of if-elseParav Pandit1-17/+22
Make core more readable with switch-case for various port flavours. Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31devlink: Make port index data type as unsigned intParav Pandit1-2/+3
Devlink port index attribute is returned to users as u32 through netlink response. Change index data type from 'unsigned' to 'unsigned int' to avoid below checkpatch.pl warning. WARNING: Prefer 'unsigned int' to bare use of 'unsigned' 81: FILE: include/net/devlink.h:81: + unsigned index; Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diagDavide Caratti1-0/+64
When an application configures kernel TLS on top of a TCP socket, it's now possible for inet_diag_handler() to collect information regarding the protocol version, the cipher type and TX / RX configuration, in case INET_DIAG_INFO is requested. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31tcp: ulp: add functions to dump ulp-specific informationDavide Caratti1-1/+51
currently, only getsockopt(TCP_ULP) can be invoked to know if a ULP is on top of a TCP socket. Extend idiag_get_aux() and idiag_get_aux_size(), introduced by commit b37e88407c1d ("inet_diag: allow protocols to provide additional data"), to report the ULP name and other information that can be made available by the ULP through optional functions. Users having CAP_NET_ADMIN privileges will then be able to retrieve this information through inet_diag_handler, if they specify INET_DIAG_INFO in the request. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net/tls: use RCU protection on icsk->icsk_ulp_dataJakub Kicinski3-9/+21
We need to make sure context does not get freed while diag code is interrogating it. Free struct tls_context with kfree_rcu(). We add the __rcu annotation directly in icsk, and cast it away in the datapath accessor. Presumably all ULPs will do a similar thing. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rateVladimir Oltean1-8/+11
The discussion to be made is absolutely the same as in the case of previous patch ("taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte"). Nothing is lost when setting a default. Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byteVladimir Oltean1-10/+13
The taprio budget needs to be adapted at runtime according to interface link speed. But that handling is problematic. For one thing, installing a qdisc on an interface that doesn't have carrier is not illegal. But taprio prints the following stack trace: [ 31.851373] ------------[ cut here ]------------ [ 31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4 [ 31.864566] taprio: dequeue() called with unknown picos per byte. [ 31.864570] Modules linked in: [ 31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689 [ 31.881398] Hardware name: Freescale LS1021A [ 31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14) [ 31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8) [ 31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8) [ 31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c) [ 31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4) [ 31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c) [ 31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc) [ 31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8) [ 31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8) [ 31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4) [ 31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c) [ 31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90) [ 31.977076] Exception stack(0xe8167b20 to 0xe8167b68) [ 31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960 [ 31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94 [ 31.998363] 7b60: 60070013 ffffffff [ 32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8) [ 32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414) [ 32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28) [ 32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88) [ 32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44) [ 32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470) [ 32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724) [ 32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec) [ 32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110) [ 32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c) [ 32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380) [ 32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24) [ 32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228) [ 32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c) [ 32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54) [ 32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0) [ 32.128960] 7fa0: b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000 [ 32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c [ 32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64 [ 32.150659] ---[ end trace 2139c9827c3e5177 ]--- This happens because the qdisc ->dequeue callback gets called. Which again is not illegal, the qdisc will dequeue even when the interface is up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames will be dropped further down the stack in dev_direct_xmit(). And, at the end of the day, for what? For calculating the initial budget of an interface which is non-operational at the moment and where frames will get dropped anyway. So if we can't figure out the link speed, default to SPEED_10 and move along. We can also remove the runtime check now. Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31taprio: Fix kernel panic in taprio_destroyVladimir Oltean1-4/+4
taprio_init may fail earlier than this line: list_add(&q->taprio_list, &taprio_list); i.e. due to the net device not being multi queue. Attempting to remove q from the global taprio_list when it is not part of it will result in a kernel panic. Fix it by matching list_add and list_del better to one another in the order of operations. This way we can keep the deletion unconditional and with lower complexity - O(1). Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filteringVladimir Oltean1-20/+82
The bridge core assumes that enabling/disabling vlan_filtering will translate into the simple toggling of a flag for switchdev drivers. That is clearly not the case for sja1105, which alters the VLAN table and the pvids in order to obtain port separation in standalone mode. There are 2 parts to the issue. First, tag_8021q changes the pvid to a unique per-port rx_vid for frame identification. But we need to disable tag_8021q when vlan_filtering kicks in, and at that point, the VLAN configured as pvid will have to be removed from the filtering table of the ports. With an invalid pvid, the ports will drop all traffic. Since the bridge will not call any vlan operation through switchdev after enabling vlan_filtering, we need to ensure we're in a functional state ourselves. Hence read the pvid that the bridge is aware of, and program that into our ports. Secondly, tag_8021q uses the 1024-3071 range privately in vlan_filtering=0 mode. Had the user installed one of these VLANs during a previous vlan_filtering=1 session, then upon the next tag_8021q cleanup for vlan_filtering to kick in again, VLANs in that range will get deleted unconditionally, hence breaking user expectation. So when deleting the VLANs, check if the bridge had knowledge about them, and if it did, re-apply the settings. Wrap this logic inside a dsa_8021q_vid_apply helper function to reduce code duplication. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net: bridge: Populate the pvid flag in br_vlan_get_infoVladimir Oltean1-0/+2
Currently this simplified code snippet fails: br_vlan_get_pvid(netdev, &pvid); br_vlan_get_info(netdev, pvid, &vinfo); ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID)); It is intuitive that the pvid of a netdevice should have the BRIDGE_VLAN_INFO_PVID flag set. However I can't seem to pinpoint a commit where this behavior was introduced. It seems like it's been like that since forever. At a first glance it would make more sense to just handle the BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay explains: There are a few reasons why we don't do it, most importantly because we need to have only one visible pvid at any single time, even if it's stale - it must be just one. Right now that rule will not be violated by this change, but people will try using this flag and could see two pvids simultaneously. You can see that the pvid code is even using memory barriers to propagate the new value faster and everywhere the pvid is read only once. That is the reason the flag is set dynamically when dumping entries, too. A second (weaker) argument against would be given the above we don't want another way to do the same thing, specifically if it can provide us with two pvids (e.g. if walking the vlan list) or if it can provide us with a pvid different from the one set in the vg. [Obviously, I'm talking about RCU pvid/vlan use cases similar to the dumps. The locked cases are fine. I would like to avoid explaining why this shouldn't be relied upon without locking] So instead of introducing the above change and making sure of the pvid uniqueness under RCU, simply dynamically populate the pvid flag in br_vlan_get_info(). Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31Merge tag 'batadv-net-for-davem-20190830' of git://git.open-mesh.org/linux-mergeDavid S. Miller2-13/+25
Simon Wunderlich says: ==================== Here are two batman-adv bugfixes: - Fix OGM and OGMv2 header read boundary check, by Sven Eckelmann (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-7/+11
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Spurious warning when loading rules using the physdev match, from Todd Seidelmann. 2) Fix FTP conntrack helper debugging output, from Thomas Jarosch. 3) Restore per-netns nf_conntrack_{acct,helper,timeout} sysctl knobs, from Florian Westphal. 4) Clear skbuff timestamp from the flowtable datapath, also from Florian. 5) Fix incorrect byteorder of NFT_META_BRI_IIFVPROTO, from wenxu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30net: sched: cls_matchall: cleanup flow_action before deallocatingVlad Buslov1-0/+2
Recent rtnl lock removal patch changed flow_action infra to require proper cleanup besides simple memory deallocation. However, matchall classifier was not updated to call tc_cleanup_flow_action(). Add proper cleanup to mall_replace_hw_filter() and mall_reoffload(). Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads") Reported-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30tcp_bbr: clarify that bbr_bdp() rounds up in commentsLuke Hsiao1-2/+4
This explicitly clarifies that bbr_bdp() returns the rounded-up value of the bandwidth-delay product and why in the comments. Signed-off-by: Luke Hsiao <lukehsiao@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30sched: act_vlan: implement stats_update callbackJiri Pirko1-0/+14
Implement this callback in order to get the offloaded stats added to the kernel stats. Reported-by: Pengfei Liu <pengfeil@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]David Howells5-5/+50
When a local endpoint is ceases to be in use, such as when the kafs module is unloaded, the kernel will emit an assertion failure if there are any outstanding client connections: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:433! and even beyond that, will evince other oopses if there are service connections still present. Fix this by: (1) Removing the triggering of connection reaping when an rxrpc socket is released. These don't actually clean up the connections anyway - and further, the local endpoint may still be in use through another socket. (2) Mark the local endpoint as dead when we start the process of tearing it down. (3) When destroying a local endpoint, strip all of its client connections from the idle list and discard the ref on each that the list was holding. (4) When destroying a local endpoint, call the service connection reaper directly (rather than through a workqueue) to immediately kill off all outstanding service connections. (5) Make the service connection reaper reap connections for which the local endpoint is marked dead. Only after destroying the connections can we close the socket lest we get an oops in a workqueue that's looking at a connection or a peer. Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30Merge tag 'rxrpc-fixes-20190827' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsDavid S. Miller13-237/+291
David Howells says: ==================== rxrpc: Fix use of skb_cow_data() Here's a series of patches that replaces the use of skb_cow_data() in rxrpc with skb_unshare() early on in the input process. The problem that is being seen is that skb_cow_data() indirectly requires that the maximum usage count on an sk_buff be 1, and it may generate an assertion failure in pskb_expand_head() if not. This can occur because rxrpc_input_data() may be still holding a ref when it has just attached the sk_buff to the rx ring and given that attachment its own ref. If recvmsg happens fast enough, skb_cow_data() can see the ref still held by the softirq handler. Further, a packet may contain multiple subpackets, each of which gets its own attachment to the ring and its own ref - also making skb_cow_data() go bang. Fix this by: (1) The DATA packet is currently parsed for subpackets twice by the input routines. Parse it just once instead and make notes in the sk_buff private data. (2) Use the notes from (1) when attaching the packet to the ring multiple times. Once the packet is attached to the ring, recvmsg can see it and start modifying it, so the softirq handler is not permitted to look inside it from that point. (3) Pass the ref from the input code to the ring rather than getting an extra ref. rxrpc_input_data() uses a ref on the second refcount to prevent the packet from evaporating under it. (4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet() before we take call->input_lock. Other sorts of packets don't get modified and so can be left. A trace is emitted if skb_unshare() eats the skb. Note that skb_share() for our accounting in this regard as we can't see the parameters in the packet to log in a trace line if it releases it. (5) Remove the calls to skb_cow_data(). These are then no longer necessary. There are also patches to improve the rxrpc_skb tracepoint to make sure that Tx-derived buffers are identified separately from Rx-derived buffers in the trace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30Merge tag 'ceph-for-5.3-rc7' of git://github.com/ceph/ceph-clientLinus Torvalds1-2/+4
Pull two ceph fixes from Ilya Dryomov: "A fix for a -rc1 regression in rbd and a trivial static checker fix" * tag 'ceph-for-5.3-rc7' of git://github.com/ceph/ceph-client: rbd: restore zeroing past the overlap when reading from parent libceph: don't call crypto_free_sync_skcipher() on a NULL tfm
2019-08-30netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorderwenxu1-1/+1
Get the vlan_proto of ingress bridge in network byteorder as userspace expects. Otherwise this is inconsistent with NFT_META_PROTOCOL. Fixes: 2a3a93ef0ba5 ("netfilter: nft_meta_bridge: Add NFT_META_BRI_IIFVPROTO support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-29mac80211: Correctly set noencrypt for PAE framesDenis Kenzior1-1/+1
The noencrypt flag was intended to be set if the "frame was received unencrypted" according to include/uapi/linux/nl80211.h. However, the current behavior is opposite of this. Cc: stable@vger.kernel.org Fixes: 018f6fbf540d ("mac80211: Send control port frames over nl80211") Signed-off-by: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190827224120.14545-3-denkenz@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-29mac80211: Don't memset RXCB prior to PAE interceptDenis Kenzior1-2/+2
In ieee80211_deliver_skb_to_local_stack intercepts EAPoL frames if mac80211 is configured to do so and forwards the contents over nl80211. During this process some additional data is also forwarded, including whether the frame was received encrypted or not. Unfortunately just prior to the call to ieee80211_deliver_skb_to_local_stack, skb->cb is cleared, resulting in incorrect data being exposed over nl80211. Fixes: 018f6fbf540d ("mac80211: Send control port frames over nl80211") Cc: stable@vger.kernel.org Signed-off-by: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190827224120.14545-2-denkenz@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-29netfilter: nf_flow_table: clear skb tstamp before xmitFlorian Westphal1-1/+2
If 'fq' qdisc is used and a program has requested timestamps, skb->tstamp needs to be cleared, else fq will treat these as 'transmit time'. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-28net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueueDavide Caratti1-2/+6
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that per-cpu counters are there in the error path of skb_array_produce(). Otherwise, the following splat can be seen: Unable to handle kernel paging request at virtual address 0000600dea430008 Mem abort info: ESR = 0x96000005 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e [0000600dea430008] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP [...] pstate: 10000005 (nzcV daif -PAN -UAO) pc : pfifo_fast_enqueue+0x524/0x6e8 lr : pfifo_fast_enqueue+0x46c/0x6e8 sp : ffff800d39376fe0 x29: ffff800d39376fe0 x28: 1ffff001a07d1e40 x27: ffff800d03e8f188 x26: ffff800d03e8f200 x25: 0000000000000062 x24: ffff800d393772f0 x23: 0000000000000000 x22: 0000000000000403 x21: ffff800cca569a00 x20: ffff800d03e8ee00 x19: ffff800cca569a10 x18: 00000000000000bf x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: ffff1001a726edd0 x13: 1fffe4000276a9a4 x12: 0000000000000000 x11: dfff200000000000 x10: ffff800d03e8f1a0 x9 : 0000000000000003 x8 : 0000000000000000 x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003 x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb x1 : 0000600dea430000 x0 : 0000600dea430008 Process ping (pid: 6067, stack limit = 0x00000000dc0aa557) Call trace: pfifo_fast_enqueue+0x524/0x6e8 htb_enqueue+0x660/0x10e0 [sch_htb] __dev_queue_xmit+0x123c/0x2de0 dev_queue_xmit+0x24/0x30 ip_finish_output2+0xc48/0x1720 ip_finish_output+0x548/0x9d8 ip_output+0x334/0x788 ip_local_out+0x90/0x138 ip_send_skb+0x44/0x1d0 ip_push_pending_frames+0x5c/0x78 raw_sendmsg+0xed8/0x28d0 inet_sendmsg+0xc4/0x5c0 sock_sendmsg+0xac/0x108 __sys_sendto+0x1ac/0x2a0 __arm64_sys_sendto+0xc4/0x138 el0_svc_handler+0x13c/0x298 el0_svc+0x8/0xc Code: f9402e80 d538d081 91002000 8b010000 (885f7c03) Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'. Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni <pabeni@redhat.com> CC: Stefano Brivio <sbrivio@redhat.com> Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28tcp: inherit timestamp on mtu probeWillem de Bruijn1-1/+2
TCP associates tx timestamp requests with a byte in the bytestream. If merging skbs in tcp_mtu_probe, migrate the tstamp request. Similar to MSG_EOR, do not allow moving a timestamp from any segment in the probe but the last. This to avoid merging multiple timestamps. Tested with the packetdrill script at https://github.com/wdebruij/packetdrill/commits/mtu_probe-1 Link: http://patchwork.ozlabs.org/patch/1143278/#2232897 Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28net: sched: act_sample: fix psample group handling on overwriteVlad Buslov2-2/+6
Action sample doesn't properly handle psample_group pointer in overwrite case. Following issues need to be fixed: - In tcf_sample_init() function RCU_INIT_POINTER() is used to set s->psample_group, even though we neither setting the pointer to NULL, nor preventing concurrent readers from accessing the pointer in some way. Use rcu_swap_protected() instead to safely reset the pointer. - Old value of s->psample_group is not released or deallocated in any way, which results resource leak. Use psample_group_put() on non-NULL value obtained with rcu_swap_protected(). - The function psample_group_put() that released reference to struct psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu grace period when deallocating it. Extend struct psample_group with rcu head and use kfree_rcu when freeing it. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28openvswitch: Clear the L4 portion of the key for "later" fragments.Justin Pettit1-1/+4
Only the first fragment in a datagram contains the L4 headers. When the Open vSwitch module parses a packet, it always sets the IP protocol field in the key, but can only set the L4 fields on the first fragment. The original behavior would not clear the L4 portion of the key, so garbage values would be sent in the key for "later" fragments. This patch clears the L4 fields in that circumstance to prevent sending those garbage values as part of the upcall. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28openvswitch: Properly set L4 keys on "later" IP fragmentsGreg Rose3-66/+95
When IP fragments are reassembled before being sent to conntrack, the key from the last fragment is used. Unless there are reordering issues, the last fragment received will not contain the L4 ports, so the key for the reassembled datagram won't contain them. This patch updates the key once we have a reassembled datagram. The handle_fragments() function works on L3 headers so we pull the L3/L4 flow key update code from key_extract into a new function 'key_extract_l3l4'. Then we add a another new function ovs_flow_key_update_l3l4() and export it so that it is accessible by handle_fragments() for conntrack packet reassembly. Co-authored-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28mld: fix memory leak in mld_del_delrec()Eric Dumazet1-2/+3
Similar to the fix done for IPv4 in commit e5b1c6c6277d ("igmp: fix memory leak in igmpv3_del_delrec()"), we need to make sure mca_tomb and mca_sources are not blindly overwritten. Using swap() then a call to ip6_mc_clear_src() will take care of the missing free. BUG: memory leak unreferenced object 0xffff888117d9db00 (size 64): comm "syz-executor247", pid 6918, jiffies 4294943989 (age 25.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 fe 88 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000005b463030>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<000000005b463030>] slab_post_alloc_hook mm/slab.h:522 [inline] [<000000005b463030>] slab_alloc mm/slab.c:3319 [inline] [<000000005b463030>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548 [<00000000939cbf94>] kmalloc include/linux/slab.h:552 [inline] [<00000000939cbf94>] kzalloc include/linux/slab.h:748 [inline] [<00000000939cbf94>] ip6_mc_add1_src net/ipv6/mcast.c:2236 [inline] [<00000000939cbf94>] ip6_mc_add_src+0x31f/0x420 net/ipv6/mcast.c:2356 [<00000000d8972221>] ip6_mc_source+0x4a8/0x600 net/ipv6/mcast.c:449 [<000000002b203d0d>] do_ipv6_setsockopt.isra.0+0x1b92/0x1dd0 net/ipv6/ipv6_sockglue.c:748 [<000000001f1e2d54>] ipv6_setsockopt+0x89/0xd0 net/ipv6/ipv6_sockglue.c:944 [<00000000c8f7bdf9>] udpv6_setsockopt+0x4e/0x90 net/ipv6/udp.c:1558 [<000000005a9a0c5e>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3139 [<00000000910b37b2>] __sys_setsockopt+0x10f/0x220 net/socket.c:2084 [<00000000e9108023>] __do_sys_setsockopt net/socket.c:2100 [inline] [<00000000e9108023>] __se_sys_setsockopt net/socket.c:2097 [inline] [<00000000e9108023>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2097 [<00000000f4818160>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296 [<000000008d367e8f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Fixes: 9c8bb163ae78 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28net/sched: pfifo_fast: fix wrong dereference when qdisc is resetDavide Caratti1-4/+7
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we need to be sure that per-cpu counters are present when 'reset()' is called for pfifo_fast qdiscs. Otherwise, the following script: # tc q a dev lo handle 1: root htb default 100 # tc c a dev lo parent 1: classid 1:100 htb \ > rate 95Mbit ceil 100Mbit burst 64k [...] # tc f a dev lo parent 1: protocol arp basic classid 1:100 [...] # tc q a dev lo parent 1:100 handle 100: pfifo_fast [...] # tc q d dev lo root can generate the following splat: Unable to handle kernel paging request at virtual address dfff2c01bd148000 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [dfff2c01bd148000] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP [...] pstate: 80000005 (Nzcv daif -PAN -UAO) pc : pfifo_fast_reset+0x280/0x4d8 lr : pfifo_fast_reset+0x21c/0x4d8 sp : ffff800d09676fa0 x29: ffff800d09676fa0 x28: ffff200012ee22e4 x27: dfff200000000000 x26: 0000000000000000 x25: ffff800ca0799958 x24: ffff1001940f332b x23: 0000000000000007 x22: ffff200012ee1ab8 x21: 0000600de8a40000 x20: 0000000000000000 x19: ffff800ca0799900 x18: 0000000000000000 x17: 0000000000000002 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: ffff1001b922e6e2 x11: 1ffff001b922e6e1 x10: 0000000000000000 x9 : 1ffff001b922e6e1 x8 : dfff200000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 1fffe400025dc45c x4 : 1fffe400025dc357 x3 : 00000c01bd148000 x2 : 0000600de8a40000 x1 : 0000000000000007 x0 : 0000600de8a40004 Call trace: pfifo_fast_reset+0x280/0x4d8 qdisc_reset+0x6c/0x370 htb_reset+0x150/0x3b8 [sch_htb] qdisc_reset+0x6c/0x370 dev_deactivate_queue.constprop.5+0xe0/0x1a8 dev_deactivate_many+0xd8/0x908 dev_deactivate+0xe4/0x190 qdisc_graft+0x88c/0xbd0 tc_get_qdisc+0x418/0x8a8 rtnetlink_rcv_msg+0x3a8/0xa78 netlink_rcv_skb+0x18c/0x328 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x3c4/0x538 netlink_sendmsg+0x538/0x9a0 sock_sendmsg+0xac/0xf8 ___sys_sendmsg+0x53c/0x658 __sys_sendmsg+0xc8/0x140 __arm64_sys_sendmsg+0x74/0xa8 el0_svc_handler+0x164/0x468 el0_svc+0x10/0x14 Code: 910012a0 92400801 d343fc03 11000c21 (38fb6863) Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'. Changes since v1: - coding style improvements, thanks to Stefano Brivio Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni <pabeni@redhat.com> Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28libceph: don't call crypto_free_sync_skcipher() on a NULL tfmJia-Ju Bai1-2/+4
In set_secret(), key->tfm is assigned to NULL on line 55, and then ceph_crypto_key_destroy(key) is executed. ceph_crypto_key_destroy(key) crypto_free_sync_skcipher(key->tfm) crypto_free_skcipher(&tfm->base); This happens to work because crypto_sync_skcipher is a trivial wrapper around crypto_skcipher: &tfm->base is still 0 and crypto_free_skcipher() handles that. Let's not rely on the layout of crypto_sync_skcipher. This bug is found by a static analysis tool STCheck written by us. Fixes: 69d6302b65a8 ("libceph: Remove VLA usage of skcipher"). Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-27tcp: remove empty skb from write queue in error casesEric Dumazet1-10/+20
Vladimir Rutsky reported stuck TCP sessions after memory pressure events. Edge Trigger epoll() user would never receive an EPOLLOUT notification allowing them to retry a sendmsg(). Jason tested the case of sk_stream_alloc_skb() returning NULL, but there are other paths that could lead both sendmsg() and sendpage() to return -1 (EAGAIN), with an empty skb queued on the write queue. This patch makes sure we remove this empty skb so that Jason code can detect that the queue is empty, and call sk->sk_write_space(sk) accordingly. Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Reported-by: Vladimir Rutsky <rutsky@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net/rds: Fix info leak in rds6_inc_info_copy()Ka-Cheong Poon1-1/+4
The rds6_inc_info_copy() function has a couple struct members which are leaking stack information. The ->tos field should hold actual information and the ->flags field needs to be zeroed out. Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure") Fixes: b7ff8b1036f0 ("rds: Extend RDS API for IPv6 support") Reported-by: 黄ID蝴蝶 <butterflyhuangxx@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: allow users to set ep ecn flag by sockoptXin Long1-0/+73
SCTP_ECN_SUPPORTED sockopt will be added to allow users to change ep ecn flag, and it's similar with other feature flags. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: allow users to set netns ecn flag with sysctlXin Long1-0/+7
sysctl net.sctp.ecn_enable is added in this patch. It will allow users to change the default sctp ecn flag, net.sctp.ecn_enable. This feature was also required on this thread: http://lkml.iu.edu/hypermail/linux/kernel/0812.1/01858.html Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: make ecn flag per netns and endpointXin Long3-4/+16
This patch is to add ecn flag for both netns_sctp and sctp_endpoint, net->sctp.ecn_enable is set 1 by default, and ep->ecn_enable will be initialized with net->sctp.ecn_enable. asoc->peer.ecn_capable will be set during negotiation only when ep->ecn_enable is set on both sides. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: fix skb use after free in netpollFeng Sun1-3/+3
After commit baeababb5b85d5c4e6c917efe2a1504179438d3b ("tun: return NET_XMIT_DROP for dropped packets"), when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP, netpoll_send_skb_on_dev will run into following use after free cases: 1. retry netpoll_start_xmit with freed skb; 2. queue freed skb in npinfo->txq. queue_process will also run into use after free case. hit netpoll_send_skb_on_dev first case with following kernel log: [ 117.864773] kernel BUG at mm/slub.c:306! [ 117.864773] invalid opcode: 0000 [#1] SMP PTI [ 117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P OE 5.3.0-050300rc5-generic #201908182231 [ 117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0 [ 117.864781] Call Trace: [ 117.864781] ? tun_net_xmit+0x21c/0x460 [ 117.864781] kfree_skbmem+0x4e/0x60 [ 117.864782] kfree_skb+0x3a/0xa0 [ 117.864782] tun_net_xmit+0x21c/0x460 [ 117.864782] netpoll_start_xmit+0x11d/0x1b0 [ 117.864788] netpoll_send_skb_on_dev+0x1b8/0x200 [ 117.864789] __br_forward+0x1b9/0x1e0 [bridge] [ 117.864789] ? skb_clone+0x53/0xd0 [ 117.864790] ? __skb_clone+0x2e/0x120 [ 117.864790] deliver_clone+0x37/0x50 [bridge] [ 117.864790] maybe_deliver+0x89/0xc0 [bridge] [ 117.864791] br_flood+0x6c/0x130 [bridge] [ 117.864791] br_dev_xmit+0x315/0x3c0 [bridge] [ 117.864792] netpoll_start_xmit+0x11d/0x1b0 [ 117.864792] netpoll_send_skb_on_dev+0x1b8/0x200 [ 117.864792] netpoll_send_udp+0x2c6/0x3e8 [ 117.864793] write_msg+0xd9/0xf0 [netconsole] [ 117.864793] console_unlock+0x386/0x4e0 [ 117.864793] vprintk_emit+0x17e/0x280 [ 117.864794] vprintk_default+0x29/0x50 [ 117.864794] vprintk_func+0x4c/0xbc [ 117.864794] printk+0x58/0x6f [ 117.864795] loop_fun+0x24/0x41 [printmsg_loop] [ 117.864795] kthread+0x104/0x140 [ 117.864795] ? 0xffffffffc05b1000 [ 117.864796] ? kthread_park+0x80/0x80 [ 117.864796] ret_from_fork+0x35/0x40 Signed-off-by: Feng Sun <loyou85@gmail.com> Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: Advertise the VLAN offload netdev ability only if switch supports itVladimir Oltean1-9/+6
When adding a VLAN sub-interface on a DSA slave port, the 8021q core checks NETIF_F_HW_VLAN_CTAG_FILTER and, if the netdev is capable of filtering, calls .ndo_vlan_rx_add_vid or .ndo_vlan_rx_kill_vid to configure the VLAN offloading. DSA sets this up counter-intuitively: it always advertises this netdev feature, but the underlying driver may not actually support VLAN table manipulation. In that case, the DSA core is forced to ignore the error, because not being able to offload the VLAN is still fine - and should result in the creation of a non-accelerated VLAN sub-interface. Change this so that the netdev feature is only advertised for switch drivers that support VLAN manipulation, instead of checking for -EOPNOTSUPP at runtime. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: tag_8021q: Future-proof the reserved fields in the custom VIDVladimir Oltean1-0/+2
After witnessing the discussion in https://lkml.org/lkml/2019/8/14/151 w.r.t. ioctl extensibility, it became clear that such an issue might prevent that the 3 RSV bits inside the DSA 802.1Q tag might also suffer the same fate and be useless for further extension. So clearly specify that the reserved bits should currently be transmitted as zero and ignored on receive. The DSA tagger already does this (and has always did), and is the only known user so far (no Wireshark dissection plugin, etc). So there should be no incompatibility to speak of. Fixes: 0471dd429cea ("net: dsa: tag_8021q: Create a stable binary format") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: clear VLAN PVID flag for CPU portVivien Didelot1-0/+6
When the bridge offloads a VLAN on a slave port, we also need to program its dedicated CPU port as a member of the VLAN. Drivers may handle the CPU port's membership as they want. For example, Marvell as a special "Unmodified" mode to pass frames as is through such ports. Even though DSA expects the drivers to handle the CPU port membership, it does not make sense to program user VLANs as PVID on the CPU port. This patch clears this flag before programming the CPU port. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: program VLAN on CPU port from slaveVivien Didelot2-1/+18
DSA currently programs a VLAN on the CPU port implicitly after the related notifier is received by a switch. While we still need to do this transparent programmation of the DSA links in the fabric, programming the CPU port this way may cause problems in some corners such as the tag_8021q driver. Because the dedicated CPU port is specific to a slave, make their programmation explicit a few layers up, in the slave code. Note that technically, DSA links have a dedicated CPU port as well, but since they are only used as conduit between interconnected switches of a fabric, programming them transparently this way is what we want. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: check bridge VLAN in slave operationsVivien Didelot2-8/+14
The bridge VLANs are not offloaded by dsa_port_vlan_* if the port is not bridged or if its bridge is not VLAN aware. This is a good thing but other corners of DSA, such as the tag_8021q driver, may need to program VLANs regardless the bridge state. And also because bridge_dev is specific to user ports anyway, move these checks were it belongs, one layer up in the slave code. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: add slave VLAN helpersVivien Didelot1-7/+33
Add dsa_slave_vlan_add and dsa_slave_vlan_del helpers to handle SWITCHDEV_OBJ_ID_PORT_VLAN switchdev objects. Also copy the switchdev_obj_port_vlan structure on add since we will modify it in future patches. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_addVivien Didelot2-4/+7
Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP. This function is used in the tag_8021q.c code to offload the PVID of ports, which would simply not work if .port_vlan_add is not supported by the underlying switch. Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary, that is to say in dsa_slave_vlan_rx_add_vid. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: remove bitmap operationsVivien Didelot2-87/+59
The bitmap operations were introduced to simplify the switch drivers in the future, since most of them could implement the common VLAN and MDB operations (add, del, dump) with simple functions taking all target ports at once, and thus limiting the number of hardware accesses. Programming an MDB or VLAN this way in a single operation would clearly simplify the drivers a lot but would require a new get-set interface in DSA. The usage of such bitmap from the stack also raised concerned in the past, leading to the dynamic allocation of a new ds->_bitmap member in the dsa_switch structure. So let's get rid of them for now. This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add} switch operations into new dsa_switch_{mdb,vlan}_{prepare,add} variants not using any bitmap argument anymore. New dsa_switch_{mdb,vlan}_match helpers have been introduced to make clear which local port of a switch must be programmed with the target object. While the targeted user port is an obvious candidate, the DSA links must also be programmed, as well as the CPU port for VLANs. While at it, also remove local variables that are only used once. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net_sched: fix a NULL pointer deref in ipt actionCong Wang19-23/+24
The net pointer in struct xt_tgdtor_param is not explicitly initialized therefore is still NULL when dereferencing it. So we have to find a way to pass the correct net pointer to ipt_destroy_target(). The best way I find is just saving the net pointer inside the per netns struct tcf_idrinfo, which could make this patch smaller. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Reported-and-tested-by: itugrok@yahoo.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>