linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-02-21	tcp: switch to GSO being always on	Eric Dumazet	3	-1/+3
	Oleksandr Natalenko reported performance issues with BBR without FQ packet scheduler that were root caused to lack of SG and GSO/TSO on his configuration. In this mode, TCP internal pacing has to setup a high resolution timer for each MSS sent. We could implement in TCP a strategy similar to the one adopted in commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts") or decide to finally switch TCP stack to a GSO only mode. This has many benefits : 1) Most TCP developments are done with TSO in mind. 2) Less high-resolution timers needs to be armed for TCP-pacing 3) GSO can benefit of xmit_more hint 4) Receiver GRO is more effective (as if TSO was used for real on sender) -> Lower ACK traffic 5) Write queues have less overhead (one skb holds about 64KB of payload) 6) SACK coalescing just works. 7) rtx rb-tree contains less packets, SACK is cheaper. This patch implements the minimum patch, but we can remove some legacy code as follow ups. Tested: On 40Gbit link, one netperf -t TCP_STREAM BBR+fq: sg on: 26 Gbits/sec sg off: 15.7 Gbits/sec (was 2.3 Gbit before patch) BBR+pfifo_fast: sg on: 24.2 Gbits/sec sg off: 14.9 Gbits/sec (was 0.66 Gbit before patch !!! ) BBR+fq_codel: sg on: 24.4 Gbits/sec sg off: 15 Gbits/sec (was 0.66 Gbit before patch !!! ) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	Merge branch 'ibmvnic-Make-driver-resources-dynamic'	David S. Miller	2	-67/+110
	Nathan Fontenot says: ==================== ibmvnic: Make driver resources dynamic The ibmvnic driver needs to be able to handle the number of tx/rx sub-crqs changing during a reset of the driver. To do this several changes need to be made. First the num_active_[tx\|rx]_pools counters need to be re-named to num_active_[tc\|rx]_scrqs, and updated after resource initialization. With this change we can now release and init the sub crqs and napi (for rx sub crqs) when the number of sub crqs change. Lastly, the stats buffer allocation is updated to always allocate the maximum number of sub-crqs count of stats buffers. -Nathan --- Updates for V3: Patch 3/5 - Make do_h_free parameter a bool Updates for V2: Patch 3/5 - Use correct queue count when driver is in probed state for releasing sub crqs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	ibmvnic: Allocate max queues stats buffers	Nathan Fontenot	1	-2/+2
	To avoid losing any stats when the number of sub-crqs change, allocate the max number of stats buffers so a stats buffer exists all possible sub-crqs. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	ibmvnic: Make napi usage dynamic	Nathan Fontenot	1	-25/+45
	In order to handle the number of rx sub crqs changing during a driver reset, the ibmvnic driver also needs to update the number of napi. To do this the code to init and free napi's is moved to their own routines so they can be called during the reset process. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	ibmvnic: Free and re-allocate scrqs when tx/rx scrqs change	Nathan Fontenot	1	-26/+52
	When the driver resets it is possible that the number of tx/rx sub-crqs can change. This patch handles this so that the driver does not try to access non-existent sub-crqs. The count for releasing sub crqs depends on the adapter state. The active queue count is not set in probe, so if we are relasing in probe state we use the request queue count. Additionally, a parameter is added to release_sub_crqs() so that we know if the h_call to free the sub-crq needs to be made. In the reset path we have to do a reset of the main crq, which is a free followed by a register of the main crq. The free of main crq results in all of the sub crq's being free'ed. When updating sub-crq count in the reset path we do not want to h_free the sub-crqs, they are already free'ed. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	ibmvnic: Move active sub-crq count settings	Nathan Fontenot	1	-10/+7
	Inpreparation for using the active scrq count to track more active resources, move the setting of the active count to after initialization occurs in initial driver init and during driver reset. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	ibmvnic: Rename active queue count variables	Nathan Fontenot	2	-10/+10
	Rename the tx/rx active pool variables to be tx/rx active scrq counts. The tx/rx pools are per sub-crq so this is a more appropriate name. This also is a preparatory step for using thiese variables for handling updates to sub-crqs and napi based on the active count. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	rds: send: mark expected switch fall-through in rds_rm_size	Gustavo A. R. Silva	1	-0/+2
	In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 1465362 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	Merge branch '8390-cleanups'	David S. Miller	14	-141/+85
	Finn Thain says: ==================== Fixes, cleanup and modernization for 8390 ethernet drivers Changes since v4 of combined patch series: - Removed redundant and non-portable MACH_IS_MAC tests. - Added acked-by tags from Geert Uytterhoeven. - Omitted patches unrelated to 8390 drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	net/mac8390: Fix log messages	Finn Thain	1	-19/+17
	Use dev_foo() to log the slot number instead of the unexpanded "eth%d" format string. Disambiguate the two identical "Card type %s is unsupported" messages. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	net/mac8390: Convert to nubus_driver	Finn Thain	3	-76/+67
	This resolves an old bug that constrained this driver to no more than one card. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	net/8390: Fix msg_enable patch snafu	Finn Thain	11	-49/+4
	The lib8390 module parameter 'msg_enable' doesn't do anything useful: it causes an ancient version string to be logged. Remove redundant code that logs the same string. In ne.c and wd.c, the value of ei_local->msg_enable is used before being assigned. Use ne_msg_enable and wd_msg_enable, respectively. Most of the other 8390 drivers never assign ei_local->msg_enable. Use the 'msg_enable' module parameter from lib8390 as the default value. Eliminate the pointless static and local variables. Clean up an indentation mistake. All of these issues originated from the same patch. Cc: Russell King <linux@armlinux.org.uk> Fixes: c45f812f0280 ("8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature") Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	net/8390: Remove redundant make dependencies	Finn Thain	1	-3/+3
	The hydra, zorro8390 and mcf8390 drivers all #include "lib8390.c" and have no need for 8390.o. modinfo confirms no dependency on 8390.ko. Drop the redundant dependency from the Makefile. objdump confirms that this patch has no effect on the module binaries. The superfluous additions of 8390.o were introduced in commit 644570b83026 ("8390: Move the 8390 related drivers"). Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	r8169: remove not needed PHY soft reset in rtl8168e_2_hw_phy_config	Heiner Kallweit	1	-2/+0
	rtl8169_init_phy() resets the PHY anyway after applying the chip-specific PHY configuration. So we don't need to soft-reset the PHY as part of the chip-specific configuration. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21	net: sched: add em_ipt ematch for calling xtables matches	Eyal Birger	5	-1/+292
	The commit a new tc ematch for using netfilter xtable matches. This allows early classification as well as mirroning/redirecting traffic based on logic implemented in netfilter extensions. Current supported use case is classification based on the incoming IPSec state used during decpsulation using the 'policy' iptables extension (xt_policy). The module dynamically fetches the netfilter match module and calls it using a fake xt_action_param structure based on validated userspace provided parameters. As the xt_policy match does not access skb->data, no skb modifications are needed on match. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	r8169: remove some WOL-related dead code	Heiner Kallweit	1	-37/+2
	Commit bde135a672bf "r8169: only enable PCI wakeups when WOL is active" removed the only user of flag RTL_FEATURE_WOL. So let's remove some now dead code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	Merge branch 'stmmac-multi-queue-fixes-and-cleanups'	David S. Miller	4	-23/+42
	Niklas Cassel says: ==================== stmmac multi-queue fixes and cleanups ==================== Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: stmmac: honor error code from stmmac_dt_phy()	Niklas Cassel	1	-2/+3
	Honor error code from stmmac_dt_phy() instead of always returning -ENODEV. No functional change intended. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: stmmac: add error handling in stmmac_mtl_setup()	Niklas Cassel	1	-5/+24
	The device tree binding for stmmac says: - Multiple TX Queues parameters: below the list of all the parameters to configure the multiple TX queues: - snps,tx-queues-to-use: number of TX queues to be used in the driver [...] - For each TX queue [...] However, if one specifies snps,tx-queues-to-use = 2, but omits the queue subnodes, or defines just one queue subnode, since the driver appears to initialize queues with sane default values, we will get tx queue timeouts. This is because the initialization code only initializes as many queues as it finds subnodes. Potentially leaving some queues uninitialized. To avoid hard to debug issues, return an error if the number of subnodes differ from snps,tx-queues-to-use/snps,rx-queues-to-use. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: stmmac: call correct function in stmmac_mac_config_rx_queues_routing()	Niklas Cassel	1	-1/+1
	stmmac_mac_config_rx_queues_routing() incorrectly calls rx_queue_prio() instead of rx_queue_routing(). This looks like a copy paste issue, since stmmac_mac_config_rx_queues_prio() already calls rx_queue_prio(), and both stmmac_mac_config_rx_queues_routing() and stmmac_mac_config_rx_queues_prio() are very similar in structure. Fixes: abe80fdc6ee6 ("net: stmmac: RX queue routing configuration") Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: stmmac: rename dwmac4_tx_queue_routing() to match reality	Niklas Cassel	1	-3/+3
	Looking at dwmac4_tx_queue_routing(), it is obvious that it sets up rx queue routing. Rename dwmac4_tx_queue_routing() to dwmac4_rx_queue_routing() to better match reality. Fixes: abe80fdc6ee6 ("net: stmmac: RX queue routing configuration") Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: stmmac: WARN if tx_skbuff entries are reused before cleared	Niklas Cassel	1	-0/+5
	The current code assumes that a tx_skbuff entry has been cleared by stmmac_tx_clean() before stmmac_xmit()/stmmac_tso_xmit() assigns a new skb to that entry. However, since we never check the current value before overwriting it, it is theoretically possible that a non-NULL value is overwritten. Add WARN_ONs to verify that each entry in tx_skbuff is NULL before it is assigned a new value. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: stmmac: do not clear tx_skbuff entries in stmmac_xmit()/stmmac_tso_xmit()	Niklas Cassel	1	-3/+0
	tx_skbuff is initialized to NULL in init_dma_tx_desc_rings(), which is called from ndo_open(). stmmac_tx_clean() frees any non-NULL skb, and sets the tx_skbuff entry to NULL. Hence, there is no need to set skbuff entries to NULL in stmmac_xmit()/stmmac_tso_xmit(), and doing so falsely gives the reader the impression that it is needed. Do not clear tx_skbuff entries in stmmac_xmit()/stmmac_tso_xmit(). Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: stmmac: set MSS for each tx DMA channel	Niklas Cassel	2	-9/+6
	The DMA engine in dwmac4 can segment a large TSO packet to several smaller packets of (max) size Maximum Segment Size (MSS). The DMA engine fetches and saves the MSS via a context descriptor. This context decriptor has to be provided to each tx DMA channel. To ensure that this is done, move struct member mss from stmmac_priv to stmmac_tx_queue. stmmac_reset_queues_param() now also resets mss, together with other queue parameters, so reset of mss value can be removed from stmmac_resume(). init_dma_tx_desc_rings() now also resets mss, together with other queue parameters, so reset of mss value can be removed from stmmac_open(). This fixes tx queue timeouts for dwmac4, with DT property snps,tx-queues-to-use > 1, when running iperf3 with multiple threads. Fixes: ce736788e8a9 ("net: stmmac: adding multiple buffers for TX") Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	x25: use %*ph to print small buffer	Antonio Cardace	1	-2/+1
	Use %*ph format to print small buffer as hex string. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Antonio Cardace <anto.cardace@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	Merge branch 'net-Expose-KVD-linear-parts-as-resources'	David S. Miller	5	-97/+220
	Jiri Pirko says: ==================== net: Expose KVD linear parts as resources Arkadi says: Expose the KVD linear partitions via the devlink resource interface. This will give the user the ability to control the linear memory division. --- v1->v2: - patch1: - fixed u64 division error reported by kbuildbot ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	mlxsw: spectrum_kvdl: Add support for per part occupancy	Arkadi Sharshevsky	1	-3/+55
	Add support for calculating occupancy for separate kvdl parts. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	mlxsw: spectrum_kvdl: Add support for dynamic partition set	Arkadi Sharshevsky	1	-5/+53
	Add support for dynamic partition set via the resource interface. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	mlxsw: spectrum_kvdl: Add support for linear division resources	Arkadi Sharshevsky	3	-3/+85
	The linear part of the KVD memory is sub-divided into multiple parts. This patch exposes this internal partitions via the resource interface. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	devlink: Perform cleanup of resource_set cb	Arkadi Sharshevsky	2	-84/+3
	After adding size validation logic into core cleanup is required. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	devlink: Move size validation to core	Arkadi Sharshevsky	1	-5/+27
	Currently the size validation is done via a cb, which is unneeded. The size validation can be moved to core. The next patch will perform cleanup. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	Merge branch 'net-Get-rid-of-net_mutex-and-simplify-cleanup_list-queueing'	David S. Miller	3	-41/+47
	Kirill Tkhai says: ==================== net: Get rid of net_mutex and simplify cleanup_list queueing [1/3] kills net_mutex and makes net_sem be taken for write instead. This is made to take less locks (1 instead of 2) for the time before all pernet_operations are converted. [2-3/3] simplifies dead net cleanup queueing, and makes llist api be used for that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: Queue net_cleanup_work only if there is first net added	Kirill Tkhai	1	-2/+2
	When llist_add() returns false, cleanup_net() hasn't made its llist_del_all(), while the work has already been scheduled by the first queuer. So, we may skip queue_work() in this case. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: Make cleanup_list and net::cleanup_list of llist type	Kirill Tkhai	2	-15/+8
	This simplifies cleanup queueing and makes cleanup lists to use llist primitives. Since llist has its own cmpxchg() ordering, cleanup_list_lock is not more need. Also, struct llist_node is smaller, than struct list_head, so we save some bytes in struct net with this patch. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	net: Kill net_mutex	Kirill Tkhai	3	-26/+39
	We take net_mutex, when there are !async pernet_operations registered, and read locking of net_sem is not enough. But we may get rid of taking the mutex, and just change the logic to write lock net_sem in such cases. This obviously reduces the number of lock operations, we do. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20	ibmvnic: Keep track of supplementary TX descriptors	Thomas Falcon	2	-2/+7
	Supplementary TX descriptors were not being accounted for, which was resulting in an overflow of the hardware device's transmit queue. Keep track of those descriptors now when determining how many entries remain on the TX queue. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	333	-2793/+4395

2018-02-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	52	-395/+504
	Pull networking fixes from David Miller: 1) Prevent index integer overflow in ptr_ring, from Jason Wang. 2) Program mvpp2 multicast filter properly, from Mikulas Patocka. 3) The bridge brport attribute file is write only and doesn't have a ->show() method, don't blindly invoke it. From Xin Long. 4) Inverted mask used in genphy_setup_forced(), from Ingo van Lil. 5) Fix multiple definition issue with if_ether.h UAPI header, from Hauke Mehrtens. 6) Fix GFP_KERNEL usage in atomic in RDS protocol code, from Sowmini Varadhan. 7) Revert XDP redirect support from thunderx driver, it is not implemented properly. From Jesper Dangaard Brouer. 8) Fix missing RTNL protection across some tipc operations, from Ying Xue. 9) Return the correct IV bytes in the TLS getsockopt code, from Boris Pismenny. 10) Take tclassid into consideration properly when doing FIB rule matching. From Stefano Brivio. 11) cxgb4 device needs more PCI VPD quirks, from Casey Leedom. 12) TUN driver doesn't align frags properly, and we can end up doing unaligned atomics on misaligned metadata. From Eric Dumazet. 13) Fix various crashes found using DEBUG_PREEMPT in rmnet driver, from Subash Abhinov Kasiviswanathan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits) tg3: APE heartbeat changes mlxsw: spectrum_router: Do not unconditionally clear route offload indication net: qualcomm: rmnet: Fix possible null dereference in command processing net: qualcomm: rmnet: Fix warning seen with 64 bit stats net: qualcomm: rmnet: Fix crash on real dev unregistration sctp: remove the left unnecessary check for chunk in sctp_renege_events rxrpc: Work around usercopy check tun: fix tun_napi_alloc_frags() frag allocator udplite: fix partial checksum initialization skbuff: Fix comment mis-spelling. dn_getsockoptdecnet: move nf_{get/set}sockopt outside sock lock PCI/cxgb4: Extend T3 PCI quirk to T4+ devices cxgb4: fix trailing zero in CIM LA dump cxgb4: free up resources of pf 0-3 fib_semantics: Don't match route with mismatching tclassid NFC: llcp: Limit size of SDP URI tls: getsockopt return record sequence number tls: reset the crypto info if copy_from_user fails tls: retrun the correct IV in getsockopt docs: segmentation-offloads.txt: add SCTP info ...
2018-02-19	tipc: don't call sock_release() in atomic context	Paolo Abeni	1	-1/+1
	syzbot reported a scheduling while atomic issue at netns destruction time: BUG: sleeping function called from invalid context at net/core/sock.c:2769 in_atomic(): 1, irqs_disabled(): 0, pid: 85, name: kworker/u4:3 5 locks held by kworker/u4:3/85: #0: ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c9792deb>] process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084 #1: (net_cleanup_work){+.+.}, at: [<00000000adc12e2a>] process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088 #2: (net_sem){++++}, at: [<000000009ccb5669>] cleanup_net+0x23f/0xd20 net/core/net_namespace.c:494 #3: (net_mutex){+.+.}, at: [<00000000a92767d9>] cleanup_net+0xa7d/0xd20 net/core/net_namespace.c:496 #4: (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>] spin_lock_bh include/linux/spinlock.h:315 [inline] #4: (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>] tipc_topsrv_stop+0x231/0x610 net/tipc/topsrv.c:685 CPU: 0 PID: 85 Comm: kworker/u4:3 Not tainted 4.16.0-rc1+ #230 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6128 __might_sleep+0x95/0x190 kernel/sched/core.c:6081 lock_sock_nested+0x37/0x110 net/core/sock.c:2769 lock_sock include/net/sock.h:1463 [inline] tipc_release+0x103/0xff0 net/tipc/socket.c:572 sock_release+0x8d/0x1e0 net/socket.c:594 tipc_topsrv_stop+0x3c0/0x610 net/tipc/topsrv.c:696 tipc_exit_net+0x15/0x40 net/tipc/core.c:96 ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:148 cleanup_net+0x6ba/0xd20 net/core/net_namespace.c:529 process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x33c/0x400 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429 This is caused by tipc_topsrv_stop() releasing the listener socket with the idr lock held. This changeset addresses the issue moving the release operation outside such lock. Reported-and-tested-by: syzbot+749d9d87c294c00ca856@syzkaller.appspotmail.com Fixes: 0ef897be12b8 ("tipc: separate topology server listener socket from subcsriber sockets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: ///jon Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	tipc: fix bug on error path in tipc_topsrv_kern_subscr()	Jon Maloy	1	-3/+4
	In commit cc1ea9ffadf7 ("tipc: eliminate struct tipc_subscriber") we re-introduced an old bug on the error path in the function tipc_topsrv_kern_subscr(). We now re-introduce the correction too. Reported-by: syzbot+f62e0f2a0ef578703946@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	Merge branch 'pernet_ops-conversions-part-2'	David S. Miller	24	-0/+26
	Kirill Tkhai says: ==================== Converting pernet_operations (part #2) This patchset continues to review and to convert pernet_operations to async. There are mostly ipv6, also some regular used netfilter pernet_operations are involved. One more converted is cfg80211_pernet_ops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert iptable_filter_net_ops	Kirill Tkhai	1	-0/+1
	These pernet_operations register and unregister net::ipv4.iptable_filter table. Since there are no packets in-flight at the time of exit method is working, iptables rules should not be touched. Also, pernet_operations should not send ipv4 packets each other. So, it's safe to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert ip_tables_net_ops, udplite6_net_ops and xt_net_ops	Kirill Tkhai	3	-0/+3
	ip_tables_net_ops and udplite6_net_ops create and destroy /proc entries. xt_net_ops does nothing. So, we are able to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert ip6_frags_ops	Kirill Tkhai	1	-0/+1
	Exit methods calls inet_frags_exit_net() with global ip6_frags as argument. So, after we make the pernet_operations async, a pair of exit methods may be called to iterate this hash table. Since there is inet_frag_worker(), which already may work in parallel with inet_frags_exit_net(), and it can make the same cleanup, that inet_frags_exit_net() does, it's safe. So we may mark these pernet_operations as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert fib6_net_ops, ipv6_addr_label_ops and ip6_segments_ops	Kirill Tkhai	3	-0/+3
	These pernet_operations register and unregister tables and lists for packets forwarding. All of the entities are per-net. Init methods makes simple initializations, and since net is not visible for foreigners at the time it is working, it can't race with anything. Exit method is executed when there are only local devices, and there mustn't be packets in-flight. Also, it looks like no one pernet_operations want to send ipv6 packets to foreign net. The same reasons are for ipv6_addr_label_ops and ip6_segments_ops. So, we are able to mark all them as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert xfrm6_net_ops	Kirill Tkhai	1	-0/+1
	These pernet_operations create sysctl tables and initialize net::xfrm.xfrm6_dst_ops used for routing. It doesn't look like another pernet_operations send ipv6 packets to foreign net namespaces, so it should be safe to mark the pernet_operations as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert ip6_flowlabel_net_ops	Kirill Tkhai	1	-0/+1
	These pernet_operations create and destroy /proc entries. ip6_fl_purge() makes almost the same actions as timer ip6_fl_gc_timer does, and as it can be executed in parallel with ip6_fl_purge(), two parallel ip6_fl_purge() may be executed. So, we can mark it async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert ping_v6_net_ops	Kirill Tkhai	1	-0/+1
	These pernet_operations only register and unregister /proc entries, so it's possible to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert ipv6_sysctl_net_ops	Kirill Tkhai	1	-0/+1
	These pernet_operations create and destroy sysctl tables. They are not touched by another net pernet_operations. So, it's possible to execute them in parallel with others. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19	net: Convert tcpv6_net_ops	Kirill Tkhai	1	-0/+1
	These pernet_operations create and destroy net::ipv6.tcp_sk socket, which is used in tcp_v6_send_response() only. It looks like foreign pernet_operations don't want to set ipv6 connection inside destroyed net, so this socket may be created in destroyed in parallel with anything else. inet_twsk_purge() is also safe for that, as described in patch for tcp_sk_ops. So, it's possible to mark them as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>