linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-10-11	xfs: reinit btree pointer on attr tree inactivation walk	Brian Foster	1	-0/+2
	xfs_attr3_root_inactive() walks the attr fork tree to invalidate the associated blocks. xfs_attr3_node_inactive() recursively descends from internal blocks to leaf blocks, caching block address values along the way to revisit parent blocks, locate the next entry and descend down that branch of the tree. The code that attempts to reread the parent block is unsafe because it assumes that the local xfs_da_node_entry pointer remains valid after an xfs_trans_brelse() and re-read of the parent buffer. Under heavy memory pressure, it is possible that the buffer has been reclaimed and reallocated by the time the parent block is reread. This means that 'btree' can point to an invalid memory address, lead to a random/garbage value for child_fsb and cause the subsequent read of the attr fork to go off the rails and return a NULL buffer for an attr fork offset that is most likely not allocated. Note that this problem can be manufactured by setting XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers, creating a file with a multi-level attr fork and removing it to trigger inactivation. To address this problem, reinit the node/btree pointers to the parent buffer after it has been re-read. This ensures btree points to a valid record and allows the walk to proceed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-11	xfs: Fix bool initialization/comparison	Thomas Meyer	5	-8/+8
	Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-11	xfs: don't change inode mode if ACL update fails	Dave Chinner	1	-6/+16
	If we get ENOSPC half way through setting the ACL, the inode mode can still be changed even though the ACL does not exist. Reorder the operation to only change the mode of the inode if the ACL is set correctly. Whilst this does not fix the problem with crash consistency (that requires attribute addition to be a deferred op) it does prevent ENOSPC and other non-fatal errors setting an xattr to be handled sanely. This fixes xfstests generic/449. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-11	xfs: move more RT specific code under CONFIG_XFS_RT	Dave Chinner	3	-0/+27
	Various utility functions and interfaces that iterate internal devices try to reference the realtime device even when RT support is not compiled into the kernel. Make sure this code is excluded from the CONFIG_XFS_RT=n build, and where appropriate stub functions to return fatal errors if they ever get called when RT support is not present. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-11	xfs: Don't log uninitialised fields in inode structures	Dave Chinner	3	-58/+50
	Prevent kmemcheck from throwing warnings about reading uninitialised memory when formatting inodes into the incore log buffer. There are several issues here - we don't always log all the fields in the inode log format item, and we never log the inode the di_next_unlinked field. In the case of the inode log format item, this is exacerbated by the old xfs_inode_log_format structure padding issue. Hence make the padded, 64 bit aligned version of the structure the one we always use for formatting the log and get rid of the 64 bit variant. This means we'll always log the 64-bit version and so recovery only needs to convert from the unpadded 32 bit version from older 32 bit kernels. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-03	xfs: handle racy AIO in xfs_reflink_end_cow	Christoph Hellwig	1	-1/+8
	If we got two AIO writes into a COW area the second one might not have any COW extents left to convert. Handle that case gracefully instead of triggering an assert or accessing beyond the bounds of the extent list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-03	xfs: always swap the cow forks when swapping extents	Darrick J. Wong	1	-2/+22
	Since the CoW fork exists as a secondary data structure to the data fork, we must always swap cow forks during swapext. We also need to swap the extent counts and reset the cowblocks tags. Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-26	xfs: revert "xfs: factor rmap btree size into the indlen calculations"	Darrick J. Wong	1	-15/+2
	In commit fd26a88093ba we added a worst case estimate for rmapbt blocks needed to satisfy the block mapping request. Since then, we added the ability to reserve enough space in each AG such that we should never run out of blocks to grow the rmapbt, which makes this calculation unnecessary. Revert the commit because it makes the extra delalloc indlen accounting unnecessary and incorrect. Reported-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-26	xfs: Capture state of the right inode in xfs_iflush_done	Carlos Maiolino	1	-1/+1
	My previous patch: d3a304b6292168b83b45d624784f973fdc1ca674 check for XFS_LI_FAILED flag xfs_iflush done, so the failed item can be properly resubmitted. In the loop scanning other inodes being completed, it should check the current item for the XFS_LI_FAILED, and not the initial one. The state of the initial inode is checked after the loop ends Kudos to Eric for catching this. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-26	xfs: perag initialization should only touch m_ag_max_usable for AG 0	Darrick J. Wong	1	-2/+10
	We call __xfs_ag_resv_init to make a per-AG reservation for each AG. This makes the reservation per-AG, not per-filesystem. Therefore, it is incorrect to adjust m_ag_max_usable for each AG. Adjust it only when we're reserving AG 0's blocks so that we only do it once per fs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-09-26	xfs: update i_size after unwritten conversion in dio completion	Eryu Guan	5	-19/+28
	Since commit d531d91d6990 ("xfs: always use unwritten extents for direct I/O writes"), we start allocating unwritten extents for all direct writes to allow appending aio in XFS. But for dio writes that could extend file size we update the in-core inode size first, then convert the unwritten extents to real allocations at dio completion time in xfs_dio_write_end_io(). Thus a racing direct read could see the new i_size and find the unwritten extents first and read zeros instead of actual data, if the direct writer also takes a shared iolock. Fix it by updating the in-core inode size after the unwritten extent conversion. To do this, introduce a new boolean argument to xfs_iomap_write_unwritten() to tell if we want to update in-core i_size or not. Suggested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-26	iomap_dio_rw: Allocate AIO completion queue before submitting dio	Chandan Rajendra	1	-7/+7
	Executing xfs/104 test in a loop on Linux-v4.13 kernel on a ppc64 machine can cause the following NULL pointer dereference, .queue_work_on+0x4c/0x80 .iomap_dio_bio_end_io+0xbc/0x1f0 .bio_endio+0x118/0x1f0 .blk_update_request+0xd0/0x470 .blk_mq_end_request+0x24/0xc0 .lo_complete_rq+0x40/0xe0 .__blk_mq_complete_request_remote+0x28/0x40 .flush_smp_call_function_queue+0xc4/0x1e0 .smp_ipi_demux_relaxed+0x8c/0x100 .icp_hv_ipi_action+0x54/0xa0 .__handle_irq_event_percpu+0x84/0x2c0 .handle_irq_event_percpu+0x28/0x80 .handle_percpu_irq+0x78/0xc0 .generic_handle_irq+0x40/0x70 .__do_irq+0x88/0x200 .call_do_irq+0x14/0x24 .do_IRQ+0x84/0x130 This occurs due to the following sequence of events, 1. Allocate dio for Direct I/O write. 2. Invoke iomap_apply() until iov_iter_count() bytes have been submitted. - Assume that we have submitted atleast one bio. Hence iomap_dio->ref value will be >= 2. - If during the second iteration, iomap_apply() ends up returning -ENOSPC, we would break out of the loop and since the 'ret' value is a negative number we end up not allocating memory for super_block->s_dio_done_wq. 3. Meanwhile, iomap_dio_bio_end_io() is invoked for bios that have been submitted and here the code ends up dereferencing the NULL pointer stored at super_block->s_dio_done_wq. This commit fixes the bug by allocating memory for super_block->s_dio_done_wq before iomap_apply() is invoked. Reported-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-26	xfs: validate bdev support for DAX inode flag	Ross Zwisler	1	-1/+2
	Currently only the blocksize is checked, but we should really be calling bdev_dax_supported() which also tests to make sure we can get a struct dax_device and that the dax_direct_access() path is working. This is the same check that we do for the "-o dax" mount option in xfs_fs_fill_super(). This does not fix the race issues that caused the XFS DAX inode option to be disabled, so that option will still be disabled. If/when we re-enable it, though, I think we will want this issue to have been fixed. I also do think that we want to fix this in stable kernels. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> CC: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-25	xfs: remove redundant re-initialization of total_nr_pages	Colin Ian King	1	-2/+0
	Variable total_nr_pages is being initialized and then updated with the same value, this latter assignment is redundant and can be removed. Cleans up clang build warning: Value stored to 'total_nr_pages' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-25	xfs: Output warning message when discard option was enabled even though the device does not support discard	Kenjiro Nakayama	1	-0/+10
	In order to using discard function, it is necessary that not only xfs is mounted with discard option, but also the discard function is supported by the device. Current code doesn't output any message when users mount with discard option on unsupported device, so it is difficult to notice that it was not enabled actually. This patch adds the warning message to notice that discard option is not enabled due to unsupported device when the filesystem is mounted. Changes in v2 (Suggested by Brian Foster): - Move the unsupported device check into xfs_fs_fill_super(). - Clear the discard flag when device is unsupported. Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-25	xfs: report zeroed or not correctly in xfs_zero_range()	Eryu Guan	1	-1/+1
	The 'did_zero' param of xfs_zero_range() was not passed to iomap_zero_range() correctly. This was introduced by commit 7bb41db3ea16 ("xfs: handle 64-bit length in xfs_iozero"), and found by code inspection. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-25	xfs: kill meaningless variable 'zero'	Eryu Guan	1	-3/+1
	In xfs_file_aio_write_checks(), variable 'zero' is there only to satisfy xfs_zero_eof(), the result of it is ignored. Now, with iomap_zero_range() based xfs_zero_eof(), we can safely pass NULL as the last param of it and kill 'zero'. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-25	fs/xfs: Use %pS printk format for direct addresses	Helge Deller	1	-1/+1
	Use the %pS instead of the %pF printk format specifier for printing symbols from direct addresses. This is needed for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-25	xfs: evict CoW fork extents when performing finsert/fcollapse	Darrick J. Wong	1	-1/+13
	When we perform an finsert/fcollapse operation, cancel all the CoW extents for the affected file offset range so that they don't end up pointing to the wrong blocks. Reported-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-25	xfs: don't unconditionally clear the reflink flag on zero-block files	Darrick J. Wong	1	-3/+5
	If we have speculative cow preallocations hanging around in the cow fork, don't let a truncate operation clear the reflink flag because if we do then there's a chance we'll forget to free those extents when we destroy the incore inode. Reported-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-16	Linux 4.14-rc1	Linus Torvalds	1	-2/+2

2017-09-16	firmware: Restore support for built-in firmware	Markus Trippelsdorf	4	-5/+71
	Commit 5620a0d1aac ("firmware: delete in-kernel firmware") removed the entire firmware directory. Unfortunately it thereby also removed the support for built-in firmware. This restores the ability to build firmware directly into the kernel by pruning the original Makefile to the necessary minimum. The default for EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/. Fixes: 5620a0d1aac ("firmware: delete in-kernel firmware") Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Acked-by: Greg K-H <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-16	mlxsw: spectrum_router: Only handle IPv4 and IPv6 events	Ido Schimmel	1	-1/+2
	The driver doesn't support events from address families other than IPv4 and IPv6, so ignore them. Otherwise, we risk queueing a work item before it's initialized. This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES is enabled, as the VRF driver will try to add an l3mdev rule for the IPMR family. Fixes: 65e65ec137f4 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Andreas Rammhold <andreas@rammhold.de> Reported-by: Florian Klink <flokli@flokli.de> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-16	Documentation: link in networking docs	Pavel Machek	1	-1/+1
	Fix link in filter.txt. Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-16	tcp: fix data delivery rate	Eric Dumazet	1	-4/+3
	Now skb->mstamp_skb is updated later, we also need to call tcp_rate_skb_sent() after the update is done. Fixes: 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	bpf/verifier: reject BPF_ALU64\|BPF_END	Edward Cree	2	-1/+18
	Neither ___bpf_prog_run nor the JITs accept it. Also adds a new test case. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	sctp: do not mark sk dumped when inet_sctp_diag_fill returns err	Xin Long	1	-1/+0
	sctp_diag would not actually dump out sk/asoc if inet_sctp_diag_fill returns err, in which case it shouldn't mark sk dumped by setting cb->args[3] as 1 in sctp_sock_dump(). Otherwise, it could cause some asocs to have no parent's sk dumped in 'ss --sctp'. So this patch is to not set cb->args[3] when inet_sctp_diag_fill() returns err in sctp_sock_dump(). Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	sctp: fix an use-after-free issue in sctp_sock_dump	Xin Long	3	-39/+36
	Commit 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep with holding sock and sock lock. But Paolo noticed that endpoint could be destroyed in sctp_rcv without sock lock protection. It means the use-after-free issue still could be triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks !ep, although it's pretty hard to reproduce. I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close and sctp_sock_dump long time. This patch is to add another param cb_done to sctp_for_each_transport and dump ep->assocs with holding tsp after jumping out of transport's traversal in it to avoid this issue. It can also improve sctp diag dump to make it run faster, as no need to save sk into cb->args[5] and keep calling sctp_for_each_transport any more. This patch is also to use int * instead of int for the pos argument in sctp_for_each_transport, which could make postion increment only in sctp_for_each_transport and no need to keep changing cb->args[2] in sctp_sock_filter and sctp_sock_dump any more. Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	netvsc: increase default receive buffer size	Stephen Hemminger	1	-1/+1
	The default receive buffer size was reduced by recent change to a value which was appropriate for 10G and Windows Server 2016. But the value is too small for full performance with 40G on Azure. Increase the default back to maximum supported by host. Fixes: 8b5327975ae1 ("netvsc: allow controlling send/recv buffer size") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	tcp: update skb->skb_mstamp more carefully	Eric Dumazet	1	-7/+12
	liujian reported a problem in TCP_USER_TIMEOUT processing with a patch in tcp_probe_timer() : https://www.spinics.net/lists/netdev/msg454496.html After investigations, the root cause of the problem is that we update skb->skb_mstamp of skbs in write queue, even if the attempt to send a clone or copy of it failed. One reason being a routing problem. This patch prevents this, solving liujian issue. It also removes a potential RTT miscalculation, since __tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has been changed. A future ACK would then lead to a very small RTT sample and min_rtt would then be lowered to this too small value. Tested: # cat user_timeout.pkt --local_ip=192.168.102.64 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0` +0 < S 0:0(0) win 0 <mss 1460> +0 > S. 0:0(0) ack 1 <mss 1460> +.1 < . 1:1(0) ack 1 win 65530 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 +0 write(4, ..., 24) = 24 +0 > P. 1:25(24) ack 1 win 29200 +.1 < . 1:1(0) ack 25 win 65530 //change the ipaddress +1 `ifconfig tun0 192.168.0.10/16` +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +0 `ifconfig tun0 192.168.102.64/16` +0 < . 1:2(1) ack 25 win 65530 +0 `ifconfig tun0 192.168.0.10/16` +3 write(4, ..., 24) = -1 # ./packetdrill user_timeout.pkt Signed-off-by: Eric Dumazet <edumazet@googl.com> Reported-by: liujian <liujian56@huawei.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	net: ipv4: fix l3slave check for index returned in IP_PKTINFO	David Ahern	1	-2/+6
	rt_iif is only set to the actual egress device for the output path. The recent change to consider the l3slave flag when returning IP_PKTINFO works for local traffic (the correct device index is returned), but it broke the more typical use case of packets received from a remote host always returning the VRF index rather than the original ingress device. Update the fixup to consider l3slave and rt_iif actually getting set. Fixes: 1dfa76390bf05 ("net: ipv4: add check for l3slave for index returned in IP_PKTINFO") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	net: smsc911x: Quieten netif during suspend	Geert Uytterhoeven	1	-1/+14
	If the network interface is kept running during suspend, the net core may call net_device_ops.ndo_start_xmit() while the Ethernet device is still suspended, which may lead to a system crash. E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is driven by a PM controlled clock. If the Ethernet registers are accessed while the clock is not running, the system will crash with an imprecise external abort. As this is a race condition with a small time window, it is not so easy to trigger at will. Using pm_test may increase your chances: # echo 0 > /sys/module/printk/parameters/console_suspend # echo platform > /sys/power/pm_test # echo mem > /sys/power/state To fix this, make sure the network interface is quietened during suspend. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	net: systemport: Fix 64-bit stats deadlock	Florian Fainelli	1	-3/+0
	We can enter a deadlock situation because there is no sufficient protection when ndo_get_stats64() runs in process context to guard against RX or TX NAPI contexts running in softirq, this can lead to the following lockdep splat and actual deadlock was experienced as well with an iperf session in the background and a while loop doing ifconfig + ethtool. [ 5.780350] ================================ [ 5.784679] WARNING: inconsistent lock state [ 5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted [ 5.794561] -------------------------------- [ 5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 5.810175] (&syncp->seq#2){+.?...}, at: [<c0768a28>] bcm_sysport_tx_reclaim+0x30/0x54 [ 5.818327] {SOFTIRQ-ON-W} state was registered at: [ 5.823278] bcm_sysport_get_stats64+0x17c/0x258 [ 5.828053] dev_get_stats+0x38/0xac [ 5.831776] rtnl_fill_stats+0x30/0x118 [ 5.835761] rtnl_fill_ifinfo+0x538/0xe24 [ 5.839921] rtmsg_ifinfo_build_skb+0x6c/0xd8 [ 5.844430] rtmsg_ifinfo_event.part.5+0x14/0x44 [ 5.849201] rtmsg_ifinfo+0x20/0x28 [ 5.852837] register_netdevice+0x628/0x6b8 [ 5.857171] register_netdev+0x14/0x24 [ 5.861051] bcm_sysport_probe+0x30c/0x438 [ 5.865280] platform_drv_probe+0x50/0xb0 [ 5.869418] driver_probe_device+0x2e8/0x450 [ 5.873817] __driver_attach+0x104/0x120 [ 5.877871] bus_for_each_dev+0x7c/0xc0 [ 5.881834] bus_add_driver+0x1b0/0x270 [ 5.885797] driver_register+0x78/0xf4 [ 5.889675] do_one_initcall+0x54/0x190 [ 5.893646] kernel_init_freeable+0x144/0x1d0 [ 5.898135] kernel_init+0x8/0x110 [ 5.901665] ret_from_fork+0x14/0x2c [ 5.905363] irq event stamp: 24263 [ 5.908804] hardirqs last enabled at (24262): [<c08eecf0>] net_rx_action+0xc4/0x4e4 [ 5.916624] hardirqs last disabled at (24263): [<c0a7da00>] _raw_spin_lock_irqsave+0x1c/0x98 [ 5.925143] softirqs last enabled at (24258): [<c022a7fc>] irq_enter+0x84/0x98 [ 5.932524] softirqs last disabled at (24259): [<c022a918>] irq_exit+0x108/0x16c [ 5.939985] [ 5.939985] other info that might help us debug this: [ 5.946576] Possible unsafe locking scenario: [ 5.946576] [ 5.952556] CPU0 [ 5.955031] ---- [ 5.957506] lock(&syncp->seq#2); [ 5.960955] <Interrupt> [ 5.963604] lock(&syncp->seq#2); [ 5.967227] [ 5.967227] * DEADLOCK * [ 5.967227] [ 5.973222] 1 lock held by swapper/0/0: [ 5.977092] #0: (&(&ring->lock)->rlock){..-...}, at: [<c0768a18>] bcm_sysport_tx_reclaim+0x20/0x54 So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64() since it does not appear to be useful for anything. No inconsistency was observed with either ifconfig or ethtool, global TX counts equal the sum of per-queue TX counts on a 32-bit architecture. Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	net: vrf: avoid gcc-4.6 warning	Arnd Bergmann	1	-3/+3
	When building an allmodconfig kernel with gcc-4.6, we get a rather odd warning: drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’: drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror] drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror] I have no idea what this warning is even trying to say, but it does seem like a false positive. Reordering the initialization in to match the structure definition gets rid of the warning, and might also avoid whatever gcc thinks is wrong here. Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	qed: remove unnecessary call to memset	Himanshu Jha	1	-1/+0
	call to memset to assign 0 value immediately after allocating memory with kzalloc is unnecesaary as kzalloc allocates the memory filled with 0 value. Semantic patch used to resolve this issue: @@ expression e,e2; constant c; statement S; @@ e = kzalloc(e2, c); if(e == NULL) S - memset(e, 0, e2); Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Acked-by: Sudarsana Kalluru <sudarsana.kalluru@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-15	Input: i8042 - add Gigabyte P57 to the keyboard reset table	Kai-Heng Feng	1	-0/+7
	Similar to other Gigabyte laptops, the touchpad on P57 requires a keyboard reset to detect Elantech touchpad correctly. BugLink: https://bugs.launchpad.net/bugs/1594214 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-09-15	kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly	Jim Mattson	1	-59/+75
	When emulating a nested VM-entry from L1 to L2, several control field validation checks are deferred to the hardware. Should one of these validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When this happens, the L2 guest state is not loaded (even in part), and execution should continue in L1 with the next instruction after the VMLAUNCH/VMRESUME. The VMCS12 is not modified (except for the VM-instruction error field), the VMCS12 MSR save/load lists are not processed, and the CPU state is not loaded from the VMCS12 host area. Moreover, the vmcs02 exit reason is stale, so it should not be consulted for any reason. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly	Jim Mattson	1	-6/+8
	On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the VM-instruction error field of the current VMCS), the launch state of the current VMCS is not set to "launched," and the VM-exit information fields of the current VMCS (including IDT-vectoring information and exit reason) are stale. On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit of the exit reason field), the launch state of the current VMCS is not set to "launched," and only two of the VM-exit information fields of the current VMCS are modified (exit reason and exit qualification). The remaining VM-exit information fields of the current VMCS (including IDT-vectoring information, in particular) are stale. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry	Jim Mattson	1	-7/+9
	After a successful VM-entry, RFLAGS is cleared, with the exception of bit 1, which is always set. This is handled by load_vmcs12_host_state. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm,mips: Fix potential swait_active() races	Davidlohr Bueso	1	-2/+2
	For example, the following could occur, making us miss a wakeup: CPU0 CPU1 kvm_vcpu_block kvm_mips_comparecount_func [L] swait_active(&vcpu->wq) [S] prepare_to_swait(&vcpu->wq) [L] if (!kvm_vcpu_has_pending_timer(vcpu)) schedule() [S] queue_timer_int(vcpu) Ensure that the swait_active() check is not hoisted over the interrupt. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm,powerpc: Serialize wq active checks in ops->vcpu_kick	Davidlohr Bueso	1	-1/+1
	Particularly because kvmppc_fast_vcpu_kick_hv() is a callback, ensure that we properly serialize wq active checks in order to avoid potentially missing a wakeup due to racing with the waiter side. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm: Serialize wq active checks in kvm_vcpu_wake_up()	Davidlohr Bueso	1	-1/+1
	This is a generic call and can be suceptible to races in reading the wq task_list while another task is adding itself to the list. Add a full barrier by using the swq_has_sleeper() helper. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm,x86: Fix apf_task_wake_one() wq serialization	Davidlohr Bueso	1	-1/+1
	During code inspection, the following potential race was seen: CPU0 CPU1 kvm_async_pf_task_wait apf_task_wake_one [L] swait_active(&n->wq) [S] prepare_to_swait(&n.wq) [L] if (!hlist_unhahed(&n.link)) schedule() [S] hlist_del_init(&n->link); Properly serialize swait_active() checks such that a wakeup is not missed. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm,lapic: Justify use of swait_active()	Davidlohr Bueso	1	-0/+4
	A comment might serve future readers. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	kvm,async_pf: Use swq_has_sleeper()	Davidlohr Bueso	1	-5/+1
	... as we've got the new helper now. This caller already does the right thing, hence no changes in semantics. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	sched/wait: Add swq_has_sleeper()	Davidlohr Bueso	1	-2/+56
	Which is the equivalent of what we have in regular waitqueues. I'm not crazy about the name, but this also helps us get both apis closer -- which iirc comes originally from the -net folks. We also duplicate the comments for the lockless swait_active(), from wait.h. Future users will make use of this interface. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	KVM: VMX: Do not BUG() on out-of-bounds guest IRQ	Jan H. Schönherr	1	-2/+7
	The value of the guest_irq argument to vmx_update_pi_irte() is ultimately coming from a KVM_IRQFD API call. Do not BUG() in vmx_update_pi_irte() if the value is out-of bounds. (Especially, since KVM as a whole seems to hang after that.) Instead, print a message only once if we find that we don't have a route for a certain IRQ (which can be out-of-bounds or within the array). This fixes CVE-2017-1000252. Fixes: efc644048ecde54 ("KVM: x86: Update IRTE for posted-interrupts") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	KVM: Don't accept obviously wrong gsi values via KVM_IRQFD	Jan H. Schönherr	1	-0/+2
	We cannot add routes for gsi values >= KVM_MAX_IRQ_ROUTES -- see kvm_set_irq_routing(). Hence, there is no sense in accepting them via KVM_IRQFD. Prevent them from entering the system in the first place. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15	nios2: time: Read timer in get_cycles only if initialized	Guenter Roeck	1	-1/+4
	Mainline crashes as follows when running nios2 images. On node 0 totalpages: 65536 free_area_init_node: node 0, pgdat c8408fa0, node_mem_map c8726000 Normal zone: 512 pages used for memmap Normal zone: 0 pages reserved Normal zone: 65536 pages, LIFO batch:15 Unable to handle kernel NULL pointer dereference at virtual address 00000000 ea = c8003cb0, ra = c81cbf40, cause = 15 Kernel panic - not syncing: Oops Problem is seen because get_cycles() is called before the timer it depends on is initialized. Returning 0 in that situation fixes the problem. Fixes: 33d72f3822d7 ("init/main.c: extract early boot entropy from the ..") Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-09-15	nios2: add earlycon support to 3c120 devboard DTS	Tobias Klauser	1	-1/+2
	Allow earlycon to be used on the JTAG UART present in the 3c120 GHRD. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>