linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-09-20	percpu-refcount: make percpu_ref based on longs instead of ints	Tejun Heo	2	-30/+31
	percpu_ref is currently based on ints and the number of refs it can cover is (1 << 31). This makes it impossible to use a percpu_ref to count memory objects or pages on 64bit machines as it may overflow. This forces those users to somehow aggregate the references before contributing to the percpu_ref which is often cumbersome and sometimes challenging to get the same level of performance as using the percpu_ref directly. While using ints for the percpu counters makes them pack tighter on 64bit machines, the possible gain from using ints instead of longs is extremely small compared to the overall gain from per-cpu operation. This patch makes percpu_ref based on longs so that it can be used to directly count memory objects or pages. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Johannes Weiner <hannes@cmpxchg.org>
2014-09-20	percpu-refcount: improve WARN messages	Tejun Heo	1	-3/+5
	percpu_ref's WARN messages can be a lot more helpful by indicating who's the culprit. Make them report the release function that the offending percpu-refcount is associated with. This should make it a lot easier to track down the reported invalid refcnting operations. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com>
2014-09-09	percpu: fix locking regression in the failure path of pcpu_alloc()	Tejun Heo	1	-0/+1
	While updating locking, b38d08f3181c ("percpu: restructure locking") broke pcpu_create_chunk() creation path in pcpu_alloc(). It returns without releasing pcpu_alloc_mutex. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Julia Lawall <julia.lawall@lip6.fr>
2014-09-08	percpu-refcount: add @gfp to percpu_ref_init()	Tejun Heo	6	-10/+15
	Percpu allocator now supports allocation mask. Add @gfp to percpu_ref_init() so that !GFP_KERNEL allocation masks can be used with percpu_refs too. This patch doesn't make any functional difference. v2: blk-mq conversion was missing. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <koverstreet@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Jens Axboe <axboe@kernel.dk>
2014-09-08	proportions: add @gfp to init functions	Tejun Heo	6	-15/+17
	Percpu allocator now supports allocation mask. Add @gfp to [flex_]proportions init functions so that !GFP_KERNEL allocation masks can be used with them too. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org>
2014-09-08	percpu_counter: add @gfp to percpu_counter_init()	Tejun Heo	23	-42/+49
	Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too. We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>
2014-09-08	percpu_counter: make percpu_counters_lock irq-safe	Tejun Heo	1	-6/+10
	percpu_counter is scheduled to grow @gfp support to allow atomic initialization. This patch makes percpu_counters_lock irq-safe so that it can be safely used from atomic contexts. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: implement asynchronous chunk population	Tejun Heo	2	-6/+115
	The percpu allocator now supports atomic allocations by only allocating from already populated areas but the mechanism to ensure that there's adequate amount of populated areas was missing. This patch expands pcpu_balance_work so that in addition to freeing excess free chunks it also populates chunks to maintain an adequate level of populated areas. pcpu_alloc() schedules pcpu_balance_work if the amount of free populated areas is too low or after an atomic allocation failure. * PERPCU_DYNAMIC_RESERVE is increased by two pages to account for PCPU_EMPTY_POP_PAGES_LOW. * pcpu_async_enabled is added to gate both async jobs - chunk->map_extend_work and pcpu_balance_work - so that we don't end up scheduling them while the needed subsystems aren't up yet. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: rename pcpu_reclaim_work to pcpu_balance_work	Tejun Heo	1	-15/+12
	pcpu_reclaim_work will also be used to populate chunks asynchronously. Rename it to pcpu_balance_work in preparation. pcpu_reclaim() is renamed to pcpu_balance_workfn() and some of its local variables are renamed too. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated	Tejun Heo	2	-10/+114
	pcpu_nr_empty_pop_pages counts the number of empty populated pages across all chunks and chunk->nr_populated counts the number of populated pages in a chunk. Both will be used to implement pre/async population for atomic allocations. pcpu_chunk_[de]populated() are added to update chunk->populated, chunk->nr_populated and pcpu_nr_empty_pop_pages together. All successful chunk [de]populations should be followed by the corresponding pcpu_chunk_[de]populated() calls. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: make sure chunk->map array has available space	Tejun Heo	1	-8/+45
	An allocation attempt may require extending chunk->map array which requires GFP_KERNEL context which isn't available for atomic allocations. This patch ensures that chunk->map array usually keeps some amount of available space by directly allocating buffer space during GFP_KERNEL allocations and scheduling async extension during atomic ones. This should make atomic allocation failures from map space exhaustion rare. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: implement [__]alloc_percpu_gfp()	Tejun Heo	2	-27/+46
	Now that pcpu_alloc_area() can allocate only from populated areas, it's easy to add atomic allocation support to [__]alloc_percpu(). Update pcpu_alloc() so that it accepts @gfp and skips all the blocking operations and allocates only from the populated areas if @gfp doesn't contain GFP_KERNEL. New interface functions [__]alloc_percpu_gfp() are added. While this means that atomic allocations are possible, this isn't complete yet as there's no mechanism to ensure that certain amount of populated areas is kept available and atomic allocations may keep failing under certain conditions. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: indent the population block in pcpu_alloc()	Tejun Heo	1	-17/+21
	The next patch will conditionalize the population block in pcpu_alloc() which will end up making a rather large indentation change obfuscating the actual logic change. This patch puts the block under "if (true)" so that the next patch can avoid indentation changes. The defintions of the local variables which are used only in the block are moved into the block. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: make pcpu_alloc_area() capable of allocating only from populated areas	Tejun Heo	1	-7/+58
	Update pcpu_alloc_area() so that it can skip unpopulated areas if the new parameter @pop_only is true. This is implemented by a new function, pcpu_fit_in_area(), which determines the amount of head padding considering the alignment and populated state. @pop_only is currently always false but this will be used to implement atomic allocation. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: restructure locking	Tejun Heo	2	-40/+37
	At first, the percpu allocator required a sleepable context for both alloc and free paths and used pcpu_alloc_mutex to protect everything. Later, pcpu_lock was introduced to protect the index data structure so that the free path can be invoked from atomic contexts. The conversion only updated what's necessary and left most of the allocation path under pcpu_alloc_mutex. The percpu allocator is planned to add support for atomic allocation and this patch restructures locking so that the coverage of pcpu_alloc_mutex is further reduced. * pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new chunk and populating the allocated area. Everything else is now protected soley by pcpu_lock. After this change, multiple instances of pcpu_extend_area_map() may race but the function already implements sufficient synchronization using pcpu_lock. This also allows multiple allocators to arrive at new chunk creation. To avoid creating multiple empty chunks back-to-back, a new chunk is created iff there is no other empty chunk after grabbing pcpu_alloc_mutex. * pcpu_lock is now held while modifying chunk->populated bitmap. After this, all data structures are protected by pcpu_lock. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: make percpu-km set chunk->populated bitmap properly	Tejun Heo	1	-0/+3
	percpu-km instantiates the whole chunk on creation and doesn't make use of chunk->populated bitmap and leaves it as zero. While this currently doesn't cause any problem, the inconsistency makes it difficult to build further logic on top of chunk->populated. This patch makes percpu-km fill chunk->populated on creation so that the bitmap is always consistent. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
2014-09-02	percpu: move region iterations out of pcpu_[de]populate_chunk()	Tejun Heo	3	-54/+28
	Previously, pcpu_[de]populate_chunk() were called with the range which may contain multiple target regions in it and pcpu_[de]populate_chunk() iterated over the regions. This has the benefit of batching up cache flushes for all the regions; however, we're planning to add more bookkeeping logic around [de]population to support atomic allocations and this delegation of iterations gets in the way. This patch moves the region iterations out of pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and pcpu_reclaim() - so that we can later add logic to track more states around them. This change may make cache and tlb flushes more frequent but multi-region [de]populations are rare anyway and if this actually becomes a problem, it's not difficult to factor out cache flushes as separate callbacks which are directly invoked from percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: move common parts out of pcpu_[de]populate_chunk()	Tejun Heo	3	-40/+31
	percpu-vm and percpu-km implement separate versions of pcpu_[de]populate_chunk() and some part which is or should be common are currently in the specific implementations. Make the following changes. * Allocate area clearing is moved from the pcpu_populate_chunk() implementations to pcpu_alloc(). This makes percpu-km's version noop. * Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved to their respective callers so that they are applied to percpu-km too. This doesn't make any meaningful difference as both functions are noop for percpu-km; however, this is more consistent and will help implementing atomic allocation support. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: remove @may_alloc from pcpu_get_pages()	Tejun Heo	1	-7/+8
	pcpu_get_pages() creates the temp pages array if not already allocated and returns the pointer to it. As the function is called from both [de]population paths and depopulation can only happen after at least one successful population, the param doesn't make any difference - the allocation will always happen on the population path anyway. Remove @may_alloc from pcpu_get_pages(). Also, add an lockdep assertion pcpu_alloc_mutex instead of vaguely stating that the exclusion is the caller's responsibility. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-09-02	percpu: remove the usage of separate populated bitmap in percpu-vm	Tejun Heo	1	-68/+25
	percpu-vm uses pcpu_get_pages_and_bitmap() to acquire temp pages array and populated bitmap and uses the two during [de]population. The temp bitmap is used only to build the new bitmap that is copied to chunk->populated after the operation succeeds; however, the new bitmap can be trivially set after success without using the temp bitmap. This patch removes the temp populated bitmap usage from percpu-vm.c. * pcpu_get_pages_and_bitmap() is renamed to pcpu_get_pages() and no longer hands out the temp bitmap. * @populated arugment is dropped from all the related functions. @populated updates in pcpu_[un]map_pages() are dropped. * Two loops in pcpu_map_pages() are merged. * pcpu_[de]populated_chunk() modify chunk->populated bitmap directly from @page_start and @page_end after success. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
2014-08-16	percpu: free percpu allocation info for uniprocessor system	Honggang Li	1	-0/+2
	Currently, only SMP system free the percpu allocation info. Uniprocessor system should free it too. For example, one x86 UML virtual machine with 256MB memory, UML kernel wastes one page memory. Signed-off-by: Honggang Li <enjoymindful@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2014-08-15	percpu: perform tlb flush after pcpu_map_pages() failure	Tejun Heo	1	-0/+1
	If pcpu_map_pages() fails midway, it unmaps the already mapped pages. Currently, it doesn't flush tlb after the partial unmapping. This may be okay in most cases as the established mapping hasn't been used at that point but it can go wrong and when it goes wrong it'd be extremely difficult to track down. Flush tlb after the partial unmapping. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2014-08-15	percpu: fix pcpu_alloc_pages() failure path	Tejun Heo	1	-6/+15
	When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to free what has already been allocated. The invocation is across the whole requested range and pcpu_free_pages() will try to free all non-NULL pages; unfortunately, this is incorrect as pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't clear the pages array and thus the array may have entries from the previous invocations making the partial failure path free incorrect pages. Fix it by open-coding the partial freeing of the already allocated pages. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2014-08-14	netlink: Annotate RCU locking for seq_file walker	Thomas Graf	1	-0/+2
	Silences the following sparse warnings: net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	rhashtable: fix annotations for rht_for_each_entry_rcu()	Thomas Graf	1	-8/+4
	Call rcu_deference_raw() directly from within rht_for_each_entry_rcu() as list_for_each_entry_rcu() does. Fixes the following sparse warnings: net/netlink/af_netlink.c:2906:25: expected struct rhash_head const __mptr net/netlink/af_netlink.c:2906:25: got struct rhash_head [noderef] <asn:4><noident> Fixes: e341694e3eb57fc ("netlink: Convert netlink_lookup() to use RCU protected hash table") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	rhashtable: unexport and make rht_obj() static	Thomas Graf	2	-8/+1
	No need to export rht_obj(), all inner to outer object translations occur internally. It was intended to be used with rht_for_each() which now primarily serves as the iterator for rhashtable_remove_pprev() to effectively flush and free the full table. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	rhashtable: RCU annotations for next pointers	Thomas Graf	2	-3/+3
	Properly annotate next pointers as access is RCU protected in the lookup path. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	tcp: fix ssthresh and undo for consecutive short FRTO episodes	Neal Cardwell	1	-5/+3
	Fix TCP FRTO logic so that it always notices when snd_una advances, indicating that any RTO after that point will be a new and distinct loss episode. Previously there was a very specific sequence that could cause FRTO to fail to notice a new loss episode had started: (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue (2) receiver ACKs packet 1 (3) FRTO sends 2 more packets (4) RTO timer fires again (should start a new loss episode) The problem was in step (3) above, where tcp_process_loss() returned early (in the spot marked "Step 2.b"), so that it never got to the logic to clear icsk_retransmits. Thus icsk_retransmits stayed non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero icsk_retransmits, decide that this RTO is not a new episode, and decide not to cut ssthresh and remember the current cwnd and ssthresh for undo. There were two main consequences to the bug that we have observed. First, ssthresh was not decreased in step (4). Second, when there was a series of such FRTO (1-4) sequences that happened to be followed by an FRTO undo, we would restore the cwnd and ssthresh from before the entire series started (instead of the cwnd and ssthresh from before the most recent RTO). This could result in cwnd and ssthresh being restored to values much bigger than the proper values. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic	Hannes Frederic Sowa	3	-6/+11
	tcp_tw_recycle heavily relies on tcp timestamps to build a per-host ordering of incoming connections and teardowns without the need to hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for the last measured RTO. To do so, we keep the last seen timestamp in a per-host indexed data structure and verify if the incoming timestamp in a connection request is strictly greater than the saved one during last connection teardown. Thus we can verify later on that no old data packets will be accepted by the new connection. During moving a socket to time-wait state we already verify if timestamps where seen on a connection. Only if that was the case we let the time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN will be used. But we don't verify this on incoming SYN packets. If a connection teardown was less than TCP_PAWS_MSL seconds in the past we cannot guarantee to not accept data packets from an old connection if no timestamps are present. We should drop this SYN packet. This patch closes this loophole. Please note, this patch does not make tcp_tw_recycle in any way more usable but only adds another safety check: Sporadic drops of SYN packets because of reordering in the network or in the socket backlog queues can happen. Users behing NAT trying to connect to a tcp_tw_recycle enabled server can get caught in blackholes and their connection requests may regullary get dropped because hosts behind an address translator don't have synchronized tcp timestamp clocks. tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled. In general, use of tcp_tw_recycle is disadvised. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()	Neal Cardwell	6	-5/+8
	Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications") Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	sit: Fix ipip6_tunnel_lookup device matching criteria	Shmulik Ladkani	1	-3/+3
	As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"), when looking up a tunnel, tunnel's underlying interface (t->parms.link) is verified to match incoming traffic's ingress device. However the comparison was incorrectly based on skb->dev->iflink. Instead, dev->ifindex should be used, which correctly represents the interface from which the IP stack hands the ipip6 packets. This allows setting up sit tunnels bound to vlan interfaces (otherwise incoming ipip6 traffic on the vlan interface was dropped due to ipip6_tunnel_lookup match failure). Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	net: ethernet: ibm: ehea: Remove duplicate object from Makefile	Andreas Ruprecht	1	-1/+1
	In the Makefile, ehea_phyp.o is included twice in the list of object files compile into ehea.o. This change removes one instance. Signed-off-by: Andreas Ruprecht <rupran@einserver.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	net: xgene: Check negative return value of xgene_enet_get_ring_size()	Tobias Klauser	1	-2/+5
	xgene_enet_get_ring_size() returns a negative value in case of an error, but its only caller in xgene_enet_create_desc_ring() currently uses the return value directly as u32. Instead, check for a negative value first and error out in case. Also move the call to xgene_enet_get_ring_size() before devm_kzalloc() so we don't need to free anything in the error path. This fixes the following issue reported by the Coverity Scanner: ** CID 1231336: Improper use of negative value (NEGATIVE_RETURNS) /drivers/net/ethernet/apm/xgene/xgene_enet_main.c: 596 in xgene_enet_create_desc_ring() Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)	Andrey Vagin	3	-9/+14
	We don't know right timestamp for repaired skb-s. Wrong RTT estimations isn't good, because some congestion modules heavily depends on it. This patch adds the TCPCB_REPAIRED flag, which is included in TCPCB_RETRANS. Thanks to Eric for the advice how to fix this issue. This patch fixes the warning: [ 879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380() [ 879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1 [ 879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 879.568177] 0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2 [ 879.568776] 0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80 [ 879.569386] ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc [ 879.569982] Call Trace: [ 879.570264] [<ffffffff817aa2d2>] dump_stack+0x4d/0x66 [ 879.570599] [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0 [ 879.570935] [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20 [ 879.571292] [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380 [ 879.571614] [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710 [ 879.571958] [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370 [ 879.572315] [<ffffffff81657459>] release_sock+0x89/0x1d0 [ 879.572642] [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860 [ 879.573000] [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80 [ 879.573352] [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40 [ 879.573678] [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20 [ 879.574031] [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0 [ 879.574393] [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b [ 879.574730] ---[ end trace a17cbc38eb8c5c00 ]--- v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit, where it's set for other skb-s. Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages") Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution") Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	net: xilinx: Remove .owner field for driver	Michal Simek	3	-3/+0
	There is no need to init .owner field. Based on the patch from Peter Griffin <peter.griffin@linaro.org> "mmc: remove .owner field for drivers using module_platform_driver" This patch removes the superflous .owner field for drivers which use the module_platform_driver API, as this is overriden in platform_driver_register anyway." Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	Revert "macvlan: simplify the structure port"	David S. Miller	1	-5/+7
	This reverts commit a188a54d11629bef2169052297e61f3767ca8ce5. It causes crashes ==================== [ 80.643286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000878 [ 80.670103] IP: [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0 [ 80.691289] PGD 22c102067 PUD 235bf0067 PMD 0 [ 80.706611] Oops: 0002 [#1] SMP [ 80.717836] Modules linked in: macvlan nfsd lockd nfs_acl exportfs auth_rpcgss sunrpc oid_registry ioatdma ixgbe(-) mdio igb dca [ 80.757935] CPU: 37 PID: 6724 Comm: rmmod Not tainted 3.16.0-net-next-08-12-2014-FCoE+ #1 [ 80.785688] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 [ 80.820310] task: ffff880235a9eae0 ti: ffff88022e844000 task.ti: ffff88022e844000 [ 80.845770] RIP: 0010:[<ffffffff810832e4>] [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0 [ 80.875326] RSP: 0018:ffff88022e847b28 EFLAGS: 00010046 [ 80.893251] RAX: 0000000000037a6a RBX: 0000000000000878 RCX: 0000000000000000 [ 80.917187] RDX: ffff880235a9eae0 RSI: 0000000000000001 RDI: ffffffff810832db [ 80.941125] RBP: ffff88022e847b58 R08: 0000000000000000 R09: 0000000000000000 [ 80.965056] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022e847b70 [ 80.988994] R13: 0000000000000000 R14: ffff88022e847be8 R15: ffffffff81ebe440 [ 81.012929] FS: 00007fab90b07700(0000) GS:ffff88043f7a0000(0000) knlGS:0000000000000000 [ 81.040400] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 81.059757] CR2: 0000000000000878 CR3: 0000000235a42000 CR4: 00000000001407e0 [ 81.083689] Stack: [ 81.090739] ffff880235a9eae0 0000000000000878 ffff88022e847b70 0000000000000000 [ 81.116253] ffff88022e847be8 ffffffff81ebe440 ffff88022e847b98 ffffffff810847f1 [ 81.141766] ffff88022e847b78 0000000000000286 ffff880234200000 0000000000000000 [ 81.167282] Call Trace: [ 81.175768] [<ffffffff810847f1>] __cancel_work_timer+0x31/0x170 [ 81.195985] [<ffffffff8108494b>] cancel_work_sync+0xb/0x10 [ 81.214769] [<ffffffffa015ae68>] macvlan_port_destroy+0x28/0x60 [macvlan] [ 81.237844] [<ffffffffa015b930>] macvlan_uninit+0x40/0x50 [macvlan] [ 81.259209] [<ffffffff816bf6e2>] rollback_registered_many+0x1a2/0x2c0 [ 81.281140] [<ffffffff816bf81a>] unregister_netdevice_many+0x1a/0xb0 [ 81.302786] [<ffffffffa015a4ff>] macvlan_device_event+0x1ef/0x240 [macvlan] [ 81.326439] [<ffffffff8108a13d>] notifier_call_chain+0x4d/0x70 [ 81.346366] [<ffffffff8108a201>] raw_notifier_call_chain+0x11/0x20 [ 81.367439] [<ffffffff816bf25b>] call_netdevice_notifiers_info+0x3b/0x70 [ 81.390228] [<ffffffff816bf2a1>] call_netdevice_notifiers+0x11/0x20 [ 81.411587] [<ffffffff816bf6bd>] rollback_registered_many+0x17d/0x2c0 [ 81.433518] [<ffffffff816bf925>] unregister_netdevice_queue+0x75/0x110 [ 81.455735] [<ffffffff816bfb2b>] unregister_netdev+0x1b/0x30 [ 81.475094] [<ffffffffa0039b50>] ixgbe_remove+0x170/0x1d0 [ixgbe] [ 81.495886] [<ffffffff813512a2>] pci_device_remove+0x32/0x60 [ 81.515246] [<ffffffff814c75c4>] __device_release_driver+0x64/0xd0 [ 81.536321] [<ffffffff814c76f8>] driver_detach+0xc8/0xd0 [ 81.554530] [<ffffffff814c656e>] bus_remove_driver+0x4e/0xa0 [ 81.573888] [<ffffffff814c828b>] driver_unregister+0x2b/0x60 [ 81.593246] [<ffffffff8135143e>] pci_unregister_driver+0x1e/0xa0 [ 81.613749] [<ffffffffa005db18>] ixgbe_exit_module+0x1c/0x2e [ixgbe] [ 81.635401] [<ffffffff810e738b>] SyS_delete_module+0x15b/0x1e0 [ 81.655334] [<ffffffff8187a395>] ? sysret_check+0x22/0x5d [ 81.673833] [<ffffffff810abd2d>] ? trace_hardirqs_on_caller+0x11d/0x1e0 [ 81.696339] [<ffffffff8132bfde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 81.717985] [<ffffffff8187a369>] system_call_fastpath+0x16/0x1b [ 81.738199] Code: 00 48 83 3d 6e bb da 00 00 48 89 c2 0f 84 67 01 00 00 fa 66 0f 1f 44 00 00 49 89 14 24 e8 b5 4b 02 00 45 84 ed 0f 85 ac 00 00 00 <f0> 0f ba 2b 00 72 1d 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 [ 81.807026] RIP [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0 [ 81.828468] RSP <ffff88022e847b28> [ 81.840384] CR2: 0000000000000878 [ 81.851731] ---[ end trace 9f6c7232e3464e11 ]--- ==================== This bug could be triggered by these steps: modprobe ixgbe ; modprobe macvlan ip link add link p96p1 address 00:1B:21:6E:06:00 macvlan0 type macvlan ip link add link p96p1 address 00:1B:21:6E:06:01 macvlan1 type macvlan ip link add link p96p1 address 00:1B:21:6E:06:02 macvlan2 type macvlan ip link add link p96p1 address 00:1B:21:6E:06:03 macvlan3 type macvlan rmmod ixgbe Reported-by: "Keller, Jacob E" <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14	timekeeping: Another fix to the VSYSCALL_OLD update_vsyscall	John Stultz	1	-2/+3
	Benjamin Herrenschmidt pointed out that I further missed modifying update_vsyscall after the wall_to_mono value was changed to a timespec64. This causes issues on powerpc32, which expects a 32bit timespec. This patch fixes the problem by properly converting from a timespec64 to a timespec before passing the value on to the arch-specific vsyscall logic. [ Thomas is currently on vacation, but reviewed it and wanted me to send this fix on to you directly. ] Cc: LKML <linux-kernel@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-14	mm: fix CROSS_MEMORY_ATTACH help text grammar	Geert Uytterhoeven	1	-1/+1
	Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-14	drivers/mfd/rtsx_usb.c: export device table	Jeff Mahoney	1	-0/+1
	The rtsx_usb driver contains the table for the devices it supports but doesn't export it. As a result, no alias is generated and it doesn't get loaded automatically. Via https://bugzilla.novell.com/show_bug.cgi?id=890096 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Marcel Witte <wittemar@googlemail.com> Cc: Roger Tseng <rogerable@realtek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-14	mm, hugetlb_cgroup: align hugetlb cgroup limit to hugepage size	David Rientjes	1	-0/+1
	Memcg aligns memory.limit_in_bytes to PAGE_SIZE as part of the resource counter since it makes no sense to allow a partial page to be charged. As a result of the hugetlb cgroup using the resource counter, it is also aligned to PAGE_SIZE but makes no sense unless aligned to the size of the hugepage being limited. Align hugetlb cgroup limit to hugepage size. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-14	iwlwifi: mvm: disable scheduled scan to prevent firmware crash	Emmanuel Grumbach	1	-1/+2
	There are firmwares which don't support scheduled scan. Disable it for now. Linus's system encoutered this issue. Thanks to David Spinadel for his help. Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2014-08-14	IB/srp: Fix return value check in srp_init_module()	Wei Yongjun	1	-2/+2
	In case of error, the function create_workqueue() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-13	sparc: Hook up memfd_create system call.	David S. Miller	3	-4/+5
	Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13	sparc64: Properly claim resources as each PCI bus is probed.	David S. Miller	1	-0/+32
	Perform a pci_claim_resource() on all valid resources discovered during the OF device tree scan. Based almost entirely upon the PCI OF bus probing code which does the same thing there. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13	sparc64: Skip bogus PCI bridge ranges.	David S. Miller	1	-0/+11
	It seems that when a PCI Express bridge is not in use and has no devices behind it, the ranges property is bogus. Specifically the size property is of the form [0xffffffff:...], and if you add this size to the resource start address the 64-bit calculation will overflow. Just check specifically for this size value signature and skip them. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13	sparc64: Expand PCI bridge probing debug logging.	David S. Miller	1	-1/+23
	Dump the various aspects of the PCI bridge probed at boot time, most importantly the bridge number ranges, and the ranges property. This helps diagnose PCI resource issues and other problems by giving ofpci_debug=1 on the boot command line. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13	xen-netback: remove loop waiting function	Wei Liu	1	-29/+0
	The original implementation relies on a loop to check if all inflight packets are freed. Now we have proper reference counting, there's no need to use loop anymore. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13	xen-netback: don't stop dealloc kthread too early	Wei Liu	3	-7/+42
	Reference count the number of packets in host stack, so that we don't stop the deallocation thread too early. If not, we can end up with xenvif_free permanently waiting for deallocation thread to unmap grefs. Reported-by: Thomas Leonard <talex5@gmail.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13	xen-netback: move NAPI add/remove calls	Wei Liu	1	-4/+5
	Originally netif_napi_add was in xenvif_init_queue and netif_napi_del was in xenvif_deinit_queue, while kthreads were handled in xenvif_connect and xenvif_disconnect. Move netif_napi_add and netif_napi_del to xenvif_connect and xenvif_disconnect so that they reside together with kthread operations. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13	xen-netback: fix debugfs entry creation	Wei Liu	1	-4/+4
	The original code is bogus. The function gets called in a loop which leaks entries created in previous rounds. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>