aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/seqlock.h (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2013-07-18macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang1-25/+37
We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-18tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang1-24/+38
We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. The bug were introduced from commit 0690899b4d4501b3505be069b9a687e68ccbe15b (tun: experimental zero copy tx support) Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-18pkt_sched: sch_qfq: remove a source of high packet delay/jitterPaolo Valente1-29/+56
QFQ+ inherits from QFQ a design choice that may cause a high packet delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a special quantity, the system virtual time, to track the service provided by the ideal system it approximates. When a packet is dequeued, this quantity must be incremented by the size of the packet, divided by the sum of the weights of the aggregates waiting to be served. Tracking this sum correctly is a non-trivial task, because, to preserve tight service guarantees, the decrement of this sum must be delayed in a special way [1]: this sum can be decremented only after that its value would decrease also in the ideal system approximated by QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous' weight sum, increased and decreased immediately as the weight of an aggregate changes, and as an aggregate is created or destroyed (which, in its turn, happens as a consequence of some class being created/destroyed/changed). However, to avoid the problems caused to service guarantees by these immediate decreases, QFQ+ increments the system virtual time using the maximum value allowed for the weight sum, 2^10, in place of the dynamic, instantaneous value. The instantaneous value of the weight sum is used only to check whether a request of weight increase or a class creation can be satisfied. Unfortunately, the problems caused by this choice are worse than the temporary degradation of the service guarantees that may occur, when a class is changed or destroyed, if the instantaneous value of the weight sum was used to update the system virtual time. In fact, the fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is equal to the ratio between the weight of the aggregate and the sum of the weights of the competing aggregates. The packet delay guaranteed to the aggregate is instead inversely proportional to the guaranteed bandwidth. By using the maximum possible value, and not the actual value of the weight sum, QFQ+ provides each aggregate with the worst possible service guarantees, and not with service guarantees related to the actual set of competing aggregates. To see the consequences of this fact, consider the following simple example. Suppose that only the following aggregates are backlogged, i.e., that only the classes in the following aggregates have packets to transmit: one aggregate with weight 10, say A, and ten aggregates with weight 1, say B1, B2, ..., B10. In particular, suppose that these aggregates are always backlogged. Given the weight distribution, the smoothest and fairest service order would be: A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ... QFQ+ would provide exactly this optimal service if it used the actual value for the weight sum instead of the maximum possible value, i.e., 11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it serves aggregates as follows (easy to prove and to reproduce experimentally): A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ... By replacing 10 with N in the above example, and by increasing N, one can increase at will the maximum packet delay and the jitter experienced by the classes in aggregate A. This patch addresses this issue by just using the above 'instantaneous' value of the weight sum, instead of the maximum possible value, when updating the system virtual time. After the instantaneous weight sum is decreased, QFQ+ may deviate from the ideal service for a time interval in the order of the time to serve one maximum-size packet for each backlogged class. The worst-case extent of the deviation exhibited by QFQ+ during this time interval [1] is basically the same as of the deviation described above (but, without this patch, QFQ+ suffers from such a deviation all the time). Finally, this patch modifies the comment to the function qfq_slot_insert, to make it coherent with the fact that the weight sum used by QFQ+ can now be lower than the maximum possible value. [1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix", Proceedings of AAA-IDEA'05, June 2005. Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-17xen-netfront: pull on receive skb may need to happen earlierJan Beulich1-18/+13
Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX") xennet_fill_frags() may end up filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce the fragment count subsequently via __pskb_pull_tail(). That's a result of xennet_get_responses() allowing a maximum of one more slot to be consumed (and intermediately transformed into a fragment) if the head slot has a size less than or equal to RX_COPY_THRESHOLD. Hence we need to adjust xennet_fill_frags() to pull earlier if we reached the maximum fragment count - due to the described behavior of xennet_get_responses() this guarantees that at least the first fragment will get completely consumed, and hence the fragment count reduced. In order to not needlessly call __pskb_pull_tail() twice, make the original call conditional upon the pull target not having been reached yet, and defer the newly added one as much as possible (an alternative would have been to always call the function right before the call to xennet_fill_frags(), but that would imply more frequent cases of needing to call it twice). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: stable@vger.kernel.org (3.6 onwards) Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-17vxlan: add necessary locking on device removalstephen hemminger1-0/+6
The socket management is now done in workqueue (outside of RTNL) and protected by vn->sock_lock. There were two possible bugs, first the vxlan device was removed from the VNI hash table per socket without holding lock. And there was a race when device is created and the workqueue could run after deletion. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16hyperv: Fix the NETIF_F_SG flag setting in netvscHaiyang Zhang1-2/+2
SG mode is not currently supported by netvsc, so remove this flag for now. Otherwise, it will be unconditionally enabled by commit ec5f0615642 "Kill link between CSUM and SG features" Previously, the SG feature is disabled because CSUM is not set here. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16net: Fix sysfs_format_mac() code duplication.David S. Miller1-20/+1
It's just a duplicate implementation of "%*phC". Thanks to Joe Perches for showing that we had exactly this support in the lib/vsprintf.c code already. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16be2net: Fix to avoid hardware workaround when not neededSarveshwar Bandi1-4/+10
Hardware workaround requesting hardware to skip vlan insertion is necessary only when umc or qnq is enabled. Enabling this workaround in other scenarios could cause controller to stall. Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16macvtap: do not assume 802.1Q when send vlan packetsJason Wang1-1/+1
The hard-coded 8021.q proto will break 802.1ad traffic. So switch to use vlan->proto. Cc: Basil Gor <basil.gor@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16macvtap: fix the missing ret value of TUNSETQUEUEJason Wang1-0/+1
Commit 441ac0fcaadc76ad09771812382345001dd2b813 (macvtap: Convert to using rtnl lock) forget to return what macvtap_ioctl_set_queue() returns to its caller. This may break multiqueue API by always falling through to TUNGETFEATURES. Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16ipv4: set transport header earlierEric Dumazet1-4/+3
commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a performance regression for non GRO traffic, basically disabling IP early demux. IPv6 stack resets transport header in ip6_rcv() before calling IP early demux in ip6_rcv_finish(), while IPv4 does this only in ip_local_deliver_finish(), _after_ IP early demux. GRO traffic happened to enable IP early demux because transport header is also set in inet_gro_receive() Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the same : transport_header should be set in ip_rcv() instead of ip_local_deliver_finish() ip_local_deliver_finish() can also use skb_network_header_len() which is faster than ip_hdrlen() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16mlx5 core: Fix __udivdi3 when compiling for 32 bit archesTim Gardner1-1/+1
Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16bgmac: add dependency to phylibHauke Mehrtens1-0/+1
bgmac is using functions from phylib, add the dependency. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16net/irda: fixed style issues in irlan_ethDragos Foianu1-17/+14
Applied fixes suggested by checkpatch.pl Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16ethtool: fixed trailing statements in ethtoolDragos Foianu1-10/+20
Applied fixes suggested by checkpatch.pl. Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16ndisc: bool initializations should use true and falseDaniel Baluta1-3/+3
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16atl1e: unmap partially mapped skb on dma error and free skbNeil Horman1-1/+23
Ben Hutchings pointed out that my recent update to atl1e in commit 352900b583b2852152a1e05ea0e8b579292e731e ("atl1e: fix dma mapping warnings") was missing a bit of code. Specifically it reset the hardware tx ring to its origional state when we hit a dma error, but didn't unmap any exiting mappings from the operation. This patch fixes that up. It also remembers to free the skb in the event that an error occurs, so we don't leak. Untested, as I don't have hardware. I think its pretty straightforward, but please review closely. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jay Cliburn <jcliburn@gmail.com> CC: Chris Snook <chris.snook@gmail.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-14Linux 3.11-rc1Linus Torvalds2-1604/+883
2013-07-14slub: Check for page NULL before doing the node_match checkSteven Rostedt1-1/+1
In the -rt kernel (mrg), we hit the following dump: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 PGD a2d39067 PUD b1641067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU 3 Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992 RIP: 0010:[<ffffffff811573f1>] [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 RSP: 0018:ffff8800a9b17d70 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000 RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500 RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500 R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000) Stack: ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011 0000000001200011 0000000001200011 0000000000000000 0000000000000000 00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd Call Trace: [<ffffffff81202e08>] ? current_has_perm+0x68/0x80 [<ffffffff81041cbd>] copy_process+0xdd/0x15b0 [<ffffffff810a2125>] ? rt_up_read+0x25/0x30 [<ffffffff8104369a>] do_fork+0x5a/0x360 [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220 [<ffffffff8100b068>] sys_clone+0x28/0x30 [<ffffffff81527423>] stub_clone+0x13/0x20 [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2 RIP [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 RSP <ffff8800a9b17d70> CR2: 0000000000000000 ---[ end trace 0000000000000002 ]--- Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do disable migration. But the SLUB code is relatively lockless, and the spin_locks there are raw_spin_locks (not converted to mutexes), thus I believe this bug can happen in mainline without -rt features. The -rt patch is just good at triggering mainline bugs ;-) Anyway, looking at where this crashed, it seems that the page variable can be NULL when passed to the node_match() function (which does not check if it is NULL). When this happens we get the above panic. As page is only used in slab_alloc() to check if the node matches, if it's NULL I'm assuming that we can say it doesn't and call the __slab_alloc() code. Is this a correct assumption? Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-14sunrpc: now we can just set ->s_d_opAl Viro1-3/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14cgroup: we can use simple_lookup() nowAl Viro1-10/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14efivarfs: we can use simple_lookup() nowAl Viro1-13/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14make simple_lookup() usable for filesystems that set ->s_d_opAl Viro1-1/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14configfs: don't open-code d_alloc_name()Al Viro1-11/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14__rpc_lookup_create_exclusive: pass string instead of qstrAl Viro1-25/+9
... and use d_hash_and_lookup() instead of open-coding it, for fsck sake... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14rpc_create_*_dir: don't bother with qstrAl Viro4-33/+23
just pass the name Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-13llist: llist_add() can use llist_add_batch()Oleg Nesterov1-10/+4
llist_add(new, head) can simply use llist_add_batch(new, new, head), no need to duplicate the code. This obviously uninlines llist_add() and to me this is a win. But we can make llist_add_batch() inline if this is desirable, in this case gcc can notice that new_first == new_last if the caller is llist_add(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-13llist: fix/simplify llist_add() and llist_add_batch()Oleg Nesterov2-22/+12
1. This is mostly theoretical, but llist_add*() need ACCESS_ONCE(). Otherwise it is not guaranteed that the first cmpxchg() uses the same value for old_entry and new_last->next. 2. These helpers cache the result of cmpxchg() and read the initial value of head->first before the main loop. I do not think this makes sense. In the likely case cmpxchg() succeeds, otherwise it doesn't hurt to reload head->first. I think it would be better to simplify the code and simply read ->first before cmpxchg(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-13fput: turn "list_head delayed_fput_list" into llist_headOleg Nesterov2-15/+12
fput() and delayed_fput() can use llist and avoid the locking. This is unlikely path, it is not that this change can improve the performance, but this way the code looks simpler. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-13fs/file_table.c:fput(): add commentAndrew Morton1-0/+6
A missed update to "fput: task_work_add() can fail if the caller has passed exit_task_work()". Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: David Howells <dhowells@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-13Safer ABI for O_TMPFILEAl Viro6-8/+12
[suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE; that will fail on old kernels in a lot more cases than what I came up with. And make sure O_CREAT doesn't get there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-12arm: multi_v7_defconfig: Tweaks for omap and sunxiOlof Johansson1-3/+7
OMAP recently changed how the platforms are configured, so OMAP2/3/4 SoC support is no longer enabled by default. Add them back. Enable new ethernet driver for sunxi. The i.MX console options moved due to resorting, no functional change. Signed-off-by: Olof Johansson <olof@lixom.net>
2013-07-13ext4: don't allow ext4_free_blocks() to fail due to ENOMEMTheodore Ts'o1-3/+8
The filesystem should not be marked inconsistent if ext4_free_blocks() is not able to allocate memory. Unfortunately some callers (most notably ext4_truncate) don't have a way to reflect an error back up to the VFS. And even if we did, most userspace applications won't deal with most system calls returning ENOMEM anyway. Reported-by: Nagachandra P <nagachandra@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-07-13ext4: fix spelling errors and a comment in extent_status treeTheodore Ts'o2-16/+14
Replace "assertation" with "assertion" in lots and lots of debugging messages. Correct the comment stating when ext4_es_insert_extent() is used. It was no doubt tree at one point, but it is no longer true... Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com>
2013-07-12ipv6: only static routes qualify for equal cost multipathingHannes Frederic Sowa1-4/+11
Static routes in this case are non-expiring routes which did not get configured by autoconf or by icmpv6 redirects. To make sure we actually get an ecmp route while searching for the first one in this fib6_node's leafs, also make sure it matches the ecmp route assumptions. v2: a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF already ensures that this route, even if added again without RTF_EXPIRES (in case of a RA announcement with infinite timeout), does not cause the rt6i_nsiblings logic to go wrong if a later RA updates the expiration time later. v3: a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so, because an pmtu event could update the RTF_EXPIRES flag and we would not count this route, if another route joins this set. We now filter only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that don't get changed after rt6_info construction. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12via-rhine: fix dma mapping errorsNeil Horman1-1/+16
this bug: https://bugzilla.redhat.com/show_bug.cgi?id=951695 Reported a dma debug backtrace: WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930() Hardware name: To Be Filled By O.E.M. via-rhine 0000:00:12.0: DMA-API: device driver failed to check map error[device address=0x0000000075a837b2] [size=90 bytes] [mapped as single] Modules linked in: ip6_tables gspca_spca561 gspca_main videodev media snd_hda_codec_realtek snd_hda_intel i2c_viapro snd_hda_codec snd_hwdep snd_seq ppdev mperf via_rhine coretemp snd_pcm mii microcode snd_page_alloc snd_timer snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore parport_pc parport shpchp ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm pata_via sata_via i2c_core uinput Pid: 295, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.1.fc20.x86_64 #1 Call Trace: <IRQ> [<ffffffff81068dd0>] warn_slowpath_common+0x70/0xa0 [<ffffffff81068e4c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8137ec6d>] check_unmap+0x47d/0x930 [<ffffffff810ace9f>] ? local_clock+0x5f/0x70 [<ffffffff8137f17f>] debug_dma_unmap_page+0x5f/0x70 [<ffffffffa0225edc>] ? rhine_ack_events.isra.14+0x3c/0x50 [via_rhine] [<ffffffffa02275f8>] rhine_napipoll+0x1d8/0xd80 [via_rhine] [<ffffffff815d3d51>] ? net_rx_action+0xa1/0x380 [<ffffffff815d3e22>] net_rx_action+0x172/0x380 [<ffffffff8107345f>] __do_softirq+0xff/0x400 [<ffffffff81073925>] irq_exit+0xb5/0xc0 [<ffffffff81724cd6>] do_IRQ+0x56/0xc0 [<ffffffff81719ff2>] common_interrupt+0x72/0x72 <EOI> [<ffffffff8170ff57>] ? __slab_alloc+0x4c2/0x526 [<ffffffff811992e0>] ? mmap_region+0x2b0/0x5a0 [<ffffffff810d5807>] ? __lock_is_held+0x57/0x80 [<ffffffff811992e0>] ? mmap_region+0x2b0/0x5a0 [<ffffffff811bf1bf>] kmem_cache_alloc+0x2df/0x360 [<ffffffff811992e0>] mmap_region+0x2b0/0x5a0 [<ffffffff811998e6>] do_mmap_pgoff+0x316/0x3d0 [<ffffffff81183ca0>] vm_mmap_pgoff+0x90/0xc0 [<ffffffff81197d6c>] sys_mmap_pgoff+0x4c/0x190 [<ffffffff81367d7e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101eb42>] sys_mmap+0x22/0x30 [<ffffffff81722fd9>] system_call_fastpath+0x16/0x1b Usual problem with the usual fix, add the appropriate calls to dma_mapping_error where appropriate Untested, as I don't have hardware, but its pretty straightforward Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: David S. Miller <davem@davemloft.net> CC: Roger Luethi <rl@hellgate.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12atl1e: fix dma mapping warningsNeil Horman1-3/+25
Recently had this backtrace reported: WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930() Hardware name: System Product Name ATL1E 0000:02:00.0: DMA-API: device driver failed to check map error[device address=0x00000000cbfd1000] [size=90 bytes] [mapped as single] Modules linked in: xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support snd_hda_intel acpi_cpufreq mperf coretemp btrfs zlib_deflate snd_hda_codec snd_hwdep microcode raid6_pq libcrc32c snd_seq usblp serio_raw xor snd_seq_device joydev snd_pcm snd_page_alloc snd_timer snd lpc_ich i2c_i801 soundcore mfd_core atl1e asus_atk0110 ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core pata_marvell uinput Pid: 314, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.3.fc19.x86_64 #1 Call Trace: <IRQ> [<ffffffff81069106>] warn_slowpath_common+0x66/0x80 [<ffffffff8106916c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8138151d>] check_unmap+0x47d/0x930 [<ffffffff810ad048>] ? sched_clock_cpu+0xa8/0x100 [<ffffffff81381a2f>] debug_dma_unmap_page+0x5f/0x70 [<ffffffff8137ce30>] ? unmap_single+0x20/0x30 [<ffffffffa01569a1>] atl1e_intr+0x3a1/0x5b0 [atl1e] [<ffffffff810d53fd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff81119636>] handle_irq_event_percpu+0x56/0x390 [<ffffffff811199ad>] handle_irq_event+0x3d/0x60 [<ffffffff8111cb6a>] handle_fasteoi_irq+0x5a/0x100 [<ffffffff8101c36f>] handle_irq+0xbf/0x150 [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50 [<ffffffff81073b10>] ? irq_enter+0x50/0xa0 [<ffffffff8172738d>] do_IRQ+0x4d/0xc0 [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50 [<ffffffff8171c6b2>] common_interrupt+0x72/0x72 <EOI> [<ffffffff810db5b2>] ? lock_release+0xc2/0x310 [<ffffffff8109ea04>] lg_local_unlock_cpu+0x24/0x50 [<ffffffff811dcb2f>] file_sb_list_del+0x3f/0x50 [<ffffffff811dcb6d>] fput+0x2d/0xc0 [<ffffffff811d8ea1>] filp_close+0x61/0x90 [<ffffffff811fae4d>] __close_fd+0x8d/0x150 [<ffffffff811d8ef0>] sys_close+0x20/0x50 [<ffffffff81725699>] system_call_fastpath+0x16/0x1b The usual straighforward failure to check for dma_mapping_error after a map operation is completed. This patch should fix it, the reporter wandered off after filing this bz: https://bugzilla.redhat.com/show_bug.cgi?id=954170 and I don't have hardware to test, but the fix is pretty straightforward, so I figured I'd post it for review. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Jay Cliburn <jcliburn@gmail.com> CC: Chris Snook <chris.snook@gmail.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12tcp: account all retransmit failuresYuchung Cheng1-3/+4
Change snmp RETRANSFAILS stat to include timeout retransmit failures in addition to other loss recoveries. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12usb/net/r815x: fix cast to restricted __le32hayeswang1-9/+12
>> drivers/net/usb/r815x.c:38:16: sparse: cast to restricted __le32 >> drivers/net/usb/r815x.c:67:15: sparse: cast to restricted __le32 >> drivers/net/usb/r815x.c:69:13: sparse: incorrect type in assignment (different base types) drivers/net/usb/r815x.c:69:13: expected unsigned int [unsigned] [addressable] [assigned] [usertype] tmp drivers/net/usb/r815x.c:69:13: got restricted __le32 [usertype] <noident> Signed-off-by: Hayes Wang <hayeswang@realtek.com> Spotted-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12usb/net/r8152: fix integer overflow in expressionhayeswang1-1/+2
config: make ARCH=avr32 allyesconfig drivers/net/usb/r8152.c: In function 'rtl8152_start_xmit': drivers/net/usb/r8152.c:956: warning: integer overflow in expression 955 memset(tx_desc, 0, sizeof(*tx_desc)); > 956 tx_desc->opts1 = cpu_to_le32((len & TX_LEN_MASK) | TX_FS | TX_LS); 957 tp->tx_skb = skb; Signed-off-by: Hayes Wang <hayeswang@realtek.com> Spotted-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12net: access page->private by using page_privateSunghan Suh1-3/+3
Signed-off-by: Sunghan Suh <sunghan.suh@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12net: strict_strtoul is obsolete, use kstrtoul instead“Cosmin1-1/+1
patch found using checkpatch.pl Signed-off-by: Cosmin Stanescu <cosmin90stanescu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12arm: multi_v7_defconfig: add i.MX options and NFS rootVincent Stehlé1-0/+10
- Add i.MX serial console support. This gives us the boot messages on UART on e.g. the i.MX6Q sabre sd platform. - Add the necessary config options, to allow booting with NFS root on an i.MX6 sabre sd. - Add Freescale LPUART serial console support. This gives us the boot messages on UART on e.g. the Vybrid VF610 Tower board. Signed-off-by: Vincent Stehlé <vincent.stehle@freescale.com> Cc: Russell King <linux@arm.linux.org.uk> [olof: squashed three commits down to one] Signed-off-by: Olof Johansson <olof@lixom.net>
2013-07-12perf/x86: Fix incorrect use of do_div() in NMI warningDave Hansen1-3/+4
I completely botched understanding the calling conventions of do_div(). I assumed that do_div() returned the result instead of realizing that it modifies its argument and returns a remainder. The side-effect from this would be bogus numbers for the "msecs" value in the warning messages: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.114 msecs Note, there was a second fix posted by Stephane Eranian for a separate patch which I also botched: http://lkml.kernel.org/r/20130704223010.GA30625@quad Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20130708214404.B0B6EA66@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12sched: Fix HRTICKPeter Zijlstra1-9/+11
David reported that the HRTICK sched feature was borken; which was enough motivation for me to finally fix it ;-) We should not allow hrtimer code to do softirq wakeups while holding scheduler locks. The hrtimer code only needs this when we accidentally try to program an expired time. We don't much care about those anyway since we have the regular tick to fall back to. Reported-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130628091853.GE29209@dyad.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12tick: broadcast: Check broadcast mode on CPU hotplugStephen Boyd1-1/+4
On ARM systems the dummy clockevent is registered with the cpu hotplug notifier chain before any other per-cpu clockevent. This has the side-effect of causing the dummy clockevent to be registered first in every hotplug sequence. Because the dummy is first, we'll try to turn the broadcast source on but the code in tick_device_uses_broadcast() assumes the broadcast source is in periodic mode and calls tick_broadcast_start_periodic() unconditionally. On boot this isn't a problem because we typically haven't switched into oneshot mode yet (if at all). During hotplug, if the broadcast source isn't in periodic mode we'll replace the broadcast oneshot handler with the broadcast periodic handler and start emulating oneshot mode when we shouldn't. Due to the way the broadcast oneshot handler programs the next_event it's possible for it to contain KTIME_MAX and cause us to hang the system when the periodic handler tries to program the next tick. Fix this by using the appropriate function to start the broadcast source. Reported-by: Stephen Warren <swarren@nvidia.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: ARM kernel mailing list <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joseph Lo <josephl@nvidia.com> Link: http://lkml.kernel.org/r/20130711140059.GA27430@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-12mutex: Move ww_mutex definitions to ww_mutex.hMaarten Lankhorst5-359/+381
Move the definitions for wound/wait mutexes out to a separate header, ww_mutex.h. This reduces clutter in mutex.h, and increases readability. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Dave Airlie <airlied@gmail.com> Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com [ Tidied up the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12perf: Fix perf_lock_task_context() vs RCUPeter Zijlstra1-1/+14
Jiri managed to trigger this warning: [] ====================================================== [] [ INFO: possible circular locking dependency detected ] [] 3.10.0+ #228 Tainted: G W [] ------------------------------------------------------- [] p/6613 is trying to acquire lock: [] (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250 [] [] but task is already holding lock: [] (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0 [] [] which lock already depends on the new lock. [] [] the existing dependency chain (in reverse order) is: [] [] -> #4 (&ctx->lock){-.-...}: [] -> #3 (&rq->lock){-.-.-.}: [] -> #2 (&p->pi_lock){-.-.-.}: [] -> #1 (&rnp->nocb_gp_wq[1]){......}: [] -> #0 (rcu_node_0){..-...}: Paul was quick to explain that due to preemptible RCU we cannot call rcu_read_unlock() while holding scheduler (or nested) locks when part of the read side critical section was preemptible. Therefore solve it by making the entire RCU read side non-preemptible. Also pull out the retry from under the non-preempt to play nice with RT. Reported-by: Jiri Olsa <jolsa@redhat.com> Helped-out-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenarioJiri Olsa1-1/+10
The '!ctx->is_active' check has a valid scenario, so there's no need for the warning. The reason is that there's a time window between the 'ctx->is_active' check in the perf_event_enable() function and the __perf_event_enable() function having: - IRQs on - ctx->lock unlocked where the task could be killed and 'ctx' deactivated by perf_event_exit_task(), ending up with the warning below. So remove the WARN_ON_ONCE() check and add comments to explain it all. This addresses the following warning reported by Vince Weaver: [ 324.983534] ------------[ cut here ]------------ [ 324.984420] WARNING: at kernel/events/core.c:1953 __perf_event_enable+0x187/0x190() [ 324.984420] Modules linked in: [ 324.984420] CPU: 19 PID: 2715 Comm: nmi_bug_snb Not tainted 3.10.0+ #246 [ 324.984420] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ 324.984420] 0000000000000009 ffff88043fce3ec8 ffffffff8160ea0b ffff88043fce3f00 [ 324.984420] ffffffff81080ff0 ffff8802314fdc00 ffff880231a8f800 ffff88043fcf7860 [ 324.984420] 0000000000000286 ffff880231a8f800 ffff88043fce3f10 ffffffff8108103a [ 324.984420] Call Trace: [ 324.984420] <IRQ> [<ffffffff8160ea0b>] dump_stack+0x19/0x1b [ 324.984420] [<ffffffff81080ff0>] warn_slowpath_common+0x70/0xa0 [ 324.984420] [<ffffffff8108103a>] warn_slowpath_null+0x1a/0x20 [ 324.984420] [<ffffffff81134437>] __perf_event_enable+0x187/0x190 [ 324.984420] [<ffffffff81130030>] remote_function+0x40/0x50 [ 324.984420] [<ffffffff810e51de>] generic_smp_call_function_single_interrupt+0xbe/0x130 [ 324.984420] [<ffffffff81066a47>] smp_call_function_single_interrupt+0x27/0x40 [ 324.984420] [<ffffffff8161fd2f>] call_function_single_interrupt+0x6f/0x80 [ 324.984420] <EOI> [<ffffffff816161a1>] ? _raw_spin_unlock_irqrestore+0x41/0x70 [ 324.984420] [<ffffffff8113799d>] perf_event_exit_task+0x14d/0x210 [ 324.984420] [<ffffffff810acd04>] ? switch_task_namespaces+0x24/0x60 [ 324.984420] [<ffffffff81086946>] do_exit+0x2b6/0xa40 [ 324.984420] [<ffffffff8161615c>] ? _raw_spin_unlock_irq+0x2c/0x30 [ 324.984420] [<ffffffff81087279>] do_group_exit+0x49/0xc0 [ 324.984420] [<ffffffff81096854>] get_signal_to_deliver+0x254/0x620 [ 324.984420] [<ffffffff81043057>] do_signal+0x57/0x5a0 [ 324.984420] [<ffffffff8161a164>] ? __do_page_fault+0x2a4/0x4e0 [ 324.984420] [<ffffffff8161665c>] ? retint_restore_args+0xe/0xe [ 324.984420] [<ffffffff816166cd>] ? retint_signal+0x11/0x84 [ 324.984420] [<ffffffff81043605>] do_notify_resume+0x65/0x80 [ 324.984420] [<ffffffff81616702>] retint_signal+0x46/0x84 [ 324.984420] ---[ end trace 442ec2f04db3771a ]--- Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1373384651-6109-2-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12perf: Clone child context from parent context pmuJiri Olsa1-1/+1
Currently when the child context for inherited events is created, it's based on the pmu object of the first event of the parent context. This is wrong for the following scenario: - HW context having HW and SW event - HW event got removed (closed) - SW event stays in HW context as the only event and its pmu is used to clone the child context The issue starts when the cpu context object is touched based on the pmu context object (__get_cpu_context). In this case the HW context will work with SW cpu context ending up with following WARN below. Fixing this by using parent context pmu object to clone from child context. Addresses the following warning reported by Vince Weaver: [ 2716.472065] ------------[ cut here ]------------ [ 2716.476035] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x) [ 2716.476035] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn [ 2716.476035] CPU: 0 PID: 3164 Comm: perf_fuzzer Not tainted 3.10.0-rc4 #2 [ 2716.476035] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2 [ 2716.476035] 0000000000000000 ffffffff8102e215 0000000000000000 ffff88011fc18 [ 2716.476035] ffff8801175557f0 0000000000000000 ffff880119fda88c ffffffff810ad [ 2716.476035] ffff880119fda880 ffffffff810af02a 0000000000000009 ffff880117550 [ 2716.476035] Call Trace: [ 2716.476035] [<ffffffff8102e215>] ? warn_slowpath_common+0x5b/0x70 [ 2716.476035] [<ffffffff810ab2bd>] ? task_ctx_sched_out+0x3c/0x5f [ 2716.476035] [<ffffffff810af02a>] ? perf_event_exit_task+0xbf/0x194 [ 2716.476035] [<ffffffff81032a37>] ? do_exit+0x3e7/0x90c [ 2716.476035] [<ffffffff810cd5ab>] ? __do_fault+0x359/0x394 [ 2716.476035] [<ffffffff81032fe6>] ? do_group_exit+0x66/0x98 [ 2716.476035] [<ffffffff8103dbcd>] ? get_signal_to_deliver+0x479/0x4ad [ 2716.476035] [<ffffffff810ac05c>] ? __perf_event_task_sched_out+0x230/0x2d1 [ 2716.476035] [<ffffffff8100205d>] ? do_signal+0x3c/0x432 [ 2716.476035] [<ffffffff810abbf9>] ? ctx_sched_in+0x43/0x141 [ 2716.476035] [<ffffffff810ac2ca>] ? perf_event_context_sched_in+0x7a/0x90 [ 2716.476035] [<ffffffff810ac311>] ? __perf_event_task_sched_in+0x31/0x118 [ 2716.476035] [<ffffffff81050dd9>] ? mmdrop+0xd/0x1c [ 2716.476035] [<ffffffff81051a39>] ? finish_task_switch+0x7d/0xa6 [ 2716.476035] [<ffffffff81002473>] ? do_notify_resume+0x20/0x5d [ 2716.476035] [<ffffffff813654f5>] ? retint_signal+0x3d/0x78 [ 2716.476035] ---[ end trace 827178d8a5966c3d ]--- Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1373384651-6109-1-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>