aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2015-06-19sched/deadline: Make init_sched_dl_class() __initWanpeng Li1-1/+1
It's a bootstrap function, make init_sched_dl_class() __init. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431496867-4194-2-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/deadline: Optimize pull_dl_task()Wanpeng Li1-1/+27
pull_dl_task() uses pick_next_earliest_dl_task() to select a migration candidate; this is sub-optimal since the next earliest task -- as per the regular runqueue -- might not be migratable at all. This could result in iterating the entire runqueue looking for a task. Instead iterate the pushable queue -- this queue only contains tasks that have at least 2 cpus set in their cpus_allowed mask. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [ Improved the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431496867-4194-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/preempt: Add static_key() to preempt_notifiersPeter Zijlstra1-5/+23
Avoid touching the curr->preempt_notifier cacheline when not needed. Provides a small improvement on pipe-bench: taskset 01 perf stat --repeat 10 -- perf bench sched pipe before: Performance counter stats for 'perf bench sched pipe' (10 runs): 12385.016204 task-clock (msec) # 1.001 CPUs utilized ( +- 0.34% ) 2,000,023 context-switches # 0.161 M/sec ( +- 0.00% ) 0 cpu-migrations # 0.000 K/sec 175 page-faults # 0.014 K/sec ( +- 0.26% ) 41,376,162,250 cycles # 3.341 GHz ( +- 0.11% ) 17,389,139,321 stalled-cycles-frontend # 42.03% frontend cycles idle ( +- 0.25% ) <not supported> stalled-cycles-backend 68,788,588,003 instructions # 1.66 insns per cycle # 0.25 stalled cycles per insn ( +- 0.02% ) 13,449,387,620 branches # 1085.940 M/sec ( +- 0.02% ) 20,880,690 branch-misses # 0.16% of all branches ( +- 0.98% ) 12.372646094 seconds time elapsed ( +- 0.34% ) after: Performance counter stats for 'perf bench sched pipe' (10 runs): 12180.936528 task-clock (msec) # 1.001 CPUs utilized ( +- 0.33% ) 2,000,077 context-switches # 0.164 M/sec ( +- 0.00% ) 0 cpu-migrations # 0.000 K/sec 174 page-faults # 0.014 K/sec ( +- 0.27% ) 40,691,545,577 cycles # 3.341 GHz ( +- 0.06% ) 16,446,333,371 stalled-cycles-frontend # 40.42% frontend cycles idle ( +- 0.18% ) <not supported> stalled-cycles-backend 68,570,100,387 instructions # 1.69 insns per cycle # 0.24 stalled cycles per insn ( +- 0.01% ) 13,389,740,014 branches # 1099.237 M/sec ( +- 0.01% ) 20,175,440 branch-misses # 0.15% of all branches ( +- 0.52% ) 12.169253010 seconds time elapsed ( +- 0.33% ) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iterationMathieu Desnoyers1-1/+1
preempt_notifier_unregister() documents: "This is safe to call from within a preemption notifier." However, both fire_sched_in_preempt_notifiers() and fire_sched_out_preempt_notifiers() are using hlist_for_each_entry(), which is not safe against entry removal during iteration. Inspection of the KVM code does not reveal any use of preempt_notifier_unregister() within the preempt notifiers. Therefore, fix the comment. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431881590-1456-1-git-send-email-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/stop_machine: Fix deadlock between multiple stop_two_cpus()Peter Zijlstra3-37/+32
Jiri reported a machine stuck in multi_cpu_stop() with migrate_swap_stop() as function and with the following src,dst cpu pairs: {11, 4} {13, 11} { 4, 13} 4 11 13 cpuM: queue(4 ,13) *Ma cpuN: queue(13,11) *N Na *M Mb cpuO: queue(11, 4) *O Oa *Nb *Ob Where *X denotes the cpu running the queueing of cpu-X and X[ab] denotes the first/second queued work. You'll observe the top of the workqueue for each cpu: 4,11,13 to be work from cpus: M, O, N resp. IOW. deadlock. Do away with the queueing trickery and introduce lg_double_lock() to lock both CPUs and fully serialize the stop_two_cpus() callers instead of the partial (and buggy) serialization we have now. Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150605153023.GH19282@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/debug: Add sum_sleep_runtime to /proc/<pid>/schedSrikar Dronamraju1-0/+1
When CONFIG_SCHEDSTATS is enabled, /proc/<pid>/sched prints almost all sched statistics except sum_sleep_runtime. Since sum_sleep_runtime is a good info to collect, add this it to /proc/<pid>/sched. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433751041-11724-4-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/debug: Replace vruntime with wait_sum in /proc/sched_debugSrikar Dronamraju1-2/+2
Within runnable tasks in /proc/sched_debug, vruntime is printed twice, once as tree-key and again as exec-runtime. Since exec-runtime isnt populated in !CONFIG_SCHEDSTATS, use this field to print wait_sum. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433751041-11724-3-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/debug: Properly format runnable tasks in /proc/sched_debugSrikar Dronamraju1-2/+4
With !CONFIG_SCHEDSTATS, runnable tasks in /proc/sched_debug has too many columns than required. Fix this by printing appropriate columns. While at this, print sum_exec_runtime, since this information is available even in !CONFIG_SCHEDSTATS case. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433751041-11724-2-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07sched/numa: Only consider less busy nodes as numa balancing destinationsRik van Riel1-2/+28
Changeset a43455a1d572 ("sched/numa: Ensure task_numa_migrate() checks the preferred node") fixes an issue where workloads would never converge on a fully loaded (or overloaded) system. However, it introduces a regression on less than fully loaded systems, where workloads converge on a few NUMA nodes, instead of properly staying spread out across the whole system. This leads to a reduction in available memory bandwidth, and usable CPU cache, with predictable performance problems. The root cause appears to be an interaction between the load balancer and NUMA balancing, where the short term load represented by the load balancer differs from the long term load the NUMA balancing code would like to base its decisions on. Simply reverting a43455a1d572 would re-introduce the non-convergence of workloads on fully loaded systems, so that is not a good option. As an aside, the check done before a43455a1d572 only applied to a task's preferred node, not to other candidate nodes in the system, so the converge-on-too-few-nodes problem still happens, just to a lesser degree. Instead, try to compensate for the impedance mismatch between the load balancer and NUMA balancing by only ever considering a lesser loaded node as a destination for NUMA balancing, regardless of whether the task is trying to move to the preferred node, or to another node. This patch also addresses the issue that a system with a single runnable thread would never migrate that thread to near its memory, introduced by 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced"). A test where the main thread creates a large memory area, and spawns a worker thread to iterate over the memory (placed on another node by select_task_rq_fair), after which the main thread goes to sleep and waits for the worker thread to loop over all the memory now sees the worker thread migrated to where the memory is, instead of having all the memory migrated over like before. Jirka has run a number of performance tests on several systems: single instance SpecJBB 2005 performance is 7-15% higher on a 4 node system, with higher gains on systems with more cores per socket. Multi-instance SpecJBB 2005 (one per node), linpack, and stream see little or no changes with the revert of 095bebf61a46 and this patch. Reported-by: Artem Bityutski <dedekind1@gmail.com> Reported-by: Jirka Hladky <jhladky@redhat.com> Tested-by: Jirka Hladky <jhladky@redhat.com> Tested-by: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150528095249.3083ade0@annuminas.surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07Revert 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced")Rik van Riel1-26/+15
Commit 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced") broke convergence of workloads with just one runnable thread, by making it impossible for the one runnable thread on the system to move from one NUMA node to another. Instead, the thread would remain where it was, and pull all the memory across to its location, which is much slower than just migrating the thread to where the memory is. The next patch has a better fix for the issue that 095bebf61a46 tried to address. Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: mgorman@suse.de Link: http://lkml.kernel.org/r/1432753468-7785-2-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07sched/fair: Prevent throttling in early pick_next_task_fair()Ben Segall1-11/+14
The optimized task selection logic optimistically selects a new task to run without first doing a full put_prev_task(). This is so that we can avoid a put/set on the common ancestors of the old and new task. Similarly, we should only call check_cfs_rq_runtime() to throttle eligible groups if they're part of the common ancestry, otherwise it is possible to end up with no eligible task in the simple task selection. Imagine: /root /prev /next /A /B If our optimistic selection ends up throttling /next, we goto simple and our put_prev_task() ends up throttling /prev, after which we're going to bug out in set_next_entity() because there aren't any tasks left. Avoid this scenario by only throttling common ancestors. Reported-by: Mohammed Naser <mnaser@vexxhost.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Ben Segall <bsegall@google.com> [ munged Changelog ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pjt@google.com Fixes: 678d5718d8d0 ("sched/fair: Optimize cgroup pick_next_task_fair()") Link: http://lkml.kernel.org/r/xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07preempt: Reorganize the notrace definitions a bitFrederic Weisbecker1-17/+15
preempt.h has two seperate "#ifdef CONFIG_PREEMPT" sections: one to define preempt_enable() and another to define preempt_enable_notrace(). Lets gather both. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07preempt: Use preempt_schedule_context() as the official tracing preemption pointFrederic Weisbecker8-32/+13
preempt_schedule_context() is a tracing safe preemption point but it's only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing recursion issues since commit: b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers") introduced function based preemp_count_*() ops. Lets make it available on all configs and give it a more appropriate name for its new position. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07sched: Make preempt_schedule_context() function-tracing safeFrederic Weisbecker1-2/+9
Since function tracing disables preemption, it needs a safe preemption point to use when preemption is re-enabled without worrying about tracing recursion. Ie: to avoid tracing recursion, that preemption point can't be traced (use of notrace qualifier) and it can't call any traceable function before that preemption point disables preemption itself, which disarms the recursion. preempt_schedule() was fine until commit: b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers") because PREEMPT_ACTIVE (which has the property to disable preemption and this disarm tracing preemption recursion) was set before calling any further function. But that commit introduced the use of preempt_count_add/sub() functions to set PREEMPT_ACTIVE and because these functions are called before preemption gets a chance to be disabled, we have a tracing recursion. preempt_schedule_context() is one of the possible preemption functions used by tracing. Its special purpose is to avoid tracing recursion against context tracking. Lets enhance this function to become more generally tracing safe by disabling preemption with raw accessors, such that no function is called before preemption gets disabled and disarm the tracing recursion. This function is going to become the specific tracing-safe preemption point in further commit. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-01vti6: Add pmtu handling to vti6_xmit.Steffen Klassert1-0/+14
We currently rely on the PMTU discovery of xfrm. However if a packet is localy sent, the PMTU mechanism of xfrm tries to to local socket notification what might not work for applications like ping that don't check for this. So add pmtu handling to vti6_xmit to report MTU changes immediately. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Revert "net: core: 'ethtool' issue with querying phy settings"David S. Miller1-9/+1
This reverts commit f96dee13b8e10f00840124255bed1d8b4c6afd6f. It isn't right, ethtool is meant to manage one PHY instance per netdevice at a time, and this is selected by the SET command. Therefore by definition the GET command must only return the settings for the configured and selected PHY. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01bnx2x: Move statistics implementation into semaphoresYuval Mintz3-11/+20
Commit dff173de84958 ("bnx2x: Fix statistics locking scheme") changed the bnx2x locking around statistics state into using a mutex - but the lock is being accessed via a timer which is forbidden. [If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about accessing the mutex in interrupt context] This moves the implementation into using a semaphore [with size '1'] instead. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01xen: netback: read hotplug script once at start of day.Ian Campbell1-14/+19
When we come to tear things down in netback_remove() and generate the uevent it is possible that the xenstore directory has already been removed (details below). In such cases netback_uevent() won't be able to read the hotplug script and will write a xenstore error node. A recent change to the hypervisor exposed this race such that we now sometimes lose it (where apparently we didn't ever before). Instead read the hotplug script configuration during setup and use it for the lifetime of the backend device. The apparently more obvious fix of moving the transition to state=Closed in netback_remove() to after the uevent does not work because it is possible that we are already in state=Closed (in reaction to the guest having disconnected as it shutdown). Being already in Closed means the toolstack is at liberty to start tearing down the xenstore directories. In principal it might be possible to arrange to unregister the device sooner (e.g on transition to Closing) such that xenstore would still be there but this state machine is fragile and prone to anger... A modern Xen system only relies on the hotplug uevent for driver domains, when the backend is in the same domain as the toolstack it will run the necessary setup/teardown directly in the correct sequence wrt xenstore changes. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01xen: netback: fix printf format string warningIan Campbell1-1/+1
drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=] (txreq.offset&~PAGE_MASK) + txreq.size); ^ PAGE_MASK's type can vary by arch, so a cast is needed. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> ---- v2: Cast to unsigned long, since PAGE_MASK can vary by arch. Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Revert "netfilter: ensure number of counters is >0 in do_replace()"Bernhard Thaler1-4/+0
This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c. Setting rules with ebtables does not work any more with 1086bbe97a07 place. There is an error message and no rules set in the end. e.g. ~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP Unable to update the kernel. Two possible causes: 1. Multiple ebtables programs were executing simultaneously. The ebtables userspace tool doesn't by default support multiple ebtables programs running Reverting the ebtables part of 1086bbe97a07 makes this work again. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-01include/uapi/linux/virtio_balloon.h: include linux/virtio_types.hMikko Rapeli1-0/+1
Fixes userspace compilation error: error: unknown type name ‘__virtio16’ __virtio16 tag; Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-05-31sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTEKhalid Aziz6-22/+104
sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE Bit 9 of TTE is CV (Cacheable in V-cache) on sparc v9 processor while the same bit 9 is MCDE (Memory Corruption Detection Enable) on M7 processor. This creates a conflicting usage of the same bit. Kernel sets TTE.cv bit on all pages for sun4v architecture which works well for sparc v9 but enables memory corruption detection on M7 processor which is not the intent. This patch adds code to determine if kernel is running on M7 processor and takes steps to not enable memory corruption detection in TTE erroneously. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31sparc64: pci slots information is not populated in sysfsEric Snowberg1-8/+51
Add PCI slot numbers within sysfs for PCIe hardware. Larger PCIe systems with nested PCI bridges and slots further down on these bridges were not being populated within sysfs. This will add ACPI style PCI slot numbers for these systems since the OF 'slot-names' information is not available on all PCIe platforms. Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31sparc: kernel: GRPCI2: Remove a useless memsetChristophe Jaillet1-1/+0
grpci2priv is allocated using kzalloc, so there is no need to memset it. Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31net: dsa: Properly propagate errors from dsa_switch_setup_oneFlorian Fainelli1-2/+2
While shuffling some code around, dsa_switch_setup_one() was introduced, and it was modified to return either an error code using ERR_PTR() or a NULL pointer when running out of memory or failing to setup a switch. This is a problem for its caler: dsa_switch_setup() which uses IS_ERR() and expects to find an error code, not a NULL pointer, so we still try to proceed with dsa_switch_setup() and operate on invalid memory addresses. This can be easily reproduced by having e.g: the bcm_sf2 driver built-in, but having no such switch, such that drv->setup will fail. Fix this by using PTR_ERR() consistently which is both more informative and avoids for the caller to use IS_ERR_OR_NULL(). Fixes: df197195a5248 ("net: dsa: split dsa_switch_setup into two functions") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31tcp: fix child sockets to use system default congestion control if not setNeal Cardwell3-3/+10
Linux 3.17 and earlier are explicitly engineered so that if the app doesn't specifically request a CC module on a listener before the SYN arrives, then the child gets the system default CC when the connection is established. See tcp_init_congestion_control() in 3.17 or earlier, which says "if no choice made yet assign the current value set as default". The change ("net: tcp: assign tcp cong_ops when tcp sk is created") altered these semantics, so that children got their parent listener's congestion control even if the system default had changed after the listener was created. This commit returns to those original semantics from 3.17 and earlier, since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]: Congestion control initialization."), and some Linux congestion control workflows depend on that. In summary, if a listener socket specifically sets TCP_CONGESTION to "x", or the route locks the CC module to "x", then the child gets "x". Otherwise the child gets current system default from net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and earlier, and this commit restores that. Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created") Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31udp: fix behavior of wrong checksumsEric Dumazet2-8/+4
We have two problems in UDP stack related to bogus checksums : 1) We return -EAGAIN to application even if receive queue is not empty. This breaks applications using edge trigger epoll() 2) Under UDP flood, we can loop forever without yielding to other processes, potentially hanging the host, especially on non SMP. This patch is an attempt to make things better. We might in the future add extra support for rt applications wanting to better control time spent doing a recv() in a hostile environment. For example we could validate checksums before queuing packets in socket receive queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31Linux 4.1-rc6Linus Torvalds1-1/+1
2015-05-31sfc: free multiple Rx buffers when requiredDaniel Pieczko1-17/+25
When Rx packet data must be dropped, all the buffers associated with that Rx packet must be freed. Extend and rename efx_free_rx_buffer() to efx_free_rx_buffers() and loop through all the fragments. By doing so this patch fixes a possible memory leak. Signed-off-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30bna: fix soft lock-up during firmware initialization failureIvan Vecera1-2/+2
Bug in the driver initialization causes soft-lockup if firmware initialization timeout is reached. Polling function bfa_ioc_poll_fwinit() incorrectly calls bfa_nw_iocpf_timeout() when the timeout is reached. The problem is that bfa_nw_iocpf_timeout() calls again bfa_ioc_poll_fwinit()... etc. The bfa_ioc_poll_fwinit() should directly send timeout event for iocpf and the same should be done if firmware download into HW fails. Cc: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30bna: remove unreasonable iocpf timer startIvan Vecera1-4/+0
Driver starts iocpf timer prior bnad_ioceth_enable() call and this is unreasonable. This piece of code probably originates from Brocade/Qlogic out-of-box driver during initial import into upstream. This driver uses only one timer and queue to implement multiple timers and this timer is started at this place. The upstream driver uses multiple timers instead of this. Cc: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30bna: fix firmware loading on big-endian machinesIvan Vecera1-0/+7
Firmware required by bna is stored in appropriate files as sequence of LE32 integers. After loading by request_firmware() they need to be byte-swapped on big-endian arches. Without this conversion the NIC is unusable on big-endian machines. Cc: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30bridge: fix br_multicast_query_expired() bugEric Dumazet1-1/+1
br_multicast_query_expired() querier argument is a pointer to a struct bridge_mcast_querier : struct bridge_mcast_querier { struct br_ip addr; struct net_bridge_port __rcu *port; }; Intent of the code was to clear port field, not the pointer to querier. Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Steinar H. Gunderson <sesse@samfundet.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30iser-target: Fix error path in isert_create_pi_ctx()Roland Dreier1-3/+3
We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed in the function. That means the cleanup path should use the local pi_ctx variable, not desc->pi_ctx. This was detected by Coverity (CID 1260062). Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-05-30target: Use a PASSTHROUGH flag instead of transport_typesAndy Grover11-22/+17
It seems like we only care if a transport is passthrough or not. Convert transport_type to a flags field and replace TRANSPORT_PLUGIN_* with a flag, TRANSPORT_FLAG_PASSTHROUGH. Signed-off-by: Andy Grover <agrover@redhat.com> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-05-30target: Move passthrough CDB parsing into a common functionAndy Grover4-94/+78
Aside from whether they handle BIDI ops or not, parsing of the CDB by kernel and user SCSI passthrough modules should be identical. Move this into a new passthrough_parse_cdb() and call it from tcm-pscsi and tcm-user. Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-05-30target/user: Only support full command pass-throughAndy Grover2-70/+75
After much discussion, give up on only passing a subset of SCSI commands to userspace and pass them all. Based on what pscsi is doing, make sure to set SCF_SCSI_DATA_CDB for I/O ops, and define attributes identical to pscsi. Make hw_block_size configurable via dev param. Remove mention of command filtering from tcmu-design.txt. Signed-off-by: Andy Grover <agrover@redhat.com> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-05-30target/user: Update example code for new ABI requirementsAndy Grover1-2/+6
We now require that the userspace handler set a bit if the command is not handled. Update calls to tcmu_hdr_get_op for v2. Signed-off-by: Andy Grover <agrover@redhat.com> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-05-30target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOSTAndy Grover2-0/+4
See https://bugzilla.redhat.com/show_bug.cgi?id=1025672 We need to put() the reference to the scsi host that we got in pscsi_configure_device(). In VIRTUAL_HOST mode it is associated with the dev_virt, not the hba_virt. Signed-off-by: Andy Grover <agrover@redhat.com> Cc: stable@vger.kernel.org # 2.6.38+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-05-30target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystemChristoph Hellwig8-64/+34
There is just one configfs subsystem in the target code, so we might as well add two helpers to reference / unreference it from the core code instead of passing pointers to it around. This fixes a regression introduced for v4.1-rc1 with commit 9ac8928e6, where configfs_depend_item() callers using se_tpg_tfo->tf_subsys would fail, because the assignment from the original target_core_subsystem[] is no longer happening at target_register_template() time. (Fix target_core_exit_configfs pointer dereference - Sagi) Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-05-29hwmon: (nct6683) Add missing sysfs attribute initializationGuenter Roeck1-0/+2
The following error message is seen when loading the nct6683 driver with DEBUG_LOCK_ALLOC enabled. BUG: key ffff88040b2f0030 not in .data! ------------[ cut here ]------------ WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988 lockdep_init_map+0x469/0x630() DEBUG_LOCKS_WARN_ON(1) Caused by a missing call to sysfs_attr_init() when initializing sysfs attributes. Reported-by: Alexey Orishko <alexey.orishko@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-05-29hwmon: (nct6775) Add missing sysfs attribute initializationGuenter Roeck1-0/+2
The following error message is seen when loading the nct6775 driver with DEBUG_LOCK_ALLOC enabled. BUG: key ffff88040b2f0030 not in .data! ------------[ cut here ]------------ WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988 lockdep_init_map+0x469/0x630() DEBUG_LOCKS_WARN_ON(1) Caused by a missing call to sysfs_attr_init() when initializing sysfs attributes. Reported-by: Alexey Orishko <alexey.orishko@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-05-29hwmon: (tmp401) Do not auto-detect chip on I2C address 0x37Guenter Roeck2-2/+2
I2C address 0x37 may be used by EEPROMs, which can result in false positives. Do not attempt to detect a chip at this address. Reviewed-by: Jean Delvare <jdelvare@suse.de> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-05-29MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regressionMaciej W. Rozycki1-2/+13
Correct a regression introduced with 8453eebd [MIPS: Fix strnlen_user() return value in case of overlong strings.] causing assembler warnings and broken code generated in __strnlen_kernel_nocheck_asm: arch/mips/lib/strnlen_user.S: Assembler messages: arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot with the CPU_DADDI_WORKAROUNDS option set, resulting in the function looping indefinitely upon mounting NFS root. Use conditional assembly to avoid a microMIPS code size regression. Using $at unconditionally would cause such a regression as there are no 16-bit instruction encodings available for ALU operations using this register. Using $v1 unconditionally would produce short microMIPS encodings, but would prevent this register from being used across calls to this function. The extra LI operation introduced is free, replacing a NOP originally scheduled into the delay slot of the branch that follows. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10205/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-05-29MIPS: BMIPS: Fix bmips_wr_vec()Petri Gynther1-1/+1
bmips_wr_vec() copies exception vector code from start to dst. The call to dma_cache_wback() needs to flush (end-start) bytes, starting at dst, from write-back cache to memory. Signed-off-by: Petri Gynther <pgynther@google.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kevin Cernekee <cernekee@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10193/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-05-29MIPS: ath79: fix build problem if CONFIG_BLK_DEV_INITRD is not setLaurent Fasnacht1-0/+3
initrd_start is defined in init/do_mounts_initrd.c, which is only included in kernel if CONFIG_BLK_DEV_INITRD=y. Signed-off-by: Laurent Fasnacht <l@libres.ch> Cc: linux-mips@linux-mips.org Cc: trivial@kernel.org Patchwork: https://patchwork.linux-mips.org/patch/10198/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-05-29dm: fix casting bug in dm_merge_bvec()Joe Thornber1-5/+12
dm_merge_bvec() was originally added in f6fccb ("dm: introduce merge_bvec_fn"). In that commit a value in sectors is converted to bytes using << 9, and then assigned to an int. This code made assumptions about the value of BIO_MAX_SECTORS. A later commit 148e51 ("dm: improve documentation and code clarity in dm_merge_bvec") was meant to have no functional change but it removed the use of BIO_MAX_SECTORS in favor of using queue_max_sectors(). At this point the cast from sector_t to int resulted in a zero value. The fallout being dm_merge_bvec() would only allow a single page to be added to a bio. This interim fix is minimal for the benefit of stable@ because the more comprehensive cleanup of passing a sector_t to all DM targets' merge function will impact quite a few DM targets. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.19+
2015-05-29dm: fix reload failure of 0 path multipath mapping on blk-mq devicesJunichi Nomura1-7/+9
dm-multipath accepts 0 path mapping. # echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev Such a mapping can be used to release underlying devices while still holding requests in its queue until working paths come back. However, once the multipath device is created over blk-mq devices, it rejects reloading of 0 path mapping: # echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \ | dmsetup create mpath1 # echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1 device-mapper: reload ioctl on mpath1 failed: Invalid argument Command failed With following kernel message: device-mapper: ioctl: can't change device type after initial table load. DM tries to inherit the current table type using dm_table_set_type() but it doesn't work as expected because of unnecessary check about whether the target type is hybrid or not. Hybrid type is for targets that work as either request-based or bio-based and not required for blk-mq or non blk-mq checking. Fixes: 65803c205983 ("dm table: train hybrid target type detection to select blk-mq if appropriate") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29dm: fix false warning in free_rq_clone() for unmapped requestsMike Snitzer1-5/+3
When stacking request-based dm device on non blk-mq device and device-mapper target could not map the request (error target is used, multipath target with all paths down, etc), the WARN_ON_ONCE() in free_rq_clone() will trigger when it shouldn't. The warning was added by commit aa6df8d ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request"). But free_rq_clone() with clone->q == NULL is valid usage for the case where dm_kill_unmapped_request() initiates request cleanup. Fix this false warning by just removing the WARN_ON -- it only generated false positives and was never useful in catching the intended case (completing clone request not being mapped e.g. clone->q being NULL). Fixes: aa6df8d ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request") Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29ARM: multi_v7_defconfig: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760Geert Uytterhoeven1-1/+1
Since commit 100832abf065bc18 ("usb: isp1760: Make HCD support optional"), CONFIG_USB_ISP1760_HCD is automatically selected when needed. Enabling that option in the defconfig is now a no-op, and no longer enables ISP1760 HCD support. Re-enable the ISP1760 driver in the defconfig by enabling USB_ISP1760_HOST_ROLE instead. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Arnd Bergmann <arnd@arndb.de>