Age | Commit message (Collapse) | Author | Files | Lines |
|
The problem with the existing lock pinning is that each pin is of
value 1; this mean you can simply unpin if you know its pinned,
without having any extra information.
This scheme generates a random (16 bit) cookie for each pin and
requires this same cookie to unpin. This means you have to keep the
cookie in context.
No objsize difference for !LOCKDEP kernels.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In order to be able to pass around more than just the IRQ flags in the
future, add a rq_flags structure.
No difference in code generation for the x86_64-defconfig build I
tested.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Its a rather large function, inline doesn't seems to make much sense:
$ size defconfig-build/kernel/sched/core.o{.orig,}
text data bss dec hex filename
56533 21037 2320 79890 13812 defconfig-build/kernel/sched/core.o.orig
55733 21037 2320 79090 134f2 defconfig-build/kernel/sched/core.o
The 'perf bench sched messaging' micro-benchmark shows a visible improvement
of 4-5%:
$ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done
$ perf stat --null --repeat 25 -- perf bench sched messaging -g 40 -l 5000
pre:
4.582798193 seconds time elapsed ( +- 1.41% )
4.733374877 seconds time elapsed ( +- 2.10% )
4.560955136 seconds time elapsed ( +- 1.43% )
4.631062303 seconds time elapsed ( +- 1.40% )
post:
4.364765213 seconds time elapsed ( +- 0.91% )
4.454442734 seconds time elapsed ( +- 1.18% )
4.448893817 seconds time elapsed ( +- 1.41% )
4.424346872 seconds time elapsed ( +- 0.97% )
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Potential races between switch_mm() and TLB-flush or LDT-flush IPIs
could be very messy. AFAICT the code is currently okay, whether by
accident or by careful design, but enabling PCID will make it
considerably more complicated and will no longer be obviously safe.
Fix it with a big hammer: run switch_mm() with IRQs off.
To avoid a performance hit in the scheduler, we take advantage of
our knowledge that the scheduler already has IRQs disabled when it
calls switch_mm().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It's fairly large and it has quite a few callers. This may also
help untangle some headers down the road.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently all of the functions that live in tlb.c are inlined on
!SMP builds. One can debate whether this is a good idea (in many
respects the code in tlb.c is better than the inlined UP code).
Regardless, I want to add code that needs to be built on UP and SMP
kernels and relates to tlb flushing, so arrange for tlb.c to be
compiled unconditionally.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
By default, this is the same thing as switch_mm().
x86 will override it as an optimization.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some architectures (such as Alpha) rely on include/linux/sched.h definitions
in their mmu_context.h files.
So include sched.h before mmu_context.h.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
arm's mmu_context.h uses preempt_enable_no_resched and but doesn't
include anything that would pull in the declaration.
If I start including <asm/mmu_context.h> from <linux/mmu_context.h>
without this, the build breaks.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5b95730a70f2dafe12d4fbf38d20eb7330d67ba3.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Sometimes delta_exec is 0 due to update_curr() is called multiple times,
this is captured by:
u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
This patch optimizes the cpufreq update kicker by bailing out when nothing
changed, it will benefit the upcoming schedutil, since otherwise it will
(over)react to the special util/max combination.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461316044-9520-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Chris Metcalf reported a that sched_can_stop_tick() sometimes fails to
re-enable the tick.
His observed problem is that rq->cfs.nr_running can be 1 even though
there are multiple runnable CFS tasks. This happens in the cgroup
case, in which case cfs.nr_running is the number of runnable entities
for that level.
If there is a single runnable cgroup (which can have an arbitrary
number of runnable child entries itself) rq->cfs.nr_running will be 1.
However, looking at that function I think there's more problems with it.
It seems to assume that if there's FIFO tasks, those will run. This is
incorrect. The FIFO task can have a lower prio than an RR task, in which
case the RR task will run.
So the whole fifo_nr_running test seems misplaced, it should go after
the rr_nr_running tests. That is, only if !rr_nr_running, can we use
fifo_nr_running like this.
Reported-by: Chris Metcalf <cmetcalf@mellanox.com>
Tested-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Fixes: 76d92ac305f2 ("sched: Migrate sched to use new tick dependency mask model")
Link: http://lkml.kernel.org/r/20160421160315.GK24771@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
These new syscalls are implemented as generic code, so enable them for
architectures like arm64 which use the generic syscall table.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
I got a minus(very big) dl_b->total_bw during my deadline tests.
# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : -222297900
Something unusual must have happened.
After some digging, I finally noticed that when changing a deadline
task to normal(cfs), and changing it back to deadline immediately,
after it died, we will got the wrong dl_bw->total_bw.
The root cause is in dl_overflow(), it has:
if (new_bw == p->dl.dl_bw)
return 0;
1) When a deadline task is changed to !deadline task, it will start
dl timer in switched_from_dl(), and retain previous deadline parameter
till the timer expires.
2) If we change it back to deadline with the same bandwidth parameter
before the timer expires, as it keeps the old bandwidth although it
is not a deadline task. dl_overflow() simply returns success without
updating the right data, and got the wrong dl_bw->total_bw.
The solution is simple, if @p is not deadline, don't return.
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460636368-1993-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some code in CPU load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:
1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
return immediately with no action taken. And that function is dead
code.
For pure optimization purposes, lets conditionally build the NO_HZ
related code.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461080211-16271-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.
Now this confusion happens to be a problem with CPU load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.
In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.
So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The CPU load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.
Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:
update_cpu_load_*() -> cpu_load_update_*()
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The cpufreq hook should be called any time the root CFS rq utilization
changes. This can occur when a task is switched to or from the fair
class, or a task moves between groups or CPUs, but these paths
currently do not call the cpufreq hook.
Fix this by adding the hook to attach_entity_load_avg() and
detach_entity_load_avg().
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
[ Added the .update_freq argument to update_cfs_rq_load_avg() to avoid a double cpufreq call. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458858367-2831-1-git-send-email-smuckle@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
There's no reason to call the cpufreq hook if the root cfs_rq
utilization has not been modified.
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/1458606068-7476-2-git-send-email-smuckle@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The cpufreq hook should be called whenever the root cfs_rq
utilization changes so update_cfs_rq_load_avg() is a better
place for it. The current location is not invoked in the
enqueue_entity() or update_blocked_averages() paths.
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458606068-7476-1-git-send-email-smuckle@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When asymmetric packing is set in the sched_domain and target CPU is
busy, update_sd_pick_busiest() may not select the busiest runqueue.
When target CPU is busy, find_busiest_group() will ignore checks for
asym packing and may continue to load balance using the currently
selected not-the-busiest runqueue as source runqueue.
Selecting the busiest runqueue as source when the target CPU is busy,
should result in achieving much better load balance.
Also when target CPU is not busy and asymmetric packing is set in sd,
select higher CPU as source CPU for load balancing.
While doing this change, move the check to see if target CPU is busy
into check_asym_packing().
The extent of performance benefit from this change decreases with the
increasing load. However there is benefit in undercommit as well as
overcommit conditions.
1. Record per second ebizzy (32 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
Testcase: Min Max Avg StdDev
ebizzy: 5223767.00 10368236.00 7946971.00 1753094.76
4.6.0-rc2+asym-changes
Testcase: Min Max Avg StdDev %Change
ebizzy: 8617191.00 13872356.00 11383980.00 1783400.89 +24.78%
2. Record per second ebizzy (64 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
Testcase: Min Max Avg StdDev
ebizzy: 6497666.00 18399783.00 10818093.20 4051452.08
4.6.0-rc2+asym-changes
Testcase: Min Max Avg StdDev %Change
ebizzy: 7567365.00 19456937.00 11674063.60 4295407.48 +4.40%
3. Record per second ebizzy (128 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
Testcase: Min Max Avg StdDev
ebizzy: 37073983.00 40341911.00 38776241.80 1259766.82
4.6.0-rc2+asym-changes
Testcase: Min Max Avg StdDev %Change
ebizzy: 38030399.00 41333378.00 39827404.40 1255001.86 +2.54%
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1459948660-16073-1-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Haswell and Broadwell can be configured to hash the channel
interleave function using bits [27:12] of the physical address.
On those processor models we must check to see if hashing is
enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and
act accordingly.
Based on a patch by patrickg <patrickg@supermicro.com>
Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In commit:
eb1af3b71f9d ("Fix computation of channel address")
I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.
Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: eb1af3b71f9d ("Fix computation of channel address")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:
kernel BUG at .../fs/hugetlbfs/inode.c:428!
invalid opcode: 0000 [#1] SMP
...
RIP: e030:[<ffffffff811c333b>] [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
...
Call Trace:
[<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
[<ffffffff81167b3d>] evict+0xbd/0x1b0
[<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
[<ffffffff81165b0e>] dput+0x1fe/0x220
[<ffffffff81150535>] __fput+0x155/0x200
[<ffffffff81079fc0>] task_work_run+0x60/0xa0
[<ffffffff81063510>] do_exit+0x160/0x400
[<ffffffff810637eb>] do_group_exit+0x3b/0xa0
[<ffffffff8106e8bd>] get_signal+0x1ed/0x470
[<ffffffff8100f854>] do_signal+0x14/0x110
[<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
[<ffffffff814178a5>] retint_user+0x8/0x13
This is CVE-2016-3961 / XSA-174.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org
Cc: xen-devel <xen-devel@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Correct the size of the module mapping space and the maximum available
physical memory size of current processors.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1461310504-15977-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The recent introduction of the hotplug thread which invokes the callbacks on
the plugged cpu, cased the following regression:
If takedown_cpu() fails, then we run into several issues:
1) The rollback of the target cpu states is not invoked. That leaves the smp
threads and the hotplug thread in disabled state.
2) notify_online() is executed due to a missing skip_onerr flag. That causes
that both CPU_DOWN_FAILED and CPU_ONLINE notifications are invoked which
confuses quite some notifiers.
3) The CPU_DOWN_FAILED notification is not invoked on the target CPU. That's
not an issue per se, but it is inconsistent and in consequence blocks the
patches which rely on these states being invoked on the target CPU and not
on the controlling cpu. It also does not preserve the strict call order on
rollback which is problematic for the ongoing state machine conversion as
well.
To fix this we add a rollback flag to the remote callback machinery and invoke
the rollback including the CPU_DOWN_FAILED notification on the remote
cpu. Further mark the notify online state with 'skip_onerr' so we don't get a
double invokation.
This workaround will go away once we moved the unplug invocation to the target
cpu itself.
[ tglx: Massaged changelog and moved the CPU_DOWN_FAILED notifiaction to the
target cpu ]
Fixes: 4cb28ced23c4 ("cpu/hotplug: Create hotplug threads")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-s390@vger.kernel.org
Cc: rt@linutronix.de
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: http://lkml.kernel.org/r/20160408124015.GA21960@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Commit 0881841f7e78 introduced a regression by inverting a test check
after calling clocksource_mmio_init(). That results on the system to
hang at boot time.
Fix it by inverting the test again.
Fixes: 0881841f7e78 ("Replace code by clocksource_mmio_init")
Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
When doing a make allmodconfig, I hit the following compile error:
In file included from builtin-check.c:32:0:
elf.h:22:18: fatal error: gelf.h: No such file or directory
compilation terminated.
...
Digging into it, it appears that the $(shell ..) command in the Makefile does
not give the proper result when it fails to find -lelf, and continues to
compile objtool.
Instead, use the "try-run" makefile macro to perform the test. This gives a
proper result for both cases.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Fixes: 442f04c34a1a4 ("objtool: Add tool to perform compile-time stack metadata validation")
Link: http://lkml.kernel.org/r/20160420153234.GA24032@home.goodmis.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
|
|
Was previously always hardcoded to 0.
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Allowing userptr bo which are basicly a list of page from some vma
(so either anonymous page or file backed page) would lead to serious
corruption of kernel structures and counters (because we overwrite
the page->mapping field when mapping buffer).
This will already block if the buffer was populated before anyone does
try to mmap it because then TTM_PAGE_FLAG_SG would be set in in the
ttm_tt flags. But that flag is check before ttm_tt_populate in the ttm
vm fault handler.
So to be safe just add a check to verify_access() callback.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Allowing userptr bo which are basicly a list of page from some vma
(so either anonymous page or file backed page) would lead to serious
corruption of kernel structures and counters (because we overwrite
the page->mapping field when mapping buffer).
This will already block if the buffer was populated before anyone does
try to mmap it because then TTM_PAGE_FLAG_SG would be set in in the
ttm_tt flags. But that flag is check before ttm_tt_populate in the ttm
vm fault handler.
So to be safe just add a check to verify_access() callback.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes array overflow on these chips.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Prerequiste for the next patch which ups the limits.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
With the joys of things running concurrently, there's always a chance
that the port we get passed in drm_dp_payload_send_msg() isn't actually
valid anymore. Because of this, we need to make sure we validate the
reference to the port before we use it otherwise we risk running into
various race conditions. For instance, on the Dell MST monitor I have
here for testing, hotplugging it enough times causes us to kernel panic:
[drm:intel_mst_enable_dp] 1
[drm:drm_dp_update_payload_part2] payload 0 1
[drm:intel_get_hpd_pins] hotplug event received, stat 0x00200000, dig 0x10101011, pins 0x00000020
[drm:intel_hpd_irq_handler] digital hpd port B - short
[drm:intel_dp_hpd_pulse] got hpd irq on port B - short
[drm:intel_dp_check_mst_status] got esi 00 10 00
[drm:drm_dp_update_payload_part2] payload 1 1
general protection fault: 0000 [#1] SMP
…
Call Trace:
[<ffffffffa012b632>] drm_dp_update_payload_part2+0xc2/0x130 [drm_kms_helper]
[<ffffffffa032ef08>] intel_mst_enable_dp+0xf8/0x180 [i915]
[<ffffffffa0310dbd>] haswell_crtc_enable+0x3ed/0x8c0 [i915]
[<ffffffffa030c84d>] intel_atomic_commit+0x5ad/0x1590 [i915]
[<ffffffffa01db877>] ? drm_atomic_set_crtc_for_connector+0x57/0xe0 [drm]
[<ffffffffa01dc4e7>] drm_atomic_commit+0x37/0x60 [drm]
[<ffffffffa0130a3a>] drm_atomic_helper_set_config+0x7a/0xb0 [drm_kms_helper]
[<ffffffffa01cc482>] drm_mode_set_config_internal+0x62/0x100 [drm]
[<ffffffffa01d02ad>] drm_mode_setcrtc+0x3cd/0x4e0 [drm]
[<ffffffffa01c18e3>] drm_ioctl+0x143/0x510 [drm]
[<ffffffffa01cfee0>] ? drm_mode_setplane+0x1b0/0x1b0 [drm]
[<ffffffff810f79a7>] ? hrtimer_start_range_ns+0x1b7/0x3a0
[<ffffffff81212962>] do_vfs_ioctl+0x92/0x570
[<ffffffff81590852>] ? __sys_recvmsg+0x42/0x80
[<ffffffff81212eb9>] SyS_ioctl+0x79/0x90
[<ffffffff816b4e32>] entry_SYSCALL_64_fastpath+0x1a/0xa4
RIP [<ffffffffa012b026>] drm_dp_payload_send_msg+0x146/0x1f0 [drm_kms_helper]
Which occurs because of the hotplug event shown in the log, which ends
up causing DRM's dp helpers to drop the port we're updating the payload
on and panic.
CC: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: David Airlie <airlied@linux.ie>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
With commit 8bc2a40730ec ("rtc: ds1307: add support for the
DT property 'wakeup-source'") we lost the ability for rtc irq
functionality for devices that are actually hooked on a real IRQ
line and have capability to wakeup as well. This is not an expected
behavior. So, instead of just not requesting IRQ, skip the IRQ
requirement only if interrupts are not defined for the device.
Fixes: 8bc2a40730ec ("rtc: ds1307: add support for the DT property 'wakeup-source'")
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Michael Lange <linuxstuff@milaw.biz>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
|
|
while retrieving temperature from ds3231, the result may be overflow
since s16 is too small for a multiplication with 250.
ie. if temp_buf[0] == 0x2d, the result (s16 temp) will be negative.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Michael Tatarinov <kukabu@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
|
|
When using masked actions the ipv6_proto field of an action
to set IPv6 fields may be zero rather than the prevailing protocol
which will result in skipping checksum recalculation.
This patch resolves the problem by relying on the protocol
in the flow key rather than that in the set field action.
Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.")
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For IPv6, if the device indicates that the checksum is correct, set
CHECKSUM_UNNECESSARY.
Reported-by: Subbarao Narahari <snarahari@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs. This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.
Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working). Therefore
this obscure bug has been designated CVE-2016-2117.
Reported-by: Justin Yackoski <jyackoski@crypto-nite.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 2a0787051182 ("irqchip/mips-gic: Use gic_vpes instead of
NR_CPUS") & commit 78930f09b940 ("irqchip/mips-gic: Clear percpu_masks
correctly when mapping") both introduce code which accesses gic_vpes
entries in the pcpu_masks array. However, this array has length NR_CPUS.
If NR_CPUS is less than gic_vpes (ie. the kernel supports use of less
CPUs than are present in the system) then we overrun the array, clobber
some other data & generally die pretty promptly.
Most notably this affects uniprocessor kernels running on any multicore
or multithreaded Malta with a GIC (ie. the vast majority of real Malta
boards).
Fix this by only accessing up to min(gic_vpes, NR_CPUS) entries in the
pcpu_masks array, preventing the array overrun.
Fixes: 2a0787051182 ("irqchip/mips-gic: Use gic_vpes instead of NR_CPUS")
Fixes: 78930f09b940 ("irqchip/mips-gic: Clear percpu_masks correctly when mapping")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/1461234714-9975-1-git-send-email-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Count SW packet drops per RX ring instead of a global counter. This
will allow monitoring the number of rx drops per ring.
In addition, SW rx_dropped counter was overwritten by HW rx_dropped
counter, sum both of them instead to show the accurate value.
Fixes: a3333b35da16 ('net/mlx4_en: Moderate ethtool callback to [...] ')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently changing global pause settings is done via SET_PORT
command with input modifier GENERAL. This command is allowed
for each VF since MTU setting is done via the same command.
Change the above to the following scheme: before passing the
request to the FW, the PF will check whether it was issued
by a slave. If yes, don't change global pause and warn,
otherwise change to the requested value and store for
further reference.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Maintain the PCI status and provide wrappers for enabling and disabling
the PCI device. Performing the actions more than once without doing
its opposite results in warning logs.
This occurred when EEH hotplugged the device causing a warning for
disabling an already disabled device.
Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move resume related activities to a new pci_resume function instead of
performing them in mlx4_pci_slot_reset. This change is needed to avoid
a hotplug during EEH recovery due to commit f2da4ccf8bd4 ("powerpc/eeh:
More relaxed hotplug criterion").
Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ks8895 driver is using spi_dev_get() apparently just to take a copy
of the SPI device used to instantiate it but never calls spi_dev_put()
to free it. Since the device is guaranteed to exist between probe() and
remove() there should be no need for the driver to take an extra
reference to it so fix the leak by just using a straight assignment.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the DaVinci emac driver is removed and re-probed, the actual
pdev->dev.platform_data is populated with an unwanted valid pointer saved by
the previous davinci_emac_of_get_pdata() call, causing a kernel crash when
calling priv->int_disable() in emac_int_disable().
Unable to handle kernel paging request at virtual address c8622a80
...
[<c0426fb4>] (emac_int_disable) from [<c0427700>] (emac_dev_open+0x290/0x5f8)
[<c0427700>] (emac_dev_open) from [<c04c00ec>] (__dev_open+0xb8/0x120)
[<c04c00ec>] (__dev_open) from [<c04c0370>] (__dev_change_flags+0x88/0x14c)
[<c04c0370>] (__dev_change_flags) from [<c04c044c>] (dev_change_flags+0x18/0x48)
[<c04c044c>] (dev_change_flags) from [<c052bafc>] (devinet_ioctl+0x6b4/0x7ac)
[<c052bafc>] (devinet_ioctl) from [<c04a1428>] (sock_ioctl+0x1d8/0x2c0)
[<c04a1428>] (sock_ioctl) from [<c014f054>] (do_vfs_ioctl+0x41c/0x600)
[<c014f054>] (do_vfs_ioctl) from [<c014f2a4>] (SyS_ioctl+0x6c/0x7c)
[<c014f2a4>] (SyS_ioctl) from [<c000ff60>] (ret_fast_syscall+0x0/0x1c)
Fixes: 42f59967a091 ("net: ethernet: davinci_emac: add OF support")
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to avoid an Unbalanced pm_runtime_enable in the DaVinci
emac driver when the device is removed and re-probed, and a
pm_runtime_disable() call in davinci_emac_remove().
Actually, using unbind/bind on a TI DM8168 SoC gives :
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/unbind
net eth1: DaVinci EMAC: davinci_emac_remove()
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/bind
davinci_emac 4a120000.ethernet: Unbalanced pm_runtime_enable
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Fixes: 3ba97381343b ("net: ethernet: davinci_emac: add pm_runtime support")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|