Age | Commit message (Collapse) | Author | Files | Lines |
|
Seeing this, it occurs to me that we should probably add a taint here:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3
Preemption disabled at:[<ffffffff811aaa37>] console_unlock+0x2f7/0x930
CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
^^^^^^^^^^^
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
0000000000000000 ffff8800b8a17160 ffffffff81971441 ffff88011a3c4c80
ffff88011a3c4c80 ffff8800b8a17198 ffffffff81158067 0000000000000de6
ffff88011a3c4c80 ffffffff8390e07c 0000000000000184 0000000000000000
Call Trace:
[...]
BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1309
in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3
Preemption disabled at:[<ffffffff8119db33>] down_trylock+0x13/0x80
CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
^^^^^^^^^^^
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
0000000000000000 ffff8800b8a17e08 ffffffff81971441 ffff88011a3c4c80
ffff88011a3c4c80 ffff8800b8a17e40 ffffffff81158067 0000000000000000
ffff88011a3c4c80 ffffffff83437b20 000000000000051d 0000000000000000
Call Trace:
[...]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russel <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/1469216762-19626-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This message is currently really useless since it always prints a value
that comes from the printk() we just did, e.g.:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 31996, name: trinity-c1
Preemption disabled at:[<ffffffff8119db33>] down_trylock+0x13/0x80
BUG: sleeping function called from invalid context at include/linux/freezer.h:56
in_atomic(): 0, irqs_disabled(): 0, pid: 31996, name: trinity-c1
Preemption disabled at:[<ffffffff811aaa37>] console_unlock+0x2f7/0x930
Here, both down_trylock() and console_unlock() is somewhere in the
printk() path.
We should save the value before calling printk() and use the saved value
instead. That immediately reveals the offending callsite:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150
Bug report:
http://marc.info/?l=linux-netdev&m=146925979821849&w=2
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russel <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
setup_new_dl_entity() takes two parameters, but it only actually uses
one of them, under a different name, to setup a new dl_entity, after:
2f9f3fdc928 "sched/deadline: Remove dl_new from struct sched_dl_entity"
as we currently do:
setup_new_dl_entity(&p->dl, &p->dl)
However, before Luca's change we were doing:
setup_new_dl_entity(dl_se, pi_se)
in update_dl_entity() for a dl_se->new entity: we were using pi_se's
parameters (the potential PI donor) for setting up a new entity.
This change removes the useless second parameter of setup_new_dl_entity().
While we are at it we also optimize things further calling setup_new_dl_
entity() only for already queued tasks, since (as pointed out by Xunlei)
we already do the very same update at tasks wakeup time anyway. By doing
so, we don't need to worry about a potential PI donor anymore, as
rt_mutex_setprio() takes care of that already for us.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@unitn.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xpang@redhat.com>
Link: http://lkml.kernel.org/r/1470409675-20935-1-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add documentation for the cookie argument in try_to_wake_up_local().
This caused the following warning when building documentation:
kernel/sched/core.c:2088: warning: No description found for parameter 'cookie'
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Fixes: e7904a28f533 ("ilocking/lockdep, sched/core: Implement a better lock pinning scheme")
Link: http://lkml.kernel.org/r/1468159226-17674-1-git-send-email-luisbg@osg.samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In the current find_idlest_group()/find_idlest_cpu() search we end up
calling find_idlest_cpu() in a sched_group containing only one CPU in
the end. Checking idle-states becomes pointless when there is no
alternative, so bail out instead.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: linux-kernel@vger.kernel.org
Cc: mgalbraith@suse.de
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1466615004-3503-4-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In commit:
ac66f5477239 ("sched/numa: Introduce migrate_swap()")
select_task_rq() got a 'cpu' argument to enable overriding of prev_cpu
in special cases (NUMA task swapping).
However, the select_task_rq_fair() helper functions: wake_affine() and
select_idle_sibling(), still use task_cpu(p) directly to work out
prev_cpu, which leads to inconsistencies.
This patch passes prev_cpu (potentially overridden by NUMA code) into
the helper functions to ensure prev_cpu is indeed the same CPU
everywhere in the wakeup path.
cc: Ingo Molnar <mingo@redhat.com>
cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: linux-kernel@vger.kernel.org
Cc: mgalbraith@suse.de
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1466615004-3503-3-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It is seems that this one escaped Nico's renaming of cpu_power to
cpu_capacity a while back.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: linux-kernel@vger.kernel.org
Cc: mgalbraith@suse.de
Cc: vincent.guittot@linaro.org
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1466615004-3503-2-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Vincent noted that the update_tg_load_avg() usage in commit:
3d30544f0212 ("sched/fair: Apply more PELT fixes")
isn't entirely sufficient. We need to call this function every time
cfs_rq->avg.load changes, this includes when update_cfs_rq_load_avg()
returns true, but {attach,detach}_entity_load_avg() themselves also
change it. This means we need to unconditionally call
update_tg_load_avg().
Also, add more comments.
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Fix one minor typo in the comment: s/targer/target/.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1470378758-15066-1-git-send-email-leo.yan@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The update_next_balance() function is only used by idle balancing, so its
'cpu_busy' parameter is always 0.
Open code it instead of passing it around.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1470378689-14892-1-git-send-email-leo.yan@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The following warning can be triggered by hot-unplugging the CPU
on which an active SCHED_DEADLINE task is running on:
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:3531 lock_release+0x690/0x6a0
releasing a pinned lock
Call Trace:
dump_stack+0x99/0xd0
__warn+0xd1/0xf0
? dl_task_timer+0x1a1/0x2b0
warn_slowpath_fmt+0x4f/0x60
? sched_clock+0x13/0x20
lock_release+0x690/0x6a0
? enqueue_pushable_dl_task+0x9b/0xa0
? enqueue_task_dl+0x1ca/0x480
_raw_spin_unlock+0x1f/0x40
dl_task_timer+0x1a1/0x2b0
? push_dl_task.part.31+0x190/0x190
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:3649 lock_unpin_lock+0x181/0x1a0
unpinning an unpinned lock
Call Trace:
dump_stack+0x99/0xd0
__warn+0xd1/0xf0
warn_slowpath_fmt+0x4f/0x60
lock_unpin_lock+0x181/0x1a0
dl_task_timer+0x127/0x2b0
? push_dl_task.part.31+0x190/0x190
As per the comment before this code, its safe to drop the RQ lock
here, and since we (potentially) change rq, unpin and repin to avoid
the splat.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[ Rewrote changelog. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@unitn.it>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1470274940-17976-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit:
6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
fixed a problem whereby clock_nanosleep() followed by clock_gettime() could
allow a task to wake early. It addressed the problem by calling the scheduling
classes update_curr() when the cputimer starts.
Said change induced a considerable performance regression on the syscalls
times() and clock_gettimes(CLOCK_PROCESS_CPUTIME_ID). There are some
debuggers and applications that monitor their own performance that
accidentally depend on the performance of these specific calls.
This patch mitigates the performace loss by prefetching data in the CPU
cache, as stalls due to cache misses appear to be where most time is spent
in our benchmarks.
Here are the performance gain of this patch over v4.7-rc7 on a Sandy Bridge
box with 32 logical cores and 2 NUMA nodes. The test is repeated with a
variable number of threads, from 2 to 4*num_cpus; the results are in
seconds and correspond to the average of 10 runs; the percentage gain is
computed with (before-after)/before so a positive value is an improvement
(it's faster). The improvement varies between a few percents for 5-20
threads and more than 10% for 2 or >20 threads.
pound_clock_gettime:
threads 4.7-rc7 patched 4.7-rc7
[num] [secs] [secs (percent)]
2 3.48 3.06 ( 11.83%)
5 3.33 3.25 ( 2.40%)
8 3.37 3.26 ( 3.30%)
12 3.32 3.37 ( -1.60%)
21 4.01 3.90 ( 2.74%)
30 3.63 3.36 ( 7.41%)
48 3.71 3.11 ( 16.27%)
79 3.75 3.16 ( 15.74%)
110 3.81 3.25 ( 14.80%)
128 3.88 3.31 ( 14.76%)
pound_times:
threads 4.7-rc7 patched 4.7-rc7
[num] [secs] [secs (percent)]
2 3.65 3.25 ( 11.03%)
5 3.45 3.17 ( 7.92%)
8 3.52 3.22 ( 8.69%)
12 3.29 3.36 ( -2.04%)
21 4.07 3.92 ( 3.78%)
30 3.87 3.40 ( 12.17%)
48 3.79 3.16 ( 16.61%)
79 3.88 3.28 ( 15.42%)
110 3.90 3.38 ( 13.35%)
128 4.00 3.38 ( 15.45%)
pound_clock_gettime and pound_clock_gettime are two benchmarks included in
the MMTests framework. They launch a given number of threads which
repeatedly call times() or clock_gettimes(). The results above can be
reproduced with cloning MMTests from github.com and running the "poundtime"
workload:
$ git clone https://github.com/gormanm/mmtests.git
$ cd mmtests
$ cp configs/config-global-dhp__workload_poundtime config
$ ./run-mmtests.sh --run-monitor $(uname -r)
The above will run "poundtime" measuring the kernel currently running on
the machine; Once a new kernel is installed and the machine rebooted,
running again
$ cd mmtests
$ ./run-mmtests.sh --run-monitor $(uname -r)
will produce results to compare with. A comparison table will be output
with:
$ cd mmtests/work/log
$ ../../compare-kernels.sh
the table will contain a lot of entries; grepping for "Amean" (as in
"arithmetic mean") will give the tables presented above. The source code
for the two benchmarks is reported at the end of this changelog for
clairity.
The cache misses addressed by this patch were found using a combination of
`perf top`, `perf record` and `perf annotate`. The incriminated lines were
found to be
struct sched_entity *curr = cfs_rq->curr;
and
delta_exec = now - curr->exec_start;
in the function update_curr() from kernel/sched/fair.c. This patch
prefetches the data from memory just before update_curr is called in the
interested execution path.
A comparison of the total number of cycles before and after the patch
follows; the data is obtained using `perf stat -r 10 -ddd <program>`
running over the same sequence of number of threads used above (a positive
gain is an improvement):
threads cycles before cycles after gain
2 19,699,563,964 +-1.19% 17,358,917,517 +-1.85% 11.88%
5 47,401,089,566 +-2.96% 45,103,730,829 +-0.97% 4.85%
8 80,923,501,004 +-3.01% 71,419,385,977 +-0.77% 11.74%
12 112,326,485,473 +-0.47% 110,371,524,403 +-0.47% 1.74%
21 193,455,574,299 +-0.72% 180,120,667,904 +-0.36% 6.89%
30 315,073,519,013 +-1.64% 271,222,225,950 +-1.29% 13.92%
48 321,969,515,332 +-1.48% 273,353,977,321 +-1.16% 15.10%
79 337,866,003,422 +-0.97% 289,462,481,538 +-1.05% 14.33%
110 338,712,691,920 +-0.78% 290,574,233,170 +-0.77% 14.21%
128 348,384,794,006 +-0.50% 292,691,648,206 +-0.66% 15.99%
A comparison of cache miss vs total cache loads ratios, before and after
the patch (again from the `perf stat -r 10 -ddd <program>` tables):
threads L1 misses/total*100 L1 misses/total*100 gain
before after
2 7.43 +-4.90% 7.36 +-4.70% 0.94%
5 13.09 +-4.74% 13.52 +-3.73% -3.28%
8 13.79 +-5.61% 12.90 +-3.27% 6.45%
12 11.57 +-2.44% 8.71 +-1.40% 24.72%
21 12.39 +-3.92% 9.97 +-1.84% 19.53%
30 13.91 +-2.53% 11.73 +-2.28% 15.67%
48 13.71 +-1.59% 12.32 +-1.97% 10.14%
79 14.44 +-0.66% 13.40 +-1.06% 7.20%
110 15.86 +-0.50% 14.46 +-0.59% 8.83%
128 16.51 +-0.32% 15.06 +-0.78% 8.78%
As a final note, the following shows the evolution of performance figures
in the "poundtime" benchmark and pinpoints commit 6e998916dfe3
("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency") as a
major source of degradation, mostly unaddressed to this day (figures
expressed in seconds).
pound_clock_gettime:
threads parent of 6e998916dfe3 4.7-rc7
6e998916dfe3 itself
2 2.23 3.68 ( -64.56%) 3.48 (-55.48%)
5 2.83 3.78 ( -33.42%) 3.33 (-17.43%)
8 2.84 4.31 ( -52.12%) 3.37 (-18.76%)
12 3.09 3.61 ( -16.74%) 3.32 ( -7.17%)
21 3.14 4.63 ( -47.36%) 4.01 (-27.71%)
30 3.28 5.75 ( -75.37%) 3.63 (-10.80%)
48 3.02 6.05 (-100.56%) 3.71 (-22.99%)
79 2.88 6.30 (-118.90%) 3.75 (-30.26%)
110 2.95 6.46 (-119.00%) 3.81 (-29.24%)
128 3.05 6.42 (-110.08%) 3.88 (-27.04%)
pound_times:
threads parent of 6e998916dfe3 4.7-rc7
6e998916dfe3 itself
2 2.27 3.73 ( -64.71%) 3.65 (-61.14%)
5 2.78 3.77 ( -35.56%) 3.45 (-23.98%)
8 2.79 4.41 ( -57.71%) 3.52 (-26.05%)
12 3.02 3.56 ( -17.94%) 3.29 ( -9.08%)
21 3.10 4.61 ( -48.74%) 4.07 (-31.34%)
30 3.33 5.75 ( -72.53%) 3.87 (-16.01%)
48 2.96 6.06 (-105.04%) 3.79 (-28.10%)
79 2.88 6.24 (-116.83%) 3.88 (-34.81%)
110 2.98 6.37 (-114.08%) 3.90 (-31.12%)
128 3.10 6.35 (-104.61%) 4.00 (-28.87%)
The source code of the two benchmarks follows. To compile the two:
NR_THREADS=42
for FILE in pound_times pound_clock_gettime; do
gcc -lrt -O2 -lpthread -DNUM_THREADS=$NR_THREADS $FILE.c -o $FILE
done
==== BEGIN pound_times.c ====
struct tms start;
void *pound (void *threadid)
{
struct tms end;
int oldutime = 0;
int utime;
int i;
for (i = 0; i < 5000000 / NUM_THREADS; i++) {
times(&end);
utime = ((int)end.tms_utime - (int)start.tms_utime);
if (oldutime > utime) {
printf("utime decreased, was %d, now %d!\n", oldutime, utime);
}
oldutime = utime;
}
pthread_exit(NULL);
}
int main()
{
pthread_t th[NUM_THREADS];
long i;
times(&start);
for (i = 0; i < NUM_THREADS; i++) {
pthread_create (&th[i], NULL, pound, (void *)i);
}
pthread_exit(NULL);
return 0;
}
==== END pound_times.c ====
==== BEGIN pound_clock_gettime.c ====
void *pound (void *threadid)
{
struct timespec ts;
int rc, i;
unsigned long prev = 0, this = 0;
for (i = 0; i < 5000000 / NUM_THREADS; i++) {
rc = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
if (rc < 0)
perror("clock_gettime");
this = (ts.tv_sec * 1000000000) + ts.tv_nsec;
if (0 && this < prev)
printf("%lu ns timewarp at iteration %d\n", prev - this, i);
prev = this;
}
pthread_exit(NULL);
}
int main()
{
pthread_t th[NUM_THREADS];
long rc, i;
pid_t pgid;
for (i = 0; i < NUM_THREADS; i++) {
rc = pthread_create(&th[i], NULL, pound, (void *)i);
if (rc < 0)
perror("pthread_create");
}
pthread_exit(NULL);
return 0;
}
==== END pound_clock_gettime.c ====
Suggested-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1470385316-15027-2-git-send-email-ggherdovich@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We should update cfs_rq->throttled_clock_task, not
pcfs_rq->throttle_clock_task.
The effects of this bug was probably occasionally erratic
group scheduling, particularly in cgroups-intense workloads.
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
[ Added changelog. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 55e16d30bd99 ("sched/fair: Rework throttle_count sync")
Link: http://lkml.kernel.org/r/1468050862-18864-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Current code in cpudeadline.c has a bug in re-heapifying when adding a
new element at the end of the heap, because a deadline value of 0 is
temporarily set in the new elem, then cpudl_change_key() is called
with the actual elem deadline as param.
However, the function compares the new deadline to set with the one
previously in the elem, which is 0. So, if current absolute deadlines
grew so much to have negative values as s64, the comparison in
cpudl_change_key() makes the wrong decision. Instead, as from
dl_time_before(), the kernel should handle correctly abs deadlines
wrap-arounds.
This patch fixes the problem with a minimally invasive change that
forces cpudl_change_key() to heapify up in this case.
Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468921493-10054-2-git-send-email-tommaso.cucinotta@sssup.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142.
Geert Uytterhoeven reports:
"This change seems to have an (unintendent?) side-effect.
Before, pr_*() calls without a trailing newline characters would be
printed with a newline character appended, both on the console and in
the output of the dmesg command.
After this commit, no new line character is appended, and the output
of the next pr_*() call of the same type may be appended, like in:
- Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
- Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
+ Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"
Joe Perches says:
"No, that is not intentional.
The newline handling code inside vprintk_emit is a bit involved and
for now I suggest a revert until this has all the same behavior as
earlier"
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Requested-by: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit b195d5e2bffd ("ipr: Wait to do async scan until scsi host is
initialized") fixed async scan for ipr, but broke sync scan for ipr.
This fixes sync scan back up.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
which sets page->_mapcount to -512. Currently, we set/clear PageKmemcg
in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
with __GFP_ACCOUNT, including those that aren't actually charged to any
cgroup, i.e. allocated from the root cgroup context. To avoid overhead
in case cgroups are not used, we only do that if memcg_kmem_enabled() is
true. The latter is set iff there are kmem-enabled memory cgroups
(online or offline). The root cgroup is not considered kmem-enabled.
As a result, if a page is allocated with __GFP_ACCOUNT for the root
cgroup when there are kmem-enabled memory cgroups and is freed after all
kmem-enabled memory cgroups were removed, e.g.
# no memory cgroups has been created yet, create one
mkdir /sys/fs/cgroup/memory/test
# run something allocating pages with __GFP_ACCOUNT, e.g.
# a program using pipe
dmesg | tail
# remove the memory cgroup
rmdir /sys/fs/cgroup/memory/test
we'll get bad page state bug complaining about page->_mapcount != -1:
BUG: Bad page state in process swapper/0 pfn:1fd945c
page:ffffea007f651700 count:0 mapcount:-511 mapping: (null) index:0x0
flags: 0x1000000000000000()
To avoid that, let's mark with PageKmemcg only those pages that are
actually charged to and hence pin a non-root memory cgroup.
Fixes: 4949148ad433 ("mm: charge/uncharge kmemcg from generic page allocator paths")
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The symbols used in the tick_stop tracepoint were not being converted
properly into integers in the trace_stop format file. Instead we had this:
print fmt: "success=%d dependency=%s", REC->success,
__print_symbolic(REC->dependency, { 0, "NONE" },
{ (1 << TICK_DEP_BIT_POSIX_TIMER), "POSIX_TIMER" },
{ (1 << TICK_DEP_BIT_PERF_EVENTS), "PERF_EVENTS" },
{ (1 << TICK_DEP_BIT_SCHED), "SCHED" },
{ (1 << TICK_DEP_BIT_CLOCK_UNSTABLE), "CLOCK_UNSTABLE" })
User space tools have no idea how to parse "TICK_DEP_BIT_SCHED" or the other
symbols used to do the bit shifting. The reason is that the conversion was
done with using the TICK_DEP_MASK_* symbols which are just macros that
convert to the BIT shift itself (with the exception of NONE, which was
converted properly, because it doesn't use bits, and is defined as zero).
The TICK_DEP_BIT_* needs to be denoted by TRACE_DEFINE_ENUM() in order to
have this properly converted for user space tools to parse this event.
Cc: stable@vger.kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Fixes: e6e6cc22e067 ("nohz: Use enum code for tick stop failure tracing message")
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
cirrus_modeset_init() is initializing/registering the emulated fbdev
and, since commit c61b93fe51b1 ("drm/atomic: Fix remaining places where
!funcs->best_encoder is valid"), DRM internals can access/test some of
the fields in mode_config->funcs as part of the fbdev registration
process.
Make sure dev->mode_config.funcs is properly set to avoid dereferencing
a NULL pointer.
Reported-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: c61b93fe51b1 ("drm/atomic: Fix remaining places where !funcs->best_encoder is valid")
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This adds support for building more complex gcc plugins that live in a
subdirectory instead of just in a single source file.
Reported-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: clarified commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
There's no reason to repeat the same names in the Makefile when the .so
files have already been listed. The .o list can be generated from them.
Reported-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: clarified commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The latent_entropy plugin needs to pass arguments, so this adds the
support.
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
When the compiler doesn't support gcc plugins (either due to missing
headers or too old a version), report the problem and abort the build
instead of emitting a warning and letting the build founder with arcane
compiler errors.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The gcc-plugins arguments should not be included when performing
cc-option tests.
Steps to reproduce:
1) make mrproper
2) make defconfig
3) enable GCC_PLUGINS, GCC_PLUGIN_CYC_COMPLEXITY
4) enable FUNCTION_TRACER (it will select other options as well)
5) make && make modules
Build errors:
MODPOST 18 modules
ERROR: "__fentry__" [net/netfilter/xt_nat.ko] undefined!
ERROR: "__fentry__" [net/netfilter/xt_mark.ko] undefined!
ERROR: "__fentry__" [net/netfilter/xt_addrtype.ko] undefined!
ERROR: "__fentry__" [net/netfilter/xt_LOG.ko] undefined!
ERROR: "__fentry__" [net/netfilter/nf_nat_sip.ko] undefined!
ERROR: "__fentry__" [net/netfilter/nf_nat_irc.ko] undefined!
ERROR: "__fentry__" [net/netfilter/nf_nat_ftp.ko] undefined!
ERROR: "__fentry__" [net/netfilter/nf_nat.ko] undefined!
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: renamed variable, clarified commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
According to E-EDID spec 1.3, table 3.9, a digital video sink with the
"DFP 1.x compliant TMDS" bit set is "signal compatible with VESA DFP 1.x
TMDS CRGB, 1 pixel / clock, up to 8 bits / color MSB aligned".
For such displays, the DFP spec 1.0, section 3.10 "EDID support" says:
"If the DFP monitor only supports EDID 1.X (1.1, 1.2, etc.)
without extensions, the host will make the following assumptions:
1. 24-bit MSB-aligned RGB TFT
2. DE polarity is active high
3. H and V syncs are active high
4. Established CRT timings will be used
5. Dithering will not be enabled on the host"
So if we don't know the bit depth of the display from additional
colorimetry info we should assume 8 bpc / 24 bpp by default.
This patch adds info->bpc = 8 assignement for that case.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This reverts commit 013dd9e03872
("drm/i915/dp: fall back to 18 bpp when sink capability is unknown")
This commit introduced a regression into stable kernels,
as it reduces output color depth to 6 bpc for any video
sink connected to a Displayport connector if that sink
doesn't report a specific color depth via EDID, or if
our EDID parser doesn't actually recognize the proper
bpc from EDID.
Affected are active DisplayPort->VGA converters and
active DisplayPort->DVI converters. Both should be
able to handle 8 bpc, but are degraded to 6 bpc with
this patch.
The reverted commit was meant to fix
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=105331
A followup patch implements a fix for that specific bug,
which is caused by a faulty EDID of the affected DP panel
by adding a new EDID quirk for that panel.
DP 18 bpp fallback handling and other improvements to
DP sink bpc detection will be handled for future
kernels in a separate series of patches.
Please backport to stable.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=105331
reports that the "AEO model 0" display is driven with 8 bpc
without dithering by default, which looks bad because that
panel is apparently a 6 bpc DP panel with faulty EDID.
A fix for this was made by commit 013dd9e03872
("drm/i915/dp: fall back to 18 bpp when sink capability is unknown").
That commit triggers new regressions in precision for DP->DVI and
DP->VGA displays. A patch is out to revert that commit, but it will
revert video output for the AEO model 0 panel to 8 bpc without
dithering.
The EDID 1.3 of that panel, as decoded from the xrandr output
attached to that bugzilla bug report, is somewhat faulty, and beyond
other problems also sets the "DFP 1.x compliant TMDS" bit, which
according to DFP spec means to drive the panel with 8 bpc and
no dithering in absence of other colorimetry information.
Try to make the original bug reporter happy despite the
faulty EDID by adding a quirk to mark that panel as 6 bpc,
so 6 bpc output with dithering creates a nice picture.
Tested by injecting the edid from the fdo bug into a DP connector
via drm_kms_helper.edid_firmware and verifying the 6 bpc + dithering
is selected.
This patch should be backported to stable.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
When I initially added the unsafe_[get|put]_user() helpers in commit
5b24a7a2aa20 ("Add 'unsafe' user access functions for batched
accesses"), I made the mistake of modeling the interface on our
traditional __[get|put]_user() functions, which return zero on success,
or -EFAULT on failure.
That interface is fairly easy to use, but it's actually fairly nasty for
good code generation, since it essentially forces the caller to check
the error value for each access.
In particular, since the error handling is already internally
implemented with an exception handler, and we already use "asm goto" for
various other things, we could fairly easily make the error cases just
jump directly to an error label instead, and avoid the need for explicit
checking after each operation.
So switch the interface to pass in an error label, rather than checking
the error value in the caller. Best do it now before we start growing
more users (the signal handling code in particular would be a good place
to use the new interface).
So rather than
if (unsafe_get_user(x, ptr))
... handle error ..
the interface is now
unsafe_get_user(x, ptr, label);
where an error during the user mode fetch will now just cause a jump to
'label' in the caller.
Right now the actual _implementation_ of this all still ends up being a
"if (err) goto label", and does not take advantage of any exception
label tricks, but for "unsafe_put_user()" in particular it should be
fairly straightforward to convert to using the exception table model.
Note that "unsafe_get_user()" is much harder to convert to a clever
exception table model, because current versions of gcc do not allow the
use of "asm goto" (for the exception) with output values (for the actual
value to be fetched). But that is hopefully not a limitation in the
long term.
[ Also note that it might be a good idea to switch unsafe_get_user() to
actually _return_ the value it fetches from user space, but this
commit only changes the error handling semantics ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In commit 874f9c7da9a4 ("printk: create pr_<level> functions"), new
pr_level defines were added to printk.c.
These new defines are guarded by an #ifdef CONFIG_PRINTK - however,
there is already a surrounding #ifdef CONFIG_PRINTK starting a lot
earlier in line 249 which means the newly introduced #ifdef is
unnecessary.
Let's remove it to avoid confusion.
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
WMI event 0xe00e is received when battery was removed or inserted.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
|
The caller expects %rdi to remain intact, push+pop it make that happen.
Fixes the following kind of explosions on my core2duo machine when
trying to reboot or shut down:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: i915 i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm netconsole configfs binfmt_misc iTCO_wdt psmouse pcspkr snd_hda_codec_idt e100 coretemp hwmon snd_hda_codec_generic i2c_i801 mii i2c_smbus lpc_ich mfd_core snd_hda_intel uhci_hcd snd_hda_codec snd_hwdep snd_hda_core ehci_pci 8250 ehci_hcd snd_pcm 8250_base usbcore evdev serial_core usb_common parport_pc parport snd_timer snd soundcore
CPU: 0 PID: 3070 Comm: reboot Not tainted 4.8.0-rc1-perf-dirty #69
Hardware name: /D946GZIS, BIOS TS94610J.86A.0087.2007.1107.1049 11/07/2007
task: ffff88012a0b4080 task.stack: ffff880123850000
RIP: 0010:[<ffffffff81003c92>] [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0
RSP: 0018:ffff880123853b60 EFLAGS: 00010087
RAX: 0000000000000001 RBX: ffff88012fc0a3c0 RCX: 000000000000001e
RDX: 0000000000000000 RSI: 0000000040000000 RDI: ffff88012b014800
RBP: ffff880123853b88 R08: ffffffffffffffff R09: 0000000000000000
R10: ffffea0004a012c0 R11: ffffea0004acedc0 R12: ffffffff80000001
R13: ffff88012b0149c0 R14: ffff88012b014800 R15: 0000000000000018
FS: 00007f8b155cd700(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8b155f5000 CR3: 000000012a2d7000 CR4: 00000000000006f0
Stack:
ffff88012fc0a3c0 ffff88012b014800 0000000000000004 0000000000000001
ffff88012fc1b750 ffff880123853bb0 ffffffff81003d59 ffff88012b014800
ffff88012fc0a3c0 ffff88012b014800 ffff880123853bd8 ffffffff81003e13
Call Trace:
[<ffffffff81003d59>] x86_pmu_stop+0x59/0xd0
[<ffffffff81003e13>] x86_pmu_del+0x43/0x140
[<ffffffff8111705d>] event_sched_out.isra.105+0xbd/0x260
[<ffffffff8111738d>] __perf_remove_from_context+0x2d/0xb0
[<ffffffff8111745d>] __perf_event_exit_context+0x4d/0x70
[<ffffffff810c8826>] generic_exec_single+0xb6/0x140
[<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0
[<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0
[<ffffffff810c898f>] smp_call_function_single+0xdf/0x140
[<ffffffff81113d27>] perf_event_exit_cpu_context+0x87/0xc0
[<ffffffff81113d73>] perf_reboot+0x13/0x40
[<ffffffff8107578a>] notifier_call_chain+0x4a/0x70
[<ffffffff81075ad7>] __blocking_notifier_call_chain+0x47/0x60
[<ffffffff81075b06>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81076a1d>] kernel_restart_prepare+0x1d/0x40
[<ffffffff81076ae2>] kernel_restart+0x12/0x60
[<ffffffff81076d56>] SYSC_reboot+0xf6/0x1b0
[<ffffffff811a823c>] ? mntput_no_expire+0x2c/0x1b0
[<ffffffff811a83e4>] ? mntput+0x24/0x40
[<ffffffff811894fc>] ? __fput+0x16c/0x1e0
[<ffffffff811895ae>] ? ____fput+0xe/0x10
[<ffffffff81072fc3>] ? task_work_run+0x83/0xa0
[<ffffffff81001623>] ? exit_to_usermode_loop+0x53/0xc0
[<ffffffff8100105a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
[<ffffffff81076e6e>] SyS_reboot+0xe/0x10
[<ffffffff814c4ba5>] entry_SYSCALL_64_fastpath+0x18/0xa3
Code: 7c 4c 8d af c0 01 00 00 49 89 fe eb 10 48 09 c2 4c 89 e0 49 0f b1 55 00 4c 39 e0 74 35 4d 8b a6 c0 01 00 00 41 8b 8e 60 01 00 00 <0f> 33 8b 35 6e 02 8c 00 48 c1 e2 20 85 f6 7e d2 48 89 d3 89 cf
RIP [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0
RSP <ffff880123853b60>
---[ end trace 7ec95181faf211be ]---
note: reboot[3070] exited with preempt_count 2
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Fixes: f5967101e9de ("x86/hweight: Get rid of the special calling convention")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
drm_connector_register_all requires a few too many locks because our
connector_list locking is busted. Add another FIXME+hack to work
around this. This should address the below lockdep splat:
======================================================
[ INFO: possible circular locking dependency detected ]
4.7.0-rc5+ #524 Tainted: G O
-------------------------------------------------------
kworker/u8:0/6 is trying to acquire lock:
(&dev->mode_config.mutex){+.+.+.}, at: [<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120
but task is already holding lock:
((fb_notifier_list).rwsem){++++.+}, at: [<ffffffff810ac195>] __blocking_notifier_call_chain+0x35/0x70
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 ((fb_notifier_list).rwsem){++++.+}:
[<ffffffff810df611>] lock_acquire+0xb1/0x200
[<ffffffff819a55b4>] down_write+0x44/0x80
[<ffffffff810abf91>] blocking_notifier_chain_register+0x21/0xb0
[<ffffffff814c7448>] fb_register_client+0x18/0x20
[<ffffffff814c6c86>] backlight_device_register+0x136/0x260
[<ffffffffa0127eb2>] intel_backlight_device_register+0xa2/0x160 [i915]
[<ffffffffa00f46be>] intel_connector_register+0xe/0x10 [i915]
[<ffffffffa0112bfb>] intel_dp_connector_register+0x1b/0x80 [i915]
[<ffffffff8159dfea>] drm_connector_register+0x4a/0x80
[<ffffffff8159fe44>] drm_connector_register_all+0x64/0xf0
[<ffffffff815a2a64>] drm_modeset_register_all+0x174/0x1c0
[<ffffffff81599b72>] drm_dev_register+0xc2/0xd0
[<ffffffffa00621d7>] i915_driver_load+0x1547/0x2200 [i915]
[<ffffffffa006d80f>] i915_pci_probe+0x4f/0x70 [i915]
[<ffffffff814a2135>] local_pci_probe+0x45/0xa0
[<ffffffff814a349b>] pci_device_probe+0xdb/0x130
[<ffffffff815c07e3>] driver_probe_device+0x223/0x440
[<ffffffff815c0ad5>] __driver_attach+0xd5/0x100
[<ffffffff815be386>] bus_for_each_dev+0x66/0xa0
[<ffffffff815c002e>] driver_attach+0x1e/0x20
[<ffffffff815bf9be>] bus_add_driver+0x1ee/0x280
[<ffffffff815c1810>] driver_register+0x60/0xe0
[<ffffffff814a1a10>] __pci_register_driver+0x60/0x70
[<ffffffffa01a905b>] i915_init+0x5b/0x62 [i915]
[<ffffffff8100042d>] do_one_initcall+0x3d/0x150
[<ffffffff811a935b>] do_init_module+0x5f/0x1d9
[<ffffffff81124416>] load_module+0x20e6/0x27e0
[<ffffffff81124d63>] SYSC_finit_module+0xc3/0xf0
[<ffffffff81124dae>] SyS_finit_module+0xe/0x10
[<ffffffff819a83a9>] entry_SYSCALL_64_fastpath+0x1c/0xac
-> #0 (&dev->mode_config.mutex){+.+.+.}:
[<ffffffff810df0ac>] __lock_acquire+0x10fc/0x1260
[<ffffffff810df611>] lock_acquire+0xb1/0x200
[<ffffffff819a3097>] mutex_lock_nested+0x67/0x3c0
[<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120
[<ffffffff8158f79b>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2b/0x80
[<ffffffff8158f81d>] drm_fb_helper_set_par+0x2d/0x50
[<ffffffffa0105f7a>] intel_fbdev_set_par+0x1a/0x60 [i915]
[<ffffffff814c13c6>] fbcon_init+0x586/0x610
[<ffffffff8154d16a>] visual_init+0xca/0x130
[<ffffffff8154e611>] do_bind_con_driver+0x1c1/0x3a0
[<ffffffff8154eaf6>] do_take_over_console+0x116/0x180
[<ffffffff814bd3a7>] do_fbcon_takeover+0x57/0xb0
[<ffffffff814c1e48>] fbcon_event_notify+0x658/0x750
[<ffffffff810abcae>] notifier_call_chain+0x3e/0xb0
[<ffffffff810ac1ad>] __blocking_notifier_call_chain+0x4d/0x70
[<ffffffff810ac1e6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff814c748b>] fb_notifier_call_chain+0x1b/0x20
[<ffffffff814c86b1>] register_framebuffer+0x251/0x330
[<ffffffff8158fa9f>] drm_fb_helper_initial_config+0x25f/0x3f0
[<ffffffffa0106b48>] intel_fbdev_initial_config+0x18/0x30 [i915]
[<ffffffff810adfd8>] async_run_entry_fn+0x48/0x150
[<ffffffff810a3947>] process_one_work+0x1e7/0x750
[<ffffffff810a3efb>] worker_thread+0x4b/0x4f0
[<ffffffff810aad4f>] kthread+0xef/0x110
[<ffffffff819a85ef>] ret_from_fork+0x1f/0x40
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((fb_notifier_list).rwsem);
lock(&dev->mode_config.mutex);
lock((fb_notifier_list).rwsem);
lock(&dev->mode_config.mutex);
*** DEADLOCK ***
6 locks held by kworker/u8:0/6:
#0: ("events_unbound"){.+.+.+}, at: [<ffffffff810a38c9>] process_one_work+0x169/0x750
#1: ((&entry->work)){+.+.+.}, at: [<ffffffff810a38c9>] process_one_work+0x169/0x750
#2: (registration_lock){+.+.+.}, at: [<ffffffff814c8487>] register_framebuffer+0x27/0x330
#3: (console_lock){+.+.+.}, at: [<ffffffff814c86ce>] register_framebuffer+0x26e/0x330
#4: (&fb_info->lock){+.+.+.}, at: [<ffffffff814c78dd>] lock_fb_info+0x1d/0x40
#5: ((fb_notifier_list).rwsem){++++.+}, at: [<ffffffff810ac195>] __blocking_notifier_call_chain+0x35/0x70
stack backtrace:
CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: G O 4.7.0-rc5+ #524
Hardware name: Intel Corp. Broxton P/NOTEBOOK, BIOS APLKRVPA.X64.0138.B33.1606250842 06/25/2016
Workqueue: events_unbound async_run_entry_fn
0000000000000000 ffff8800758577f0 ffffffff814507a5 ffffffff828b9900
ffffffff828b9900 ffff880075857830 ffffffff810dc6fa ffff880075857880
ffff88007584d688 0000000000000005 0000000000000006 ffff88007584d6b0
Call Trace:
[<ffffffff814507a5>] dump_stack+0x67/0x92
[<ffffffff810dc6fa>] print_circular_bug+0x1aa/0x200
[<ffffffff810df0ac>] __lock_acquire+0x10fc/0x1260
[<ffffffff810df611>] lock_acquire+0xb1/0x200
[<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120
[<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120
[<ffffffff819a3097>] mutex_lock_nested+0x67/0x3c0
[<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120
[<ffffffff810fa85f>] ? rcu_read_lock_sched_held+0x7f/0x90
[<ffffffff81208218>] ? kmem_cache_alloc_trace+0x248/0x2b0
[<ffffffff815afdc5>] ? drm_modeset_lock_all+0x25/0x120
[<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120
[<ffffffff8158f79b>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2b/0x80
[<ffffffff8158f81d>] drm_fb_helper_set_par+0x2d/0x50
[<ffffffffa0105f7a>] intel_fbdev_set_par+0x1a/0x60 [i915]
[<ffffffff814c13c6>] fbcon_init+0x586/0x610
[<ffffffff8154d16a>] visual_init+0xca/0x130
[<ffffffff8154e611>] do_bind_con_driver+0x1c1/0x3a0
[<ffffffff8154eaf6>] do_take_over_console+0x116/0x180
[<ffffffff814bd3a7>] do_fbcon_takeover+0x57/0xb0
[<ffffffff814c1e48>] fbcon_event_notify+0x658/0x750
[<ffffffff810abcae>] notifier_call_chain+0x3e/0xb0
[<ffffffff810ac1ad>] __blocking_notifier_call_chain+0x4d/0x70
[<ffffffff810ac1e6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff814c748b>] fb_notifier_call_chain+0x1b/0x20
[<ffffffff814c86b1>] register_framebuffer+0x251/0x330
[<ffffffff815b7e8d>] ? vga_switcheroo_client_fb_set+0x5d/0x70
[<ffffffff8158fa9f>] drm_fb_helper_initial_config+0x25f/0x3f0
[<ffffffffa0106b48>] intel_fbdev_initial_config+0x18/0x30 [i915]
[<ffffffff810adfd8>] async_run_entry_fn+0x48/0x150
[<ffffffff810a3947>] process_one_work+0x1e7/0x750
[<ffffffff810a38c9>] ? process_one_work+0x169/0x750
[<ffffffff810a3efb>] worker_thread+0x4b/0x4f0
[<ffffffff810a3eb0>] ? process_one_work+0x750/0x750
[<ffffffff810aad4f>] kthread+0xef/0x110
[<ffffffff819a85ef>] ret_from_fork+0x1f/0x40
[<ffffffff810aac60>] ? kthread_stop+0x2e0/0x2e0
v2: Rebase onto the right branch (hand-editing patches ftw) and add more
reporters.
Reported-by: Imre Deak <imre.deak@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Jiri Kosina <jikos@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
The conversion of the rcar-du driver from the I2C slave encoder to the
DRM bridge API left the HDMI encoder's bridge pointer NULL, preventing
the bridge from being handled automatically by the DRM core. Fix it.
Fixes: 1d926114d8f4 ("drm: rcar-du: Remove i2c slave encoder interface for hdmi encoder")
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.
No intended functional changes in this commit.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The original commit missed this function, it needs to mark it a
write flush.
Cc: Mike Christie <mchristi@redhat.com>
Fixes: e742fc32fcb4 ("target: use bio op accessors")
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Cleaner than manipulating bio->bi_rw flags directly.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Commit abf545484d31 changed it from an 'rw' flags type to the
newer ops based interface, but now we're effectively leaking
some bdev internals to the rest of the kernel. Since we only
care about whether it's a read or a write at that level, just
pass in a bool 'is_write' parameter instead.
Then we can also move op_is_write() and friends back under
CONFIG_BLOCK protection.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.
So converting all EACCES to EPERM on immutable inode.
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(),
which includes vmap() or ioremap(). But they are currently freed by
kfree(). This uses persistent_ram_free() for correct this asymmetry usage.
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.kw@hitachi.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Seiji Aguchi <seiji.aguchi.tr@hitachi.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Instead of a ramoops-specific node, use a child node of /reserved-memory.
This requires that of_platform_device_create() be explicitly called
for the node, though, since "/reserved-memory" does not have its own
"compatible" property.
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rob Herring <robh@kernel.org>
|
|
Fixes hangs under memory pressure, e.g. running the piglit test
tex3d-maxsize concurrently with other tests.
Fixes: 17d33bc9d6ef ("drm/ttm: drop waiting for idle in ttm_bo_evict.")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Clean up duplicated expression by replacing it with the equivalent local
variable pdev.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
It will be useful to know the hardware configured BAR size to diagnose
issues with NTB memory windows.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
This script automates testing doorbells, scratchpads and memory windows
for an NTB device. It can be run locally, with the NTB looped
back to the same host or use SSH to remotely control the second host.
In the single host case, the script just needs to be passed two
arguments: a PCI ID for each side of the link. In the two host case
the -r option must be used to specify the remote hostname (which must
be SSH accessible and should probably have ssh-keys exchanged).
A sample run looks like this:
$ sudo ./ntb_test.sh 0000:03:00.1 0000:83:00.1 -p 29
Starting ntb_tool tests...
Running link tests on: 0000:03:00.1 / 0000:83:00.1
Passed
Running link tests on: 0000:83:00.1 / 0000:03:00.1
Passed
Running db tests on: 0000:03:00.1 / 0000:83:00.1
Passed
Running db tests on: 0000:83:00.1 / 0000:03:00.1
Passed
Running spad tests on: 0000:03:00.1 / 0000:83:00.1
Passed
Running spad tests on: 0000:83:00.1 / 0000:03:00.1
Passed
Running mw0 tests on: 0000:03:00.1 / 0000:83:00.1
Passed
Running mw0 tests on: 0000:83:00.1 / 0000:03:00.1
Passed
Running mw1 tests on: 0000:03:00.1 / 0000:83:00.1
Passed
Running mw1 tests on: 0000:83:00.1 / 0000:03:00.1
Passed
Starting ntb_pingpong tests...
Running ping pong tests on: 0000:03:00.1 / 0000:83:00.1
Passed
Starting ntb_perf tests...
Running local perf test without DMA
0: copied 536870912 bytes in 164453 usecs, 3264 MBytes/s
Passed
Running remote perf test without DMA
0: copied 536870912 bytes in 164453 usecs, 3264 MBytes/s
Passed
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
When the link goes down, the link_is_up flag did not return to
false. This could have caused some subtle corner case bugs
when the link goes up and down quickly.
Once that was fixed, there was found to be a race if the link was
brought down then immediately up. The link_cleanup work would
occasionally be scheduled after the next link up event. This would
cancel the link_work that was supposed to occur and leave ntb_perf
in an unusable state.
To fix this we get rid of the link_cleanup work and put the actions
directly in the link_down event.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
This commit adds a debugfs 'count' file to ntb_pingpong. This is so
testing with ntb_pingpong can be automated beyond just checking the
logs for pong messages.
The count file returns a number which increments every pong. The
counter can be cleared by writing a zero.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
In order to more successfully script with ntb_tool it's useful to
have a link file to check the link status so that the script
doesn't use the other files until the link is up.
This commit adds a 'link' file to the debugfs directory which reads
boolean (Y or N) depending on the link status. Writing to the file
change the link state using ntb_link_enable or ntb_link_disable.
A 'link_event' file is also provided so an application can block until
the link changes to the desired state. If the user writes a 1, it will
block until the link is up. If the user writes a 0, it will block until
the link is down.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
In order to make the interface closer to the raw NTB API, this commit
changes memory windows so they are not initialized on link up.
Instead, the 'peer_trans*' debugfs files are introduced. When read,
they return information provided by ntb_mw_get_range. When written,
they create a buffer and initialize the memory window. The
value written is taken as the requested size of the buffer (which
is then rounded for alignment). Writing a value of zero frees the buffer
and tears down the memory window translation. The 'peer_mw*' file is
only created once the memory window translation is setup by the user.
Additionally, it was noticed that the read and write functions for the
'peer_mw*' files should have checked for a NULL pointer.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Instead of returning immediately with an error when the link is
down, wait for the link to come up (or the user sends a SIGINT).
This is to make scripting ntb_perf easier.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|