linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-08-10	sched/debug: Add taint on "BUG: Sleeping function called from invalid context"	Vegard Nossum	1	-0/+1
	Seeing this, it occurs to me that we should probably add a taint here: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3 Preemption disabled at:[<ffffffff811aaa37>] console_unlock+0x2f7/0x930 CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19 ^^^^^^^^^^^ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 0000000000000000 ffff8800b8a17160 ffffffff81971441 ffff88011a3c4c80 ffff88011a3c4c80 ffff8800b8a17198 ffffffff81158067 0000000000000de6 ffff88011a3c4c80 ffffffff8390e07c 0000000000000184 0000000000000000 Call Trace: [...] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1309 in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3 Preemption disabled at:[<ffffffff8119db33>] down_trylock+0x13/0x80 CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19 ^^^^^^^^^^^ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 0000000000000000 ffff8800b8a17e08 ffffffff81971441 ffff88011a3c4c80 ffff88011a3c4c80 ffff8800b8a17e40 ffffffff81158067 0000000000000000 ffff88011a3c4c80 ffffffff83437b20 000000000000051d 0000000000000000 Call Trace: [...] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russel <rusty@rustcorp.com.au> Link: http://lkml.kernel.org/r/1469216762-19626-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/debug: Make the "Preemption disabled at ..." message more useful	Vegard Nossum	2	-8/+22
	This message is currently really useless since it always prints a value that comes from the printk() we just did, e.g.: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 31996, name: trinity-c1 Preemption disabled at:[<ffffffff8119db33>] down_trylock+0x13/0x80 BUG: sleeping function called from invalid context at include/linux/freezer.h:56 in_atomic(): 0, irqs_disabled(): 0, pid: 31996, name: trinity-c1 Preemption disabled at:[<ffffffff811aaa37>] console_unlock+0x2f7/0x930 Here, both down_trylock() and console_unlock() is somewhere in the printk() path. We should save the value before calling printk() and use the saved value instead. That immediately reveals the offending callsite: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2 Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150 Bug report: http://marc.info/?l=linux-netdev&m=146925979821849&w=2 Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russel <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/deadline: Remove useless parameter from setup_new_dl_entity()	Juri Lelli	1	-6/+16
	setup_new_dl_entity() takes two parameters, but it only actually uses one of them, under a different name, to setup a new dl_entity, after: 2f9f3fdc928 "sched/deadline: Remove dl_new from struct sched_dl_entity" as we currently do: setup_new_dl_entity(&p->dl, &p->dl) However, before Luca's change we were doing: setup_new_dl_entity(dl_se, pi_se) in update_dl_entity() for a dl_se->new entity: we were using pi_se's parameters (the potential PI donor) for setting up a new entity. This change removes the useless second parameter of setup_new_dl_entity(). While we are at it we also optimize things further calling setup_new_dl_ entity() only for already queued tasks, since (as pointed out by Xunlei) we already do the very same update at tasks wakeup time anyway. By doing so, we don't need to worry about a potential PI donor anymore, as rt_mutex_setprio() takes care of that already for us. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@unitn.it> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xunlei Pang <xpang@redhat.com> Link: http://lkml.kernel.org/r/1470409675-20935-1-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/core: Add documentation for 'cookie' argument	Luis de Bethencourt	1	-0/+1
	Add documentation for the cookie argument in try_to_wake_up_local(). This caused the following warning when building documentation: kernel/sched/core.c:2088: warning: No description found for parameter 'cookie' Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Fixes: e7904a28f533 ("ilocking/lockdep, sched/core: Implement a better lock pinning scheme") Link: http://lkml.kernel.org/r/1468159226-17674-1-git-send-email-luisbg@osg.samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/fair: Optimize find_idlest_cpu() when there is no choice	Morten Rasmussen	1	-0/+4
	In the current find_idlest_group()/find_idlest_cpu() search we end up calling find_idlest_cpu() in a sched_group containing only one CPU in the end. Checking idle-states becomes pointless when there is no alternative, so bail out instead. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: linux-kernel@vger.kernel.org Cc: mgalbraith@suse.de Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1466615004-3503-4-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/fair: Make the use of prev_cpu consistent in the wakeup path	Morten Rasmussen	1	-11/+13
	In commit: ac66f5477239 ("sched/numa: Introduce migrate_swap()") select_task_rq() got a 'cpu' argument to enable overriding of prev_cpu in special cases (NUMA task swapping). However, the select_task_rq_fair() helper functions: wake_affine() and select_idle_sibling(), still use task_cpu(p) directly to work out prev_cpu, which leads to inconsistencies. This patch passes prev_cpu (potentially overridden by NUMA code) into the helper functions to ensure prev_cpu is indeed the same CPU everywhere in the wakeup path. cc: Ingo Molnar <mingo@redhat.com> cc: Rik van Riel <riel@redhat.com> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: linux-kernel@vger.kernel.org Cc: mgalbraith@suse.de Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1466615004-3503-3-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/core: Fix power to capacity renaming in comment	Morten Rasmussen	1	-1/+1
	It is seems that this one escaped Nico's renaming of cpu_power to cpu_capacity a while back. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: linux-kernel@vger.kernel.org Cc: mgalbraith@suse.de Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1466615004-3503-2-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/fair: Improve PELT stuff some more	Peter Zijlstra	1	-19/+25
	Vincent noted that the update_tg_load_avg() usage in commit: 3d30544f0212 ("sched/fair: Apply more PELT fixes") isn't entirely sufficient. We need to call this function every time cfs_rq->avg.load changes, this includes when update_cfs_rq_load_avg() returns true, but {attach,detach}_entity_load_avg() themselves also change it. This means we need to unconditionally call update_tg_load_avg(). Also, add more comments. Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/core: Fix one typo	Leo Yan	1	-1/+1
	Fix one minor typo in the comment: s/targer/target/. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1470378758-15066-1-git-send-email-leo.yan@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/fair: Remove 'cpu_busy' parameter from update_next_balance()	Leo Yan	1	-5/+6
	The update_next_balance() function is only used by idle balancing, so its 'cpu_busy' parameter is always 0. Open code it instead of passing it around. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1470378689-14892-1-git-send-email-leo.yan@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/deadline: Fix lock pinning warning during CPU hotplug	Wanpeng Li	1	-1/+4
	The following warning can be triggered by hot-unplugging the CPU on which an active SCHED_DEADLINE task is running on: WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:3531 lock_release+0x690/0x6a0 releasing a pinned lock Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 ? dl_task_timer+0x1a1/0x2b0 warn_slowpath_fmt+0x4f/0x60 ? sched_clock+0x13/0x20 lock_release+0x690/0x6a0 ? enqueue_pushable_dl_task+0x9b/0xa0 ? enqueue_task_dl+0x1ca/0x480 _raw_spin_unlock+0x1f/0x40 dl_task_timer+0x1a1/0x2b0 ? push_dl_task.part.31+0x190/0x190 WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:3649 lock_unpin_lock+0x181/0x1a0 unpinning an unpinned lock Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_fmt+0x4f/0x60 lock_unpin_lock+0x181/0x1a0 dl_task_timer+0x127/0x2b0 ? push_dl_task.part.31+0x190/0x190 As per the comment before this code, its safe to drop the RQ lock here, and since we (potentially) change rq, unpin and repin to avoid the splat. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [ Rewrote changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@unitn.it> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1470274940-17976-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/cputime: Mitigate performance regression in times()/clock_gettime()	Giovanni Gherdovich	1	-0/+19
	Commit: 6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency") fixed a problem whereby clock_nanosleep() followed by clock_gettime() could allow a task to wake early. It addressed the problem by calling the scheduling classes update_curr() when the cputimer starts. Said change induced a considerable performance regression on the syscalls times() and clock_gettimes(CLOCK_PROCESS_CPUTIME_ID). There are some debuggers and applications that monitor their own performance that accidentally depend on the performance of these specific calls. This patch mitigates the performace loss by prefetching data in the CPU cache, as stalls due to cache misses appear to be where most time is spent in our benchmarks. Here are the performance gain of this patch over v4.7-rc7 on a Sandy Bridge box with 32 logical cores and 2 NUMA nodes. The test is repeated with a variable number of threads, from 2 to 4num_cpus; the results are in seconds and correspond to the average of 10 runs; the percentage gain is computed with (before-after)/before so a positive value is an improvement (it's faster). The improvement varies between a few percents for 5-20 threads and more than 10% for 2 or >20 threads. pound_clock_gettime: threads 4.7-rc7 patched 4.7-rc7 [num] [secs] [secs (percent)] 2 3.48 3.06 ( 11.83%) 5 3.33 3.25 ( 2.40%) 8 3.37 3.26 ( 3.30%) 12 3.32 3.37 ( -1.60%) 21 4.01 3.90 ( 2.74%) 30 3.63 3.36 ( 7.41%) 48 3.71 3.11 ( 16.27%) 79 3.75 3.16 ( 15.74%) 110 3.81 3.25 ( 14.80%) 128 3.88 3.31 ( 14.76%) pound_times: threads 4.7-rc7 patched 4.7-rc7 [num] [secs] [secs (percent)] 2 3.65 3.25 ( 11.03%) 5 3.45 3.17 ( 7.92%) 8 3.52 3.22 ( 8.69%) 12 3.29 3.36 ( -2.04%) 21 4.07 3.92 ( 3.78%) 30 3.87 3.40 ( 12.17%) 48 3.79 3.16 ( 16.61%) 79 3.88 3.28 ( 15.42%) 110 3.90 3.38 ( 13.35%) 128 4.00 3.38 ( 15.45%) pound_clock_gettime and pound_clock_gettime are two benchmarks included in the MMTests framework. They launch a given number of threads which repeatedly call times() or clock_gettimes(). The results above can be reproduced with cloning MMTests from github.com and running the "poundtime" workload: $ git clone https://github.com/gormanm/mmtests.git $ cd mmtests $ cp configs/config-global-dhp__workload_poundtime config $ ./run-mmtests.sh --run-monitor $(uname -r) The above will run "poundtime" measuring the kernel currently running on the machine; Once a new kernel is installed and the machine rebooted, running again $ cd mmtests $ ./run-mmtests.sh --run-monitor $(uname -r) will produce results to compare with. A comparison table will be output with: $ cd mmtests/work/log $ ../../compare-kernels.sh the table will contain a lot of entries; grepping for "Amean" (as in "arithmetic mean") will give the tables presented above. The source code for the two benchmarks is reported at the end of this changelog for clairity. The cache misses addressed by this patch were found using a combination of `perf top`, `perf record` and `perf annotate`. The incriminated lines were found to be struct sched_entity curr = cfs_rq->curr; and delta_exec = now - curr->exec_start; in the function update_curr() from kernel/sched/fair.c. This patch prefetches the data from memory just before update_curr is called in the interested execution path. A comparison of the total number of cycles before and after the patch follows; the data is obtained using `perf stat -r 10 -ddd <program>` running over the same sequence of number of threads used above (a positive gain is an improvement): threads cycles before cycles after gain 2 19,699,563,964 +-1.19% 17,358,917,517 +-1.85% 11.88% 5 47,401,089,566 +-2.96% 45,103,730,829 +-0.97% 4.85% 8 80,923,501,004 +-3.01% 71,419,385,977 +-0.77% 11.74% 12 112,326,485,473 +-0.47% 110,371,524,403 +-0.47% 1.74% 21 193,455,574,299 +-0.72% 180,120,667,904 +-0.36% 6.89% 30 315,073,519,013 +-1.64% 271,222,225,950 +-1.29% 13.92% 48 321,969,515,332 +-1.48% 273,353,977,321 +-1.16% 15.10% 79 337,866,003,422 +-0.97% 289,462,481,538 +-1.05% 14.33% 110 338,712,691,920 +-0.78% 290,574,233,170 +-0.77% 14.21% 128 348,384,794,006 +-0.50% 292,691,648,206 +-0.66% 15.99% A comparison of cache miss vs total cache loads ratios, before and after the patch (again from the `perf stat -r 10 -ddd <program>` tables): threads L1 misses/total100 L1 misses/total100 gain before after 2 7.43 +-4.90% 7.36 +-4.70% 0.94% 5 13.09 +-4.74% 13.52 +-3.73% -3.28% 8 13.79 +-5.61% 12.90 +-3.27% 6.45% 12 11.57 +-2.44% 8.71 +-1.40% 24.72% 21 12.39 +-3.92% 9.97 +-1.84% 19.53% 30 13.91 +-2.53% 11.73 +-2.28% 15.67% 48 13.71 +-1.59% 12.32 +-1.97% 10.14% 79 14.44 +-0.66% 13.40 +-1.06% 7.20% 110 15.86 +-0.50% 14.46 +-0.59% 8.83% 128 16.51 +-0.32% 15.06 +-0.78% 8.78% As a final note, the following shows the evolution of performance figures in the "poundtime" benchmark and pinpoints commit 6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency") as a major source of degradation, mostly unaddressed to this day (figures expressed in seconds). pound_clock_gettime: threads parent of 6e998916dfe3 4.7-rc7 6e998916dfe3 itself 2 2.23 3.68 ( -64.56%) 3.48 (-55.48%) 5 2.83 3.78 ( -33.42%) 3.33 (-17.43%) 8 2.84 4.31 ( -52.12%) 3.37 (-18.76%) 12 3.09 3.61 ( -16.74%) 3.32 ( -7.17%) 21 3.14 4.63 ( -47.36%) 4.01 (-27.71%) 30 3.28 5.75 ( -75.37%) 3.63 (-10.80%) 48 3.02 6.05 (-100.56%) 3.71 (-22.99%) 79 2.88 6.30 (-118.90%) 3.75 (-30.26%) 110 2.95 6.46 (-119.00%) 3.81 (-29.24%) 128 3.05 6.42 (-110.08%) 3.88 (-27.04%) pound_times: threads parent of 6e998916dfe3 4.7-rc7 6e998916dfe3 itself 2 2.27 3.73 ( -64.71%) 3.65 (-61.14%) 5 2.78 3.77 ( -35.56%) 3.45 (-23.98%) 8 2.79 4.41 ( -57.71%) 3.52 (-26.05%) 12 3.02 3.56 ( -17.94%) 3.29 ( -9.08%) 21 3.10 4.61 ( -48.74%) 4.07 (-31.34%) 30 3.33 5.75 ( -72.53%) 3.87 (-16.01%) 48 2.96 6.06 (-105.04%) 3.79 (-28.10%) 79 2.88 6.24 (-116.83%) 3.88 (-34.81%) 110 2.98 6.37 (-114.08%) 3.90 (-31.12%) 128 3.10 6.35 (-104.61%) 4.00 (-28.87%) The source code of the two benchmarks follows. To compile the two: NR_THREADS=42 for FILE in pound_times pound_clock_gettime; do gcc -lrt -O2 -lpthread -DNUM_THREADS=$NR_THREADS $FILE.c -o $FILE done ==== BEGIN pound_times.c ==== struct tms start; void pound (void threadid) { struct tms end; int oldutime = 0; int utime; int i; for (i = 0; i < 5000000 / NUM_THREADS; i++) { times(&end); utime = ((int)end.tms_utime - (int)start.tms_utime); if (oldutime > utime) { printf("utime decreased, was %d, now %d!\n", oldutime, utime); } oldutime = utime; } pthread_exit(NULL); } int main() { pthread_t th[NUM_THREADS]; long i; times(&start); for (i = 0; i < NUM_THREADS; i++) { pthread_create (&th[i], NULL, pound, (void )i); } pthread_exit(NULL); return 0; } ==== END pound_times.c ==== ==== BEGIN pound_clock_gettime.c ==== void pound (void threadid) { struct timespec ts; int rc, i; unsigned long prev = 0, this = 0; for (i = 0; i < 5000000 / NUM_THREADS; i++) { rc = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts); if (rc < 0) perror("clock_gettime"); this = (ts.tv_sec 1000000000) + ts.tv_nsec; if (0 && this < prev) printf("%lu ns timewarp at iteration %d\n", prev - this, i); prev = this; } pthread_exit(NULL); } int main() { pthread_t th[NUM_THREADS]; long rc, i; pid_t pgid; for (i = 0; i < NUM_THREADS; i++) { rc = pthread_create(&th[i], NULL, pound, (void *)i); if (rc < 0) perror("pthread_create"); } pthread_exit(NULL); return 0; } ==== END pound_clock_gettime.c ==== Suggested-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1470385316-15027-2-git-send-email-ggherdovich@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/fair: Fix typo in sync_throttle()	Xunlei Pang	1	-1/+1
	We should update cfs_rq->throttled_clock_task, not pcfs_rq->throttle_clock_task. The effects of this bug was probably occasionally erratic group scheduling, particularly in cgroups-intense workloads. Signed-off-by: Xunlei Pang <xlpang@redhat.com> [ Added changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 55e16d30bd99 ("sched/fair: Rework throttle_count sync") Link: http://lkml.kernel.org/r/1468050862-18864-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10	sched/deadline: Fix wrap-around in DL heap	Tommaso Cucinotta	1	-1/+1
	Current code in cpudeadline.c has a bug in re-heapifying when adding a new element at the end of the heap, because a deadline value of 0 is temporarily set in the new elem, then cpudl_change_key() is called with the actual elem deadline as param. However, the function compares the new deadline to set with the one previously in the elem, which is 0. So, if current absolute deadlines grew so much to have negative values as s64, the comparison in cpudl_change_key() makes the wrong decision. Instead, as from dl_time_before(), the kernel should handle correctly abs deadlines wrap-arounds. This patch fixes the problem with a minimally invasive change that forces cpudl_change_key() to heapify up in this case. Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Luca Abeni <luca.abeni@unitn.it> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1468921493-10054-2-git-send-email-tommaso.cucinotta@sssup.it Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-09	Revert "printk: create pr_<level> functions"	Linus Torvalds	4	-76/+26
	This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142. Geert Uytterhoeven reports: "This change seems to have an (unintendent?) side-effect. Before, pr_() calls without a trailing newline characters would be printed with a newline character appended, both on the console and in the output of the dmesg command. After this commit, no new line character is appended, and the output of the next pr_() call of the same type may be appended, like in: - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000 - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM) + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)" Joe Perches says: "No, that is not intentional. The newline handling code inside vprintk_emit is a bit involved and for now I suggest a revert until this has all the same behavior as earlier" Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Requested-by: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09	ipr: Fix sync scsi scan	Brian King	1	-5/+6
	Commit b195d5e2bffd ("ipr: Wait to do async scan until scsi host is initialized") fixed async scan for ipr, but broke sync scan for ipr. This fixes sync scan back up. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09	mm: memcontrol: only mark charged pages with PageKmemcg	Vladimir Davydov	3	-14/+18
	To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg, which sets page->_mapcount to -512. Currently, we set/clear PageKmemcg in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated with __GFP_ACCOUNT, including those that aren't actually charged to any cgroup, i.e. allocated from the root cgroup context. To avoid overhead in case cgroups are not used, we only do that if memcg_kmem_enabled() is true. The latter is set iff there are kmem-enabled memory cgroups (online or offline). The root cgroup is not considered kmem-enabled. As a result, if a page is allocated with __GFP_ACCOUNT for the root cgroup when there are kmem-enabled memory cgroups and is freed after all kmem-enabled memory cgroups were removed, e.g. # no memory cgroups has been created yet, create one mkdir /sys/fs/cgroup/memory/test # run something allocating pages with __GFP_ACCOUNT, e.g. # a program using pipe dmesg \| tail # remove the memory cgroup rmdir /sys/fs/cgroup/memory/test we'll get bad page state bug complaining about page->_mapcount != -1: BUG: Bad page state in process swapper/0 pfn:1fd945c page:ffffea007f651700 count:0 mapcount:-511 mapping: (null) index:0x0 flags: 0x1000000000000000() To avoid that, let's mark with PageKmemcg only those pages that are actually charged to and hence pin a non-root memory cgroup. Fixes: 4949148ad433 ("mm: charge/uncharge kmemcg from generic page allocator paths") Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09	tracing: Fix tick_stop tracepoint symbols for user export	Steven Rostedt (Red Hat)	1	-3/+11
	The symbols used in the tick_stop tracepoint were not being converted properly into integers in the trace_stop format file. Instead we had this: print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << TICK_DEP_BIT_POSIX_TIMER), "POSIX_TIMER" }, { (1 << TICK_DEP_BIT_PERF_EVENTS), "PERF_EVENTS" }, { (1 << TICK_DEP_BIT_SCHED), "SCHED" }, { (1 << TICK_DEP_BIT_CLOCK_UNSTABLE), "CLOCK_UNSTABLE" }) User space tools have no idea how to parse "TICK_DEP_BIT_SCHED" or the other symbols used to do the bit shifting. The reason is that the conversion was done with using the TICK_DEP_MASK_* symbols which are just macros that convert to the BIT shift itself (with the exception of NONE, which was converted properly, because it doesn't use bits, and is defined as zero). The TICK_DEP_BIT_* needs to be denoted by TRACE_DEFINE_ENUM() in order to have this properly converted for user space tools to parse this event. Cc: stable@vger.kernel.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Fixes: e6e6cc22e067 ("nohz: Use enum code for tick stop failure tracing message") Reported-by: Luiz Capitulino <lcapitulino@redhat.com> Tested-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-09	drm/cirrus: Fix NULL pointer dereference when registering the fbdev	Boris Brezillon	1	-2/+11
	cirrus_modeset_init() is initializing/registering the emulated fbdev and, since commit c61b93fe51b1 ("drm/atomic: Fix remaining places where !funcs->best_encoder is valid"), DRM internals can access/test some of the fields in mode_config->funcs as part of the fbdev registration process. Make sure dev->mode_config.funcs is properly set to avoid dereferencing a NULL pointer. Reported-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Fixes: c61b93fe51b1 ("drm/atomic: Fix remaining places where !funcs->best_encoder is valid") Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-08	gcc-plugins: Add support for plugin subdirectories	Emese Revfy	2	-4/+8
	This adds support for building more complex gcc plugins that live in a subdirectory instead of just in a single source file. Reported-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: clarified commit message] Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-08	gcc-plugins: Automate make rule generation	Emese Revfy	1	-2/+1
	There's no reason to repeat the same names in the Makefile when the .so files have already been listed. The .o list can be generated from them. Reported-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: clarified commit message] Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-08	gcc-plugins: Add support for passing plugin arguments	Emese Revfy	1	-1/+1
	The latent_entropy plugin needs to pass arguments, so this adds the support. Signed-off-by: Emese Revfy <re.emese@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-08	gcc-plugins: abort builds cleanly when not supported	Kees Cook	3	-16/+39
	When the compiler doesn't support gcc plugins (either due to missing headers or too old a version), report the problem and abort the build instead of emitting a warning and letting the build founder with arcane compiler errors. Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-08	kbuild: no gcc-plugins during cc-option tests	Emese Revfy	1	-3/+7
	The gcc-plugins arguments should not be included when performing cc-option tests. Steps to reproduce: 1) make mrproper 2) make defconfig 3) enable GCC_PLUGINS, GCC_PLUGIN_CYC_COMPLEXITY 4) enable FUNCTION_TRACER (it will select other options as well) 5) make && make modules Build errors: MODPOST 18 modules ERROR: "__fentry__" [net/netfilter/xt_nat.ko] undefined! ERROR: "__fentry__" [net/netfilter/xt_mark.ko] undefined! ERROR: "__fentry__" [net/netfilter/xt_addrtype.ko] undefined! ERROR: "__fentry__" [net/netfilter/xt_LOG.ko] undefined! ERROR: "__fentry__" [net/netfilter/nf_nat_sip.ko] undefined! ERROR: "__fentry__" [net/netfilter/nf_nat_irc.ko] undefined! ERROR: "__fentry__" [net/netfilter/nf_nat_ftp.ko] undefined! ERROR: "__fentry__" [net/netfilter/nf_nat.ko] undefined! Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: renamed variable, clarified commit message] Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-09	drm/edid: Set 8 bpc color depth for displays with "DFP 1.x compliant TMDS".	Mario Kleiner	1	-0/+14
	According to E-EDID spec 1.3, table 3.9, a digital video sink with the "DFP 1.x compliant TMDS" bit set is "signal compatible with VESA DFP 1.x TMDS CRGB, 1 pixel / clock, up to 8 bits / color MSB aligned". For such displays, the DFP spec 1.0, section 3.10 "EDID support" says: "If the DFP monitor only supports EDID 1.X (1.1, 1.2, etc.) without extensions, the host will make the following assumptions: 1. 24-bit MSB-aligned RGB TFT 2. DE polarity is active high 3. H and V syncs are active high 4. Established CRT timings will be used 5. Dithering will not be enabled on the host" So if we don't know the bit depth of the display from additional colorimetry info we should assume 8 bpc / 24 bpp by default. This patch adds info->bpc = 8 assignement for that case. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-09	drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown"	Mario Kleiner	1	-15/+5
	This reverts commit 013dd9e03872 ("drm/i915/dp: fall back to 18 bpp when sink capability is unknown") This commit introduced a regression into stable kernels, as it reduces output color depth to 6 bpc for any video sink connected to a Displayport connector if that sink doesn't report a specific color depth via EDID, or if our EDID parser doesn't actually recognize the proper bpc from EDID. Affected are active DisplayPort->VGA converters and active DisplayPort->DVI converters. Both should be able to handle 8 bpc, but are degraded to 6 bpc with this patch. The reverted commit was meant to fix Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=105331 A followup patch implements a fix for that specific bug, which is caused by a faulty EDID of the affected DP panel by adding a new EDID quirk for that panel. DP 18 bpp fallback handling and other improvements to DP sink bpc detection will be handled for future kernels in a separate series of patches. Please backport to stable. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-09	drm/edid: Add 6 bpc quirk for display AEO model 0.	Mario Kleiner	1	-0/+8
	Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=105331 reports that the "AEO model 0" display is driven with 8 bpc without dithering by default, which looks bad because that panel is apparently a 6 bpc DP panel with faulty EDID. A fix for this was made by commit 013dd9e03872 ("drm/i915/dp: fall back to 18 bpp when sink capability is unknown"). That commit triggers new regressions in precision for DP->DVI and DP->VGA displays. A patch is out to revert that commit, but it will revert video output for the AEO model 0 panel to 8 bpc without dithering. The EDID 1.3 of that panel, as decoded from the xrandr output attached to that bugzilla bug report, is somewhat faulty, and beyond other problems also sets the "DFP 1.x compliant TMDS" bit, which according to DFP spec means to drive the panel with 8 bpc and no dithering in absence of other colorimetry information. Try to make the original bug reporter happy despite the faulty EDID by adding a quirk to mark that panel as 6 bpc, so 6 bpc output with dithering creates a nice picture. Tested by injecting the edid from the fdo bug into a DP connector via drm_kms_helper.edid_firmware and verifying the 6 bpc + dithering is selected. This patch should be backported to stable. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: stable@vger.kernel.org Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-08	unsafe_[get\|put]_user: change interface to use a error target label	Linus Torvalds	4	-18/+17
	When I initially added the unsafe_[get\|put]_user() helpers in commit 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses"), I made the mistake of modeling the interface on our traditional __[get\|put]_user() functions, which return zero on success, or -EFAULT on failure. That interface is fairly easy to use, but it's actually fairly nasty for good code generation, since it essentially forces the caller to check the error value for each access. In particular, since the error handling is already internally implemented with an exception handler, and we already use "asm goto" for various other things, we could fairly easily make the error cases just jump directly to an error label instead, and avoid the need for explicit checking after each operation. So switch the interface to pass in an error label, rather than checking the error value in the caller. Best do it now before we start growing more users (the signal handling code in particular would be a good place to use the new interface). So rather than if (unsafe_get_user(x, ptr)) ... handle error .. the interface is now unsafe_get_user(x, ptr, label); where an error during the user mode fetch will now just cause a jump to 'label' in the caller. Right now the actual _implementation_ of this all still ends up being a "if (err) goto label", and does not take advantage of any exception label tricks, but for "unsafe_put_user()" in particular it should be fairly straightforward to convert to using the exception table model. Note that "unsafe_get_user()" is much harder to convert to a clever exception table model, because current versions of gcc do not allow the use of "asm goto" (for the exception) with output values (for the actual value to be fetched). But that is hopefully not a limitation in the long term. [ Also note that it might be a good idea to switch unsafe_get_user() to actually _return_ the value it fetches from user space, but this commit only changes the error handling semantics ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08	printk: Remove unnecessary #ifdef CONFIG_PRINTK	Andreas Ziegler	1	-2/+0
	In commit 874f9c7da9a4 ("printk: create pr_<level> functions"), new pr_level defines were added to printk.c. These new defines are guarded by an #ifdef CONFIG_PRINTK - however, there is already a surrounding #ifdef CONFIG_PRINTK starting a lot earlier in line 249 which means the newly introduced #ifdef is unnecessary. Let's remove it to avoid confusion. Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Cc: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08	dell-wmi: Ignore WMI event 0xe00e	Pali Rohár	1	-2/+2
	WMI event 0xe00e is received when battery was removed or inserted. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-08-08	x86/hweight: Don't clobber %rdi	Ville Syrjälä	1	-0/+2
	The caller expects %rdi to remain intact, push+pop it make that happen. Fixes the following kind of explosions on my core2duo machine when trying to reboot or shut down: general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: i915 i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm netconsole configfs binfmt_misc iTCO_wdt psmouse pcspkr snd_hda_codec_idt e100 coretemp hwmon snd_hda_codec_generic i2c_i801 mii i2c_smbus lpc_ich mfd_core snd_hda_intel uhci_hcd snd_hda_codec snd_hwdep snd_hda_core ehci_pci 8250 ehci_hcd snd_pcm 8250_base usbcore evdev serial_core usb_common parport_pc parport snd_timer snd soundcore CPU: 0 PID: 3070 Comm: reboot Not tainted 4.8.0-rc1-perf-dirty #69 Hardware name: /D946GZIS, BIOS TS94610J.86A.0087.2007.1107.1049 11/07/2007 task: ffff88012a0b4080 task.stack: ffff880123850000 RIP: 0010:[<ffffffff81003c92>] [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0 RSP: 0018:ffff880123853b60 EFLAGS: 00010087 RAX: 0000000000000001 RBX: ffff88012fc0a3c0 RCX: 000000000000001e RDX: 0000000000000000 RSI: 0000000040000000 RDI: ffff88012b014800 RBP: ffff880123853b88 R08: ffffffffffffffff R09: 0000000000000000 R10: ffffea0004a012c0 R11: ffffea0004acedc0 R12: ffffffff80000001 R13: ffff88012b0149c0 R14: ffff88012b014800 R15: 0000000000000018 FS: 00007f8b155cd700(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8b155f5000 CR3: 000000012a2d7000 CR4: 00000000000006f0 Stack: ffff88012fc0a3c0 ffff88012b014800 0000000000000004 0000000000000001 ffff88012fc1b750 ffff880123853bb0 ffffffff81003d59 ffff88012b014800 ffff88012fc0a3c0 ffff88012b014800 ffff880123853bd8 ffffffff81003e13 Call Trace: [<ffffffff81003d59>] x86_pmu_stop+0x59/0xd0 [<ffffffff81003e13>] x86_pmu_del+0x43/0x140 [<ffffffff8111705d>] event_sched_out.isra.105+0xbd/0x260 [<ffffffff8111738d>] __perf_remove_from_context+0x2d/0xb0 [<ffffffff8111745d>] __perf_event_exit_context+0x4d/0x70 [<ffffffff810c8826>] generic_exec_single+0xb6/0x140 [<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0 [<ffffffff81117410>] ? __perf_remove_from_context+0xb0/0xb0 [<ffffffff810c898f>] smp_call_function_single+0xdf/0x140 [<ffffffff81113d27>] perf_event_exit_cpu_context+0x87/0xc0 [<ffffffff81113d73>] perf_reboot+0x13/0x40 [<ffffffff8107578a>] notifier_call_chain+0x4a/0x70 [<ffffffff81075ad7>] __blocking_notifier_call_chain+0x47/0x60 [<ffffffff81075b06>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff81076a1d>] kernel_restart_prepare+0x1d/0x40 [<ffffffff81076ae2>] kernel_restart+0x12/0x60 [<ffffffff81076d56>] SYSC_reboot+0xf6/0x1b0 [<ffffffff811a823c>] ? mntput_no_expire+0x2c/0x1b0 [<ffffffff811a83e4>] ? mntput+0x24/0x40 [<ffffffff811894fc>] ? __fput+0x16c/0x1e0 [<ffffffff811895ae>] ? ____fput+0xe/0x10 [<ffffffff81072fc3>] ? task_work_run+0x83/0xa0 [<ffffffff81001623>] ? exit_to_usermode_loop+0x53/0xc0 [<ffffffff8100105a>] ? trace_hardirqs_on_thunk+0x1a/0x1c [<ffffffff81076e6e>] SyS_reboot+0xe/0x10 [<ffffffff814c4ba5>] entry_SYSCALL_64_fastpath+0x18/0xa3 Code: 7c 4c 8d af c0 01 00 00 49 89 fe eb 10 48 09 c2 4c 89 e0 49 0f b1 55 00 4c 39 e0 74 35 4d 8b a6 c0 01 00 00 41 8b 8e 60 01 00 00 <0f> 33 8b 35 6e 02 8c 00 48 c1 e2 20 85 f6 7e d2 48 89 d3 89 cf RIP [<ffffffff81003c92>] x86_perf_event_update+0x52/0xc0 RSP <ffff880123853b60> ---[ end trace 7ec95181faf211be ]--- note: reboot[3070] exited with preempt_count 2 Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Fixes: f5967101e9de ("x86/hweight: Get rid of the special calling convention") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08	drm: Paper over locking inversion after registration rework	Daniel Vetter	1	-5/+3
	drm_connector_register_all requires a few too many locks because our connector_list locking is busted. Add another FIXME+hack to work around this. This should address the below lockdep splat: ====================================================== [ INFO: possible circular locking dependency detected ] 4.7.0-rc5+ #524 Tainted: G O ------------------------------------------------------- kworker/u8:0/6 is trying to acquire lock: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120 but task is already holding lock: ((fb_notifier_list).rwsem){++++.+}, at: [<ffffffff810ac195>] __blocking_notifier_call_chain+0x35/0x70 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((fb_notifier_list).rwsem){++++.+}: [<ffffffff810df611>] lock_acquire+0xb1/0x200 [<ffffffff819a55b4>] down_write+0x44/0x80 [<ffffffff810abf91>] blocking_notifier_chain_register+0x21/0xb0 [<ffffffff814c7448>] fb_register_client+0x18/0x20 [<ffffffff814c6c86>] backlight_device_register+0x136/0x260 [<ffffffffa0127eb2>] intel_backlight_device_register+0xa2/0x160 [i915] [<ffffffffa00f46be>] intel_connector_register+0xe/0x10 [i915] [<ffffffffa0112bfb>] intel_dp_connector_register+0x1b/0x80 [i915] [<ffffffff8159dfea>] drm_connector_register+0x4a/0x80 [<ffffffff8159fe44>] drm_connector_register_all+0x64/0xf0 [<ffffffff815a2a64>] drm_modeset_register_all+0x174/0x1c0 [<ffffffff81599b72>] drm_dev_register+0xc2/0xd0 [<ffffffffa00621d7>] i915_driver_load+0x1547/0x2200 [i915] [<ffffffffa006d80f>] i915_pci_probe+0x4f/0x70 [i915] [<ffffffff814a2135>] local_pci_probe+0x45/0xa0 [<ffffffff814a349b>] pci_device_probe+0xdb/0x130 [<ffffffff815c07e3>] driver_probe_device+0x223/0x440 [<ffffffff815c0ad5>] __driver_attach+0xd5/0x100 [<ffffffff815be386>] bus_for_each_dev+0x66/0xa0 [<ffffffff815c002e>] driver_attach+0x1e/0x20 [<ffffffff815bf9be>] bus_add_driver+0x1ee/0x280 [<ffffffff815c1810>] driver_register+0x60/0xe0 [<ffffffff814a1a10>] __pci_register_driver+0x60/0x70 [<ffffffffa01a905b>] i915_init+0x5b/0x62 [i915] [<ffffffff8100042d>] do_one_initcall+0x3d/0x150 [<ffffffff811a935b>] do_init_module+0x5f/0x1d9 [<ffffffff81124416>] load_module+0x20e6/0x27e0 [<ffffffff81124d63>] SYSC_finit_module+0xc3/0xf0 [<ffffffff81124dae>] SyS_finit_module+0xe/0x10 [<ffffffff819a83a9>] entry_SYSCALL_64_fastpath+0x1c/0xac -> #0 (&dev->mode_config.mutex){+.+.+.}: [<ffffffff810df0ac>] __lock_acquire+0x10fc/0x1260 [<ffffffff810df611>] lock_acquire+0xb1/0x200 [<ffffffff819a3097>] mutex_lock_nested+0x67/0x3c0 [<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120 [<ffffffff8158f79b>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2b/0x80 [<ffffffff8158f81d>] drm_fb_helper_set_par+0x2d/0x50 [<ffffffffa0105f7a>] intel_fbdev_set_par+0x1a/0x60 [i915] [<ffffffff814c13c6>] fbcon_init+0x586/0x610 [<ffffffff8154d16a>] visual_init+0xca/0x130 [<ffffffff8154e611>] do_bind_con_driver+0x1c1/0x3a0 [<ffffffff8154eaf6>] do_take_over_console+0x116/0x180 [<ffffffff814bd3a7>] do_fbcon_takeover+0x57/0xb0 [<ffffffff814c1e48>] fbcon_event_notify+0x658/0x750 [<ffffffff810abcae>] notifier_call_chain+0x3e/0xb0 [<ffffffff810ac1ad>] __blocking_notifier_call_chain+0x4d/0x70 [<ffffffff810ac1e6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff814c748b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff814c86b1>] register_framebuffer+0x251/0x330 [<ffffffff8158fa9f>] drm_fb_helper_initial_config+0x25f/0x3f0 [<ffffffffa0106b48>] intel_fbdev_initial_config+0x18/0x30 [i915] [<ffffffff810adfd8>] async_run_entry_fn+0x48/0x150 [<ffffffff810a3947>] process_one_work+0x1e7/0x750 [<ffffffff810a3efb>] worker_thread+0x4b/0x4f0 [<ffffffff810aad4f>] kthread+0xef/0x110 [<ffffffff819a85ef>] ret_from_fork+0x1f/0x40 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((fb_notifier_list).rwsem); lock(&dev->mode_config.mutex); lock((fb_notifier_list).rwsem); lock(&dev->mode_config.mutex); * DEADLOCK * 6 locks held by kworker/u8:0/6: #0: ("events_unbound"){.+.+.+}, at: [<ffffffff810a38c9>] process_one_work+0x169/0x750 #1: ((&entry->work)){+.+.+.}, at: [<ffffffff810a38c9>] process_one_work+0x169/0x750 #2: (registration_lock){+.+.+.}, at: [<ffffffff814c8487>] register_framebuffer+0x27/0x330 #3: (console_lock){+.+.+.}, at: [<ffffffff814c86ce>] register_framebuffer+0x26e/0x330 #4: (&fb_info->lock){+.+.+.}, at: [<ffffffff814c78dd>] lock_fb_info+0x1d/0x40 #5: ((fb_notifier_list).rwsem){++++.+}, at: [<ffffffff810ac195>] __blocking_notifier_call_chain+0x35/0x70 stack backtrace: CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: G O 4.7.0-rc5+ #524 Hardware name: Intel Corp. Broxton P/NOTEBOOK, BIOS APLKRVPA.X64.0138.B33.1606250842 06/25/2016 Workqueue: events_unbound async_run_entry_fn 0000000000000000 ffff8800758577f0 ffffffff814507a5 ffffffff828b9900 ffffffff828b9900 ffff880075857830 ffffffff810dc6fa ffff880075857880 ffff88007584d688 0000000000000005 0000000000000006 ffff88007584d6b0 Call Trace: [<ffffffff814507a5>] dump_stack+0x67/0x92 [<ffffffff810dc6fa>] print_circular_bug+0x1aa/0x200 [<ffffffff810df0ac>] __lock_acquire+0x10fc/0x1260 [<ffffffff810df611>] lock_acquire+0xb1/0x200 [<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120 [<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120 [<ffffffff819a3097>] mutex_lock_nested+0x67/0x3c0 [<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120 [<ffffffff810fa85f>] ? rcu_read_lock_sched_held+0x7f/0x90 [<ffffffff81208218>] ? kmem_cache_alloc_trace+0x248/0x2b0 [<ffffffff815afdc5>] ? drm_modeset_lock_all+0x25/0x120 [<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120 [<ffffffff8158f79b>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2b/0x80 [<ffffffff8158f81d>] drm_fb_helper_set_par+0x2d/0x50 [<ffffffffa0105f7a>] intel_fbdev_set_par+0x1a/0x60 [i915] [<ffffffff814c13c6>] fbcon_init+0x586/0x610 [<ffffffff8154d16a>] visual_init+0xca/0x130 [<ffffffff8154e611>] do_bind_con_driver+0x1c1/0x3a0 [<ffffffff8154eaf6>] do_take_over_console+0x116/0x180 [<ffffffff814bd3a7>] do_fbcon_takeover+0x57/0xb0 [<ffffffff814c1e48>] fbcon_event_notify+0x658/0x750 [<ffffffff810abcae>] notifier_call_chain+0x3e/0xb0 [<ffffffff810ac1ad>] __blocking_notifier_call_chain+0x4d/0x70 [<ffffffff810ac1e6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff814c748b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff814c86b1>] register_framebuffer+0x251/0x330 [<ffffffff815b7e8d>] ? vga_switcheroo_client_fb_set+0x5d/0x70 [<ffffffff8158fa9f>] drm_fb_helper_initial_config+0x25f/0x3f0 [<ffffffffa0106b48>] intel_fbdev_initial_config+0x18/0x30 [i915] [<ffffffff810adfd8>] async_run_entry_fn+0x48/0x150 [<ffffffff810a3947>] process_one_work+0x1e7/0x750 [<ffffffff810a38c9>] ? process_one_work+0x169/0x750 [<ffffffff810a3efb>] worker_thread+0x4b/0x4f0 [<ffffffff810a3eb0>] ? process_one_work+0x750/0x750 [<ffffffff810aad4f>] kthread+0xef/0x110 [<ffffffff819a85ef>] ret_from_fork+0x1f/0x40 [<ffffffff810aac60>] ? kthread_stop+0x2e0/0x2e0 v2: Rebase onto the right branch (hand-editing patches ftw) and add more reporters. Reported-by: Imre Deak <imre.deak@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: Jiri Kosina <jikos@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-08	drm: rcar-du: Link HDMI encoder with bridge	Laurent Pinchart	1	-0/+1
	The conversion of the rcar-du driver from the I2C slave encoder to the DRM bridge API left the HDMI encoder's bridge pointer NULL, preventing the bridge from being handled automatically by the DRM core. Fix it. Fixes: 1d926114d8f4 ("drm: rcar-du: Remove i2c slave encoder interface for hdmi encoder") Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-07	Linux 4.8-rc1	Linus Torvalds	1	-2/+2

2016-08-07	block: rename bio bi_rw to bi_opf	Jens Axboe	51	-157/+158
	Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	target: iblock_execute_sync_cache() should use bio_set_op_attrs()	Jens Axboe	1	-1/+1
	The original commit missed this function, it needs to mark it a write flush. Cc: Mike Christie <mchristi@redhat.com> Fixes: e742fc32fcb4 ("target: use bio op accessors") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	mm: make __swap_writepage() use bio_set_op_attrs()	Jens Axboe	1	-2/+3
	Cleaner than manipulating bio->bi_rw flags directly. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	block/mm: make bdev_ops->rw_page() take a bool for read/write	Jens Axboe	11	-53/+51
	Commit abf545484d31 changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07	fs: return EPERM on immutable inode	Eryu Guan	4	-4/+5
	In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. So converting all EACCES to EPERM on immutable inode. Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-05	ramoops: use persistent_ram_free() instead of kfree() for freeing prz	Hiraku Toyooka	1	-3/+3
	persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(), which includes vmap() or ioremap(). But they are currently freed by kfree(). This uses persistent_ram_free() for correct this asymmetry usage. Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.kw@hitachi.com> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Seiji Aguchi <seiji.aguchi.tr@hitachi.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-05	ramoops: use DT reserved-memory bindings	Kees Cook	4	-33/+56
	Instead of a ramoops-specific node, use a child node of /reserved-memory. This requires that of_platform_device_create() be explicitly called for the node, though, since "/reserved-memory" does not have its own "compatible" property. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rob Herring <robh@kernel.org>
2016-08-05	drm/ttm: Wait for a BO to become idle before unbinding it from GTT	Michel Dänzer	6	-9/+19
	Fixes hangs under memory pressure, e.g. running the piglit test tex3d-maxsize concurrently with other tests. Fixes: 17d33bc9d6ef ("drm/ttm: drop waiting for idle in ttm_bo_evict.") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-08-05	NTB: ntb_hw_intel: use local variable pdev	Allen Hubbe	1	-5/+5
	Clean up duplicated expression by replacing it with the equivalent local variable pdev. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	NTB: ntb_hw_intel: show BAR size in debugfs info	Allen Hubbe	1	-1/+38
	It will be useful to know the hardware configured BAR size to diagnose issues with NTB memory windows. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_test: Add a selftest script for the NTB subsystem	Logan Gunthorpe	2	-0/+423
	This script automates testing doorbells, scratchpads and memory windows for an NTB device. It can be run locally, with the NTB looped back to the same host or use SSH to remotely control the second host. In the single host case, the script just needs to be passed two arguments: a PCI ID for each side of the link. In the two host case the -r option must be used to specify the remote hostname (which must be SSH accessible and should probably have ssh-keys exchanged). A sample run looks like this: $ sudo ./ntb_test.sh 0000:03:00.1 0000:83:00.1 -p 29 Starting ntb_tool tests... Running link tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running link tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running db tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running db tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running spad tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running spad tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running mw0 tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running mw0 tests on: 0000:83:00.1 / 0000:03:00.1 Passed Running mw1 tests on: 0000:03:00.1 / 0000:83:00.1 Passed Running mw1 tests on: 0000:83:00.1 / 0000:03:00.1 Passed Starting ntb_pingpong tests... Running ping pong tests on: 0000:03:00.1 / 0000:83:00.1 Passed Starting ntb_perf tests... Running local perf test without DMA 0: copied 536870912 bytes in 164453 usecs, 3264 MBytes/s Passed Running remote perf test without DMA 0: copied 536870912 bytes in 164453 usecs, 3264 MBytes/s Passed Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: clear link_is_up flag when the link goes down.	Logan Gunthorpe	1	-19/+9
	When the link goes down, the link_is_up flag did not return to false. This could have caused some subtle corner case bugs when the link goes up and down quickly. Once that was fixed, there was found to be a race if the link was brought down then immediately up. The link_cleanup work would occasionally be scheduled after the next link up event. This would cancel the link_work that was supposed to occur and leave ntb_perf in an unusable state. To fix this we get rid of the link_cleanup work and put the actions directly in the link_down event. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_pingpong: Add a debugfs file to get the ping count	Logan Gunthorpe	1	-1/+61
	This commit adds a debugfs 'count' file to ntb_pingpong. This is so testing with ntb_pingpong can be automated beyond just checking the logs for pong messages. The count file returns a number which increments every pong. The counter can be cleared by writing a zero. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_tool: Add link status and files to debugfs	Logan Gunthorpe	1	-0/+92
	In order to more successfully script with ntb_tool it's useful to have a link file to check the link status so that the script doesn't use the other files until the link is up. This commit adds a 'link' file to the debugfs directory which reads boolean (Y or N) depending on the link status. Writing to the file change the link state using ntb_link_enable or ntb_link_disable. A 'link_event' file is also provided so an application can block until the link changes to the desired state. If the user writes a 1, it will block until the link is up. If the user writes a 0, it will block until the link is down. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_tool: Postpone memory window initialization for the user	Logan Gunthorpe	1	-138/+228
	In order to make the interface closer to the raw NTB API, this commit changes memory windows so they are not initialized on link up. Instead, the 'peer_trans' debugfs files are introduced. When read, they return information provided by ntb_mw_get_range. When written, they create a buffer and initialize the memory window. The value written is taken as the requested size of the buffer (which is then rounded for alignment). Writing a value of zero frees the buffer and tears down the memory window translation. The 'peer_mw' file is only created once the memory window translation is setup by the user. Additionally, it was noticed that the read and write functions for the 'peer_mw*' files should have checked for a NULL pointer. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05	ntb_perf: Wait for link before running test	Logan Gunthorpe	1	-1/+4
	Instead of returning immediately with an error when the link is down, wait for the link to come up (or the user sends a SIGINT). This is to make scripting ntb_perf easier. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>