aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-01-08Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-1/+2
Pull scheduler fixes from Ingo Molnar: "Misc scheduler fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Reset task's lockless wake-queues on fork() sched/core: Fix unserialized r-m-w scribbling stuff sched/core: Check tgid in is_global_init() sched/fair: Fix multiplication overflow on 32-bit systems
2016-01-08Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-29/+6
Pull perf fixes from Ingo Molnar: "Two core subsystem fixes, plus a handful of tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix race in swevent hash perf: Fix race in perf_event_exec() perf list: Robustify event printing routine perf list: Add support for PERF_COUNT_SW_BPF_OUT perf hists browser: Fix segfault if use symbol filter in cmdline perf hists browser: Reset selection when refresh perf hists browser: Add NULL pointer check to prevent crash perf buildid-list: Fix return value of perf buildid-list -k perf buildid-list: Show running kernel build id fix
2016-01-08Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+3
Pull irq fix from Ingo Molnar: "Fixes a core IRQ subsystem deadlock" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent chip buslock deadlock
2016-01-07ftrace: Fix the race between ftrace and insmodQiu Peiyang1-9/+16
We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip. Basically, there is a race between insmod and ftrace_run_update_code. After load_module=>ftrace_module_init, another thread jumps in to call ftrace_run_update_code=>ftrace_arch_code_modify_prepare =>set_all_modules_text_rw, to change all modules as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute is not changed. Then, the 2nd thread goes ahead to change codes. However, load_module continues to call complete_formation=>set_section_ro_nx, then 2nd thread would fail when probing the module's TEXT. The patch fixes it by using notifier to delay the enabling of ftrace records to the time when module is at state MODULE_STATE_COMING. Link: http://lkml.kernel.org/r/567CE628.3000609@intel.com Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07Merge tag 'trace-v4.4-rc4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds1-0/+6
Pull ftrace fix from Steven Rostedt: "PeiyangX Qiu reported that if a module fails to load between calling ftrace_module_init() and do_init_module() that the allocations made in ftrace_module_init() will not be freed, resulting in a memory leak. The solution is to call ftrace_release_mod() on the failing module in the fail path befor do_init_module() is called. This will remove any allocations made for that module, and nothing if ftrace_module_init() wasn't called yet for that module. Note, once do_init_module() is called, the MODULE_GOING notifiers are called for the failed module, which calls into the ftrace code to do the proper clean up (basically calling ftrace_release_mod())" * tag 'trace-v4.4-rc4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/module: Call clean up function when module init fails early
2016-01-07ftrace: Add infrastructure for delayed enabling of module functionsSteven Rostedt (Red Hat)1-55/+106
Qiu Peiyang pointed out that there's a race when enabling function tracing and loading a module. In order to make the modifications of converting nops in the prologue of functions into callbacks, the text needs to be converted from read-only to read-write. When enabling function tracing, the text permission is updated, the functions are modified, and then they are put back. When loading a module, the updates to convert function calls to mcount is done before the module text is set to read-only. But after it is done, the module text is visible by the function tracer. Thus we have the following race: CPU 0 CPU 1 ----- ----- start function tracing set text to read-write load_module add functions to ftrace set module text read-only update all functions to callbacks modify module functions too < Can't it's read-only > When this happens, ftrace detects the issue and disables itself till the next reboot. To fix this, a new DISABLED flag is added for ftrace records, which all module functions get when they are added. Then later, after the module code is all set, the records will have the DISABLED flag cleared, and they will be enabled if any callback wants all functions to be traced. Note, this doesn't add the delay to later. It simply changes the ftrace_module_init() to do both the setting of DISABLED records, and then immediately calls the enable code. This helps with testing this new code as it has the same behavior as previously. Another change will come after this to have the ftrace_module_enable() called after the text is set to read-only. Cc: Qiu Peiyang <peiyangx.qiu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07ftrace/module: Call clean up function when module init fails earlySteven Rostedt (Red Hat)1-0/+6
If the module init code fails after calling ftrace_module_init() and before calling do_init_module(), we can suffer from a memory leak. This is because ftrace_module_init() allocates pages to store the locations that ftrace hooks are placed in the module text. If do_init_module() fails, it still calls the MODULE_GOING notifiers which will tell ftrace to do a clean up of the pages it allocated for the module. But if load_module() fails before then, the pages allocated by ftrace_module_init() will never be freed. Call ftrace_release_mod() on the module if load_module() fails before getting to do_init_module(). Link: http://lkml.kernel.org/r/567CEA31.1070507@intel.com Reported-by: "Qiu, PeiyangX" <peiyangx.qiu@intel.com> Fixes: a949ae560a511 "ftrace/module: Hardcode ftrace_module_init() call into load_module()" Cc: stable@vger.kernel.org # v2.6.38+ Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07workqueue: simplify the apply_workqueue_attrs_locked()wanghaibin1-7/+4
If the apply_wqattrs_prepare() returns NULL, it has already cleaned up the related resources, so it can return directly and avoid calling the clean up function again. This doesn't introduce any functional changes. Signed-off-by: wanghaibin <wanghaibin.wang@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
2016-01-06Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcuIngo Molnar9-217/+342
Pull RCU changes from Paul E. McKenney: - Adding transitivity uniformly to rcu_node structure ->lock acquisitions. (This is implemented by the first two commits on top of v4.4-rc2 due to the pervasive nature of this change.) - Documentation updates, including RCU requirements. - Expedited grace-period changes. - Miscellaneous fixes. - Linked-list fixes, courtesy of KTSAN. - Torture-test updates. - Late-breaking fix to sysrq-generated crash. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06perf/core: Collapse more IPI loopsPeter Zijlstra1-73/+33
This patch collapses the two 'hard' cases, which are perf_event_{dis,en}able(). I cannot seem to convince myself the current code is correct. So starting with perf_event_disable(); we don't strictly need to test for event->state == ACTIVE, ctx->is_active is enough. If the event is not scheduled while the ctx is, __perf_event_disable() still does the right thing. Its a little less efficient to IPI in that case, over-all simpler. For perf_event_enable(); the same goes, but I think that's actually broken in its current form. The current condition is: ctx->is_active && event->state == OFF, that means it doesn't do anything when !ctx->active && event->state == OFF. This is wrong, it should still mark the event INACTIVE in that case, otherwise we'll still not try and schedule the event once the context becomes active again. This patch implements the two function using the new event_function_call() and does away with the tricky event->state tests. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06Merge branch 'perf/urgent' into perf/core, to pick up fixes before applying new changesIngo Molnar1-29/+6
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()Yuyang Du1-10/+28
If a newly created task is selected to go to a different CPU in fork balance when it wakes up the first time, its load averages should not be removed from the source CPU since they are never added to it before. The same is also applicable to a never used group entity. Fix it in remove_entity_load_avg(): when entity's last_update_time is 0, simply return. This should precisely identify the case in question, because in other migrations, the last_update_time is set to 0 after remove_entity_load_avg(). Reported-by: Steve Muckle <steve.muckle@linaro.org> Signed-off-by: Yuyang Du <yuyang.du@intel.com> [peterz: cfs_rq_last_update_time] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <Juri.Lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06sched/deadline: Fix the earliest_dl.next logicWanpeng Li1-52/+7
earliest_dl.next should cache deadline of the earliest ready task that is also enqueued in the pushable rbtree, as pull algorithm uses this information to find candidates for migration: if the earliest_dl.next deadline of source rq is earlier than the earliest_dl.curr deadline of destination rq, the task from the source rq can be pulled. However, current implementation only guarantees that earliest_dl.next is the deadline of the next ready task instead of the next pushable task; which will result in potentially holding both rqs' lock and find nothing to migrate because of affinity constraints. In addition, current logic doesn't update the next candidate for pushing in pick_next_task_dl(), even if the running task is never eligible. This patch fixes both problems by updating earliest_dl.next when pushable dl task is enqueued/dequeued, similar to what we already do for RT. Tested-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1449135730-27202-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06Merge branch 'sched/urgent' into sched/core, to pick up fixes before merging new patchesIngo Molnar29-206/+323
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06sched/core: Reset task's lockless wake-queues on fork()Sebastian Andrzej Siewior1-0/+1
In the following commit: 7675104990ed ("sched: Implement lockless wake-queues") we gained lockless wake-queues. The -RT kernel managed to lockup itself with those. There could be multiple attempts for task X to enqueue it for a wakeup _even_ if task X is already running. The reason is that task X could be runnable but not yet on CPU. The the task performing the wakeup did not leave the CPU it could performe multiple wakeups. With the proper timming task X could be running and enqueued for a wakeup. If this happens while X is performing a fork() then its its child will have a !NULL `wake_q` member copied. This is not a problem as long as the child task does not participate in lockless wakeups :) Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 7675104990ed ("sched: Implement lockless wake-queues") Link: http://lkml.kernel.org/r/20151221171710.GA5499@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06sched/fair: Fix multiplication overflow on 32-bit systemsAndrey Ryabinin1-1/+1
Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX' on 32-bit systems: UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18 signed integer overflow: 87950 * 47742 cannot be represented in type 'int' The most likely effect of this bug are bad load average numbers resulting in weird scheduling. It's also likely that this can persist for a longer time - until the system goes idle for a long time so that all load avg numbers get reset. [ This is the CFS load average metric, not the procfs output, which is separate. ] Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com [ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06perf: Fix race in swevent hashPeter Zijlstra1-19/+1
There's a race on CPU unplug where we free the swevent hash array while it can still have events on. This will result in a use-after-free which is BAD. Simply do not free the hash array on unplug. This leaves the thing around and no use-after-free takes place. When the last swevent dies, we do a for_each_possible_cpu() iteration anyway to clean these up, at which time we'll free it, so no leakage will occur. Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06perf: Fix race in perf_event_exec()Peter Zijlstra1-10/+5
I managed to tickle this warning: [ 2338.884942] ------------[ cut here ]------------ [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80() [ 2338.900504] Modules linked in: [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244 [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 2338.923071] ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000 [ 2338.931382] ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400 [ 2338.939678] 0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00 [ 2338.947987] Call Trace: [ 2338.950726] [<ffffffff815c680c>] dump_stack+0x4e/0x82 [ 2338.956474] [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0 [ 2338.963195] [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20 [ 2338.969720] [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80 [ 2338.976244] [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180 [ 2338.982575] [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0 [ 2338.988810] [<ffffffff8126de83>] load_elf_binary+0x393/0x1660 [ 2338.995339] [<ffffffff811dc772>] ? get_user_pages+0x52/0x60 [ 2339.001669] [<ffffffff8121e297>] search_binary_handler+0x97/0x200 [ 2339.008581] [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0 [ 2339.016072] [<ffffffff8121fcea>] SyS_execve+0x3a/0x50 [ 2339.021819] [<ffffffff819fc165>] stub_execve+0x5/0x5 [ 2339.027469] [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71 [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]--- Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not what we expected it to be. This is because context switches can swap the task_struct::perf_event_ctxp[] pointer around. Therefore you have to either disable preemption when looking at current, or hold ctx->lock. Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[] before disabling interrupts, therefore a preemption in the right place can swap contexts around and we're using the wrong one. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Potapenko <glider@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: syzkaller <syzkaller@googlegroups.com> Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-05Merge tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds1-0/+1
Pull tracing fixes from Steven Rostedt: "Two more fixes: 1. The recordmcount change had an output that used sprintf() (incorrectly) when it should have been a fprintf() to stderr. 2. The printk_formats file could crash if someone added a trace_printk() in the core kernel, and also added one in a module. This does not affect production kernels. Only kernels where developers add trace_printk() for debugging can crash" * tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix setting of start_index in find_next() ftrace/scripts: Fix incorrect use of sprintf in recordmcount
2016-01-04PM / sleep: Add support for read-only sysfs attributesRafael J. Wysocki2-15/+11
Some sysfs attributes in /sys/power/ should really be read-only, so add support for that, convert those attributes to read-only and drop the stub .show() routines from them. Original-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-04tracing: Fix setting of start_index in find_next()Qiu Peiyang1-0/+1
When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel panic at t_show. general protection fault: 0000 [#1] PREEMPT SMP CPU: 0 PID: 2957 Comm: sh Tainted: G W O 3.14.55-x86_64-01062-gd4acdc7 #2 RIP: 0010:[<ffffffff811375b2>] [<ffffffff811375b2>] t_show+0x22/0xe0 RSP: 0000:ffff88002b4ebe80 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1 RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0 R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570 FS: 0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0 Call Trace: [<ffffffff811dc076>] seq_read+0x2f6/0x3e0 [<ffffffff811b749b>] vfs_read+0x9b/0x160 [<ffffffff811b7f69>] SyS_read+0x49/0xb0 [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13 ---[ end trace 5bd9eb630614861e ]--- Kernel panic - not syncing: Fatal exception When the first time find_next calls find_next_mod_format, it should iterate the trace_bprintk_fmt_list to find the first print format of the module. However in current code, start_index is smaller than *pos at first, and code will not iterate the list. Latter container_of will get the wrong address with former v, which will cause mod_fmt be a meaningless object and so is the returned mod_fmt->fmt. This patch will fix it by correcting the start_index. After fixed, when the first time calls find_next_mod_format, start_index will be equal to *pos, and code will iterate the trace_bprintk_fmt_list to get the right module printk format, so is the returned mod_fmt->fmt. Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com Cc: stable@vger.kernel.org # 3.12+ Fixes: 102c9323c35a8 "tracing: Add __tracepoint_string() to export string pointers" Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-04kernel/*: switch to memdup_user_nul()Al Viro4-95/+48
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04convert a bunch of open-coded instances of memdup_user_nul()Al Viro1-9/+3
A _lot_ of ->write() instances were open-coding it; some are converted to memdup_user_nul(), a lot more remain... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-02cgroup: demote subsystem init messages to KERN_DEBUGTejun Heo1-1/+1
These are noisy during boot and not all that interesting. Signed-off-by: Tejun Heo <tj@kernel.org>
2015-12-29bpf: hash: use per-bucket spinlocktom.leiming@gmail.com1-18/+32
Both htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, so it isn't efficient to use a per-hashtable lock in this two helpers. The per-hashtable spinlock is used for protecting bucket's hlist, and per-bucket lock is just enough. This patch converts the per-hashtable lock into per-bucket spinlock, so that contention can be decreased a lot. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-29bpf: hash: move select_bucket() out of htab's spinlocktom.leiming@gmail.com1-4/+2
The spinlock is just used for protecting the per-bucket hlist, so it isn't needed for selecting bucket. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-29bpf: hash: use atomic counttom.leiming@gmail.com1-6/+6
Preparing for removing global per-hashtable lock, so the counter need to be defined as aotmic_t first. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-29posix-clock: Fix return code on the poll method's error pathRichard Cochran1-2/+2
The posix_clock_poll function is supposed to return a bit mask of POLLxxx values. However, in case the hardware has disappeared (due to hot plugging for example) this code returns -ENODEV in a futile attempt to throw an error at the file descriptor level. The kernel's file_operations interface does not accept such error codes from the poll method. Instead, this function aught to return POLLERR. The value -ENODEV does, in fact, contain the POLLERR bit (and almost all the other POLLxxx bits as well), but only by chance. This patch fixes code to return a proper bit mask. Credit goes to Markus Elfring for pointing out the suspicious signed/unsigned mismatch. Reported-by: Markus Elfring <elfring@users.sourceforge.net> igned-off-by: Richard Cochran <richardcochran@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Link: http://lkml.kernel.org/r/1450819198-17420-1-git-send-email-richardcochran@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-29Merge branch 'irq/gic-v2m-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/coreThomas Gleixner1-1/+1
Pull another round of GIC changes from Marc: ACPI support for GIV-v2m
2015-12-24security: Make inode argument of inode_getsecid non-constAndreas Gruenbacher3-5/+5
Make the inode argument of the inode_getsecid hook non-const so that we can use it to revalidate invalid security labels. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-12-23tracing: Fix comment to use tracing_on over tracing_enableChuyu Hu1-2/+2
The file tracing_enable is obsolete and does not exist anymore. Replace the comment that references it with the proper tracing_on file. Link: http://lkml.kernel.org/r/1450787141-45544-1-git-send-email-chuhu@redhat.com Signed-off-by: Chuyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23ftrace: Clean up ftrace_module_init() codeSteven Rostedt (Red Hat)1-6/+3
The start and end variables were only used when ftrace_module_init() was split up into multiple functions. No need to keep them around after the merger. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23ftrace: Join functions ftrace_module_init() and ftrace_init_module()Abel Vesa1-9/+6
Simple cleanup. No need for two functions here. The whole work can simply be done inside 'ftrace_module_init'. Link: http://lkml.kernel.org/r/1449067197-5718-1-git-send-email-abelvesa@linux.com Signed-off-by: Abel Vesa <abelvesa@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23bpf: Constify bpf_verifier_ops structureJulia Lawall1-1/+1
This bpf_verifier_ops structure is never modified, like the other bpf_verifier_ops structures, so declare it as const. Done with the help of Coccinelle. Link: http://lkml.kernel.org/r/1449855359-13724-1-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags tooSteven Rostedt (Red Hat)1-12/+18
Jiri Olsa noted that the change to replace the control_ops did not update the trampoline for when running perf on a single CPU and with CONFIG_PREEMPT disabled (where dynamic ops, like perf, can use trampolines directly). The result was that perf function could be called when RCU is not watching as well as not handle the ftrace_local_disable(). Modify the ftrace_ops_get_func() to also check the RCU and PER_CPU ops flags and use the recursive function if they are set. The recursive function is modified to check those flags and execute the appropriate checks if they are set. Link: http://lkml.kernel.org/r/20151201134213.GA14155@krava.brq.redhat.com Reported-by: Jiri Olsa <jolsa@redhat.com> Patch-fixed-up-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23ftrace: Remove use of control list and opsSteven Rostedt (Red Hat)3-91/+39
Currently perf has its own list function within the ftrace infrastructure that seems to be used only to allow for it to have per-cpu disabling as well as a check to make sure that it's not called while RCU is not watching. It uses something called the "control_ops" which is used to iterate over ops under it with the control_list_func(). The problem is that this control_ops and control_list_func unnecessarily complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags (FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code that is special with the control ops and add the needed checks within the generic ftrace_list_func(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23ftrace: Fix output of enabled_functions for showing trampSteven Rostedt (Red Hat)1-3/+4
When showing all tramps registered to a ftrace record in the file enabled_functions, it exits the loop with ops == NULL. But then it is suppose to show the function on the ops->trampoline and add_trampoline_func() is called with the given ops. But because ops is now NULL (to exit the loop), it always shows the static trampoline instead of the one that is really registered to the record. The call to add_trampoline_func() that shows the trampoline for the given ops needs to be called at every iteration. Fixes: 39daa7b9e895 "ftrace: Show all tramps registered to a record on ftrace_bug()" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23ftrace: Fix a typo in commentLi Bin1-1/+1
s/ARCH_SUPPORT_FTARCE_OPS/ARCH_SUPPORTS_FTRACE_OPS/ Link: http://lkml.kernel.org/r/1448879016-8659-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23Merge branch 'for-linus' into for-nextTakashi Iwai20-166/+229
Conflicts: drivers/gpu/drm/i915/intel_pm.c
2015-12-21missing include asm/paravirt.h in cputime.cStefano Stabellini1-0/+3
Add include asm/paravirt.h to cputime.c, as steal_account_process_tick calls paravirt_steal_clock, which is defined in asm/paravirt.h. The ifdef CONFIG_PARAVIRT is necessary because not all archs have an asm/paravirt.h to include. The reason why currently cputime.c compiles, even though include <asm/paravirt.h> is missing, is that on x86 asm/paravirt.h is included by one of the other headers included in kernel/sched/cputime.c: On arm and arm64, where I am about to introduce asm/paravirt.h and stolen time support, without #include <asm/paravirt.h> in cputime.c, I would get an error. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-12-21irqdomain: Introduce is_fwnode_irqchip helperSuravee Suthikulpanit1-1/+1
Since there will be several places checking if fwnode.type is equal FWNODE_IRQCHIP, this patch adds a convenient function for this purpose. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-12-20futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT opDarren Hart1-1/+2
While reviewing Michael Kerrisk's recent futex manpage update, I noticed that we allow the FUTEX_CLOCK_REALTIME flag for FUTEX_WAIT_BITSET but not for FUTEX_WAIT. FUTEX_WAIT is treated as a simple version for FUTEX_WAIT_BITSET internally (with a bitmask of FUTEX_BITSET_MATCH_ANY). As such, I cannot come up with a reason for this exclusion for FUTEX_WAIT. This change does modify the behavior of the futex syscall, changing a call with FUTEX_WAIT | FUTEX_CLOCK_REALTIME from returning -ENOSYS, to be equivalent to FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME with a bitset of FUTEX_BITSET_MATCH_ANY. Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/9f3bdc116d79d23f5ee72ceb9a2a857f5ff8fa29.1450474525.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-20futex: Cleanup the goto confusion in requeue_pi()Thomas Gleixner1-2/+7
out_unlock: does not only drop the locks, it also drops the refcount on the pi_state. Really intuitive. Move the label after the put_pi_state() call and use 'break' in the error handling path of the requeue loop. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.526665141@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-20futex: Remove pointless put_pi_state calls in requeue()Thomas Gleixner1-4/+2
In the error handling cases we neither have pi_state nor a reference to it. Remove the pointless code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.432780944@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-20futex: Document pi_state refcounting in requeue codeThomas Gleixner1-12/+39
Documentation of the pi_state refcounting in the requeue code is non existent. Add it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.335938312@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-20futex: Rename free_pi_state() to put_pi_state()Thomas Gleixner1-7/+10
free_pi_state() is confusing as it is in fact only freeing/caching the pi state when the last reference is gone. Rename it to put_pi_state() which reflects better what it is doing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.259636467@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-20futex: Drop refcount if requeue_pi() acquired the rtmutexThomas Gleixner1-0/+5
If the proxy lock in the requeue loop acquires the rtmutex for a waiter then it acquired also refcount on the pi_state related to the futex, but the waiter side does not drop the reference count. Add the missing free_pi_state() call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.178132067@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2015-12-20genirq/msi: Export functions to allow MSI domains in modulesJake Oshins2-0/+5
The Linux kernel already has the concept of IRQ domain, wherein a component can expose a set of IRQs which are managed by a particular interrupt controller chip or other subsystem. The PCI driver exposes the notion of an IRQ domain for Message-Signaled Interrupts (MSI) from PCI Express devices. This patch exposes the functions which are necessary for creating a MSI IRQ domain within a module. [ tglx: Split it into x86 and core irq parts ] Signed-off-by: Jake Oshins <jakeo@microsoft.com> Cc: gregkh@linuxfoundation.org Cc: kys@microsoft.com Cc: devel@linuxdriverproject.org Cc: olaf@aepfle.de Cc: apw@canonical.com Cc: vkuznets@redhat.com Cc: haiyangz@microsoft.com Cc: marc.zyngier@arm.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1449769983-12948-4-git-send-email-jakeo@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19clocksource: Make clocksource validation work for all clocksourcesYang Yingliang1-1/+5
The clocksource validation which makes sure that the newly read value is not smaller than the last value only works if the clocksource mask is 64bit, i.e. the counter is 64bit wide. But we want to use that mechanism also for clocksources which are less than 64bit wide. So instead of checking whether bit 63 is set, we check whether the most significant bit of the clocksource mask is set in the delta result. If it is set, we return 0. [ tglx: Simplified the implementation, added a comment and massaged the commit message ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Cc: <linux-arm-kernel@lists.infradead.org> Link: http://lkml.kernel.org/r/56349607.6070708@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>