aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-02-18printk: use rcuidle console tracepointSergey Senozhatsky1-1/+1
Use rcuidle console tracepoint because, apparently, it may be issued from an idle CPU: hw-breakpoint: Failed to enable monitor mode on CPU 0. hw-breakpoint: CPU 0 failed to disable vector catch =============================== [ ERR: suspicious RCU usage. ] 4.10.0-rc8-next-20170215+ #119 Not tainted ------------------------------- ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 2, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by swapper/0/0: #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 Hardware name: Generic OMAP4 (Flattened Device Tree) console_unlock vprintk_emit vprintk_default printk reset_ctrl_regs dbg_cpu_pm_notify notifier_call_chain cpu_pm_exit omap_enter_idle_coupled cpuidle_enter_state cpuidle_enter_state_coupled do_idle cpu_startup_entry start_kernel This RCU warning, however, is suppressed by lockdep_off() in printk(). lockdep_off() increments the ->lockdep_recursion counter and thus disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want lockdep to be enabled "current->lockdep_recursion == 0". Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Tony Lindgren <tony@atomide.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Russell King <rmk@armlinux.org.uk> Cc: <stable@vger.kernel.org> [3.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-18hrtimer: Catch invalid clockids againMarc Zyngier1-5/+15
commit 82e88ff1ea94 ("hrtimer: Revert CLOCK_MONOTONIC_RAW support") removed unfortunately a sanity check in the hrtimer code which was part of that MONOTONIC_RAW patch series. It would have caught the bogus usage of CLOCK_MONOTONIC_RAW in the wireless code. So bring it back. It is way too easy to take any random clockid and feed it to the hrtimer subsystem. At best, it gets mapped to a monotonic base, but it would be better to just catch illegal values as early as possible. Detect invalid clockids, map them to CLOCK_MONOTONIC and emit a warning. [ tglx: Replaced the BUG by a WARN and gracefully map to CLOCK_MONOTONIC ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Tomasz Nowicki <tn@semihalf.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Link: http://lkml.kernel.org/r/1452879670-16133-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-18PM / sleep: Fix test_suspend after sleep state reworkGeert Uytterhoeven1-1/+1
When passing "test_suspend=mem" to the kernel: PM: can't test 'mem' suspend state and the suspend test is not run. Commit 406e79385f3223d8 ("PM / sleep: System sleep state selection interface rework") changed pm_labels[] from a contiguous NULL-terminated array to a sparse array (with the first element unpopulated), breaking the assumptions of the iterator in setup_test_suspend(). Iterate from PM_SUSPEND_MIN to PM_SUSPEND_MAX - 1 to fix this. Fixes: 406e79385f3223d8 (PM / sleep: System sleep state selection interface rework) Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-17Merge branch 'for-4.11/next' into for-4.11/linus-mergeJens Axboe1-57/+21
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17bpf: make jited programs visible in tracesDaniel Borkmann4-13/+282
Long standing issue with JITed programs is that stack traces from function tracing check whether a given address is kernel code through {__,}kernel_text_address(), which checks for code in core kernel, modules and dynamically allocated ftrace trampolines. But what is still missing is BPF JITed programs (interpreted programs are not an issue as __bpf_prog_run() will be attributed to them), thus when a stack trace is triggered, the code walking the stack won't see any of the JITed ones. The same for address correlation done from user space via reading /proc/kallsyms. This is read by tools like perf, but the latter is also useful for permanent live tracing with eBPF itself in combination with stack maps when other eBPF types are part of the callchain. See offwaketime example on dumping stack from a map. This work tries to tackle that issue by making the addresses and symbols known to the kernel. The lookup from *kernel_text_address() is implemented through a latched RB tree that can be read under RCU in fast-path that is also shared for symbol/size/offset lookup for a specific given address in kallsyms. The slow-path iteration through all symbols in the seq file done via RCU list, which holds a tiny fraction of all exported ksyms, usually below 0.1 percent. Function symbols are exported as bpf_prog_<tag>, in order to aide debugging and attribution. This facility is currently enabled for root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening is active in any mode. The rationale behind this is that still a lot of systems ship with world read permissions on kallsyms thus addresses should not get suddenly exposed for them. If that situation gets much better in future, we always have the option to change the default on this. Likewise, unprivileged programs are not allowed to add entries there either, but that is less of a concern as most such programs types relevant in this context are for root-only anyway. If enabled, call graphs and stack traces will then show a correct attribution; one example is illustrated below, where the trace is now visible in tooling such as perf script --kallsyms=/proc/kallsyms and friends. Before: 7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so) After: 7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux) [...] 7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17bpf: remove stubs for cBPF from arch codeDaniel Borkmann1-1/+11
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make that a single __weak function in the core that can be overridden similarly to the eBPF one. Also remove stale pr_err() mentions of bpf_jit_compile. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17bpf: mark all registered map/prog types as __ro_after_initDaniel Borkmann5-14/+14
All map types and prog types are registered to the BPF core through bpf_register_map_type() and bpf_register_prog_type() during init and remain unchanged thereafter. As by design we don't (and never will) have any pluggable code that can register to that at any later point in time, lets mark all the existing bpf_{map,prog}_type_list objects in the tree as __ro_after_init, so they can be moved to read-only section from then onwards. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-40/+90
2017-02-16Revert "nohz: Fix collision between tick and other hrtimers"Linus Torvalds2-14/+2
This reverts commit 24b91e360ef521a2808771633d76ebc68bd5604b and commit 7bdb59f1ad47 ("tick/nohz: Fix possible missing clock reprog after tick soft restart") that depends on it, Pavel reports that it causes occasional boot hangs for him that seem to depend on just how the machine was booted. In particular, his machine hangs at around the PCI fixups of the EHCI USB host controller, but only hangs from cold boot, not from a warm boot. Thomas Gleixner suspecs it's a CPU hotplug interaction, particularly since Pavel also saw suspend/resume issues that seem to be related. We're reverting for now while trying to figure out the root cause. Reported-bisected-and-tested-by: Pavel Machek <pavel@ucw.cz> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org # reverted commits were marked for stable Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-22/+66
Pull networking fixes from David Miller: 1) In order to avoid problems in the future, make cgroup bpf overriding explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov. 2) LLC sets skb->sk without proper skb->destructor and this explodes, fix from Eric Dumazet. 3) Make sure when we have an ipv4 mapped source address, the destination is either also an ipv4 mapped address or ipv6_addr_any(). Fix from Jonathan T. Leighton. 4) Avoid packet loss in fec driver by programming the multicast filter more intelligently. From Rui Sousa. 5) Handle multiple threads invoking fanout_add(), fix from Eric Dumazet. 6) Since we can invoke the TCP input path in process context, without BH being disabled, we have to accomodate that in the locking of the TCP probe. Also from Eric Dumazet. 7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we aren't even updating that sysctl value. From Marcus Huewe. 8) Fix endian bugs in ibmvnic driver, from Thomas Falcon. [ This is the second version of the pull that reverts the nested rhashtable changes that looked a bit too scary for this late in the release - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits) rhashtable: Revert nested table changes. ibmvnic: Fix endian errors in error reporting output ibmvnic: Fix endian error when requesting device capabilities net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification net: xilinx_emaclite: fix freezes due to unordered I/O net: xilinx_emaclite: fix receive buffer overflow bpf: kernel header files need to be copied into the tools directory tcp: tcp_probe: use spin_lock_bh() uapi: fix linux/if_pppol2tp.h userspace compilation errors packet: fix races in fanout_add() ibmvnic: Fix initial MTU settings net: ethernet: ti: cpsw: fix cpsw assignment in resume kcm: fix a null pointer dereference in kcm_sendmsg() net: fec: fix multicast filtering hardware setup ipv6: Handle IPv4-mapped src to in6addr_any dst. ipv6: Inhibit IPv4-mapped src address on the wire. net/mlx5e: Disable preemption when doing TC statistics upcall rhashtable: Add nested tables tipc: Fix tipc_sk_reinit race conditions gfs2: Use rhashtable walk interface in glock_hash_walk ...
2017-02-16genirq: Clarify logic calculating bogus irqreturn_t valuesJeremy Kerr1-1/+3
Although irqreturn_t is an enum, we treat it (and its enumeration constants) as a bitmask. However, bad_action_ret() uses a less-than operator to determine whether an irqreturn_t falls within allowable bit values, which means we need to know the signededness of an enum type to read the logic, which is implementation-dependent. This change explicitly uses an unsigned type for the comparison. We do this instead of changing to a bitwise test, as the latter compiles to increased instructions in this hot path. It looks like we get the correct behaviour currently (bad_action_ret(-1) returns 1), so this is purely a readability fix. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Link: http://lkml.kernel.org/r/1487219049-4061-1-git-send-email-jk@ozlabs.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-15timekeeping: Use deferred printk() in debug codeSergey Senozhatsky1-2/+2
We cannot do printk() from tk_debug_account_sleep_time(), because tk_debug_account_sleep_time() is called under tk_core seq lock. The reason why printk() is unsafe there is that console_sem may invoke scheduler (up()->wake_up_process()->activate_task()), which, in turn, can return back to timekeeping code, for instance, via get_time()->ktime_get(), deadlocking the system on tk_core seq lock. [ 48.950592] ====================================================== [ 48.950622] [ INFO: possible circular locking dependency detected ] [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted [ 48.950622] ------------------------------------------------------- [ 48.950622] kworker/0:0/3 is trying to acquire lock: [ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90 [ 48.950683] but task is already holding lock: [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.950714] which lock already depends on the new lock. [ 48.950714] the existing dependency chain (in reverse order) is: [ 48.950714] -> #5 (hrtimer_bases.lock){-.-...}: [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64 [ 48.950775] lock_hrtimer_base+0x28/0x58 [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8 [ 48.950775] __enqueue_rt_entity+0x320/0x360 [ 48.950805] enqueue_rt_entity+0x2c/0x44 [ 48.950805] enqueue_task_rt+0x24/0x94 [ 48.950836] ttwu_do_activate+0x54/0xc0 [ 48.950836] try_to_wake_up+0x248/0x5c8 [ 48.950836] __setup_irq+0x420/0x5f0 [ 48.950836] request_threaded_irq+0xdc/0x184 [ 48.950866] devm_request_threaded_irq+0x58/0xa4 [ 48.950866] omap_i2c_probe+0x530/0x6a0 [ 48.950897] platform_drv_probe+0x50/0xb0 [ 48.950897] driver_probe_device+0x1f8/0x2cc [ 48.950897] __driver_attach+0xc0/0xc4 [ 48.950927] bus_for_each_dev+0x6c/0xa0 [ 48.950927] bus_add_driver+0x100/0x210 [ 48.950927] driver_register+0x78/0xf4 [ 48.950958] do_one_initcall+0x3c/0x16c [ 48.950958] kernel_init_freeable+0x20c/0x2d8 [ 48.950958] kernel_init+0x8/0x110 [ 48.950988] ret_from_fork+0x14/0x24 [ 48.950988] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [ 48.951019] _raw_spin_lock+0x40/0x50 [ 48.951019] rq_offline_rt+0x9c/0x2bc [ 48.951019] set_rq_offline.part.2+0x2c/0x58 [ 48.951049] rq_attach_root+0x134/0x144 [ 48.951049] cpu_attach_domain+0x18c/0x6f4 [ 48.951049] build_sched_domains+0xba4/0xd80 [ 48.951080] sched_init_smp+0x68/0x10c [ 48.951080] kernel_init_freeable+0x160/0x2d8 [ 48.951080] kernel_init+0x8/0x110 [ 48.951080] ret_from_fork+0x14/0x24 [ 48.951110] -> #3 (&rq->lock){-.-.-.}: [ 48.951110] _raw_spin_lock+0x40/0x50 [ 48.951141] task_fork_fair+0x30/0x124 [ 48.951141] sched_fork+0x194/0x2e0 [ 48.951141] copy_process.part.5+0x448/0x1a20 [ 48.951171] _do_fork+0x98/0x7e8 [ 48.951171] kernel_thread+0x2c/0x34 [ 48.951171] rest_init+0x1c/0x18c [ 48.951202] start_kernel+0x35c/0x3d4 [ 48.951202] 0x8000807c [ 48.951202] -> #2 (&p->pi_lock){-.-.-.}: [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951232] try_to_wake_up+0x30/0x5c8 [ 48.951232] up+0x4c/0x60 [ 48.951263] __up_console_sem+0x2c/0x58 [ 48.951263] console_unlock+0x3b4/0x650 [ 48.951263] vprintk_emit+0x270/0x474 [ 48.951293] vprintk_default+0x20/0x28 [ 48.951293] printk+0x20/0x30 [ 48.951324] kauditd_hold_skb+0x94/0xb8 [ 48.951324] kauditd_thread+0x1a4/0x56c [ 48.951324] kthread+0x104/0x148 [ 48.951354] ret_from_fork+0x14/0x24 [ 48.951354] -> #1 ((console_sem).lock){-.....}: [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951385] down_trylock+0xc/0x2c [ 48.951385] __down_trylock_console_sem+0x24/0x80 [ 48.951385] console_trylock+0x10/0x8c [ 48.951416] vprintk_emit+0x264/0x474 [ 48.951416] vprintk_default+0x20/0x28 [ 48.951416] printk+0x20/0x30 [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70 [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0 [ 48.951446] timekeeping_resume+0x218/0x23c [ 48.951477] syscore_resume+0x94/0x42c [ 48.951477] suspend_enter+0x554/0x9b4 [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4 [ 48.951507] enter_state+0x934/0xbd4 [ 48.951507] pm_suspend+0x14/0x70 [ 48.951507] state_store+0x68/0xc8 [ 48.951538] kernfs_fop_write+0xf4/0x1f8 [ 48.951538] __vfs_write+0x1c/0x114 [ 48.951538] vfs_write+0xa0/0x168 [ 48.951568] SyS_write+0x3c/0x90 [ 48.951568] __sys_trace_return+0x0/0x10 [ 48.951568] -> #0 (tk_core){----..}: [ 48.951599] lock_acquire+0xe0/0x294 [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4 [ 48.951629] retrigger_next_event+0x4c/0x90 [ 48.951629] on_each_cpu+0x40/0x7c [ 48.951629] clock_was_set_work+0x14/0x20 [ 48.951660] process_one_work+0x2b4/0x808 [ 48.951660] worker_thread+0x3c/0x550 [ 48.951660] kthread+0x104/0x148 [ 48.951690] ret_from_fork+0x14/0x24 [ 48.951690] other info that might help us debug this: [ 48.951690] Chain exists of: tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock [ 48.951721] Possible unsafe locking scenario: [ 48.951721] CPU0 CPU1 [ 48.951721] ---- ---- [ 48.951721] lock(hrtimer_bases.lock); [ 48.951751] lock(&rt_b->rt_runtime_lock); [ 48.951751] lock(hrtimer_bases.lock); [ 48.951751] lock(tk_core); [ 48.951782] *** DEADLOCK *** [ 48.951782] 3 locks held by kworker/0:0/3: [ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.951843] stack backtrace: [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ [ 48.951904] Workqueue: events clock_was_set_work [ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) [ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0) [ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308) [ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324) [ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8) [ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294) [ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4) [ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90) [ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c) [ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20) [ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808) [ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550) [ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148) [ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24) Replace printk() with printk_deferred(), which does not call into the scheduler. Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend") Reported-and-tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Stultz <john.stultz@linaro.org> Cc: "[4.9+]" <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-14bpf: reduce compiler warnings by adding fallthrough commentsAlexander Alemayhu1-0/+5
Fixes the following warnings: kernel/bpf/verifier.c: In function ‘may_access_direct_pkt_data’: kernel/bpf/verifier.c:702:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (t == BPF_WRITE) ^ kernel/bpf/verifier.c:704:2: note: here case BPF_PROG_TYPE_SCHED_CLS: ^~~~ kernel/bpf/verifier.c: In function ‘reg_set_min_max_inv’: kernel/bpf/verifier.c:2057:23: warning: this statement may fall through [-Wimplicit-fallthrough=] true_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2058:2: note: here case BPF_JSGT: ^~~~ kernel/bpf/verifier.c:2068:23: warning: this statement may fall through [-Wimplicit-fallthrough=] true_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2069:2: note: here case BPF_JSGE: ^~~~ kernel/bpf/verifier.c: In function ‘reg_set_min_max’: kernel/bpf/verifier.c:2009:24: warning: this statement may fall through [-Wimplicit-fallthrough=] false_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2010:2: note: here case BPF_JSGT: ^~~~ kernel/bpf/verifier.c:2019:24: warning: this statement may fall through [-Wimplicit-fallthrough=] false_reg->min_value = 0; ~~~~~~~~~~~~~~~~~~~~~^~~ kernel/bpf/verifier.c:2020:2: note: here case BPF_JSGE: ^~~~ Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14audit: remove unnecessary curly braces from switch/case statementsPaul Moore1-12/+12
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-02-14Merge tag 'v4.10-rc8' into perf/core, to pick up fixesIngo Molnar7-63/+81
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-13audit: log module name on init_moduleRichard Guy Briggs3-1/+21
This adds a new auxiliary record MODULE_INIT to the SYSCALL event. We get finit_module for free since it made most sense to hook this in to load_module(). https://github.com/linux-audit/audit-kernel/issues/7 https://github.com/linux-audit/audit-kernel/wiki/RFE-Module-Load-Record-Format Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Jessica Yu <jeyu@redhat.com> [PM: corrected links in the commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-02-13futex: Move futex_init() to core_initcallYang Yang1-1/+1
The UEVENT user mode helper is enabled before the initcalls are executed and is available when the root filesystem has been mounted. The user mode helper is triggered by device init calls and the executable might use the futex syscall. futex_init() is marked __initcall which maps to device_initcall, but there is no guarantee that futex_init() is invoked _before_ the first device init call which triggers the UEVENT user mode helper. If the user mode helper uses the futex syscall before futex_init() then the syscall crashes with a NULL pointer dereference because the futex subsystem has not been initialized yet. Move futex_init() to core_initcall so futexes are initialized before the root filesystem is mounted and the usermode helper becomes available. [ tglx: Rewrote changelog ] Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: jiang.biao2@zte.com.cn Cc: jiang.zhengxiong@zte.com.cn Cc: zhong.weidong@zte.com.cn Cc: deng.huali@zte.com.cn Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-13tick/broadcast: Prevent deadlock on tick_broadcast_lockMike Galbraith1-8/+7
tick_broadcast_lock is taken from interrupt context, but the following call chain takes the lock without disabling interrupts: [ 12.703736] _raw_spin_lock+0x3b/0x50 [ 12.703738] tick_broadcast_control+0x5a/0x1a0 [ 12.703742] intel_idle_cpu_online+0x22/0x100 [ 12.703744] cpuhp_invoke_callback+0x245/0x9d0 [ 12.703752] cpuhp_thread_fun+0x52/0x110 [ 12.703754] smpboot_thread_fn+0x276/0x320 So the following deadlock can happen: lock(tick_broadcast_lock); <Interrupt> lock(tick_broadcast_lock); intel_idle_cpu_online() is the only place which violates the calling convention of tick_broadcast_control(). This was caused by the removal of the smp function call in course of the cpu hotplug rework. Instead of slapping local_irq_disable/enable() at the call site, we can relax the calling convention and handle it in the core code, which makes the whole machinery more robust. Fixes: 29d7bbada98e ("intel_idle: Remove superfluous SMP fuction call") Reported-by: Gabriel C <nix.or.die@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ruslan Ruslichenko <rruslich@cisco.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: lwn@lwn.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: stable <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1486953115.5912.4.camel@gmx.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-12bpf: introduce BPF_F_ALLOW_OVERRIDE flagAlexei Starovoitov3-22/+66
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command to the given cgroup the descendent cgroup will be able to override effective bpf program that was inherited from this cgroup. By default it's not passed, therefore override is disallowed. Examples: 1. prog X attached to /A with default prog Y fails to attach to /A/B and /A/B/C Everything under /A runs prog X 2. prog X attached to /A with allow_override. prog Y fails to attach to /A/B with default (non-override) prog M attached to /A/B with allow_override. Everything under /A/B runs prog M only. 3. prog X attached to /A with allow_override. prog Y fails to attach to /A with default. The user has to detach first to switch the mode. In the future this behavior may be extended with a chain of non-overridable programs. Also fix the bug where detach from cgroup where nothing is attached was not throwing error. Return ENOENT in such case. Add several testcases and adjust libbpf. Fixes: 3007098494be ("cgroup: add support for eBPF programs") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Daniel Mack <daniel@zonque.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12genirq/devres: Use dev_name(dev) as default for devnameHeiner Kallweit1-2/+8
Allow the devname parameter to be NULL and use dev_name(dev) in this case. This should be an appropriate default for most use cases. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: http://lkml.kernel.org/r/05c63d67-30b4-7026-02d5-ce7fb7bc185f@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-11Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+5
Pull timer fix from Ingo Molnar: "Fix a sporadic missed timer hw reprogramming bug that can result in random delays" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Fix possible missing clock reprog after tick soft restart
2017-02-11Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-10/+15
Pull perf fixes from Ingo Molnar: "A kernel crash fix plus three tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix crash in perf_event_read() perf callchain: Reference count maps perf diff: Fix -o/--order option behavior (again) perf diff: Fix segfault on 'perf diff -o N' option
2017-02-11Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-8/+4
Pull lockdep fix from Ingo Molnar: "This fixes an ugly lockdep stack trace output regression. (But also affects other stacktrace users such as kmemleak, KASAN, etc)" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stacktrace, lockdep: Fix address, newline ugliness
2017-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+1
2017-02-10module: Optimize search_module_extables()Peter Zijlstra1-13/+14
While looking through the __ex_table stuff I found that we do a linear lookup of the module. Also fix up a comment. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jessica Yu <jeyu@redhat.com>
2017-02-10genirq: Fix /proc/interrupts output alignmentH Hartley Sweeten1-0/+2
If the irq_desc being output does not have a domain associated the information following the 'name' is not aligned correctly. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Link: http://lkml.kernel.org/r/20170210165416.5629-1-hsweeten@visionengravers.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10Merge branches 'iommu/fixes', 'arm/exynos', 'arm/renesas', 'arm/smmu', 'arm/mediatek', 'arm/core', 'x86/vt-d' and 'core' into nextJoerg Roedel35-238/+538
2017-02-10irqdesc: Add a resource managed version of irq_alloc_descs()Bartosz Golaszewski1-0/+55
Add a devres flavor of __devm_irq_alloc_descs() and corresponding helper macros. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-doc@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Link: http://lkml.kernel.org/r/1486729403-21132-1-git-send-email-bgolaszewski@baylibre.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10timer_list: Remove useless cast when printingMars Cheng1-1/+1
hrtimer_resolution is already unsigned int, not necessary to cast it when printing. Signed-off-by: Mars Cheng <mars.cheng@mediatek.com> Cc: CC Hwang <cc.hwang@mediatek.com> Cc: wsd_upstream@mediatek.com Cc: Loda Chou <loda.chou@mediatek.com> Cc: Jades Shih <jades.shih@mediatek.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: John Stultz <john.stultz@linaro.org> Cc: My Chuang <my.chuang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Link: http://lkml.kernel.org/r/1486626615-5879-1-git-send-email-mars.cheng@mediatek.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10time: Remove CONFIG_TIMER_STATSKees Cook7-523/+2
Currently CONFIG_TIMER_STATS exposes process information across namespaces: kernel/time/timer_list.c print_timer(): SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); /proc/timer_list: #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570 Given that the tracer can give the same information, this patch entirely removes CONFIG_TIMER_STATS. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: linux-doc@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Xing Gao <xgao01@email.wm.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jessica Frazelle <me@jessfraz.com> Cc: kernel-hardening@lists.openwall.com Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Olof Johansson <olof@lixom.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-api@vger.kernel.org Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10tick/nohz: Fix possible missing clock reprog after tick soft restartFrederic Weisbecker1-0/+5
ts->next_tick keeps track of the next tick deadline in order to optimize clock programmation on irq exit and avoid redundant clock device writes. Now if ts->next_tick missed an update, we may spuriously miss a clock reprog later as the nohz code is fooled by an obsolete next_tick value. This is what happens here on a specific path: when we observe an expired timer from the nohz update code on irq exit, we perform a soft tick restart which simply fires the closest possible tick without actually exiting the nohz mode and restoring a periodic state. But we forget to update ts->next_tick accordingly. As a result, after the next tick resulting from such soft tick restart, the nohz code sees a stale value on ts->next_tick which doesn't match the clock deadline that just expired. If that obsolete ts->next_tick value happens to collide with the actual next tick deadline to be scheduled, we may spuriously bypass the clock reprogramming. In the worst case, the tick may never fire again. Fix this with a ts->next_tick reset on soft tick restart. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1486485894-29173-1-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10locking/spinlock/debug: Remove spinlock lockup detection codeWaiman Long1-81/+5
The current spinlock lockup detection code can sometimes produce false positives because of the unfairness of the locking algorithm itself. So the lockup detection code is now removed. Instead, we are relying on the NMI watchdog to detect potential lockup. We won't have lockup detection if the watchdog isn't running. The commented-out read-write lock lockup detection code are also removed. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486583208-11038-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKSByungchul Park1-1/+1
Bug messages and stack dump for MAX_LOCKDEP_CHAIN_HLOCKS should only be printed once. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1484275324-28192-1-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10perf/core: Allow kernel filters on CPU eventsAlexander Shishkin1-14/+28
While supporting file-based address filters for CPU events requires some extra context switch handling, kernel address filters are easy, since the kernel mapping is preserved across address spaces. It is also useful as it permits tracing scheduling paths of the kernel. This patch allows setting up kernel filters for CPU events. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170126094057.13805-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10perf/core: Do error out on a kernel filter on an exclude_filter eventAlexander Shishkin1-0/+1
It is currently possible to configure a kernel address filter for a event that excludes kernel from its traces (attr.exclude_kernel==1). While in reality this doesn't make sense, the SET_FILTER ioctl() should return a error in such case, currently it does not. Furthermore, it will still silently discard the filter and any potentially valid filters that came with it. This patch makes the SET_FILTER ioctl() error out in such cases. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170126094057.13805-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10sched/core: Remove unlikely() annotation from sched_move_task()Steven Rostedt (VMware)1-2/+2
The check for 'running' in sched_move_task() has an unlikely() around it. That is, it is unlikely that the task being moved is running. That use to be true. But with a couple of recent updates, it is now likely that the task will be running. The first change came from ea86cb4b7621 ("sched/cgroup: Fix cpu_cgroup_fork() handling") that moved around the use case of sched_move_task() in do_fork() where the call is now done after the task is woken (hence it is running). The second change came from 8e5bfa8c1f84 ("sched/autogroup: Do not use autogroup->tg in zombie threads") where sched_move_task() is called by the exit path, by the task that is exiting. Hence it too is running. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20170206110426.27ca6426@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10perf/core: Fix crash in perf_event_read()Peter Zijlstra1-10/+15
Alexei had his box explode because doing read() on a package (rapl/uncore) event that isn't currently scheduled in ends up doing an out-of-bounds load. Rework the code to more explicitly deal with event->oncpu being -1. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Tested-by: David Carrillo-Cisneros <davidcc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Fixes: d6a2f9035bfc ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG") Link: http://lkml.kernel.org/r/20170131102710.GL6515@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10kprobes: Introduce weak variant of kprobe_exceptions_notify()Naveen N. Rao1-0/+6
kprobe_exceptions_notify() is not used on some of the architectures such as arm[64] and powerpc anymore. Introduce a weak variant for such architectures. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/selinux into nextJames Morris1-17/+2
2017-02-09core: migrate exception table users off module.h and onto extable.hPaul Gortmaker2-0/+2
These files were including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and where possible, avoid all the extra header content in module.h that we don't really need to compile these non-modular files. Note: init/main.c still needs module.h for __init_or_module kernel/extable.c still needs module.h for is_module_text_address ...and so we don't get the benefit of removing module.h from the cpp feed for these two files, unlike the almost universal 1:1 exchange of module.h for extable.h we were able to do in the arch dirs. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Jessica Yu <jeyu@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2017-02-08kernel/ucount.c: mark user_header with kmemleak_ignore()Luis R. Rodriguez1-2/+1
The user_header gets caught by kmemleak with the following splat as missing a free: unreferenced object 0xffff99667a733d80 (size 96): comm "swapper/0", pid 1, jiffies 4294892317 (age 62191.468s) hex dump (first 32 bytes): a0 b6 92 b4 ff ff ff ff 00 00 00 00 01 00 00 00 ................ 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: kmemleak_alloc+0x4a/0xa0 __kmalloc+0x144/0x260 __register_sysctl_table+0x54/0x5e0 register_sysctl+0x1b/0x20 user_namespace_sysctl_init+0x17/0x34 do_one_initcall+0x52/0x1a0 kernel_init_freeable+0x173/0x200 kernel_init+0xe/0x100 ret_from_fork+0x2c/0x40 The BUG_ON()s are intended to crash so no need to clean up after ourselves on error there. This is also a kernel/ subsys_init() we don't need a respective exit call here as this is never modular, so just white list it. Link: http://lkml.kernel.org/r/20170203211404.31458-1-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-08bpf, lpm: fix overflows in trie_alloc checksDaniel Borkmann1-9/+27
Cap the maximum (total) value size and bail out if larger than KMALLOC_MAX_SIZE as otherwise it doesn't make any sense to proceed further, since we're guaranteed to fail to allocate elements anyway in lpm_trie_node_alloc(); likleyhood of failure is still high for large values, though, similarly as with htab case in non-prealloc. Next, make sure that cost vars are really u64 instead of size_t, so that we don't overflow on 32 bit and charge only tiny map.pages against memlock while allowing huge max_entries; cap also the max cost like we do with other map types. Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08printk: drop call_console_drivers() unused paramSergey Senozhatsky1-8/+4
We do suppress_message_printing() check before we call call_console_drivers() now, so `level' param is not needed anymore. Link: http://lkml.kernel.org/r/20161224140902.1962-2-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-02-08printk: convert the rest to printk-safeSergey Senozhatsky1-38/+65
This patch converts the rest of logbuf users (which are out of printk recursion case, but can deadlock in printk). To make printk-safe usage easier the patch introduces 4 helper macros: - logbuf_lock_irq()/logbuf_unlock_irq() lock/unlock the logbuf lock and disable/enable local IRQ - logbuf_lock_irqsave(flags)/logbuf_unlock_irqrestore(flags) lock/unlock the logbuf lock and saves/restores local IRQ state Link: http://lkml.kernel.org/r/20161227141611.940-9-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-02-08printk: remove zap_locks() functionSergey Senozhatsky1-61/+0
We use printk-safe now which makes printk-recursion detection code in vprintk_emit() unreachable. The tricky thing here is that, apart from detecting and reporting printk recursions, that code also used to zap_locks() in case of panic() from the same CPU. However, zap_locks() does not look to be needed anymore: 1) Since commit 08d78658f393 ("panic: release stale console lock to always get the logbuf printed out") panic flushing of `logbuf' to console ignores the state of `console_sem' by doing panic() console_trylock(); console_unlock(); 2) Since commit cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic") panic attempts to zap the `logbuf_lock' spin_lock to successfully flush nmi messages to `logbuf'. Basically, it seems that we either already do what zap_locks() used to do but in other places or we ignore the state of the lock. The only reaming difference is that we don't re-init the console semaphore in printk_safe_flush_on_panic(), but this is not necessary because we don't call console drivers from printk_safe_flush_on_panic() due to the fact that we are using a deferred printk() version (as was suggested by Petr Mladek). Link: http://lkml.kernel.org/r/20161227141611.940-8-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-02-08printk: use printk_safe buffers in printkSergey Senozhatsky1-15/+32
Use printk_safe per-CPU buffers in printk recursion-prone blocks: -- around logbuf_lock protected sections in vprintk_emit() and console_unlock() -- around down_trylock_console_sem() and up_console_sem() Note that this solution addresses deadlocks caused by printk() recursive calls only. That is vprintk_emit() and console_unlock(). The rest will be converted in a followup patch. Another thing to note is that we now keep lockdep enabled in printk, because we are protected against the printk recursion caused by lockdep in vprintk_emit() by the printk-safe mechanism - we first switch to per-CPU buffers and only then access the deadlock-prone locks. Examples: 1) printk() from logbuf_lock spin_lock section Assume the following code: printk() raw_spin_lock(&logbuf_lock); WARN_ON(1); raw_spin_unlock(&logbuf_lock); which now produces: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 366 at kernel/printk/printk.c:1811 vprintk_emit CPU: 0 PID: 366 Comm: bash Call Trace: warn_slowpath_null+0x1d/0x1f vprintk_emit+0x1cd/0x438 vprintk_default+0x1d/0x1f printk+0x48/0x50 [..] 2) printk() from semaphore sem->lock spin_lock section Assume the following code printk() console_trylock() down_trylock() raw_spin_lock_irqsave(&sem->lock, flags); WARN_ON(1); raw_spin_unlock_irqrestore(&sem->lock, flags); which now produces: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 363 at kernel/locking/semaphore.c:141 down_trylock CPU: 1 PID: 363 Comm: bash Call Trace: warn_slowpath_null+0x1d/0x1f down_trylock+0x3d/0x62 ? vprintk_emit+0x3f9/0x414 console_trylock+0x31/0xeb vprintk_emit+0x3f9/0x414 vprintk_default+0x1d/0x1f printk+0x48/0x50 [..] 3) printk() from console_unlock() Assume the following code: printk() console_unlock() raw_spin_lock(&logbuf_lock); WARN_ON(1); raw_spin_unlock(&logbuf_lock); which now produces: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 329 at kernel/printk/printk.c:2384 console_unlock CPU: 1 PID: 329 Comm: bash Call Trace: warn_slowpath_null+0x18/0x1a console_unlock+0x12d/0x559 ? trace_hardirqs_on_caller+0x16d/0x189 ? trace_hardirqs_on+0xd/0xf vprintk_emit+0x363/0x374 vprintk_default+0x18/0x1a printk+0x43/0x4b [..] 4) printk() from try_to_wake_up() Assume the following code: printk() console_unlock() up() try_to_wake_up() raw_spin_lock_irqsave(&p->pi_lock, flags); WARN_ON(1); raw_spin_unlock_irqrestore(&p->pi_lock, flags); which now produces: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 363 at kernel/sched/core.c:2028 try_to_wake_up CPU: 3 PID: 363 Comm: bash Call Trace: warn_slowpath_null+0x1d/0x1f try_to_wake_up+0x7f/0x4f7 wake_up_process+0x15/0x17 __up.isra.0+0x56/0x63 up+0x32/0x42 __up_console_sem+0x37/0x55 console_unlock+0x21e/0x4c2 vprintk_emit+0x41c/0x462 vprintk_default+0x1d/0x1f printk+0x48/0x50 [..] 5) printk() from call_console_drivers() Assume the following code: printk() console_unlock() call_console_drivers() ... WARN_ON(1); which now produces: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 305 at kernel/printk/printk.c:1604 call_console_drivers CPU: 2 PID: 305 Comm: bash Call Trace: warn_slowpath_null+0x18/0x1a call_console_drivers.isra.6.constprop.16+0x3a/0xb0 console_unlock+0x471/0x48e vprintk_emit+0x1f4/0x206 vprintk_default+0x18/0x1a vprintk_func+0x6e/0x70 printk+0x3e/0x46 [..] 6) unsupported placeholder in printk() format now prints an actual warning from vscnprintf(), instead of 'BUG: recent printk recursion!'. ------------[ cut here ]------------ WARNING: CPU: 5 PID: 337 at lib/vsprintf.c:1900 format_decode Please remove unsupported % in format string CPU: 5 PID: 337 Comm: bash Call Trace: dump_stack+0x4f/0x65 __warn+0xc2/0xdd warn_slowpath_fmt+0x4b/0x53 format_decode+0x22c/0x308 vsnprintf+0x89/0x3b7 vscnprintf+0xd/0x26 vprintk_emit+0xb4/0x238 vprintk_default+0x1d/0x1f vprintk_func+0x6c/0x73 printk+0x43/0x4b [..] Link: http://lkml.kernel.org/r/20161227141611.940-7-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-08printk: report lost messages in printk safe/nmi contextsSergey Senozhatsky3-37/+28
Account lost messages in pritk-safe and printk-safe-nmi contexts and report those numbers during printk_safe_flush(). The patch also moves lost message counter to struct `printk_safe_seq_buf' instead of having dedicated static counters - this simplifies the code. Link: http://lkml.kernel.org/r/20161227141611.940-6-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-02-08printk: always use deferred printk when flush printk_safe linesSergey Senozhatsky1-8/+6
Always use printk_deferred() in printk_safe_flush_line(). Flushing can be done from NMI or printk_safe contexts (when we are in panic), so we can't call console drivers, yet still want to store the messages in the logbuf buffer. Therefore we use a deferred printk version. Link: http://lkml.kernel.org/r/20170206164253.GA463@tigerII.localdomain Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-08printk: introduce per-cpu safe_print seq bufferSergey Senozhatsky4-52/+157
This patch extends the idea of NMI per-cpu buffers to regions that may cause recursive printk() calls and possible deadlocks. Namely, printk() can't handle printk calls from schedule code or printk() calls from lock debugging code (spin_dump() for instance); because those may be called with `sem->lock' already taken or any other `critical' locks (p->pi_lock, etc.). An example of deadlock can be vprintk_emit() console_unlock() up() << raw_spin_lock_irqsave(&sem->lock, flags); wake_up_process() try_to_wake_up() ttwu_queue() ttwu_activate() activate_task() enqueue_task() enqueue_task_fair() cfs_rq_of() task_of() WARN_ON_ONCE(!entity_is_task(se)) vprintk_emit() console_trylock() down_trylock() raw_spin_lock_irqsave(&sem->lock, flags) ^^^^ deadlock and some other cases. Just like in NMI implementation, the solution uses a per-cpu `printk_func' pointer to 'redirect' printk() calls to a 'safe' callback, that store messages in a per-cpu buffer and flushes them back to logbuf buffer later. Usage example: printk() printk_safe_enter_irqsave(flags) // // any printk() call from here will endup in vprintk_safe(), // that stores messages in a special per-CPU buffer. // printk_safe_exit_irqrestore(flags) The 'redirection' mechanism, though, has been reworked, as suggested by Petr Mladek. Instead of using a per-cpu @print_func callback we now keep a per-cpu printk-context variable and call either default or nmi vprintk function depending on its value. printk_nmi_entrer/exit and printk_safe_enter/exit, thus, just set/celar corresponding bits in printk-context functions. The patch only adds printk_safe support, we don't use it yet. Link: http://lkml.kernel.org/r/20161227141611.940-4-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-08printk: rename nmi.c and exported apiSergey Senozhatsky4-36/+36
A preparation patch for printk_safe work. No functional change. - rename nmi.c to print_safe.c - add `printk_safe' prefix to some (which used both by printk-safe and printk-nmi) of the exported functions. Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>