aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-12-14kexec: export the value of phys_base instead of symbol addressBaoquan He1-3/+0
Currently in x86_64, the symbol address of phys_base is exported to vmcoreinfo. Dave Anderson complained this is really useless for his Crash implementation. Because in user-space utility Crash and Makedumpfile which exported vmcore information is mainly used for, value of phys_base is needed to covert virtual address of exported kernel symbol to physical address. Especially init_level4_pgt, if we want to access and go over the page table to look up a PA corresponding to VA, firstly we need calculate page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base; Now in Crash and Makedumpfile, we have to analyze the vmcore elf program header to get value of phys_base. As Dave said, it would be preferable if it were readily availabl in vmcoreinfo rather than depending upon the PT_LOAD semantics. Hence in this patch change to export the value of phys_base instead of its virtual address. And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64 only, should be moved into arch dependent function arch_crash_save_vmcoreinfo. Do the moving in this patch. Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14coredump: clarify "unsafe core_pattern" warningAlexey Dobriyan1-3/+5
I was amused to find "unsafe core_pattern" warning having these lines in /etc/sysctl.conf: fs.suid_dumpable=2 kernel.core_pattern=/core/core-%e-%p-%E kernel.core_uses_pid=0 Turns out kernel is formally right. Default core_pattern is just "core", which doesn't qualify for secure path while setting suid.dumpable. Hint admins about solution, clarify sysctl names, delete unnecessary '\' characters (string literals are concatenated regardless) and reformat for easier grepping. Link: http://lkml.kernel.org/r/20161029152124.GA1258@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14signals: avoid unnecessary taking of sighand->siglockWaiman Long1-0/+7
When running certain database workload on a high-end system with many CPUs, it was found that spinlock contention in the sigprocmask syscalls became a significant portion of the overall CPU cycles as shown below. 9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2 [k] _raw_spin_lock_irq | ---_raw_spin_lock_irq | |--99.34%-- __set_current_blocked | sigprocmask | sys_rt_sigprocmask | system_call_fastpath | | | |--50.63%-- __swapcontext | | | | | |--99.91%-- upsleepgeneric | | | |--49.36%-- __setcontext | | ktskRun Looking further into the swapcontext function in glibc, it was found that the function always call sigprocmask() without checking if there are changes in the signal mask. A check was added to the __set_current_blocked() function to avoid taking the sighand->siglock spinlock if there is no change in the signal mask. This will prevent unneeded spinlock contention when many threads are trying to call sigprocmask(). With this patch applied, the spinlock contention in sigprocmask() was gone. Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stas Sergeev <stsp@list.ru> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14kernel/watchdog: use nmi registers snapshot in hardlockup handlerKonstantin Khlebnikov1-1/+0
NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ. Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP pointing to the code interrupted by IRQ which was interrupted by NMI. NULL isn't a problem: in this case watchdog calls dump_stack() and prints full stack trace including NMI. But if we're stuck in IRQ handler then NMI watchlog will print stack trace without IRQ part at all. This patch uses registers snapshot passed into NMI handler as arguments: these registers point exactly to the instruction interrupted by NMI. Fixes: 55537871ef66 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup") Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds3-26/+89
Pull namespace updates from Eric Biederman: "After a lot of discussion and work we have finally reachanged a basic understanding of what is necessary to make unprivileged mounts safe in the presence of EVM and IMA xattrs which the last commit in this series reflects. While technically it is a revert the comments it adds are important for people not getting confused in the future. Clearing up that confusion allows us to seriously work on unprivileged mounts of fuse in the next development cycle. The rest of the fixes in this set are in the intersection of user namespaces, ptrace, and exec. I started with the first fix which started a feedback cycle of finding additional issues during review and fixing them. Culiminating in a fix for a bug that has been present since at least Linux v1.0. Potentially these fixes were candidates for being merged during the rc cycle, and are certainly backport candidates but enough little things turned up during review and testing that I decided they should be handled as part of the normal development process just to be certain there were not any great surprises when it came time to backport some of these fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Revert "evm: Translate user/group ids relative to s_user_ns when computing HMAC" exec: Ensure mm->user_ns contains the execed files ptrace: Don't allow accessing an undumpable mm ptrace: Capture the ptracer's creds not PT_PTRACE_CAP mm: Add a user_ns owner to mm_struct and fix ptrace permission checks
2016-12-14Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds6-221/+341
Pull audit updates from Paul Moore: "After the small number of patches for v4.9, we've got a much bigger pile for v4.10. The bulk of these patches involve a rework of the audit backlog queue to enable us to move the netlink multicasting out of the task/thread that generates the audit record and into the kernel thread that emits the record (just like we do for the audit unicast to auditd). While we were playing with the backlog queue(s) we fixed a number of other little problems with the code, and from all the testing so far things look to be in much better shape now. Doing this also allowed us to re-enable disabling IRQs for some netns operations ("netns: avoid disabling irq for netns id"). The remaining patches fix some small problems that are well documented in the commit descriptions, as well as adding session ID filtering support" * 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit: audit: use proper refcount locking on audit_sock netns: avoid disabling irq for netns id audit: don't ever sleep on a command record/message audit: handle a clean auditd shutdown with grace audit: wake up kauditd_thread after auditd registers audit: rework audit_log_start() audit: rework the audit queue handling audit: rename the queues and kauditd related functions audit: queue netlink multicast sends just like we do for unicast sends audit: fixup audit_init() audit: move kaudit thread start from auditd registration to kaudit init (#2) audit: add support for session ID user filter audit: fix formatting of AUDIT_CONFIG_CHANGE events audit: skip sessionid sentinel value when auto-incrementing audit: tame initialization warning len_abuf in audit_log_execve_info audit: less stack usage for /proc/*/loginuid
2016-12-14Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds1-4/+3
Pull security subsystem updates from James Morris: "Generally pretty quiet for this release. Highlights: Yama: - allow ptrace access for original parent after re-parenting TPM: - add documentation - many bugfixes & cleanups - define a generic open() method for ascii & bios measurements Integrity: - Harden against malformed xattrs SELinux: - bugfixes & cleanups Smack: - Remove unnecessary smack_known_invalid label - Do not apply star label in smack_setprocattr hook - parse mnt opts after privileges check (fixes unpriv DoS vuln)" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (56 commits) Yama: allow access for the current ptrace parent tpm: adjust return value of tpm_read_log tpm: vtpm_proxy: conditionally call tpm_chip_unregister tpm: Fix handling of missing event log tpm: Check the bios_dir entry for NULL before accessing it tpm: return -ENODEV if np is not set tpm: cleanup of printk error messages tpm: replace of_find_node_by_name() with dev of_node property tpm: redefine read_log() to handle ACPI/OF at runtime tpm: fix the missing .owner in tpm_bios_measurements_ops tpm: have event log use the tpm_chip tpm: drop tpm1_chip_register(/unregister) tpm: replace dynamically allocated bios_dir with a static array tpm: replace symbolic permission with octal for securityfs files char: tpm: fix kerneldoc tpm2_unseal_trusted name typo tpm_tis: Allow tpm_tis to be bound using DT tpm, tpm_vtpm_proxy: add kdoc comments for VTPM_PROXY_IOC_NEW_DEV tpm: Only call pm_runtime_get_sync if device has a parent tpm: define a generic open() method for ascii & bios measurements Documentation: tpm: add the Physical TPM device tree binding documentation ...
2016-12-14Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-4/+0
Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.10: API: - add skcipher walk interface - add asynchronous compression (acomp) interface - fix algif_aed AIO handling of zero buffer Algorithms: - fix unaligned access in poly1305 - fix DRBG output to large buffers Drivers: - add support for iMX6UL to caam - fix givenc descriptors (used by IPsec) in caam - accelerated SHA256/SHA512 for ARM64 from OpenSSL - add SSE CRCT10DIF and CRC32 to ARM/ARM64 - add AEAD support to Chelsio chcr - add Armada 8K support to omap-rng" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (148 commits) crypto: testmgr - fix overlap in chunked tests again crypto: arm/crc32 - accelerated support based on x86 SSE implementation crypto: arm64/crc32 - accelerated support based on x86 SSE implementation crypto: arm/crct10dif - port x86 SSE implementation to ARM crypto: arm64/crct10dif - port x86 SSE implementation to arm64 crypto: testmgr - add/enhance test cases for CRC-T10DIF crypto: testmgr - avoid overlap in chunked tests crypto: chcr - checking for IS_ERR() instead of NULL crypto: caam - check caam_emi_slow instead of re-lookup platform crypto: algif_aead - fix AIO handling of zero buffer crypto: aes-ce - Make aes_simd_algs static crypto: algif_skcipher - set error code when kcalloc fails crypto: caam - make aamalg_desc a proper module crypto: caam - pass key buffers with typesafe pointers crypto: arm64/aes-ce-ccm - Fix AEAD decryption length MAINTAINERS: add crypto headers to crypto entry crypt: doc - remove misleading mention of async API crypto: doc - fix header file name crypto: api - fix comment typo crypto: skcipher - Add separate walker for AEAD decryption ..
2016-12-14Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivialLinus Torvalds1-2/+0
Pull trivial updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: NTB: correct ntb_spad_count comment typo misc: ibmasm: fix typo in error message Remove references to dead make variable LINUX_INCLUDE Remove last traces of ikconfig.h treewide: Fix printk() message errors Documentation/device-mapper: s/getsize/getsz/
2016-12-14audit: use proper refcount locking on audit_sockRichard Guy Briggs1-6/+24
Resetting audit_sock appears to be racy. audit_sock was being copied and dereferenced without using a refcount on the source sock. Bump the refcount on the underlying sock when we store a refrence in audit_sock and release it when we reset audit_sock. audit_sock modification needs the audit_cmd_mutex. See: https://lkml.org/lkml/2016/11/26/232 Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang <xiyou.wangcong@gmail.com> on ideas how to fix it. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> [PM: fixed the comment block text formatting for auditd_reset()] Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: don't ever sleep on a command record/messagePaul Moore1-5/+13
Sleeping on a command record/message in audit_log_start() could slow something, e.g. auditd, from doing something important, e.g. clean shutdown, which could present problems on a heavily loaded system. This patch allows tasks to bypass any queue restrictions if they are logging a command record/message. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: handle a clean auditd shutdown with gracePaul Moore1-0/+6
When auditd stops cleanly it sets 'auditd_pid' to 0 with an AUDIT_SET message, in this case we should reset our backlog queues via the auditd_reset() function. This patch also adds a 'auditd_pid' check to the top of kauditd_send_unicast_skb() so we can fail quicker. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: wake up kauditd_thread after auditd registersPaul Moore1-0/+1
This patch was suggested by Richard Briggs back in 2015, see the link to the mail archive below. Unfortunately, that patch is no longer even remotely valid due to other changes to the code. * https://www.redhat.com/archives/linux-audit/2015-October/msg00075.html Suggested-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: rework audit_log_start()Paul Moore1-56/+36
The backlog queue handling in audit_log_start() is a little odd with some questionable design decisions, this patch attempts to rectify this with the following changes: * Never make auditd wait, ignore any backlog limits as we need auditd awake so it can drain the backlog queue. * When we hit a backlog limit and start dropping records, don't wake all the tasks sleeping on the backlog, that's silly. Instead, let kauditd_thread() take care of waking everyone once it has had a chance to drain the backlog queue. * Don't keep a global backlog timeout countdown, make it per-task. A per-task timer means we won't have all the sleeping tasks waking at the same time and hammering on an already stressed backlog queue. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: rework the audit queue handlingPaul Moore1-121/+226
The audit record backlog queue has always been a bit of a mess, and the moving the multicast send into kauditd_thread() from audit_log_end() only makes things worse. This patch attempts to fix the backlog queue with a better design that should hold up better under load and have less of a performance impact at syscall invocation time. While it looks like there is a log going on in this patch, the main change is the move from a single backlog queue to three queues: * A queue for holding records generated from audit_log_end() that haven't been consumed by kauditd_thread() (audit_queue). * A queue for holding records that have been sent via multicast but had a temporary failure when sending via unicast and need a resend (audit_retry_queue). * A queue for holding records that haven't been sent via unicast because no one is listening (audit_hold_queue). Special care is taken in this patch to ensure that the proper record ordering is preserved, e.g. we send everything in the hold queue first, then the retry queue, and finally the main queue. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: rename the queues and kauditd related functionsPaul Moore1-20/+20
The audit queue names can be shortened and the record sending helpers associated with the kauditd task could be named better, do these small cleanups now to make life easier once we start reworking the queues and kauditd code. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: queue netlink multicast sends just like we do for unicast sendsPaul Moore1-35/+35
Sending audit netlink multicast messages is bad for all the same reasons that sending audit netlink unicast messages is bad, so this patch reworks things so that we don't do the multicast send in audit_log_end(), we do it from the dedicated kauditd_thread thread just as we do for unicast messages. See the GitHub issues below for more information/history: * https://github.com/linux-audit/audit-kernel/issues/23 * https://github.com/linux-audit/audit-kernel/issues/22 Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: fixup audit_init()Paul Moore1-6/+8
Make sure everything is initialized before we start the kauditd_thread and don't emit the "initialized" record until everything is finished. We also panic with a descriptive message if we can't start the kauditd_thread. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: move kaudit thread start from auditd registration to kaudit init (#2)Richard Guy Briggs1-10/+4
Richard made this change some time ago but Eric backed it out because the rest of the supporting code wasn't ready. In order to move the netlink multicast send to kauditd_thread we need to ensure the kauditd_thread is always running, so restore commit 6ff5e459 ("audit: move kaudit thread start from auditd registration to kaudit init"). Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com> [PM: brought forward and merged based on Richard's old patch] Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14Remove last traces of ikconfig.hPaul Bolle1-2/+0
The build system stopped generating ikconfig.h in v2.6.8. Remove an entry for it in dontdiff. There's also a reference to it in a small comment. Remove that comment too, as it is of little help in any case. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-12-13Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2-28/+86
Pull workqueue updates from Tejun Heo: "Mostly patches to initialize workqueue subsystem earlier and get rid of keventd_up(). The patches were headed for the last merge cycle but got delayed due to a bug found late minute, which is fixed now. Also, to help debugging, destroy_workqueue() is more chatty now on a sanity check failure." * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: move wq_numa_init() to workqueue_init() workqueue: remove keventd_up() debugobj, workqueue: remove keventd_up() usage slab, workqueue: remove keventd_up() usage power, workqueue: remove keventd_up() usage tty, workqueue: remove keventd_up() usage mce, workqueue: remove keventd_up() usage workqueue: make workqueue available early during boot workqueue: dump workqueue state on sanity check failures in destroy_workqueue()
2016-12-13Merge tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds7-118/+342
Pull power management updates from Rafael Wysocki: "Again, cpufreq gets more changes than the other parts this time (one new driver, one old driver less, a bunch of enhancements of the existing code, new CPU IDs, fixes, cleanups) There also are some changes in cpuidle (idle injection rework, a couple of new CPU IDs, online/offline rework in intel_idle, fixes and cleanups), in the generic power domains framework (mostly related to supporting power domains containing CPUs), and in the Operating Performance Points (OPP) library (mostly related to supporting devices with multiple voltage regulators) In addition to that, the system sleep state selection interface is modified to make it easier for distributions with unchanged user space to support suspend-to-idle as the default system suspend method, some issues are fixed in the PM core, the latency tolerance PM QoS framework is improved a bit, the Intel RAPL power capping driver is cleaned up and there are some fixes and cleanups in the devfreq subsystem Specifics: - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding for it (Markus Mayer) - Support for ARM Integrator/AP and Integrator/CP in the generic DT cpufreq driver and elimination of the old Integrator cpufreq driver (Linus Walleij) - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier, and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie, Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik) - cpufreq core fix to eliminate races that may lead to using inactive policy objects and related cleanups (Rafael Wysocki) - cpufreq schedutil governor update to make it use SCHED_FIFO kernel threads (instead of regular workqueues) for doing delayed work (to reduce the response latency in some cases) and related cleanups (Viresh Kumar) - New cpufreq sysfs attribute for resetting statistics (Markus Mayer) - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis, Viresh Kumar) - Support for using generic cpufreq governors in the intel_pstate driver (Rafael Wysocki) - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy Performance Preference/Energy Performance Bias) knobs in the intel_pstate driver (Srinivas Pandruvada) - New CPU ID for Knights Mill in intel_pstate (Piotr Luc) - intel_pstate driver modification to use the P-state selection algorithm based on CPU load on platforms with the system profile in the ACPI tables set to "mobile" (Srinivas Pandruvada) - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki, Srinivas Pandruvada) - cpufreq powernv driver updates including fast switching support (for the schedutil governor), fixes and cleanus (Akshay Adiga, Andrew Donnellan, Denis Kirjanov) - acpi-cpufreq driver rework to switch it over to the new CPU offline/online state machine (Sebastian Andrzej Siewior) - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth Prakash) - Idle injection rework (to make it use the regular idle path instead of a home-grown custom one) and related powerclamp thermal driver updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej Siewior) - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy Shevchenko, Piotr Luc) - intel_idle driver cleanups and switch over to using the new CPU offline/online state machine (Anna-Maria Gleixner, Sebastian Andrzej Siewior) - cpuidle DT driver update to support suspend-to-idle properly (Sudeep Holla) - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian, Rafael Wysocki) - Preliminary support for power domains including CPUs in the generic power domains (genpd) framework and related DT bindings (Lina Iyer) - Assorted fixes and cleanups in the generic power domains (genpd) framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven) - Preliminary support for devices with multiple voltage regulators and related fixes and cleanups in the Operating Performance Points (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd) - System sleep state selection interface rework to make it easier to support suspend-to-idle as the default system suspend method (Rafael Wysocki) - PM core fixes and cleanups, mostly related to the interactions between the system suspend and runtime PM frameworks (Ulf Hansson, Sahitya Tummala, Tony Lindgren) - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski) - New Knights Mill CPU ID for the Intel RAPL power capping driver (Piotr Luc) - Intel RAPL power capping driver fixes, cleanups and switch over to using the new CPU offline/online state machine (Jacob Pan, Thomas Gleixner, Sebastian Andrzej Siewior) - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc, rockchip-dfi devfreq drivers and the devfreq core (Axel Lin, Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar) - Fix for false-positive KASAN warnings during resume from ACPI S3 (suspend-to-RAM) on x86 (Josh Poimboeuf) - Memory map verification during resume from hibernation on x86 to ensure a consistent address space layout (Chen Yu) - Wakeup sources debugging enhancement (Xing Wei) - rockchip-io AVS driver cleanup (Shawn Lin)" * tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits) devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks devfreq: rk3399_dmc: Remove dangling rcu_read_unlock() devfreq: exynos: Don't use OPP structures outside of RCU locks Documentation: intel_pstate: Document HWP energy/performance hints cpufreq: intel_pstate: Support for energy performance hints with HWP cpufreq: intel_pstate: Add locking around HWP requests PM / sleep: Print active wakeup sources when blocking on wakeup_count reads PM / core: Fix bug in the error handling of async suspend PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend PM / Domains: Fix compatible for domain idle state PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators() PM / OPP: Allow platform specific custom set_opp() callbacks PM / OPP: Separate out _generic_set_opp() PM / OPP: Add infrastructure to manage multiple regulators PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage() PM / OPP: Manage supply's voltage/current in a separate structure PM / OPP: Don't use OPP structure outside of rcu protected section PM / OPP: Reword binding supporting multiple regulators per device PM / OPP: Fix incorrect cpu-supply property in binding cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state() ..
2016-12-13Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-blockLinus Torvalds2-17/+16
Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...
2016-12-13Merge tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds1-0/+17
Pull pstore updates from Kees Cook: "Improvements and fixes to pstore subsystem: - add additional checks for bad platform data - remove bounce buffer in console writer - protect read/unlink race with a mutex - correctly give up during dump locking failures - increase ftrace bandwidth by splitting ftrace buffers per CPU" * tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ramoops: add pdata NULL check to ramoops_probe pstore: Convert console write to use ->write_buf pstore: Protect unlink with read_mutex pstore: Use global ftrace filters for function trace filtering ftrace: Provide API to use global filtering for ftrace ops pstore: Clarify context field przs as dprzs pstore: improve error report for failed setup pstore: Merge per-CPU ftrace records into one pstore: Add ftrace timestamp counter ramoops: Split ftrace buffer space into per-CPU zones pstore: Make ramoops_init_przs generic for other prz arrays pstore: Allow prz to control need for locking pstore: Warn on PSTORE_TYPE_PMSG using deprecated function pstore: Make spinlock per zone instead of global pstore: Actually give up during locking failure
2016-12-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-45/+60
Merge updates from Andrew Morton: - various misc bits - most of MM (quite a lot of MM material is awaiting the merge of linux-next dependencies) - kasan - printk updates - procfs updates - MAINTAINERS - /lib updates - checkpatch updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits) init: reduce rootwait polling interval time to 5ms binfmt_elf: use vmalloc() for allocation of vma_filesz checkpatch: don't emit unified-diff error for rename-only patches checkpatch: don't check c99 types like uint8_t under tools checkpatch: avoid multiple line dereferences checkpatch: don't check .pl files, improve absolute path commit log test scripts/checkpatch.pl: fix spelling checkpatch: don't try to get maintained status when --no-tree is given lib/ida: document locking requirements a bit better lib/rbtree.c: fix typo in comment of ____rb_erase_color lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM MAINTAINERS: add drm and drm/i915 irc channels MAINTAINERS: add "C:" for URI for chat where developers hang out MAINTAINERS: add drm and drm/i915 bug filing info MAINTAINERS: add "B:" for URI where to file bugs get_maintainer: look for arbitrary letter prefixes in sections printk: add Kconfig option to set default console loglevel printk/sound: handle more message headers printk/btrfs: handle more message headers printk/kdb: handle more message headers ...
2016-12-12Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-38/+38
Pull irq updates from Thomas Gleixner: "The irq department provides: - a major update to the auto affinity management code, which is used by multi-queue devices - move of the microblaze irq chip driver into the common driver code so it can be shared between microblaze, powerpc and MIPS - a series of updates to the ARM GICV3 interrupt controller - the usual pile of fixes and small improvements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) powerpc/virtex: Use generic xilinx irqchip driver irqchip/xilinx: Try to fall back if xlnx,kind-of-intr not provided irqchip/xilinx: Add support for parent intc irqchip/xilinx: Rename get_irq to xintc_get_irq irqchip/xilinx: Restructure and use jump label api irqchip/xilinx: Clean up print messages microblaze/irqchip: Move intc driver to irqchip ARM: virt: Select ARM_GIC_V3_ITS ARM: gic-v3-its: Add 32bit support to GICv3 ITS irqchip/gic-v3-its: Specialise readq and writeq accesses irqchip/gic-v3-its: Specialise flush_dcache operation irqchip/gic-v3-its: Narrow down Entry Size when used as a divider irqchip/gic-v3-its: Change unsigned types for AArch32 compatibility irqchip/gic-v3: Use nops macro for Cavium ThunderX erratum 23154 irqchip/gic-v3: Convert arm64 GIC accessors to {read,write}_sysreg_s genirq/msi: Drop artificial PCI dependency irqchip/bcm7038-l1: Implement irq_cpu_offline() callback genirq/affinity: Use default affinity mask for reserved vectors genirq/affinity: Take reserved vectors into account when spreading irqs PCI: Remove the irq_affinity mask from struct pci_dev ...
2016-12-12Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds14-97/+307
Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...
2016-12-12Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-124/+86
Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...
2016-12-12printk/kdb: handle more message headersPetr Mladek1-1/+1
Commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") allows to define more message headers for a single message. The motivation is that continuous lines might get mixed. Therefore it make sense to define the right log level for every piece of a cont line. This patch introduces printk_skip_headers() that will skip all headers and uses it in the kdb code instead of printk_skip_level(). This approach helps to fix other printk_skip_level() users independently. Link: http://lkml.kernel.org/r/1478695291-12169-3-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12printk/NMI: handle continuous lines and missing newlinePetr Mladek1-28/+50
Commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") added back KERN_CONT message header. As a result it might appear in the middle of the line when the parts are squashed via the temporary NMI buffer. A reasonable solution seems to be to split the text in the NNI temporary not only by newlines but also by the message headers. Another solution would be to filter out KERN_CONT when writing to the temporary buffer. But this would complicate the lockless handling. Also it would not solve problems with a missing newline that was there even before the KERN_CONT stuff. This patch moves the temporary buffer handling into separate function. I played with it and it seems that using the char pointers make the code easier to read. Also it prints the final newline as a continuous line. Finally, it moves handling of the s->len overflow into the paranoid check. And allows to recover from the disaster. Link: http://lkml.kernel.org/r/1478695291-12169-2-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12printk/NMI: fix up handling of the full nmi log bufferPetr Mladek1-2/+3
vsnprintf() adds the trailing '\0' but it does not count it into the number of printed characters. The result is that there is one byte less space for the real characters in the buffer. The broken check for the free space might cause that we will repeatedly try to print 1 character into the buffer, never reach the full buffer, and do not count the messages as missed. Also vsnprintf() returns the number of characters that would be printed if the buffer was big enough. As a result, s->len might be bigger than the size of the buffer[*]. And the printk() function might return bigger len than it really printed. Both problems are fixed by using vscnprintf() instead. Note that I though about increasing the number of missed messages even when the message was shrunken. But it made the code even more complicated. I think that it is not worth it. Shrunken messages are usually easy to recognize. And it should be a corner case. [*] The overflown s->len value is crazy and unexpected. I "made a mistake" and reported this situation as an internal error when fixed handling of PR_CONT headers in some other patch. Link: http://lkml.kernel.org/r/20161208174912.GA17042@linux.suse Signed-off-by: Petr Mladek <pmladek@suse.com> CcL Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Joe Perches <joe@perches.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Takashi Iwai <tiwai@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12hung_task: decrement sysctl_hung_task_warnings only if it is positiveTetsuo Handa1-1/+2
Since sysctl_hung_task_warnings == -1 is allowed (infinite warnings), commit 48a6d64edadb ("hung_task: allow hung_task_panic when hung_task_warnings is 0") should decrement it only when it is not -1. This prevents the kernel from ceasing warnings after the first 4294967295 ;) Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: John Siddle <jsiddle@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12kernel/fork: use vfree_atomic() to free thread stackAndrey Ryabinin1-1/+1
vfree() is going to use sleeping lock. Thread stack freed in atomic context, therefore we must use vfree_atomic() here. Link: http://lkml.kernel.org/r/1479474236-4139-6-git-send-email-hch@lst.de Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12prctl: remove one-shot limitation for changing exe linkStanislav Kinsburskiy1-10/+0
This limitation came with the reason to remove "another way for malicious code to obscure a compromised program and masquerade as a benign process" by allowing "security-concious program can use this prctl once during its early initialization to ensure the prctl cannot later be abused for this purpose": http://marc.info/?l=linux-kernel&m=133160684517468&w=2 This explanation doesn't look sufficient. The only thing "exe" link is indicating is the file, used to execve, which is basically nothing and not reliable immediately after process has returned from execve system call. Moreover, to use this feture, all the mappings to previous exe file have to be unmapped and all the new exe file permissions must be satisfied. Which means, that changing exe link is very similar to calling execve on the binary. The need to remove this limitations comes from migration of NFS mount point, which is not accessible during restore and replaced by other file system. Because of this exe link has to be changed twice. [akpm@linux-foundation.org: fix up comment] Link: http://lkml.kernel.org/r/20160927153755.9337.69650.stgit@localhost.localdomain Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: John Stultz <john.stultz@linaro.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12kthread: add __printf attributesNicolas Iooss1-2/+3
When commit fbae2d44aa1d ("kthread: add kthread_create_worker*()") introduced some kthread_create_...() functions which were taking printf-like parametter, it introduced __printf attributes to some functions (e.g. kthread_create_worker()). Nevertheless some new functions were forgotten (they have been detected thanks to -Wmissing-format-attribute warning flag). Add the missing __printf attributes to the newly-introduced functions in order to detect formatting issues at build-time with -Wformat flag. Link: http://lkml.kernel.org/r/20161126193543.22672-1-nicolas.iooss_linux@m4x.org Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12kprobes/trace: Fix kprobe selftest for newer gccMarcin Nowakowski1-5/+23
Commit 265a5b7ee3eb ("kprobes/trace: Fix kprobe selftest for gcc 4.6") has added __used attribute to kprobe_trace_selftest_target to ensure that the method is listed in kallsyms table. However, even though the method remains in the kernel image, the actual call is optimized away as there are no side effects and the return value is never checked. Add a return value check and a 'noinline' attribute to ensure that an inlined copy of the method is not used by the caller. Also add checks that verify that the kprobe was really hit, as at the moment the tests show positive results despite the test method being optimized away. Finally, add __init annotations to find_trace_probe_file() and kprobe_trace_selftest_target() as they are only called from within an __init method. Link: http://lkml.kernel.org/r/1481293178-3128-2-git-send-email-marcin.nowakowski@imgtec.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12tracing/kprobes: Add a helper method to return number of probe hitsMarcin Nowakowski1-6/+13
The number of probe hits is stored in a percpu variable and therefore can't be read directly. Add a helper method trace_kprobe_nhit() that performs the required calculation. It will be used in a follow-up commit that changes kprobe selftests to verify the number of probe hits. Link: http://lkml.kernel.org/r/1481293178-3128-1-git-send-email-marcin.nowakowski@imgtec.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12tracing/rb: Init the CPU mask on allocationSebastian Andrzej Siewior1-1/+1
Before commit b32614c03413 ("tracing/rb: Convert to hotplug state machine") the allocated cpumask was initialized to the mask of ONLINE or POSSIBLE CPUs. After the CPU hotplug changes the buffer initialisation moved to trace_rb_cpu_prepare() but I forgot to initially set the cpumask to zero. This is done now. Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.de Fixes: b32614c03413 ("tracing/rb: Convert to hotplug state machine") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-7/+0
Pull x86 asm updates from Ingo Molnar: "The main changes in this development cycle were: - a large number of call stack dumping/printing improvements: higher robustness, better cross-context dumping, improved output, etc. (Josh Poimboeuf) - vDSO getcpu() performance improvement for future Intel CPUs with the RDPID instruction (Andy Lutomirski) - add two new Intel AVX512 features and the CPUID support infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela, He Chen) - more copy-user unification (Borislav Petkov) - entry code assembly macro simplifications (Alexander Kuleshov) - vDSO C/R support improvements (Dmitry Safonov) - misc fixes and cleanups (Borislav Petkov, Paul Bolle)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) scripts/decode_stacktrace.sh: Fix address line detection on x86 x86/boot/64: Use defines for page size x86/dumpstack: Make stack name tags more comprehensible selftests/x86: Add test_vdso to test getcpu() x86/vdso: Use RDPID in preference to LSL when available x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() x86/cpufeatures: Enable new AVX512 cpu features x86/cpuid: Provide get_scattered_cpuid_leaf() x86/cpuid: Cleanup cpuid_regs definitions x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants x86/unwind: Ensure stack grows down x86/vdso: Set vDSO pointer only after success x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE x86/unwind: Detect bad stack return address x86/dumpstack: Warn on stack recursion x86/unwind: Warn on bad frame pointer x86/decoder: Use stderr if insn sanity test fails x86/decoder: Use stdout if insn decoder test is successful mm/page_alloc: Remove kernel address exposure in free_reserved_area() x86/dumpstack: Remove raw stack dump ...
2016-12-12Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull hotplug API fix from Ingo Molnar: "Late breaking fix from the v4.9 cycle: fix a hotplug register/ unregister notifier API asymmetry bug that can cause kernel warnings (and worse) with certain Kconfig combinations" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hotplug: Make register and unregister notifier API symmetric
2016-12-12Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds10-330/+669
Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a notion of 'better cores', which the scheduler will prefer to schedule single threaded workloads on. (Tim Chen, Srinivas Pandruvada) - enhance the handling of asymmetric capacity CPUs further (Morten Rasmussen) - improve/fix load handling when moving tasks between task groups (Vincent Guittot) - simplify and clean up the cputime code (Stanislaw Gruszka) - improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent Guittot) - make struct kthread kmalloc()ed and related fixes (Oleg Nesterov) - add uaccess atomicity debugging (when using access_ok() in the wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra) - implement various fixes, cleanups and other enhancements (Daniel Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched/core: Use load_avg for selecting idlest group sched/core: Fix find_idlest_group() for fork kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() kthread: Don't use to_live_kthread() in kthread_[un]park() kthread: Don't use to_live_kthread() in kthread_stop() Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function" kthread: Make struct kthread kmalloc'ed x86/uaccess, sched/preempt: Verify access_ok() context sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> cpufreq/intel_pstate: Use CPPC to get max performance acpi/bus: Set _OSC for diverse core support acpi/bus: Enable HWP CPPC objects x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU x86/sysctl: Add sysctl for ITMT scheduling feature x86: Enable Intel Turbo Boost Max Technology 3.0 x86/topology: Define x86's arch_update_cpu_topology sched: Extend scheduler's asym packing sched/fair: Clean up the tunable parameter definitions ...
2016-12-12Merge branches 'pm-sleep' and 'powercap'Rafael J. Wysocki3-31/+132
* pm-sleep: PM / sleep: Print active wakeup sources when blocking on wakeup_count reads x86/suspend: fix false positive KASAN warning on suspend/resume PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag PM / sleep: System sleep state selection interface rework PM / hibernate: Verify the consistent of e820 memory map by md5 digest * powercap: powercap / RAPL: Add Knights Mill CPUID powercap/intel_rapl: fix and tidy up error handling powercap/intel_rapl: Track active CPUs internally powercap/intel_rapl: Cleanup duplicated init code powercap/intel rapl: Convert to hotplug state machine powercap/intel_rapl: Propagate error code when registration fails powercap/intel_rapl: Add missing domain data update on hotplug
2016-12-12Merge branch 'pm-cpuidle'Rafael J. Wysocki3-67/+111
* pm-cpuidle: cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state() cpuidle: fix improper return value on error intel_idle: Convert to hotplug state machine intel_idle: Remove superfluous SMP fuction call MAINTAINERS: Add Jacob Pan as a new intel_idle maintainer MAINTAINERS: Add bug tracking system location entries for cpuidle x86/intel_idle: Add Knights Mill CPUID x86/intel_idle: Add CPU model 0x4a (Atom Z34xx series) thermal/intel_powerclamp: stop sched tick in forced idle thermal/intel_powerclamp: Convert to CPU hotplug state thermal/intel_powerclamp: Convert the kthread to kthread worker API thermal/intel_powerclamp: Remove duplicated code that starts the kthread sched/idle: Add support for tasks that inject idle cpuidle: Allow enforcing deepest idle state selection cpuidle/powernv: staticise powernv_idle_driver cpuidle: dt: assign ->enter_freeze to same as ->enter callback function cpuidle: governors: Remove remaining old module code
2016-12-12Merge branch 'pm-cpufreq'Rafael J. Wysocki1-20/+99
* pm-cpufreq: (51 commits) Documentation: intel_pstate: Document HWP energy/performance hints cpufreq: intel_pstate: Support for energy performance hints with HWP cpufreq: intel_pstate: Add locking around HWP requests cpufreq: ondemand: Set MIN_FREQUENCY_UP_THRESHOLD to 1 cpufreq: intel_pstate: Add Knights Mill CPUID MAINTAINERS: Add bug tracking system location entry for cpufreq cpufreq: dt: Add support for zx296718 cpufreq: acpi-cpufreq: drop rdmsr_on_cpus() usage cpufreq: acpi-cpufreq: Convert to hotplug state machine cpufreq: intel_pstate: fix intel_pstate_exit_perf_limits() prototype cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode cpufreq: schedutil: Rectify comment in sugov_irq_work() function cpufreq: intel_pstate: increase precision of performance limits cpufreq: intel_pstate: round up min_perf limits cpufreq: Make cpufreq_update_policy() void ACPI / processor: Make acpi_processor_ppc_has_changed() void cpufreq: Avoid using inactive policies cpufreq: intel_pstate: Generic governors support cpufreq: intel_pstate: Request P-states control from SMM if needed cpufreq: dt: Add support for r8a7743 and r8a7745 ...
2016-12-12tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate resultsPavankumar Kondeti1-1/+1
The 's' flag is supposed to indicate that a softirq is running. This can be detected by testing the preempt_count with SOFTIRQ_OFFSET. The current code tests the preempt_count with SOFTIRQ_MASK, which would be true even when softirqs are disabled but not serving a softirq. Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-12Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds15-335/+400
Pull locking updates from Ingo Molnar: "The tree got pretty big in this development cycle, but the net effect is pretty good: 115 files changed, 673 insertions(+), 1522 deletions(-) The main changes were: - Rework and generalize the mutex code to remove per arch mutex primitives. (Peter Zijlstra) - Add vCPU preemption support: add an interface to query the preemption status of vCPUs and use it in locking primitives - this optimizes paravirt performance. (Pan Xinhui, Juergen Gross, Christian Borntraeger) - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to clean up and improve the s390 lock yielding machinery and its core kernel impact. (Christian Borntraeger) - Micro-optimize mutexes some more. (Waiman Long) - Reluctantly add the to-be-deprecated mutex_trylock_recursive() interface on a temporary basis, to give the DRM code more time to get rid of its locking hacks. Any other users will be NAK-ed on sight. (We turned off the deprecation warning for the time being to not pollute the build log.) (Peter Zijlstra) - Improve the rtmutex code a bit, in light of recent long lived bugs/races. (Thomas Gleixner) - Misc fixes, cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/paravirt: Fix bool return type for PVOP_CALL() x86/paravirt: Fix native_patch() locking/ww_mutex: Use relaxed atomics locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() Documentation/virtual/kvm: Support the vCPU preemption check x86/xen: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check kvm: Introduce kvm_write_guest_offset_cached() locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests locking/spinlocks, s390: Implement vcpu_is_preempted(cpu) locking/core, powerpc: Implement vcpu_is_preempted(cpu) sched/core: Introduce the vcpu_is_preempted(cpu) interface sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q locking/core: Provide common cpu_relax_yield() definition locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily ...
2016-12-12Merge branch 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-6/+12
Pull SMP bootup updates from Ingo Molnar: "Three changes to unify/standardize some of the bootup message printing in kernel/smp.c between architectures" * 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/smp: Tell the user we're bringing up secondary CPUs kernel/smp: Make the SMP boot message common on all arches kernel/smp: Define pr_fmt() for smp.c
2016-12-12Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-13/+28
Pull RCU updates from Ingo Molnar: "The main RCU changes in this development cycle were: - Miscellaneous fixes, including a change to call_rcu()'s rcu_head alignment check. - Security-motivated list consistency checks, which are disabled by default behind DEBUG_LIST. - Torture-test updates. - Documentation updates, yet again just simple changes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: torture: Prevent jitter from delaying build-only runs torture: Remove obsolete files from rcutorture .gitignore rcu: Don't kick unless grace period or request rcu: Make expedited grace periods recheck dyntick idle state torture: Trace long read-side delays rcu: RCU_TRACE enables event tracing as well as debugfs rcu: Remove obsolete comment from __call_rcu() rcu: Remove obsolete rcu_check_callbacks() header comment rcu: Tighten up __call_rcu() rcu_head alignment check Documentation/RCU: Fix minor typo documentation: Present updated RCU guarantee bug: Avoid Kconfig warning for BUG_ON_DATA_CORRUPTION lib/Kconfig.debug: Fix typo in select statement lkdtm: Add tests for struct list corruption bug: Provide toggle for BUG on data corruption list: Split list_del() debug checking into separate function rculist: Consolidate DEBUG_LIST for list_add_rcu() list: Split list_add() debug checking into separate function
2016-12-11sched/core: Use load_avg for selecting idlest groupVincent Guittot1-11/+44
find_idlest_group() only compares the runnable_load_avg when looking for the least loaded group. But on fork intensive use case like hackbench where tasks blocked quickly after the fork, this can lead to selecting the same CPU instead of other CPUs, which have similar runnable load but a lower load_avg. When the runnable_load_avg of 2 CPUs are close, we now take into account the amount of blocked load as a 2nd selection factor. There is now 3 zones for the runnable_load of the rq: - [0 .. (runnable_load - imbalance)]: Select the new rq which has significantly less runnable_load - [(runnable_load - imbalance) .. (runnable_load + imbalance)]: The runnable loads are close so we use load_avg to chose between the 2 rq - [(runnable_load + imbalance) .. ULONG_MAX]: Keep the current rq which has significantly less runnable_load The scale factor that is currently used for comparing runnable_load, doesn't work well with small value. As an example, the use of a scaling factor fails as soon as this_runnable_load == 0 because we always select local rq even if min_runnable_load is only 1, which doesn't really make sense because they are just the same. So instead of scaling factor, we use an absolute margin for runnable_load to detect CPUs with similar runnable_load and we keep using scaling factor for blocked load. For use case like hackbench, this enable the scheduler to select different CPUs during the fork sequence and to spread tasks across the system. Tests have been done on a Hikey board (ARM based octo cores) for several kernel. The result below gives min, max, avg and stdev values of 18 runs with each configuration. The patches depend on the "no missing update_rq_clock()" work. hackbench -P -g 1 ea86cb4b7621 7dc603c9028e v4.8 v4.8+patches min 0.049 0.050 0.051 0,048 avg 0.057 0.057(0%) 0.057(0%) 0,055(+5%) max 0.066 0.068 0.070 0,063 stdev +/-9% +/-9% +/-8% +/-9% More performance numbers here: https://lkml.kernel.org/r/20161203214707.GI20785@codeblueprint.co.uk Tested-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: kernellwp@gmail.com Cc: umgwanakikbuti@gmail.com Cc: yuyang.du@intel.comc Link: http://lkml.kernel.org/r/1481216215-24651-3-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11sched/core: Fix find_idlest_group() for forkVincent Guittot1-0/+8
During fork, the utilization of a task is init once the rq has been selected because the current utilization level of the rq is used to set the utilization of the fork task. As the task's utilization is still 0 at this step of the fork sequence, it doesn't make sense to look for some spare capacity that can fit the task's utilization. Furthermore, I can see perf regressions for the test: hackbench -P -g 1 because the least loaded policy is always bypassed and tasks are not spread during fork. With this patch and the fix below, we are back to same performances as for v4.8. The fix below is only a temporary one used for the test until a smarter solution is found because we can't simply remove the test which is useful for others benchmarks | @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t | | avg_cost = this_sd->avg_scan_cost; | | - /* | - * Due to large variance we need a large fuzz factor; hackbench in | - * particularly is sensitive here. | - */ | - if ((avg_idle / 512) < avg_cost) | - return -1; | - | time = local_clock(); | | for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) { Tested-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: kernellwp@gmail.com Cc: umgwanakikbuti@gmail.com Cc: yuyang.du@intel.comc Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>