From 00cd29b799e3449f0c68b1cc77cd4a5f95b42d17 Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Wed, 13 Jan 2016 08:10:31 -0800 Subject: klist: fix starting point removed bug in klist iterators The starting node for a klist iteration is often passed in from somewhere way above the klist infrastructure, meaning there's no guarantee the node is still on the list. We've seen this in SCSI where we use bus_find_device() to iterate through a list of devices. In the face of heavy hotplug activity, the last device returned by bus_find_device() can be removed before the next call. This leads to Dec 3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50() Dec 3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core Dec 3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2 Dec 3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013 Dec 3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000 Dec 3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000 Dec 3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60 Dec 3 13:22:02 localhost kernel: Call Trace: Dec 3 13:22:02 localhost kernel: [] dump_stack+0x44/0x55 Dec 3 13:22:02 localhost kernel: [] warn_slowpath_common+0x82/0xc0 Dec 3 13:22:02 localhost kernel: [] ? proc_scsi_show+0x20/0x20 Dec 3 13:22:02 localhost kernel: [] warn_slowpath_null+0x1a/0x20 Dec 3 13:22:02 localhost kernel: [] klist_iter_init_node+0x3d/0x50 Dec 3 13:22:02 localhost kernel: [] bus_find_device+0x51/0xb0 Dec 3 13:22:02 localhost kernel: [] scsi_seq_next+0x2d/0x40 [...] And an eventual crash. It can actually occur in any hotplug system which has a device finder and a starting device. We can fix this globally by making sure the starting node for klist_iter_init_node() is actually a member of the list before using it (and by starting from the beginning if it isn't). Reported-by: Ewan D. Milne Tested-by: Ewan D. Milne Cc: stable@vger.kernel.org Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman --- lib/klist.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'lib') diff --git a/lib/klist.c b/lib/klist.c index d74cf7a29afd..0507fa5d84c5 100644 --- a/lib/klist.c +++ b/lib/klist.c @@ -282,9 +282,9 @@ void klist_iter_init_node(struct klist *k, struct klist_iter *i, struct klist_node *n) { i->i_klist = k; - i->i_cur = n; - if (n) - kref_get(&n->n_ref); + i->i_cur = NULL; + if (n && kref_get_unless_zero(&n->n_ref)) + i->i_cur = n; } EXPORT_SYMBOL_GPL(klist_iter_init_node); -- cgit v1.2.3-59-g8ed1b From 4ba6a2b28f111e4c9621487612056d10f3f4a6ca Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 8 Feb 2016 16:09:08 +0900 Subject: scatterlist: fix a typo in comment block of sg_miter_stop() Fix the doubled "started" and tidy up the following sentences. Signed-off-by: Masahiro Yamada Signed-off-by: Linus Torvalds --- lib/scatterlist.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'lib') diff --git a/lib/scatterlist.c b/lib/scatterlist.c index bafa9933fa76..004fc70fc56a 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -598,9 +598,9 @@ EXPORT_SYMBOL(sg_miter_next); * * Description: * Stops mapping iterator @miter. @miter should have been started - * started using sg_miter_start(). A stopped iteration can be - * resumed by calling sg_miter_next() on it. This is useful when - * resources (kmap) need to be released during iteration. + * using sg_miter_start(). A stopped iteration can be resumed by + * calling sg_miter_next() on it. This is useful when resources (kmap) + * need to be released during iteration. * * Context: * Preemption disabled if the SG_MITER_ATOMIC is set. Don't care -- cgit v1.2.3-59-g8ed1b From f303fccb82928790ec58eea82722bd5c54d300b3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 9 Feb 2016 17:59:38 -0500 Subject: workqueue: implement "workqueue.debug_force_rr_cpu" debug feature Workqueue used to guarantee local execution for work items queued without explicit target CPU. The guarantee is gone now which can break some usages in subtle ways. To flush out those cases, this patch implements a debug feature which forces round-robin CPU selection for all such work items. The debug feature defaults to off and can be enabled with a kernel parameter. The default can be flipped with a debug config option. If you hit this commit during bisection, please refer to 041bd12e272c ("Revert "workqueue: make sure delayed work run in local cpu"") for more information and ping me. Signed-off-by: Tejun Heo --- Documentation/kernel-parameters.txt | 11 +++++++++++ kernel/workqueue.c | 23 +++++++++++++++++++++-- lib/Kconfig.debug | 15 +++++++++++++++ 3 files changed, 47 insertions(+), 2 deletions(-) (limited to 'lib') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 87d40a72f6a1..cda2ead39093 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -4230,6 +4230,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted. The default value of this parameter is determined by the config option CONFIG_WQ_POWER_EFFICIENT_DEFAULT. + workqueue.debug_force_rr_cpu + Workqueue used to implicitly guarantee that work + items queued without explicit CPU specified are put + on the local CPU. This guarantee is no longer true + and while local CPU is still preferred work items + may be put on foreign CPUs. This debug option + forces round-robin CPU selection to flush out + usages which depend on the now broken guarantee. + When enabled, memory and cache locality will be + impacted. + x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of default x2apic cluster mode on platforms supporting x2apic. diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 054774605d2f..51d77e7c0989 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -307,6 +307,18 @@ static cpumask_var_t wq_unbound_cpumask; /* CPU where unbound work was last round robin scheduled from this CPU */ static DEFINE_PER_CPU(int, wq_rr_cpu_last); +/* + * Local execution of unbound work items is no longer guaranteed. The + * following always forces round-robin CPU selection on unbound work items + * to uncover usages which depend on it. + */ +#ifdef CONFIG_DEBUG_WQ_FORCE_RR_CPU +static bool wq_debug_force_rr_cpu = true; +#else +static bool wq_debug_force_rr_cpu = false; +#endif +module_param_named(debug_force_rr_cpu, wq_debug_force_rr_cpu, bool, 0644); + /* the per-cpu worker pools */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS], cpu_worker_pools); @@ -1309,10 +1321,17 @@ static bool is_chained_work(struct workqueue_struct *wq) */ static int wq_select_unbound_cpu(int cpu) { + static bool printed_dbg_warning; int new_cpu; - if (cpumask_test_cpu(cpu, wq_unbound_cpumask)) - return cpu; + if (likely(!wq_debug_force_rr_cpu)) { + if (cpumask_test_cpu(cpu, wq_unbound_cpumask)) + return cpu; + } else if (!printed_dbg_warning) { + pr_warn("workqueue: round-robin CPU selection forced, expect performance impact\n"); + printed_dbg_warning = true; + } + if (cpumask_empty(wq_unbound_cpumask)) return cpu; diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ecb9e75614bf..8bfd1aca7a3d 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1400,6 +1400,21 @@ config RCU_EQS_DEBUG endmenu # "RCU Debugging" +config DEBUG_WQ_FORCE_RR_CPU + bool "Force round-robin CPU selection for unbound work items" + depends on DEBUG_KERNEL + default n + help + Workqueue used to implicitly guarantee that work items queued + without explicit CPU specified are put on the local CPU. This + guarantee is no longer true and while local CPU is still + preferred work items may be put on foreign CPUs. Kernel + parameter "workqueue.debug_force_rr_cpu" is added to force + round-robin CPU selection to flush out usages which depend on the + now broken guarantee. This config option enables the debug + feature by default. When enabled, memory and cache locality will + be impacted. + config DEBUG_BLOCK_EXT_DEVT bool "Force extended block device numbers and spread them" depends on DEBUG_KERNEL -- cgit v1.2.3-59-g8ed1b From 7707535ab95e2231b6d7f2bfb4f27558e83c4dc2 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 11 Feb 2016 16:12:55 -0800 Subject: ubsan: cosmetic fix to Kconfig text When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased significantly (~3x). So, it sounds better to have some note in Kconfig. And, fixed a typo. Signed-off-by: Yang Shi Acked-by: Andrey Ryabinin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/Kconfig.ubsan | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'lib') diff --git a/lib/Kconfig.ubsan b/lib/Kconfig.ubsan index 49518fb48cab..e07c1ba9ba13 100644 --- a/lib/Kconfig.ubsan +++ b/lib/Kconfig.ubsan @@ -18,6 +18,8 @@ config UBSAN_SANITIZE_ALL This option activates instrumentation for the entire kernel. If you don't enable this option, you have to explicitly specify UBSAN_SANITIZE := y for the files/directories you want to check for UB. + Enabling this option will get kernel image size increased + significantly. config UBSAN_ALIGNMENT bool "Enable checking of pointers alignment" @@ -25,5 +27,5 @@ config UBSAN_ALIGNMENT default y if !HAVE_EFFICIENT_UNALIGNED_ACCESS help This option enables detection of unaligned memory accesses. - Enabling this option on architectures that support unalligned + Enabling this option on architectures that support unaligned accesses may produce a lot of false positives. -- cgit v1.2.3-59-g8ed1b From 7eb391299419a03cbe0fa5ab0e6b0932e42c7a36 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 11 Feb 2016 16:13:00 -0800 Subject: vsprintf: kptr_restrict is okay in IRQ when 2 The kptr_restrict flag, when set to 1, only prints the kernel address when the user has CAP_SYSLOG. When it is set to 2, the kernel address is always printed as zero. When set to 1, this needs to check whether or not we're in IRQ. However, when set to 2, this check is unneccessary, and produces confusing results in dmesg. Thus, only make sure we're not in IRQ when mode 1 is used, but not mode 2. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jason A. Donenfeld Cc: Rasmus Villemoes Cc: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/vsprintf.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'lib') diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 48ff9c36644d..f44e178e6ede 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1590,22 +1590,23 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, return buf; } case 'K': - /* - * %pK cannot be used in IRQ context because its test - * for CAP_SYSLOG would be meaningless. - */ - if (kptr_restrict && (in_irq() || in_serving_softirq() || - in_nmi())) { - if (spec.field_width == -1) - spec.field_width = default_width; - return string(buf, end, "pK-error", spec); - } - switch (kptr_restrict) { case 0: /* Always print %pK values */ break; case 1: { + const struct cred *cred; + + /* + * kptr_restrict==1 cannot be used in IRQ context + * because its test for CAP_SYSLOG would be meaningless. + */ + if (in_irq() || in_serving_softirq() || in_nmi()) { + if (spec.field_width == -1) + spec.field_width = default_width; + return string(buf, end, "pK-error", spec); + } + /* * Only print the real pointer value if the current * process has CAP_SYSLOG and is running with the @@ -1615,8 +1616,7 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, * leak pointer values if a binary opens a file using * %pK and then elevates privileges before reading it. */ - const struct cred *cred = current_cred(); - + cred = current_cred(); if (!has_capability_noaudit(current, CAP_SYSLOG) || !uid_eq(cred->euid, cred->uid) || !gid_eq(cred->egid, cred->gid)) -- cgit v1.2.3-59-g8ed1b