linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-02-20	blk-throttle: check stats_cpu before reading it from sysfs	Thadeu Lima de Souza Cascardo	1	-0/+3
	When reading blkio.throttle.io_serviced in a recently created blkio cgroup, it's possible to race against the creation of a throttle policy, which delays the allocation of stats_cpu. Like other functions in the throttle code, just checking for a NULL stats_cpu prevents the following oops caused by that race. [ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020 [ 1117.285252] Faulting instruction address: 0xc0000000003efa2c [ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1] [ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV [ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4 [ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5 [ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000 [ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980 [ 1137.734202] REGS: c000000f1d213500 TRAP: 0300 Not tainted (3.19.0) [ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 42008884 XER: 20000000 [ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0 GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000 GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000 GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808 GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000 GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80 GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850 [ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180 [ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180 [ 1137.734943] Call Trace: [ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable) [ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0 [ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70 [ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150 [ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50 [ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510 [ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200 [ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80 [ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0 [ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100 [ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98 [ 1137.735383] Instruction dump: [ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24 [ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018 And here is one code that allows to easily reproduce this, although this has first been found by running docker. void run(pid_t pid) { int n; int status; int fd; char buffer; buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE); n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid); fd = open(CGPATH "/test/tasks", O_WRONLY); write(fd, buffer, n); close(fd); if (fork() > 0) { fd = open("/dev/sda", O_RDONLY \| O_DIRECT); read(fd, buffer, 512); close(fd); wait(&status); } else { fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY); n = read(fd, buffer, BUFFER_SIZE); close(fd); } free(buffer); exit(0); } void test(void) { int status; mkdir(CGPATH "/test", 0666); if (fork() > 0) wait(&status); else run(getpid()); rmdir(CGPATH "/test"); } int main(int argc, char *argv) { int i; for (i = 0; i < NR_TESTS; i++) test(); return 0; } Reported-by: Ricardo Marin Matinata <rmm@br.ibm.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2015-02-20	IB/qib: Add blank line after declaration	Mike Marciniszyn	17	-6/+67
	Upstream checkpatch now requires this. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-20	IB/qib: Fix checkpatch warnings	Mike Marciniszyn	15	-44/+42
	Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-20	i2c: ocores: rework clk code to handle NULL cookie	Wolfram Sang	1	-14/+21
	For, !HAVE_CLK the clk API returns a NULL cookie. Rework the initialization code to handle that. If clk_get_rate() delivers 0, we use the fallback mechanisms. The patch is pretty easy when ignoring white space issues (git diff -b). Suggested-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Tested-by: Max Filippov <jcmvbkbc@gmail.com>
2015-02-20	thermal: exynos: fix: Check if data->tmu_read callback is present before read	Lukasz Majewski	1	-1/+1
	The exynos_tmu_data() function should on entrance test not only for valid data pointer, but also for data->tmu_read one. It is important, since afterwards it is dereferenced to get temperature code. Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Tested-by: Abhilash Kesavan <a.kesavan@samsung.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2015-02-20	x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch	Jiri Kosina	2	-2/+0
	Commit f47233c2d34 ("x86/mm/ASLR: Propagate base load address calculation") causes PAGE_SIZE redefinition warnings for UML subarch builds. This is caused by added includes that were leftovers from previous patch versions are are not actually needed (especially page_types.h inlcude in module.c). Drop those stray includes. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1502201017240.28769@pobox.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19	clk: Only recalculate the rate if needed	Tomeu Vizoso	1	-1/+4
	We don't really need to recalculate the effective rate of a clock when a per-user clock is removed, if the constraints of the later aren't limiting the requested rate. This was causing problems with clocks that never had a rate set before, as rate_req would be zero. Though this could be considered a bug in the implementation of those clocks, this should be checked somewhere else. Fixes: 1c8e600440c7 ("clk: Add rate constraints to clocks") Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Signed-off-by: Michael Turquette <mturquette@linaro.org>
2015-02-19	ipmi: Fix a memory ordering issue	Corey Minyard	1	-4/+10
	From a locking point of view it is safe to check waiting_msg without a lock, but there is a memory ordering issue that causes it to possibly not be set right when viewed from another processor. We are already claiming a lock right after that, move the check to inside the lock to enforce the memory ordering. Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	ipmi: Remove uses of return value of seq_printf	Joe Perches	3	-16/+26
	The seq_printf like functions will soon be changed to return void. Convert these uses to check seq_has_overflowed instead. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	ipmi: Use is_visible callback for conditional sysfs entries	Takashi Iwai	1	-43/+17
	Instead of manual calls of device_create_file() and device_remove_file(), implement the condition in is_visible callback for the attribute group and put these entries to the group, too. This simplifies the code and avoids the possible races. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	ipmi: Free ipmi_recv_msg messages from the linked list on close	Nicholas Krause	1	-1/+5
	This adds a loop through the elements in the linked list, recv_msgs using list_for_entry_safe in order to free messages in this list. In addition we are using the safe version of this marco in order to prevent use after bugs related to deleting the element we are on currently by holding a pointer to the next element after the current one we are on and freeing with the function, ipmi_free_recv_msg internally in this loop. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	ipmi: avoid gcc warning	Arnd Bergmann	1	-8/+21
	A new harmless warning has come up on ARM builds with gcc-4.9: drivers/char/ipmi/ipmi_msghandler.c: In function 'smi_send.isra.11': include/linux/spinlock.h:372:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized] raw_spin_unlock_irqrestore(&lock->rlock, flags); ^ drivers/char/ipmi/ipmi_msghandler.c:1490:16: note: 'flags' was declared here unsigned long flags; ^ This could be worked around by initializing the 'flags' variable, but it seems better to rework the code to avoid this. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 7ea0ed2b5be81 ("ipmi: Make the message handler easier to use for SMI interfaces") Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	ipmi: Update timespec usage to timespec64	John Stultz	1	-12/+13
	As part of the internal y2038 cleanup, this patch removes timespec usage in the ipmi driver, replacing it timespec64 Cc: openipmi-developer@lists.sourceforge.net Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Corey Minyard <minyard@mvista.com>
2015-02-19	ipmi: Cleanup DEBUG_TIMING ifdef usage	John Stultz	1	-40/+21
	The driver uses #ifdef DEBUG_TIMING in order to conditionally print out timestamped debug messages. Unfortunately it adds the ifdefs all over the usage sites. This patch cleans it up by adding a debug_timestamp() function which is compiled out if DEBUG_TIMING isn't present. This cleans up all the ugly ifdefs in the function logic. Cc: openipmi-developer@lists.sourceforge.net Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Corey Minyard <minyard@mvista.com>
2015-02-19	drivers:char:ipmi: Remove unneeded FIXME comment in the file,ipmi_si_intf.c	Nicholas Krause	1	-1/+0
	Removes a no longer needed FIXME comment in the function,acpi_gpe_irq_setup for the file,ipmi_si_intf.c. This comment is no longer needed as clearly we are passing the correct level of ACPI_GPE_LEVEL_TRIGGERED to the installer function,acpi_install_gpe_handler due to no breakage after years of using this ACPI level in the function,acpi_install_gpe_handler. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	char: ipmi: Remove obsolete cleanup for clientdata	Wolfram Sang	1	-2/+0
	A few new i2c-drivers came into the kernel which clear the clientdata-pointer on exit or error. This is obsolete meanwhile, the core will do it. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	ipmi: Remove a FIXME for slab conversion	Corey Minyard	1	-1/+0
	There can't be more than a few IPMI messages allocated at any one time, so converting the messages to slabs would be a waste. So just remove the FIXME. Suggested-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-02-19	NVMe: Fix potential corruption on sync commands	Keith Busch	1	-29/+3
	This makes all sync commands uninterruptible and schedules without timeout so the controller either has to post a completion or the timeout recovery fails the command. This fixes potential memory or data corruption from a command timing out too early or woken by a signal. Previously any DMA buffers mapped for that command would have been released even though we don't know what the controller is planning to do with those addresses. Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	NVMe: Remove unused variables	Keith Busch	1	-8/+0
	We don't track queues in a llist, subscribe to hot-cpu notifications, or internally retry commands. Delete the unused artifacts. Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	NVMe: Fix scsi mode select llbaa setting	Keith Busch	1	-1/+1
	It should be a logical bitwise AND, not conditional. Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	NVMe: Fix potential corruption during shutdown	Keith Busch	2	-31/+19
	The driver has to end unreturned commands at some point even if the controller has not provided a completion. The driver tried to be safe by deleting IO queues prior to ending all unreturned commands. That should cause the controller to internally abort inflight commands, but IO queue deletion request does not have to be successful, so all bets are off. We still have to make progress, so to be extra safe, this patch doesn't clear a queue to release the dma mapping for a command until after the pci device has been disabled. This patch removes the special handling during device initialization so controller recovery can be done all the time. This is possible since initialization is not inlined with pci probe anymore. Reported-by: Nilish Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	NVMe: Asynchronous controller probe	Keith Busch	2	-17/+32
	This performs the longest parts of nvme device probe in scheduled work. This speeds up probe significantly when multiple devices are in use. Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	NVMe: Register management handle under nvme class	Keith Busch	2	-25/+57
	This creates a new class type for nvme devices to register their management character devices with. This is so we do not rely on miscdev to provide enough minors for as many nvme devices some people plan to use. The previous limit was approximately 60 NVMe controllers, depending on the platform and kernel. Now the limit is 1M, which ought to be enough for anybody. Since we have a new device class, it makes sense to attach the block devices under this as well, so part of this patch moves the management handle initialization prior to the namespaces discovery. Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	NVMe: Update SCSI Inquiry VPD 83h translation	Keith Busch	3	-44/+62
	The original translation created collisions on Inquiry VPD 83 for many existing devices. Newer specifications provide other ways to translate based on the device's version can be used to create unique identifiers. Version 1.1 provides an EUI64 field that uniquely identifies each namespace, and 1.2 added the longer NGUID field for the same reason. Both follow the IEEE EUI format and readily translate to the SCSI device identification EUI designator type 2h. For devices implementing either, the translation will use this type, defaulting to the EUI64 8-byte type if implemented then NGUID's 16 byte version if not. If neither are provided, the 1.0 translation is used, and is updated to use the SCSI String format to guarantee a unique identifier. Knowing when to use the new fields depends on the nvme controller's revision. The NVME_VS macro was not decoding this correctly, so that is fixed in this patch and moved to a more appropriate place. Since the Identify Namespace structure required an update for the NGUID field, this patch adds the remaining new 1.2 fields to the structure. Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	NVMe: Metadata format support	Keith Busch	3	-67/+249
	Adds support for NVMe metadata formats and exposes block devices for all namespaces regardless of their format. Namespace formats that are unusable will have disk capacity set to 0, but a handle to the block device is created to simplify device management. A namespace is not usable when the format requires host interleave block and metadata in single buffer, has no provisioned storage, or has better data but failed to register with blk integrity. The namespace has to be scanned in two phases to support separate metadata formats. The first establishes the sector size and capacity prior to invoking add_disk. If metadata is required, the capacity will be temporarilly set to 0 until it can be revalidated and registered with the integrity extenstions after add_disk completes. The driver relies on the integrity extensions to provide the metadata buffer. NVMe requires this be a single physically contiguous region, so only one integrity segment is allowed per command. If the metadata is used for T10 PI, the driver provides mappings to save and restore the reftag physical block translation. The driver provides no-op functions for generate and verify if metadata is not used for protection information. This way the setup is always provided by the block layer. If a request does not supply a required metadata buffer, the command is failed with bad address. This could only happen if a user manually disables verify/generate on such a disk. The only exception to where this is okay is if the controller is capable of stripping/generating the metadata, which is possible on some types of formats. The metadata scatter gather list now occupies the spot in the nvme_iod that used to be used to link retryable IOD's, but we don't do that anymore, so the field was unused. Signed-off-by: Keith Busch <keith.busch@intel.com>
2015-02-19	x86: pte_protnone() and pmd_protnone() must check entry is not present	David Vrabel	1	-2/+4
	Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if _PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check for this. This fixes a 64-bit Xen PV guest regression introduced by 8a0516ed8b90 ("mm: convert p[te\|md]_numa users to p[te\|md]_protnone_numa"). Any userspace process would endlessly fault. In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set by the hypervisor. This meant that any fault on a present userspace entry (e.g., a write to a read-only mapping) would be misinterpreted as a NUMA hinting fault and the fault would not be correctly handled, resulting in the access endlessly faulting. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-19	cpuidle: powernv: Avoid endianness conversions while parsing DT	Preeti U Murthy	1	-13/+16
	We currently read the information about idle states from the DT so as to populate the cpuidle table. Use those APIs to read from the DT that can avoid endianness conversions of the property values in the cpuidle driver. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-19	kgdb, docs: Fix <para> pdfdocs build errors	Rajaneesh Acharya	1	-3/+3
	kgdb.pdf failed to build from 'make pdfdocs' giving errors such as: jade:... Documentation/DocBook/kgdb.xml:200:8:E: document type does not allow element "para" here; missing one of "footnote", "caution", "important", "note", "tip", "warning", "blockquote", "informalexample" start-tag Fixing minor <para> and <sect> issues allows kgdb.pdf to be generated under Fedora20. Originally submitted by rajaneesh.acharya@yahoo.com in 2011, discussed here: http://permalink.gmane.org/gmane.linux.documentation/3954 as patch: The following are the enhancements that removed the errors while issuing "make pdfdocs" [graham.whaley@intel.com: Improved commit message and ported to 3.18.1] Signed-off-by: Graham Whaley <graham.whaley@intel.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	debug: prevent entering debug mode on panic/exception.	Colin Cross	1	-0/+17
	On non-developer devices, kgdb prevents the device from rebooting after a panic. Incase of panics and exceptions, to allow the device to reboot, prevent entering debug mode to avoid getting stuck waiting for the user to interact with debugger. To avoid entering the debugger on panic/exception without any extra configuration, panic_timeout is being used which can be set via /proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default value. Setting panic_timeout indicates that the user requested machine to perform unattended reboot after panic. We dont want to get stuck waiting for the user input incase of panic. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: kgdb-bugreport@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Cc: Android Kernel Team <kernel-team@android.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Colin Cross <ccross@android.com> [Kiran: Added context to commit message. panic_timeout is used instead of break_on_panic and break_on_exception to honor CONFIG_PANIC_TIMEOUT Modified the commit as per community feedback] Signed-off-by: Kiran Raparthy <kiran.kumar@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	kdb: Const qualifier for kdb_getstr's prompt argument	Daniel Thompson	2	-2/+2
	All current callers of kdb_getstr() can pass constant pointers via the prompt argument. This patch adds a const qualification to make explicit the fact that this is safe. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	kdb: Provide forward search at more prompt	Daniel Thompson	3	-5/+26
	Currently kdb allows the output of comamnds to be filtered using the \| grep feature. This is useful but does not permit the output emitted shortly after a string match to be examined without wading through the entire unfiltered output of the command. Such a feature is particularly useful to navigate function traces because these traces often have a useful trigger string before the point of interest. This patch reuses the existing filtering logic to introduce a simple forward search to kdb that can be triggered from the more prompt. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	kdb: Fix a prompt management bug when using \| grep	Daniel Thompson	1	-2/+2
	Currently when the "\| grep" feature is used to filter the output of a command then the prompt is not displayed for the subsequent command. Likewise any characters typed by the user are also not echoed to the display. This rather disconcerting problem eventually corrects itself when the user presses Enter and the kdb_grepping_flag is cleared as kdb_parse() tries to make sense of whatever they typed. This patch resolves the problem by moving the clearing of this flag from the middle of command processing to the beginning. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	kdb: Remove stack dump when entering kgdb due to NMI	Daniel Thompson	1	-1/+0
	Issuing a stack dump feels ergonomically wrong when entering due to NMI. Entering due to NMI is normally a reaction to a user request, either the NMI button on a server or a "magic knock" on a UART. Therefore the backtrace behaviour on entry due to NMI should be like SysRq-g (no stack dump) rather than like oops. Note also that the stack dump does not offer any information that cannot be trivial retrieved using the 'bt' command. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	kdb: Avoid printing KERN_ levels to consoles	Daniel Thompson	3	-11/+21
	Currently when kdb traps printk messages then the raw log level prefix (consisting of '\001' followed by a numeral) does not get stripped off before the message is issued to the various I/O handlers supported by kdb. This causes annoying visual noise as well as causing problems grepping for ^. It is also a change of behaviour compared to normal usage of printk() usage. For example <SysRq>-h ends up with different output to that of kdb's "sr h". This patch addresses the problem by stripping log levels from messages before they are issued to the I/O handlers. printk() which can also act as an i/o handler in some cases is special cased; if the caller provided a log level then the prefix will be preserved when sent to printk(). The addition of non-printable characters to the output of kdb commands is a regression, albeit and extremely elderly one, introduced by commit 04d2c8c83d0e ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"). Note also that this patch does not restore the original behaviour from v3.5. Instead it makes printk() from within a kdb command display the message without any prefix (i.e. like printk() normally does). Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: stable@vger.kernel.org Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	kdb: Fix off by one error in kdb_cpu()	Jason Wessel	2	-2/+2
	There was a follow on replacement patch against the prior "kgdb: Timeout if secondary CPUs ignore the roundup". See: https://lkml.org/lkml/2015/1/7/442 This patch is the delta vs the patch that was committed upstream: * Fix an off-by-one error in kdb_cpu(). * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we really want a static limit. * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c (kgdb-next contains a patch which introduced pr_fmt() to this file to the tag will now be applied automatically). Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	kdb: fix incorrect counts in KDB summary command output	Jay Lan	1	-1/+1
	The output of KDB 'summary' command should report MemTotal, MemFree and Buffers output in kB. Current codes report in unit of pages. A define of K(x) as is defined in the code, but not used. This patch would apply the define to convert the values to kB. Please include me on Cc on replies. I do not subscribe to linux-kernel. Signed-off-by: Jay Lan <jlan@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2015-02-19	MAINTAINERS: update Ceph and RBD maintainers	Sage Weil	1	-3/+4
	- add Ilya, drop Yehuda as an RBD maintainer - add Zheng as a Ceph maintainer - update Yehuda and Sage's emails Signed-off-by: Sage Weil <sage@redhat.com>
2015-02-19	s390/spinlock: disabled compare-and-delay by default	Martin Schwidefsky	1	-5/+7
	Until we have hard performance data about the effects of CAD in the spinlock loop disable the instruction by default. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-19	i2c: designware-baytrail: another fixup for proper Kconfig dependencies	Wolfram Sang	1	-1/+1
	IOSF_MBI is tristate. Baytrail driver isn't. Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2015-02-19	i2c: fix reference to functionality constants definition	Baruch Siach	1	-1/+1
	Since commit 607ca46e97 ('UAPI: (Scripted) Disintegrate include/linux') the list of functionality constants moved to include/uapi/linux/i2c.h. Update the reference accordingly. Fixes: 607ca46e97 ('UAPI: (Scripted) Disintegrate include/linux') Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2015-02-19	x86/microcode/intel: Handle truncated microcode images more robustly	Quentin Casasnovas	2	-0/+9
	We do not check the input data bounds containing the microcode before copying a struct microcode_intel_header from it. A specially crafted microcode could cause the kernel to read invalid memory and lead to a denial-of-service. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1422964824-22056-3-git-send-email-quentin.casasnovas@oracle.com [ Made error message differ from the next one and flipped comparison. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19	x86/microcode/intel: Guard against stack overflow in the loader	Quentin Casasnovas	1	-1/+1
	mc_saved_tmp is a static array allocated on the stack, we need to make sure mc_saved_count stays within its bounds, otherwise we're overflowing the stack in _save_mc(). A specially crafted microcode header could lead to a kernel crash or potentially kernel execution. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1422964824-22056-1-git-send-email-quentin.casasnovas@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19	libceph: kfree() in put_osd() shouldn't depend on authorizer	Ilya Dryomov	1	-2/+3
	a255651d4cad ("ceph: ensure auth ops are defined before use") made kfree() in put_osd() conditional on the authorizer. A mechanical mistake most likely - fix it. Cc: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19	libceph: fix double __remove_osd() problem	Ilya Dryomov	1	-8/+18
	It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: <osd3 is idle, on the osd lru list> <con reset - osd3> con_fault_finish() osd_reset() <osdmap - osd3 down> ceph_osdc_handle_map() <takes map_sem> kick_requests() <takes request_mutex> reset_changed_osds() __reset_osd() __remove_osd() <releases request_mutex> <releases map_sem> <takes map_sem> <takes request_mutex> __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil <sage@redhat.com> Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular and lingering lists in __remove_osd() Cc: stable@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to WARN for __remove_osd() asserts Cc: stable@vger.kernel.org # 3.9+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19	rbd: convert to blk-mq	Christoph Hellwig	1	-54/+68
	This converts the rbd driver to use the blk-mq infrastructure. Except for switching to a per-request work item this is almost mechanical. This was tested by Alexandre DERUMIER in November, and found to give him 120000 iops, although the only comparism available was an old 3.10 kernel which gave 80000iops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <elder@linaro.org> [idryomov@gmail.com: context, blk_mq_init_queue() EH] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-02-19	x86, mm/ASLR: Fix stack randomization on 64-bit systems	Hector Marco-Gisbert	2	-5/+6
	The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps \| grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19	x86/mm/init: Fix incorrect page size in init_memory_mapping() printks	Dave Hansen	1	-2/+26
	With 32-bit non-PAE kernels, we have 2 page sizes available (at most): 4k and 4M. Enabling PAE replaces that 4M size with a 2M one (which 64-bit systems use too). But, when booting a 32-bit non-PAE kernel, in one of our early-boot printouts, we say: init_memory_mapping: [mem 0x00000000-0x000fffff] [mem 0x00000000-0x000fffff] page 4k init_memory_mapping: [mem 0x37000000-0x373fffff] [mem 0x37000000-0x373fffff] page 2M init_memory_mapping: [mem 0x00100000-0x36ffffff] [mem 0x00100000-0x003fffff] page 4k [mem 0x00400000-0x36ffffff] page 2M init_memory_mapping: [mem 0x37400000-0x377fdfff] [mem 0x37400000-0x377fdfff] page 4k Which is obviously wrong. There is no 2M page available. This is probably because of a badly-named variable: in the map_range code: PG_LEVEL_2M. Instead of renaming all the PG_LEVEL_2M's. This patch just fixes the printout: init_memory_mapping: [mem 0x00000000-0x000fffff] [mem 0x00000000-0x000fffff] page 4k init_memory_mapping: [mem 0x37000000-0x373fffff] [mem 0x37000000-0x373fffff] page 4M init_memory_mapping: [mem 0x00100000-0x36ffffff] [mem 0x00100000-0x003fffff] page 4k [mem 0x00400000-0x36ffffff] page 4M init_memory_mapping: [mem 0x37400000-0x377fdfff] [mem 0x37400000-0x377fdfff] page 4k BRK [0x03206000, 0x03206fff] PGTABLE Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20150210212030.665EC267@viggo.jf.intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19	x86/mm/ASLR: Propagate base load address calculation	Jiri Kosina	7	-17/+63
	Commit: e2b32e678513 ("x86, kaslr: randomize module base load address") makes the base address for module to be unconditionally randomized in case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't present on the commandline. This is not consistent with how choose_kernel_location() decides whether it will randomize kernel load base. Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is explicitly specified on kernel commandline), which makes the state space larger than what module loader is looking at. IOW CONFIG_HIBERNATION && CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied by default in that case, but module loader is not aware of that. Instead of fixing the logic in module.c, this patch takes more generic aproach. It introduces a new bootparam setup data_type SETUP_KASLR and uses that to pass the information whether kaslr has been applied during kernel decompression, and sets a global 'kaslr_enabled' variable accordingly, so that any kernel code (module loading, livepatching, ...) can make decisions based on its value. x86 module loader is converted to make use of this flag. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Kees Cook <keescook@chromium.org> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: https://lkml.kernel.org/r/alpine.LNX.2.00.1502101411280.10719@pobox.suse.cz [ Always dump correct kaslr status when panicking ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19	ceph: return error for traceless reply race	Yan, Zheng	1	-6/+9
	When we receives traceless reply for request that created new inode, we re-send a lookup request to MDS get information of the newly created inode. (VFS expects FS' callback return an inode in create case) This breaks one request into two requests. Other client may modify or move to the new inode in the middle. When the race happens, ceph_handle_notrace_create() unconditionally links the dentry for 'create' operation to the inode returned by lookup. This may confuse VFS when the inode is a directory (VFS does not allow multiple linkages for directory inode). This patch makes ceph_handle_notrace_create() when it detect a race. This event should be rare and it happens only when we talk to old MDS. Recent MDS does not send traceless reply for request that creates new inode. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: fix dentry leaks	Yan, Zheng	2	-3/+6
	Signed-off-by: Yan, Zheng <zyan@redhat.com>