linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-05-16	ipmi: Fix the I2C address extraction from SPMI tables	Corey Minyard	1	-1/+1
	Unlike everywhere else in the IPMI specification, the I2C address specified in the SPMI table is not shifted to the left one bit with the LSB zero. Instead it is not shifted with the MSB zero. Reported-by: Sanjeev <singhsan@codeaurora.org> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2016-05-16	IPMI: reserve memio regions separately	Corey Minyard	1	-13/+27
	Commit d61a3ead2680 ("[PATCH] IPMI: reserve I/O ports separately") changed the way I/O ports were reserved and includes this comment in log: Some BIOSes reserve disjoint I/O regions in their ACPI tables for the IPMI controller. This causes problems when trying to register the entire I/O region. Therefore we must register each I/O port separately. There is a similar problem with memio regions on an arm64 platform (AMD Seattle). Where I see: ipmi message handler version 39.2 ipmi_si AMDI0300:00: probing via device tree ipmi_si AMDI0300:00: ipmi_si: probing via ACPI ipmi_si AMDI0300:00: [mem 0xe0010000] regsize 1 spacing 4 irq 23 ipmi_si: Adding ACPI-specified kcs state machine IPMI System Interface driver. ipmi_si: Trying ACPI-specified kcs state machine at mem \ address 0xe0010000, slave address 0x0, irq 23 ipmi_si: Could not set up I/O space The problem is that the ACPI core registers disjoint regions for the platform device: e0010000-e0010000 : AMDI0300:00 e0010004-e0010004 : AMDI0300:00 and the ipmi_si driver tries to register one region e0010000-e0010004. Based on a patch from Mark Salter <msalter@redhat.com>, who also wrote all the above text. Signed-off-by: Corey Minyard <cminyard@mvista.com> Tested-by: Mark Salter <msalter@redhat.com>
2016-05-16	ipmi: Fix some minor coding style issues	Corey Minyard	1	-13/+12
	Signed-off-by: Corey Minyard <cminyard@mvista.com>
2016-05-16	namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS	George Spelvin	1	-7/+26
	The hash mixing between adding the next 64 bits of name was just a bit weak. Replaced with a still very fast but slightly more effective mixing function. Signed-off-by: George Spelvin <linux@horizon.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-16	blk-mq: fix undefined behaviour in order_to_size()	Bartlomiej Zolnierkiewicz	1	-1/+1
	When this_order variable in blk_mq_init_rq_map() becomes zero the code incorrectly decrements the variable and passes the result to order_to_size() helper causing undefined behaviour: UBSAN: Undefined behaviour in block/blk-mq.c:1459:27 shift exponent 4294967295 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22 Fix the code by checking this_order variable for not having the zero value first. Reported-by: Meelis Roos <mroos@linux.ee> Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism") Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-16	locking/rwsem: Fix comment on register clobbering	Borislav Petkov	1	-2/+4
	Document explicitly that %edx can get clobbered on the slow path, on 32-bit kernels. Something I learned the hard way. :-\ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-next@vger.kernel.org Link: http://lkml.kernel.org/r/20160516093428.GA26108@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-16	mmc: mmc: Fix partition switch timeout for some eMMCs	Adrian Hunter	1	-0/+7
	Some eMMCs set the partition switch timeout too low. Now typically eMMCs are considered a critical component (e.g. because they store the root file system) and consequently are expected to be reliable. Thus we can neglect the use case where eMMCs can't switch reliably and we might want a lower timeout to facilitate speedy recovery. Although we could employ a quirk for the cards that are affected (if we could identify them all), as described above, there is little benefit to having a low timeout, so instead simply set a minimum timeout. The minimum is set to 300ms somewhat arbitrarily - the examples that have been seen had a timeout of 10ms but were sometimes taking 60-70ms. Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-16	mmc: sh_mobile_sdhi: enable SDIO IRQs for RCar Gen3	Wolfram Sang	1	-1/+1
	Tested on a Salvator-X board with a Spectec SDW-823 WLAN card. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-16	mmc: sdio: fall back to SDIO 1.0 for broken 1.1 cards	Wolfram Sang	1	-1/+6
	I have two SDIO WLAN cards which specify being SDIO Rev. 1.1 cards but their FUNCE tuple reports the smaller size of a Rev 1.0 card. So, enforce 1.0 on these cards to avoid reading the not present registers. They are not really used anyhow. My cards initialize properly after this patch. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-16	mmc: sdhci-st: correct name of sd-uhs-sdr50 property	Simon Horman	1	-2/+2
	Correct what appears to be a typo in the name of the sd-uhs-sdr50. Also fix mixed tab/space indentation. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-16	MAINTAINERS: update entry for TMIO MMC driver	Wolfram Sang	1	-5/+4
	I have some more additions planned for this driver, so I'd like to get notified of other changes and coordinate them. Drop Ian as maintainer because he hasn't been involved in development for a while. Thanks for all the initial work, of course! Also, reflect the recent changes to the include file layout. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Ian Molton <ian@mnementh.co.uk> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-16	mmc: block: improve logging of handling emmc timeouts	Ken Sumrall	1	-2/+8
	Add some logging to make it clear just how the emmc timeout was handled. Signed-off-by: Ken Sumrall <ksumrall@android.com> [AmitP: cherry-picked this Android patch from aosp common kernel android-4.4] Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-16	mmc: sdhci: removed unneeded function wrappers	Dong Aisheng	1	-45/+15
	After commit d6463f170cf0 ("mmc: sdhci: Remove redundant runtime PM calls"), some of original sdhci_do_xx() function wrappers becomes meaningless, so remove them. Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-15	Linux 4.6	Linus Torvalds	1	-1/+1

2016-05-15	locking/rwsem: Fix down_write_killable()	Peter Zijlstra	1	-6/+15
	The new signal_pending exit path in __rwsem_down_write_failed_common() was fingered as breaking his kernel by Tetsuo Handa. Upon inspection it was found that there are two things wrong with it; - it forgets to remove WAITING_BIAS if it leaves the list empty, or - it forgets to wake further waiters that were blocked on the now removed waiter. Especially the first issue causes new lock attempts to block and stall indefinitely, as the code assumes that pending waiters mean there is an owner that will wake when it releases the lock. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Waiman Long <Waiman.Long@hpe.com> Link: http://lkml.kernel.org/r/20160512115745.GP3192@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-14	arm64: bpf: jit JMP_JSET_{X,K}	Zi Shen Lim	1	-0/+1
	Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler") had the relevant code paths, but due to an oversight always fail jiting. As a result, we had been falling back to BPF interpreter whenever a BPF program has JMP_JSET_{X,K} instructions. With this fix, we confirm that the corresponding tests in lib/test_bpf continue to pass, and also jited. ... [ 2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS [ 2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS [ 2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS ... [ 3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS [ 3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS [ 3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS [ 3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS ... Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14	net/route: enforce hoplimit max value	Paolo Abeni	2	-0/+4
	Currently, when creating or updating a route, no check is performed in both ipv4 and ipv6 code to the hoplimit value. The caller can i.e. set hoplimit to 256, and when such route will be used, packets will be sent with hoplimit/ttl equal to 0. This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4 ipv6 route code, substituting any value greater than 255 with 255. This is consistent with what is currently done for ADVMSS and MTU in the ipv4 code. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14	nf_conntrack: avoid kernel pointer value leak in slab name	Linus Torvalds	1	-1/+3
	The slab name ends up being visible in the directory structure under /sys, and even if you don't have access rights to the file you can see the filenames. Just use a 64-bit counter instead of the pointer to the 'net' structure to generate a unique name. This code will go away in 4.7 when the conntrack code moves to a single kmemcache, but this is the backportable simple solution to avoiding leaking kernel pointers to user space. Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix register offset	Iyappan Subramanian	2	-4/+4
	This patch fixes SG_RX_DV_GATE_REG_0_ADDR register offset and ring state field lengths. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix statistics counters race condition	Iyappan Subramanian	4	-19/+53
	This patch fixes the race condition on updating the statistics counters by moving the counters to the ring structure. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix ununiform latency across queues	Iyappan Subramanian	2	-11/+27
	This patch addresses ununiform latency across queues by adding more queues to match with, upto number of CPU cores. Also, number of interrupts are increased and the channel numbers are reordered. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix sharing of irqs	Iyappan Subramanian	1	-2/+2
	Since hardware doesn't allow sharing of interrupts, this patch fixes the same by removing IRQF_SHARED flag. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix IPv4 forward crash	Iyappan Subramanian	2	-5/+8
	This patch fixes the crash observed during IPv4 forward test by setting the drop field in the dbptr. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	ACPI / video: mark acpi_video_get_levels() inline	Arnd Bergmann	1	-1/+1
	A recent patch added a stub function for acpi_video_get_levels when CONFIG_ACPI_VIDEO is disabled. However, this is marked as 'static' and causes a warning about an unused function whereever the header gets included: In file included from ../drivers/gpu/drm/radeon/radeon_acpi.c:28:0: include/acpi/video.h:74:12: error: 'acpi_video_get_levels' defined but not used [-Werror=unused-function] This makes the declaration 'static inline', which gets rid of the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 059500940def (ACPI/video: export acpi_video_get_levels) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-13	ring-buffer: Prevent overflow of size in ring_buffer_resize()	Steven Rostedt (Red Hat)	1	-5/+4
	If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE then the DIV_ROUND_UP() will return zero. Here's the details: # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb tracing_entries_write() processes this and converts kb to bytes. 18014398509481980 << 10 = 18446744073709547520 and this is passed to ring_buffer_resize() as unsigned long size. size = DIV_ROUND_UP(size, BUF_PAGE_SIZE); Where DIV_ROUND_UP(a, b) is (a + b - 1)/b BUF_PAGE_SIZE is 4080 and here 18446744073709547520 + 4080 - 1 = 18446744073709551599 where 18446744073709551599 is still smaller than 2^64 2^64 - 18446744073709551599 = 17 But now 18446744073709551599 / 4080 = 4521260802379792 and size = size * 4080 = 18446744073709551360 This is checked to make sure its still greater than 2 * 4080, which it is. Then we convert to the number of buffer pages needed. nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE) but this time size is 18446744073709551360 and 2^64 - (18446744073709551360 + 4080 - 1) = -3823 Thus it overflows and the resulting number is less than 4080, which makes 3823 / 4080 = 0 an nr_pages is set to this. As we already checked against the minimum that nr_pages may be, this causes the logic to fail as well, and we crash the kernel. There's no reason to have the two DIV_ROUND_UP() (that's just result of historical code changes), clean up the code and fix this bug. Cc: stable@vger.kernel.org # 3.5+ Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-13	ring-buffer: Use long for nr_pages to avoid overflow failures	Steven Rostedt (Red Hat)	1	-12/+14
	The size variable to change the ring buffer in ftrace is a long. The nr_pages used to update the ring buffer based on the size is int. On 64 bit machines this can cause an overflow problem. For example, the following will cause the ring buffer to crash: # cd /sys/kernel/debug/tracing # echo 10 > buffer_size_kb # echo 8556384240 > buffer_size_kb Then you get the warning of: WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260 Which is: RB_WARN_ON(cpu_buffer, nr_removed); Note each ring buffer page holds 4080 bytes. This is because: 1) 10 causes the ring buffer to have 3 pages. (10kb requires 3 * 4080 pages to hold) 2) (2^31 / 2^10 + 1) * 4080 = 8556384240 The value written into buffer_size_kb is shifted by 10 and then passed to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE which is 4080. 8761737461760 / 4080 = 2147484672 4) nr_pages is subtracted from the current nr_pages (3) and we get: 2147484669. This value is saved in a signed integer nr_pages_to_update 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int turns into the value of -2147482627 6) As the value is a negative number, in update_pages_handler() it is negated and passed to rb_remove_pages() and 2147482627 pages will be removed, which is much larger than 3 and it causes the warning because not all the pages asked to be removed were removed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001 Cc: stable@vger.kernel.org # 2.6.28+ Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer") Reported-by: Hao Qin <QEver.cn@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-13	ARM: Hide finish_arch_post_lock_switch() from modules	Steven Rostedt	1	-0/+2
	The introduction of switch_mm_irqs_off() brought back an old bug regarding the use of preempt_enable_no_resched: As part of: 62b94a08da1b ("sched/preempt: Take away preempt_enable_no_resched() from modules") the definition of preempt_enable_no_resched() is only available in built-in code, not in loadable modules, so we can't generally use it from header files. However, the ARM version of finish_arch_post_lock_switch() calls preempt_enable_no_resched() and is defined as a static inline function in asm/mmu_context.h. This in turn means we cannot include asm/mmu_context.h from modules. With today's tip tree, asm/mmu_context.h gets included from linux/mmu_context.h, which is normally the exact pattern one would expect, but unfortunately, linux/mmu_context.h can be included from the vhost driver that is a loadable module, now causing this compile time error with modular configs: In file included from ../include/linux/mmu_context.h:4:0, from ../drivers/vhost/vhost.c:18: ../arch/arm/include/asm/mmu_context.h: In function 'finish_arch_post_lock_switch': ../arch/arm/include/asm/mmu_context.h:88:3: error: implicit declaration of function 'preempt_enable_no_resched' [-Werror=implicit-function-declaration] preempt_enable_no_resched(); Andy already tried to fix the bug by including linux/preempt.h from asm/mmu_context.h, but that didn't help. Arnd suggested reordering the header files, which wasn't popular, so let's use this workaround instead: The finish_arch_post_lock_switch() definition is now also hidden inside of #ifdef MODULE, so we don't see anything referencing preempt_enable_no_resched() from a header file. I've built a few hundred randconfig kernels with this, and did not see any new problems. Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux <linux@armlinux.org.uk> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-arm-kernel@lists.infradead.org Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") Link: http://lkml.kernel.org/r/1463146234-161304-1-git-send-email-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-13	dm thin: unroll issue_discard() to create longer discard bio chains	Joe Thornber	1	-37/+71
	There is little benefit to doing this but it does structure DM thinp's code to more cleanly use the __blkdev_issue_discard() interface -- particularly in passdown_double_checking_shared_status(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-13	dm thin: use __blkdev_issue_discard for async discard support	Mike Snitzer	1	-54/+16
	With commit 38f25255330 ("block: add __blkdev_issue_discard") DM thinp no longer needs to carry its own async discard method. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-05-13	dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining()	Mike Snitzer	1	-12/+1
	DM thinp's use of bio_inc_remaining() is critical to ensure the original parent discard bio isn't completed before sub-discards have. DM thinp needs this due to the extra quiescing that occurs, via multiple DM thinp mappings, while processing large discards. As such DM thinp must build the async discard bio chain after some delay -- so bio_inc_remaining() is used to enable DM thinp to take a reference on the original parent discard bio for each mapping. This allows the immediate use of bio_endio() on that discard bio; but with the understanding that the actual completion won't occur until each of the sub-discards' per-mapping references are dropped. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
2016-05-13	dm raid: make sure no feature flags are set in metadata	Heinz Mauelshagen	1	-1/+6
	Given we don't yet support any feature flags in the dm-raid ondisk metadata (see: 'features' member of 'struct dm_raid_superblock'), add a check to ensure no flags are actually set, if any features are set reject the activation of the RAID mapping. This is to prevent possible data corruption in case of a kernel downgrade when there'll potentially be feature flags set by a future dm-raid target. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-13	x86/arch_prctl/64: Restore accidentally removed put_cpu() in ARCH_SET_GS	Mateusz Guzik	1	-0/+1
	This fixes an oversight in: 731e33e39a5b95 ("Remove FSBASE/GSBASE < 4G optimization") Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1462913803-29634-1-git-send-email-mguzik@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-13	regulator: rk808: Migrate to regulator core's simplified DT parsing code	Wadim Egorov	1	-171/+79
	A common simplified DT parsing code for regulators was introduced in commit a0c7b164ad11 ("regulator: of: Provide simplified DT parsing method") While at it also added RK8XX_DESC and RK8XX_DESC_SWITCH macros for the regulator_desc struct initialization. This just makes the driver more compact. Signed-off-by: Wadim Egorov <w.egorov@phytec.de> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-05-13	crypto: qat - change the adf_ctl_stop_devices to void	Tadeusz Struk	1	-6/+3
	Change the adf_ctl_stop_devices to a void function. Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-13	xen-netback: fix extra_info handling in xenvif_tx_err()	Paul Durrant	1	-0/+1
	Patch 562abd39 "xen-netback: support multiple extra info fragments passed from frontend" contained a mistake which can result in an in- correct number of responses being generated when handling errors encountered when processing packets containing extra info fragments. This patch fixes the problem. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reported-by: Jan Beulich <JBeulich@suse.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12	ext4: switch to ->iterate_shared()	Al Viro	2	-2/+9
	Note that we need relax_dir() equivalent for directories locked shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12	hfs: switch to ->iterate_shared()	Al Viro	4	-3/+15
	exact parallel of hfsplus analogue Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12	hfsplus: switch to ->iterate_shared()	Al Viro	5	-3/+15
	We need to protect the list of hfsplus_readdir_data against parallel insertions (in readdir) and removals (in release). Add a spinlock for that. Note that it has nothing to do with protection of hfsplus_readdir_data->key - we have an exclusion between hfsplus_readdir() and hfsplus_delete_cat() on directory lock and between several hfsplus_readdir() for the same struct file on ->f_pos_lock. The spinlock is strictly for list changes. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12	hostfs: switch to ->iterate_shared()	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12	hpfs: switch to ->iterate_shared()	Al Viro	1	-1/+1
	NOTE: the only reason we can do that without ->i_rdir_offs races is that hpfs_lock() serializes everything in there anyway. It's not that hard to get rid of, but not as part of this series... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12	hpfs: handle allocation failures in hpfs_add_pos()	Al Viro	3	-6/+14
	pr_err() is nice, but we'd better propagate the error to caller and not proceed to violate the invariants (namely, "every file with f_pos tied to directory block should have its address visible in per-inode array"). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12	mm: thp: calculate the mapcount correctly for THP pages during WP faults	Andrea Arcangeli	5	-26/+95
	This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false positive copy-on-writes. total_mapcount() isn't the right calculation needed in reuse_swap_page(), so this introduces a page_trans_huge_mapcount() that is effectively the full accurate return value for page_mapcount() if dealing with Transparent Hugepages, however we only use the page_trans_huge_mapcount() during COW faults where it strictly needed, due to its higher runtime cost. This also provide at practical zero cost the total_mapcount information which is needed to know if we can still relocate the page anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we can reuse the page no matter if it's a pte or a pmd_trans_huge triggering the fault, but we can only relocate the page anon_vma to the local vma->anon_vma if we're sure it's only this "vma" mapping the whole THP physical range. Kirill A. Shutemov discovered the problem with moving the page anon_vma to the local vma->anon_vma in a previous version of this patch and another problem in the way page_move_anon_rmap() was called. Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a previous version, because reuse_swap_page must be a macro to call page_trans_huge_mapcount from swap.h, so this uses a macro again instead of an inline function. With this change at least it's a less dangerous usage than it was before, because "page" is used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Dean Luick noticed an uninitialized variable that could result in a rmap inefficiency for the non-THP case in a previous version. Mike Marciniszyn said: : Our RDMA tests are seeing an issue with memory locking that bisects to : commit 61f5d698cc97 ("mm: re-enable THP") : : The test program registers two rather large MRs (512M) and RDMA : writes data to a passive peer using the first and RDMA reads it back : into the second MR and compares that data. The sizes are chosen randomly : between 0 and 1024 bytes. : : The test will get through a few (<= 4 iterations) and then gets a : compare error. : : Tracing indicates the kernel logical addresses associated with the individual : pages at registration ARE correct , the data in the "RDMA read response only" : packets ARE correct. : : The "corruption" occurs when the packet crosse two pages that are not physically : contiguous. The second page reads back as zero in the program. : : It looks like the user VA at the point of the compare error no longer points to : the same physical address as was registered. : : This patch totally resolves the issue! Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reviewed-by: Dean Luick <dean.luick@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Josh Collier <josh.d.collier@intel.com> Cc: Marc Haber <mh+linux-kernel@zugschlus.de> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	ksm: fix conflict between mmput and scan_get_next_rmap_item	Zhou Chengming	1	-5/+10
	A concurrency issue about KSM in the function scan_get_next_rmap_item. task A (ksmd): \|task B (the mm's task): \| mm = slot->mm; \| down_read(&mm->mmap_sem); \| \| ... \| \| spin_lock(&ksm_mmlist_lock); \| \| ksm_scan.mm_slot go to the next slot; \| \| spin_unlock(&ksm_mmlist_lock); \| \|mmput() -> \| ksm_exit(): \| \|spin_lock(&ksm_mmlist_lock); \|if (mm_slot && ksm_scan.mm_slot != mm_slot) { \| if (!mm_slot->rmap_list) { \| easy_to_free = 1; \| ... \| \|if (easy_to_free) { \| mmdrop(mm); \| ... \| \|So this mm_struct may be freed in the mmput(). \| up_read(&mm->mmap_sem); \| As we can see above, the ksmd thread may access a mm_struct that already been freed to the kmem_cache. Suppose a fork will get this mm_struct from the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will cause mmap_sem.count to become -1. As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has the same SMP race condition, so fix it too. My prev fix in function scan_get_next_rmap_item will introduce a different SMP race condition, so just invert the up_read/spin_unlock order as Andrea Arcangeli said. Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Geliang Tang <geliangtang@163.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	ocfs2: fix posix_acl_create deadlock	Junxiao Bi	6	-48/+77
	Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") refactored code to use posix_acl_create. The problem with this function is that it is not mindful of the cluster wide inode lock making it unsuitable for use with ocfs2 inode creation with ACLs. For example, when used in ocfs2_mknod, this function can cause deadlock as follows. The parent dir inode lock is taken when calling posix_acl_create -> get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can cause deadlock if there is a blocked remote lock request waiting for the lock to be downconverted. And same deadlock happened in ocfs2_reflink. This fix is to revert back using ocfs2_init_acl. Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang	Junxiao Bi	3	-2/+27
	Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") introduced this issue. ocfs2_setattr called by chmod command holds cluster wide inode lock when calling posix_acl_chmod. This latter function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl. These two are also called directly from vfs layer for getfacl/setfacl commands and therefore acquire the cluster wide inode lock. If a remote conversion request comes after the first inode lock in ocfs2_setattr, OCFS2_LOCK_BLOCKED will be set. And this will cause the second call to inode lock from the ocfs2_iop_get_acl() to block indefinetly. The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which does not call back into the filesystem. Therefore, we restore ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that instead. Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	gfs2: switch to ->iterate_shared()	Al Viro	1	-2/+2
	protected by glock and already used without locking the directory by gfs2_get_name() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12	net: mvneta: bm: fix dependencies again	Arnd Bergmann	1	-1/+1
	I tried to fix this before, but my previous fix was incomplete and we can still get the same link error in randconfig builds because of the way that Kconfig treats the default y if MVNETA=y && MVNETA_BM_ENABLE line that does not actually trigger when MVNETA_BM_ENABLE=m, unlike I intended. Changing the line to use MVNETA_BM_ENABLE!=n however has the desired effect and hopefully makes all configurations work as expected. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 019ded3aa7c9 ("net: mvneta: bm: clarify dependencies") Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12	perf stat: Fallback to user only counters when perf_event_paranoid > 1	Arnaldo Carvalho de Melo	1	-1/+6
	After 0161028b7c8a ("perf/core: Change the default paranoia level to 2") 'perf stat' fails for users without CAP_SYS_ADMIN, so just use 'perf_evsel__fallback()' to have the same behaviour as 'perf record', i.e. set perf_event_attr.exclude_kernel to 1. Now: [acme@jouet linux]$ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.352536 task-clock:u (msec) # 0.423 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 49 page-faults:u # 0.139 M/sec 309,407 cycles:u # 0.878 GHz 243,791 instructions:u # 0.79 insn per cycle 49,622 branches:u # 140.757 M/sec 3,884 branch-misses:u # 7.83% of all branches 0.000834174 seconds time elapsed [acme@jouet linux]$ Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12	perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()	Arnaldo Carvalho de Melo	1	-0/+18
	Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1] we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel to 1. Before: [acme@jouet linux]$ perf record usleep 1 Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN [acme@jouet linux]$ After: [acme@jouet linux]$ perf record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ perf evlist cycles:u [acme@jouet linux]$ perf evlist -v cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [acme@jouet linux]$ And if the user turns on verbose mode, an explanation will appear: [acme@jouet linux]$ perf record -v usleep 1 Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples mmap size 528384B [ perf record: Woken up 1 times to write data ] Looking at the vmlinux_path (8 entries long) Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ [1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2") Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12	drm/amdgpu: fix DP mode validation	Alex Deucher	1	-2/+2
	Switch the order of the loops to walk the rates on the top so we exhaust all DP 1.1 rate/lane combinations before trying DP 1.2 rate/lane combos. This avoids selecting rates that are supported by the monitor, but not the connector leading to valid modes getting rejected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=95206 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org