linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-05-20	rtc: ds3232: fix call trace when rtc->ops_lock is used as NULL	Qianyu Gong	1	-3/+6
	The rtc->ops_lock would be accessed in ds3232_irq() without being initialized as rtc_device_register() is called too late. So move devm_rtc_device_register() just before registering irq handler to initialize rtc->ops_lock earlier. Signed-off-by: Gong Qianyu <Qianyu.Gong@nxp.com> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: snvs: return error in case enable_irq_wake fails	Stefan Agner	1	-1/+1
	If enable_irq_wake fails, we should return that error code so that entering suspend fails. Otherwise we will get a WARNING along with the hint of a unbalanced wake disable: Unbalanced IRQ 37 wake disable Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: zynqmp: Update seconds time programming logic	Anurag Kumar Vulisha	1	-6/+39
	We program RTC time using SET_TIME_WRITE register and read the RTC current time using CURRENT_TIME register. When we set the time by writing into SET_TIME_WRITE Register and immediately try to read the rtc time from CURRENT_TIME register, the previous old value is returned instead of the new loaded time. This is because RTC takes nearly 1 sec to update the new loaded value into the CURRENT_TIME register. This behaviour is expected in our RTC IP. This patch updates the driver to read the current time from SET_TIME_WRITE register instead of CURRENT_TIME when rtc time is requested within an 1sec period after setting the RTC time. Doing so will ensure the correct time is given to the user. Since there is a delay of 1sec in updating the CURRENT_TIME we are loading set time +1sec while programming the SET_TIME_WRITE register, doing this will give correct time without any delay when read from CURRENT_TIME. Signed-off-by: Anurag Kumar Vulisha <anuragku@xilinx.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: sa1100: DT spelling s/interrupt-name/interrupt-names/	Geert Uytterhoeven	1	-1/+1
	Fix typo in interrupt-names Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: mc13xxx: remove UIE signaling	Wolfram Sang	1	-19/+0
	The RTC core handles it since 6610e08 (RTC: Rework RTC code to use timerqueue for events). So far, only the callbacks to the RTC core have been removed, but not the handlers. Do this now. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: mxc: remove UIE signaling	Wolfram Sang	1	-3/+0
	The RTC core handles it since 6610e08 (RTC: Rework RTC code to use timerqueue for events). Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: ds1307: Remove CLK_IS_ROOT	Stephen Boyd	1	-2/+0
	This flag is a no-op now (see commit 47b0eeb3dc8a "clk: Deprecate CLK_IS_ROOT", 2016-02-02) so remove it. Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Michael Tatarinov <kukabu@gmail.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: hym8563: Remove CLK_IS_ROOT	Stephen Boyd	1	-1/+1
	This flag is a no-op now (see commit 47b0eeb3dc8a "clk: Deprecate CLK_IS_ROOT", 2016-02-02) so remove it. Cc: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: pcf8563: Remove CLK_IS_ROOT	Stephen Boyd	1	-1/+1
	This flag is a no-op now (see commit 47b0eeb3dc8a "clk: Deprecate CLK_IS_ROOT", 2016-02-02) so remove it. Cc: Heiko Schocher <hs@denx.de> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Heiko Schocher <hs@denx.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: zynqmp: Write Calibration value before setting time	Anurag Kumar Vulisha	1	-7/+14
	It is suggested to program CALIB_WRITE register with the calibration value before updating the SET_TIME_WRITE register, doing so will clear the Tick Counter and force the next second to be signaled exactly in 1 second. Signed-off-by: Anurag Kumar Vulisha <anuragku@xilinx.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: zynqmp: Enable RTC switching to battery power when VCC_PSAUX is N/A	Anurag Kumar Vulisha	1	-0/+8
	In order to conserve battery energy, during the PS operation, it is expected that the supply for the battery-powered domain to be switched from the battery (VCC_PSBATT) to (VCC_PSAUX) and automatically be switched back to battery when VCC_PSAUX voltage drops below a limit, doing so prevents the logic within the battery-powered domain from functioning incorrectly. Signed-off-by: Anurag Kumar Vulisha <anuragku@xilinx.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: ds1685: actually spin forever in poweroff path	Josh Poimboeuf	1	-0/+1
	objtool reports the following warning: drivers/rtc/rtc-ds1685.o: warning: objtool: ds1685_rtc_poweroff() falls through to next function ds1685_rtc_work_queue() Similar to commit 361c6ed6b153 ("rtc: ds1685: actually spin forever in poweroff error path"), there's another unreachable() annotation which is actually reachable, which we missed the first time. Actually spin forever to be consistent with the comment and to make the unreachable() annotation guaranteed to be unreachable. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: da9053: fix access ordering error during RTC interrupt at system power on	Steve Twiss	1	-5/+8
	This fix alters the ordering of the IRQ and device registrations in the RTC driver probe function. This change will apply to the RTC driver that supports both DA9052 and DA9053 PMICs. A problem could occur with the existing RTC driver if: A system is started from a cold boot using the PMIC RTC IRQ to initiate a power on operation. For instance, if an RTC alarm is used to start a platform from power off. The existing driver IRQ is requested before the device has been properly registered. i.e. ret = da9052_request_irq() comes before rtc->rtc = devm_rtc_device_register(); In this case, an interrupt exists before the device has been registered and the IRQ handler can be called immediately: this can happen be before the memory for rtc->rtc has been allocated. The IRQ handler da9052_rtc_irq() contains the function call: rtc_update_irq(rtc->rtc, 1, RTC_IRQF \| RTC_AF); which in turn tries to access the unavailable rtc->rtc. The fix is to reorder the functions inside the RTC probe. The IRQ is requested after the RTC device resource has been registered so that da9052_request_irq() is the last thing to happen. Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: ds1307: ensure that any pending alarm is cleared before a new alarm is enabled	Nicolas Boullis	1	-5/+8
	If a previously-set alarm was disabled and then triggered, it may still be pending when a new alarm is configured. Then, if the alarm is enabled before the pending alarm is cleared, then an interrupt is immediately raised. Unfortunately, when the alarm is cleared and enabled during the same I²C block write, the chip (at least the DS1339 I have) considers that the alarm is enabled before it is cleared, and raises an interrupt. This patch ensures that the pending alarm is cleared before the alarm is enabled. Signed-off-by: Nicolas Boullis <nboullis@debian.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: ds1307: fix ds1307_native_smbus_read_block_data function	Nicolas Boullis	1	-2/+6
	The i2c_smbus_read_i2c_block_data function returns 0 on success, not the number of bytes written. Hence, when there are 32 bytes or less to send, the ds1307_native_smbus_write_block_data function returns 0 on success, while it returns the number of bytes when there are more than 32. The ds1307_write_block_data always returns the number of bytes on success. Signed-off-by: Nicolas Boullis <nboullis@debian.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: ds1302: fix write value for day of week register	Akinobu Mita	1	-1/+1
	The valid range of day of week register for DS1302 is 1 to 7. But the set_time callback for rtc-ds1302 attempts to write the value of tm->tm_wday which is in the range 0 to 6. While the get_time callback correctly decodes the register. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Sergey Yanovich <ynvich@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: ds1302: fix error check in set_time	Akinobu Mita	1	-1/+1
	The set_time callback for rtc-ds1302 doesn't write clock registers because the error check for the return value from spi_write_then_read() is not correct. spi_write_then_read() which returns zero on success. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Sergey Yanovich <ynvich@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: handle oscillator failure bit	Mylène Josserand	1	-2/+23
	Handle the Oscillator Failure (OF) bit on each read of date-time. If the OF is set, an error is returned (-EINVAL) instead of the date-time. The OF bit is cleared each time the date is set. Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: add wakealarm functionality	Mylène Josserand	1	-0/+2
	To enable the wakealarm, the device must be able to wakeup. This is done by setting the device wakeup capability to true with 'device_init_wakeup' function. Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: add alarm functionality	Mylène Josserand	1	-13/+167
	Previous 'commit c3b79770e51a ("Expire alarms after the time is set")' and 'commit 48e9766726eb ("remove disabled alarm functionality")' removed the alarm support because the alarm irq was not functional. Add the alarm IRQ functionality with newer functions than previous code. Tested with 'rtctest' and the alarm is functional. Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: remove warnings and replace obsolete function	Mylène Josserand	1	-11/+15
	Replace the obsolete "simple_strtoul" function to "kstrtoul". Remove some checkpatch's errors, warnings and checks : - alignment with open parenthesis - spaces around '<' and '<<' - blank line after structure - quoted string split across lines Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: add the use of 'BIT' macro	Mylène Josserand	1	-15/+15
	Replace bit shifts by BIT macro. Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: replace i2c functions for smbus ones	Mylène Josserand	1	-90/+44
	The driver used i2c_transfer methods to read and set date/time. The smbus methods should be used. This commit replaces i2c_transfer functions by i2c_smbus_XX_i2c_block_data for reading and setting the datetime. Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: remove proc macro	Mylène Josserand	1	-4/+0
	Remove the CONFIG_RTC_INTF_PROC and CONFIG_RTC_INTF_PROC_MODULE macro which is not necessary anymore. Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: m41t80: update sysfs entries export	Mylène Josserand	1	-24/+32
	The driver used an old sysfs entry export. Update it to use the DEVICE_ATTR_XX macro and remove the unnecessary CONFIG_RTC_INTF_SYSFS macro. Signed-off-by: Mylène Josserand <mylene.josserand@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: remove useless DRV_VERSION	Alexandre Belloni	25	-89/+1
	Many drivers are defining a DRV_VERSION. This is often only used for MODULE_VERSION and sometimes to print an info message at probe time. This is kind of pointless as they are all versionned with the kernel anyway. Also the core will print a message when a new rtc is found. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: at91sam9: remove duplicate assignment of variable mr	Colin Ian King	1	-1/+1
	mr is written twice with the same value, remove one of the redundant assignments to mr. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-20	rtc: ds1302: rewrite using SPI	Sergey Yanovich	3	-197/+212
	DS1302 is an half-duplex SPI device. The driver respects this fact now. Pin configurations should be implemented using SPI subsystem. Signed-off-by: Sergei Ianovich <ynvich@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-05-15	Linux 4.6	Linus Torvalds	1	-1/+1

2016-05-14	arm64: bpf: jit JMP_JSET_{X,K}	Zi Shen Lim	1	-0/+1
	Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler") had the relevant code paths, but due to an oversight always fail jiting. As a result, we had been falling back to BPF interpreter whenever a BPF program has JMP_JSET_{X,K} instructions. With this fix, we confirm that the corresponding tests in lib/test_bpf continue to pass, and also jited. ... [ 2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS [ 2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS [ 2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS ... [ 3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS [ 3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS [ 3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS [ 3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS ... Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14	net/route: enforce hoplimit max value	Paolo Abeni	2	-0/+4
	Currently, when creating or updating a route, no check is performed in both ipv4 and ipv6 code to the hoplimit value. The caller can i.e. set hoplimit to 256, and when such route will be used, packets will be sent with hoplimit/ttl equal to 0. This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4 ipv6 route code, substituting any value greater than 255 with 255. This is consistent with what is currently done for ADVMSS and MTU in the ipv4 code. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14	nf_conntrack: avoid kernel pointer value leak in slab name	Linus Torvalds	1	-1/+3
	The slab name ends up being visible in the directory structure under /sys, and even if you don't have access rights to the file you can see the filenames. Just use a 64-bit counter instead of the pointer to the 'net' structure to generate a unique name. This code will go away in 4.7 when the conntrack code moves to a single kmemcache, but this is the backportable simple solution to avoiding leaking kernel pointers to user space. Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix register offset	Iyappan Subramanian	2	-4/+4
	This patch fixes SG_RX_DV_GATE_REG_0_ADDR register offset and ring state field lengths. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix statistics counters race condition	Iyappan Subramanian	4	-19/+53
	This patch fixes the race condition on updating the statistics counters by moving the counters to the ring structure. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix ununiform latency across queues	Iyappan Subramanian	2	-11/+27
	This patch addresses ununiform latency across queues by adding more queues to match with, upto number of CPU cores. Also, number of interrupts are increased and the channel numbers are reordered. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix sharing of irqs	Iyappan Subramanian	1	-2/+2
	Since hardware doesn't allow sharing of interrupts, this patch fixes the same by removing IRQF_SHARED flag. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	drivers: net: xgene: fix IPv4 forward crash	Iyappan Subramanian	2	-5/+8
	This patch fixes the crash observed during IPv4 forward test by setting the drop field in the dbptr. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13	xen-netback: fix extra_info handling in xenvif_tx_err()	Paul Durrant	1	-0/+1
	Patch 562abd39 "xen-netback: support multiple extra info fragments passed from frontend" contained a mistake which can result in an in- correct number of responses being generated when handling errors encountered when processing packets containing extra info fragments. This patch fixes the problem. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reported-by: Jan Beulich <JBeulich@suse.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12	mm: thp: calculate the mapcount correctly for THP pages during WP faults	Andrea Arcangeli	5	-26/+95
	This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false positive copy-on-writes. total_mapcount() isn't the right calculation needed in reuse_swap_page(), so this introduces a page_trans_huge_mapcount() that is effectively the full accurate return value for page_mapcount() if dealing with Transparent Hugepages, however we only use the page_trans_huge_mapcount() during COW faults where it strictly needed, due to its higher runtime cost. This also provide at practical zero cost the total_mapcount information which is needed to know if we can still relocate the page anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we can reuse the page no matter if it's a pte or a pmd_trans_huge triggering the fault, but we can only relocate the page anon_vma to the local vma->anon_vma if we're sure it's only this "vma" mapping the whole THP physical range. Kirill A. Shutemov discovered the problem with moving the page anon_vma to the local vma->anon_vma in a previous version of this patch and another problem in the way page_move_anon_rmap() was called. Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a previous version, because reuse_swap_page must be a macro to call page_trans_huge_mapcount from swap.h, so this uses a macro again instead of an inline function. With this change at least it's a less dangerous usage than it was before, because "page" is used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Dean Luick noticed an uninitialized variable that could result in a rmap inefficiency for the non-THP case in a previous version. Mike Marciniszyn said: : Our RDMA tests are seeing an issue with memory locking that bisects to : commit 61f5d698cc97 ("mm: re-enable THP") : : The test program registers two rather large MRs (512M) and RDMA : writes data to a passive peer using the first and RDMA reads it back : into the second MR and compares that data. The sizes are chosen randomly : between 0 and 1024 bytes. : : The test will get through a few (<= 4 iterations) and then gets a : compare error. : : Tracing indicates the kernel logical addresses associated with the individual : pages at registration ARE correct , the data in the "RDMA read response only" : packets ARE correct. : : The "corruption" occurs when the packet crosse two pages that are not physically : contiguous. The second page reads back as zero in the program. : : It looks like the user VA at the point of the compare error no longer points to : the same physical address as was registered. : : This patch totally resolves the issue! Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reviewed-by: Dean Luick <dean.luick@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Josh Collier <josh.d.collier@intel.com> Cc: Marc Haber <mh+linux-kernel@zugschlus.de> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	ksm: fix conflict between mmput and scan_get_next_rmap_item	Zhou Chengming	1	-5/+10
	A concurrency issue about KSM in the function scan_get_next_rmap_item. task A (ksmd): \|task B (the mm's task): \| mm = slot->mm; \| down_read(&mm->mmap_sem); \| \| ... \| \| spin_lock(&ksm_mmlist_lock); \| \| ksm_scan.mm_slot go to the next slot; \| \| spin_unlock(&ksm_mmlist_lock); \| \|mmput() -> \| ksm_exit(): \| \|spin_lock(&ksm_mmlist_lock); \|if (mm_slot && ksm_scan.mm_slot != mm_slot) { \| if (!mm_slot->rmap_list) { \| easy_to_free = 1; \| ... \| \|if (easy_to_free) { \| mmdrop(mm); \| ... \| \|So this mm_struct may be freed in the mmput(). \| up_read(&mm->mmap_sem); \| As we can see above, the ksmd thread may access a mm_struct that already been freed to the kmem_cache. Suppose a fork will get this mm_struct from the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will cause mmap_sem.count to become -1. As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has the same SMP race condition, so fix it too. My prev fix in function scan_get_next_rmap_item will introduce a different SMP race condition, so just invert the up_read/spin_unlock order as Andrea Arcangeli said. Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Geliang Tang <geliangtang@163.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	ocfs2: fix posix_acl_create deadlock	Junxiao Bi	6	-48/+77
	Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") refactored code to use posix_acl_create. The problem with this function is that it is not mindful of the cluster wide inode lock making it unsuitable for use with ocfs2 inode creation with ACLs. For example, when used in ocfs2_mknod, this function can cause deadlock as follows. The parent dir inode lock is taken when calling posix_acl_create -> get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can cause deadlock if there is a blocked remote lock request waiting for the lock to be downconverted. And same deadlock happened in ocfs2_reflink. This fix is to revert back using ocfs2_init_acl. Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang	Junxiao Bi	3	-2/+27
	Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") introduced this issue. ocfs2_setattr called by chmod command holds cluster wide inode lock when calling posix_acl_chmod. This latter function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl. These two are also called directly from vfs layer for getfacl/setfacl commands and therefore acquire the cluster wide inode lock. If a remote conversion request comes after the first inode lock in ocfs2_setattr, OCFS2_LOCK_BLOCKED will be set. And this will cause the second call to inode lock from the ocfs2_iop_get_acl() to block indefinetly. The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which does not call back into the filesystem. Therefore, we restore ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that instead. Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12	net: mvneta: bm: fix dependencies again	Arnd Bergmann	1	-1/+1
	I tried to fix this before, but my previous fix was incomplete and we can still get the same link error in randconfig builds because of the way that Kconfig treats the default y if MVNETA=y && MVNETA_BM_ENABLE line that does not actually trigger when MVNETA_BM_ENABLE=m, unlike I intended. Changing the line to use MVNETA_BM_ENABLE!=n however has the desired effect and hopefully makes all configurations work as expected. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 019ded3aa7c9 ("net: mvneta: bm: clarify dependencies") Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12	perf stat: Fallback to user only counters when perf_event_paranoid > 1	Arnaldo Carvalho de Melo	1	-1/+6
	After 0161028b7c8a ("perf/core: Change the default paranoia level to 2") 'perf stat' fails for users without CAP_SYS_ADMIN, so just use 'perf_evsel__fallback()' to have the same behaviour as 'perf record', i.e. set perf_event_attr.exclude_kernel to 1. Now: [acme@jouet linux]$ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.352536 task-clock:u (msec) # 0.423 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 49 page-faults:u # 0.139 M/sec 309,407 cycles:u # 0.878 GHz 243,791 instructions:u # 0.79 insn per cycle 49,622 branches:u # 140.757 M/sec 3,884 branch-misses:u # 7.83% of all branches 0.000834174 seconds time elapsed [acme@jouet linux]$ Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12	perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()	Arnaldo Carvalho de Melo	1	-0/+18
	Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1] we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel to 1. Before: [acme@jouet linux]$ perf record usleep 1 Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN [acme@jouet linux]$ After: [acme@jouet linux]$ perf record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ perf evlist cycles:u [acme@jouet linux]$ perf evlist -v cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [acme@jouet linux]$ And if the user turns on verbose mode, an explanation will appear: [acme@jouet linux]$ perf record -v usleep 1 Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples mmap size 528384B [ perf record: Woken up 1 times to write data ] Looking at the vmlinux_path (8 entries long) Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ [1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2") Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12	drm/amdgpu: fix DP mode validation	Alex Deucher	1	-2/+2
	Switch the order of the loops to walk the rates on the top so we exhaust all DP 1.1 rate/lane combinations before trying DP 1.2 rate/lane combos. This avoids selecting rates that are supported by the monitor, but not the connector leading to valid modes getting rejected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=95206 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-12	drm/radeon: fix DP mode validation	Alex Deucher	1	-2/+2
	Switch the order of the loops to walk the rates on the top so we exhaust all DP 1.1 rate/lane combinations before trying DP 1.2 rate/lane combos. This avoids selecting rates that are supported by the monitor, but not the connector leading to valid modes getting rejected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=95206 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-12	perf evsel: Improve EPERM error handling in open_strerror()	Arnaldo Carvalho de Melo	1	-2/+3
	We were showing a hardcoded default value for the kernel.perf_event_paranoid sysctl, now that it became more paranoid (1 -> 2 [1]), this would need to be updated, instead show the current value: [acme@jouet linux]$ perf record ls Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN [acme@jouet linux]$ [1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2") Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-0gc4rdpg8d025r5not8s8028@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12	workqueue: fix rebind bound workers warning	Wanpeng Li	1	-0/+11
	------------[ cut here ]------------ WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0 Modules linked in: CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31 Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046 Call Trace: dump_stack+0x89/0xd4 __warn+0xfd/0x120 warn_slowpath_null+0x1d/0x20 rebind_workers+0x1c0/0x1d0 workqueue_cpu_up_callback+0xf5/0x1d0 notifier_call_chain+0x64/0x90 ? trace_hardirqs_on_caller+0xf2/0x220 ? notify_prepare+0x80/0x80 __raw_notifier_call_chain+0xe/0x10 __cpu_notify+0x35/0x50 notify_down_prepare+0x5e/0x80 ? notify_prepare+0x80/0x80 cpuhp_invoke_callback+0x73/0x330 ? __schedule+0x33e/0x8a0 cpuhp_down_callbacks+0x51/0xc0 cpuhp_thread_fun+0xc1/0xf0 smpboot_thread_fn+0x159/0x2a0 ? smpboot_create_threads+0x80/0x80 kthread+0xef/0x110 ? wait_for_completion+0xf0/0x120 ? schedule_tail+0x35/0xf0 ret_from_fork+0x22/0x50 ? __init_kthread_worker+0x70/0x70 ---[ end trace eb12ae47d2382d8f ]--- notify_down_prepare: attempt to take down CPU 0 failed This bug can be reproduced by below config w/ nohz_full= all cpus: CONFIG_BOOTPARAM_HOTPLUG_CPU0=y CONFIG_DEBUG_HOTPLUG_CPU0=y CONFIG_NO_HZ_FULL=y As Thomas pointed out: \| If a down prepare callback fails, then DOWN_FAILED is invoked for all \| callbacks which have successfully executed DOWN_PREPARE. \| \| But, workqueue has actually two notifiers. One which handles \| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE. \| \| Now look at the priorities of those callbacks: \| \| CPU_PRI_WORKQUEUE_UP = 5 \| CPU_PRI_WORKQUEUE_DOWN = -5 \| \| So the call order on DOWN_PREPARE is: \| \| CB 1 \| CB ... \| CB workqueue_up() -> Ignores DOWN_PREPARE \| CB ... \| CB X ---> Fails \| \| So we call up to CB X with DOWN_FAILED \| \| CB 1 \| CB ... \| CB workqueue_up() -> Handles DOWN_FAILED \| CB ... \| CB X-1 \| \| So the problem is that the workqueue stuff handles DOWN_FAILED in the up \| callback, while it should do it in the down callback. Which is not a good idea \| either because it wants to be called early on rollback... \| \| Brilliant stuff, isn't it? The hotplug rework will solve this problem because \| the callbacks become symetric, but for the existing mess, we need some \| workaround in the workqueue code. The boot CPU handles housekeeping duty(unbound timers, workqueues, timekeeping, ...) on behalf of full dynticks CPUs. It must remain online when nohz full is enabled. There is a priority set to every notifier_blocks: workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down So tick_nohz_cpu_down callback failed when down prepare cpu 0, and notifier_blocks behind tick_nohz_cpu_down will not be called any more, which leads to workers are actually not unbound. Then hotplug state machine will fallback to undo and online cpu 0 again. Workers will be rebound unconditionally even if they are not unbound and trigger the warning in this progress. This patch fix it by catching !DISASSOCIATED to avoid rebind bound workers. Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
2016-05-12	cgroup: fix compile warning	Felipe Balbi	1	-1/+1
	commit 4f41fc59620f ("cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces") added the following compile warning: kernel/cgroup.c: In function ‘cgroup_show_path’: kernel/cgroup.c:1634:15: warning: unused variable ‘ret’ [-Wunused-variable] int len = 0, ret = 0; ^ fix it. Fixes: 4f41fc59620f ("cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces") Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>