linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-15	MIPS: perf: More robustly probe for the presence of per-tc counters	Matt Redfearn	3	-5/+7
	The presence of per TC performance counters is now detected by cpu-probe.c and indicated by MIPS_CPU_MT_PER_TC_PERF_COUNTERS in cpu_data. Switch detection of the feature to use this new flag rather than blindly testing the implementation specific config7 register with a magic number. Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Maciej W. Rozycki <macro@mips.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: linux-mips@linux-mips.org Cc: oprofile-list@lists.sf.net Patchwork: https://patchwork.linux-mips.org/patch/19142/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-15	MIPS: Probe for MIPS MT perf counters per TC	Matt Redfearn	3	-0/+19
	Processors implementing the MIPS MT ASE may have performance counters implemented per core or per TC. Processors implemented by MIPS Technologies signify presence per TC through a bit in the implementation specific Config7 register. Currently the code which probes for their presence blindly reads a magic number corresponding to this bit, despite it potentially having a different meaning in the CPU implementation. Since CPU features are generally detected by cpu-probe.c, perform the detection here instead. Introduce cpu_set_mt_per_tc_perf which checks the bit in config7 and call it from MIPS CPUs known to implement this bit and the MT ASE, specifically, the 34K, 1004K and interAptiv. Once the presence of the per-tc counter is indicated in cpu_data, tests for it can be updated to use this flag. Suggested-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Matt Redfearn <matt.redfearn@mips.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Maciej W. Rozycki <macro@mips.com> Cc: linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/19136/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: mscc: Connect phys to ports on ocelot_pcb123	Alexandre Belloni	1	-0/+20
	Add phy to switch port connections for PCB123 for internal PHYs. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: linux-mips@linux-mips.org Cc: netdev@vger.kernel.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: mscc: Add switch to ocelot	Alexandre Belloni	1	-0/+88
	Ocelot has an integrated switch, add support for it. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: linux-mips@linux-mips.org Cc: netdev@vger.kernel.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: JZ4740: Drop old platform reset code	Paul Cercueil	1	-31/+0
	This work is now performed by the watchdog driver directly. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Acked-by: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-mips@linux-mips.org Cc: linux-watchdog@vger.kernel.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: qi_lb60: Enable the jz4740-wdt driver	Paul Cercueil	1	-0/+2
	The watchdog is an useful piece of hardware, so there's no reason not to enable it. Besides, this is important for restart to work after the change in the next commit. This commit enables the Kconfig option in the qi_lb60 defconfig. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Acked-by: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-mips@linux-mips.org Cc: linux-watchdog@vger.kernel.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: JZ4780: dts: Fix watchdog node	Paul Cercueil	2	-2/+10
	- The previous node requested a memory area of 0x100 bytes, while the driver only manipulates four registers present in the first 0x10 bytes. - The driver requests for the "rtc" clock, but the previous node did not provide any. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Mathieu Malaterre <malat@debian.org> Acked-by: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-mips@linux-mips.org Cc: devicetree@vger.kernel.org Cc: linux-watchdog@vger.kernel.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: JZ4740: dts: Add bindings for the jz4740-wdt driver	Paul Cercueil	3	-17/+8
	Also remove the watchdog platform_device from platform.c, since it wasn't used anywhere anyway. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-mips@linux-mips.org Cc: linux-watchdog@vger.kernel.org Cc: devicetree@vger.kernel.org [jhogan@kernel.org: Drop jz4740_wdt_device declaration from header] Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	watchdog: JZ4740: Drop module remove function	Paul Cercueil	1	-8/+0
	When the watchdog was configured for nowayout, and after the userspace watchdog daemon closed the dev node without sending the magic character, unloading this module stopped the watchdog hardware, which was clearly a problem. Besides, unloading the module is not possible when the userspace watchdog daemon is running, so it's safe to assume that we don't need to stop the watchdog hardware in the jz4740_wdt_remove() function. For this reason, the jz4740_wdt_remove() function can then be dropped alltogether. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-watchdog@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	watchdog: JZ4740: Register a restart handler	Paul Cercueil	1	-0/+9
	The watchdog driver can restart the system by simply configuring the hardware for a timeout of 0 seconds. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-watchdog@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	watchdog: JZ4740: Use devm_* functions	Paul Cercueil	1	-19/+8
	- Use devm_clk_get instead of clk_get - Use devm_watchdog_register_device instead of watchdog_register_device Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-watchdog@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	watchdog: JZ4740: Disable clock after stopping counter	Paul Cercueil	1	-1/+1
	Previously, the clock was disabled first, which makes the watchdog component insensitive to register writes. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: linux-watchdog@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: VPE: Fix spelling mistake: "uneeded" -> "unneeded"	Colin Ian King	1	-1/+1
	Trivial fix to spelling mistake in pr_warn message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kernel-janitors@vger.kernel.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: Re-use kstrtobool_from_user()	Andy Shevchenko	1	-8/+1
	Re-use kstrtobool_from_user() instead of open coded variant. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: Convert update_persistent_clock() to update_persistent_clock64()	Baolin Wang	8	-39/+30
	Since struct timespec is not y2038 safe on 32bit machines, this patch converts update_persistent_clock() to update_persistent_clock64() using struct timespec64. The rtc_mips_set_time() and rtc_mips_set_mmss() interfaces were using 'unsigned long' type that is not y2038 safe on 32bit machines, moreover there is only one platform implementing rtc_mips_set_time() and two platforms implementing rtc_mips_set_mmss(), so we can just make them each implement update_persistent_clock64() directly, to get that helper out of the common mips code by removing rtc_mips_set_time() and rtc_mips_set_mmss() interfaces. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Huacai Chen <chenhc@lemote.com> Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@linux-mips.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: Convert read_persistent_clock() to read_persistent_clock64()	Baolin Wang	8	-16/+16
	Since struct timespec is not y2038 safe on 32bit machines, this patch converts read_persistent_clock() to read_persistent_clock64() using struct timespec64, as well as converting mktime() to mktime64(). Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Huacai Chen <chenhc@lemote.com> Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@linux-mips.org Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-14	MIPS: sni: Remove the read_persistent_clock()	Baolin Wang	1	-6/+0
	The dummy read_persistent_clock() uses a timespec, which is not year 2038 safe on 32bit systems. Thus remove this obsolete interface. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19114/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	MIPS: BCM47XX: Use __initdata for the bcm47xx_leds_pdata	Rafał Miłecki	1	-1/+1
	This struct variable is used during init only. It gets passed to the gpio_led_register_device() which creates its own data copy. That allows using __initdata and saving some minimal amount of memory. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/18928/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	MIPS: Use generic GCC library routines from lib/	Antony Pavlov	7	-144/+6
	The commit b35cd9884fa5 ("lib: Add shared copies of some GCC library routines") makes it possible to share generic GCC library routines by several architectures. This commit removes several generic GCC library routines from arch/mips/lib/ in favour of similar routines from lib/. Signed-off-by: Antony Pavlov <antonynpavlov@gmail.com> [Matt Redfearn] Use GENERIC_LIB_* named Kconfig entries Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19051/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	MIPS: vmlinuz: Use generic ashldi3	Matt Redfearn	1	-4/+7
	In preparation for removing some of the MIPS compiler intrinsics from arch/mips/lib, first update the build of vmlinuz to use the generic ashldi3 from lib. Both ashldi3 and bswapsi objects need to be built with different CFLAGS for inclusion to vmlinuz rather than simply including the object built for the main kernel image. The objects cannot be built directly from source, since CONFIG_MODVERSIONS changes cmd_cc_o_c to prevent this. Split the rule to ship ashldi3 and bswapsi from the relevant source locations. These files make no reference to other files in their directory, so the additional CFLAGS are apparently unnecessary - remove them as well. Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Antony Pavlov <antonynpavlov@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19050/ [jhogan@kernel.org: Add if_changed and FORCE to fix build failure when arch/mips/boot/compressed/ashldi3.c is already generated but there is no .ashldi3.c.cmd file yet] Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	lib: Rename compiler intrinsic selects to GENERIC_LIB_*	Matt Redfearn	3	-15/+15
	When these are included into arch Kconfig files, maintaining alphabetical ordering of the selects means these get split up. To allow for keeping things tidier and alphabetical, rename the selects to GENERIC_LIB_* Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Antony Pavlov <antonynpavlov@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-riscv@lists.infradead.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19049/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	Add notrace to lib/ucmpdi2.c	Palmer Dabbelt	1	-1/+1
	As part of the MIPS conversion to use the generic GCC library routines, Matt Redfearn discovered that I'd missed a notrace on __ucmpdi2(). This patch rectifies the problem. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Reviewed-by: Matt Redfearn <matt.redfearn@mips.com> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Matt Redfearn <matt.redfearn@mips.com> Cc: Antony Pavlov <antonynpavlov@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19048/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	firmware: bcm47xx_nvram: Support small (0x6000 B) NVRAM partitions	Rafał Miłecki	1	-1/+1
	Some old devices with 4 MiB flashes were using 0x1000 block size and could use smaller (0x6000 bytes) flash partition for storing NVRAM content. This adds support for reading NVRAM on Netgear WNR1000 V3. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19005/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	MIPS: BCM47XX: Add support for Netgear WNR1000 V3	Rafał Miłecki	4	-0/+21
	This adds support for detecting this model board and registers some LEDs and buttons. There are two uncommon things regarding this device: 1) It can use two different "board_id" ID values. Unit I have uses "U12H139T00_NETGEAR" value. This magic is also used in firmware file header. There are two reports (one from an OpenWrt user) of a different "U12H139T50_NETGEAR" magic though. 2) Power LEDs share GPIOs with buttons. Amber one seems to share GPIO 2 with WPS button and green one seems to share GPIO 3 with reset button. It remains unknown how to support them and handle buttons at the same time. For that reason they aren't added to the list of supported LEDs. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/19004/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-23	MIPS: dts: Avoid unneeded built-in.a in DTS dirs	Masahiro Yamada	10	-10/+10
	arch/mips/boot/dts/Makefile collects objects from sub-directories into built-in.a only when CONFIG_BUILTIN_DTB is enabled. Reflect it also to the sub-directory Makefiles. This suppresses unneeded built-in.a creation in arch/mips/boot/dts/*/ directories. While I am here, I replaced $(patsubst %.dtb, %.dtb.o, $(dtb-y)) with $(addsuffix .o, $(dtb-y)) to simplify the code a little bit. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Cercueil <paul@crapouillou.net> Cc: Mathieu Malaterre <malat@debian.org> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@linux-mips.org Cc: devicetree@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/19099/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-22	Linux 4.17-rc2	Linus Torvalds	1	-1/+1

2018-04-20	mm/filemap.c: fix NULL pointer in page_cache_tree_insert()	Matthew Wilcox	1	-5/+4
	f2fs specifies the __GFP_ZERO flag for allocating some of its pages. Unfortunately, the page cache also uses the mapping's GFP flags for allocating radix tree nodes. It always masked off the __GFP_HIGHMEM flag, and masks off __GFP_ZERO in some paths, but not all. That causes radix tree nodes to be allocated with a NULL list_head, which causes backtraces like: __list_del_entry+0x30/0xd0 list_lru_del+0xac/0x1ac page_cache_tree_insert+0xd8/0x110 The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through if they are ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain. Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Chris Fries <cfries@google.com> Debugged-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create()	Minchan Kim	1	-1/+1
	If there is heavy memory pressure, page allocation with __GFP_NOWAIT fails easily although it's order-0 request. I got below warning 9 times for normal boot. <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT\|__GFP_NOTRACK) .. snip .. Call trace: dump_backtrace+0x0/0x4 dump_stack+0xa4/0xc0 warn_alloc+0xd4/0x15c __alloc_pages_nodemask+0xf88/0x10fc alloc_slab_page+0x40/0x18c new_slab+0x2b8/0x2e0 ___slab_alloc+0x25c/0x464 __kmalloc+0x394/0x498 memcg_kmem_get_cache+0x114/0x2b8 kmem_cache_alloc+0x98/0x3e8 mmap_region+0x3bc/0x8c0 do_mmap+0x40c/0x43c vm_mmap_pgoff+0x15c/0x1e4 sys_mmap+0xb0/0xc8 el0_svc_naked+0x24/0x28 Mem-Info: active_anon:17124 inactive_anon:193 isolated_anon:0 active_file:7898 inactive_file:712955 isolated_file:55 unevictable:0 dirty:27 writeback:18 unstable:0 slab_reclaimable:12250 slab_unreclaimable:23334 mapped:19310 shmem:212 pagetables:816 bounce:0 free:36561 free_pcp:1205 free_cma:35615 Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB lowmem_reserve[]: 0 1842 1842 Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB lowmem_reserve[]: 0 0 0 DMA: 04kB 08kB 116kB (C) 032kB 064kB 0128kB 1256kB (C) 1512kB (C) 01024kB 12048kB (C) 344096kB (C) = 142096kB Normal: 2284kB (UMEH) 1728kB (UMH) 2316kB (UH) 2432kB (H) 564kB (H) 1128kB (H) 0256kB 0512kB 01024kB 02048kB 04096kB = 3872kB 721350 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 945512 pages RAM 0 pages HighMem/MovableOnly 63408 pages reserved 51200 pages cma reserved __memcg_schedule_kmem_cache_create() tries to create a shadow slab cache and the worker allocation failure is not really critical because we will retry on the next kmem charge. We might miss some charges but that shouldn't be critical. The excessive allocation failure report is not very helpful. [mhocko@kernel.org: changelog update] Link: http://lkml.kernel.org/r/20180418022912.248417-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error	Tetsuo Handa	1	-4/+4
	Commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is printing spurious messages under memory pressure due to map_addr == -ENOMEM. 9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already 14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already 16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already Complain only if -EEXIST, and use %px for printing the address. Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp Fixes: 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	kexec_file: do not add extra alignment to efi memmap	Dave Young	1	-3/+2
	Chun-Yi reported a kernel warning message below: WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c() early_iounmap(ffffffffff200180, 00000118) [0] size not consistent 00000120 The problem is x86 kexec_file_load adds extra alignment to the efi memmap: in bzImage64_load(): efi_map_sz = efi_get_runtime_map_size(); efi_map_sz = ALIGN(efi_map_sz, 16); And __efi_memmap_init maps with the size including the alignment bytes but efi_memmap_unmap use nr_maps * desc_size which does not include the extra bytes. The alignment in kexec code is only needed for the kexec buffer internal use Actually kexec should pass exact size of the efi memmap to 2nd kernel. Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.com Signed-off-by: Dave Young <dyoung@redhat.com> Reported-by: joeyli <jlee@suse.com> Tested-by: Randy Wright <rwright@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	proc: fix /proc/loadavg regression	Alexey Dobriyan	2	-2/+2
	Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") changed last field of /proc/loadavg (last pid allocated) to be off by one: # unshare -p -f --mount-proc cat /proc/loadavg 0.00 0.00 0.00 1/60 2 <=== It should be 1 after first fork into pid namespace. This is formally a regression but given how useless this field is I don't think anyone is affected. Bug was found by /proc testsuite! Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2 Fixes: 95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	proc: revalidate kernel thread inodes to root:root	Alexey Dobriyan	1	-0/+6
	task_dump_owner() has the following code: mm = task->mm; if (mm) { if (get_dumpable(mm) != SUID_DUMP_USER) { uid = ... } } Check for ->mm is buggy -- kernel thread might be borrowing mm and inode will go to some random uid:gid pair. Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	autofs: mount point create should honour passed in mode	Ian Kent	1	-1/+1
	The autofs file system mkdir inode operation blindly sets the created directory mode to S_IFDIR \| 0555, ingoring the passed in mode, which can cause selinux dac_override denials. But the function also checks if the caller is the daemon (as no-one else should be able to do anything here) so there's no point in not honouring the passed in mode, allowing the daemon to set appropriate mode when required. Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	MAINTAINERS: add personal addresses for Sascha and Uwe	Uwe Kleine-König	1	-5/+10
	The idea behind using kernel@pengutronix.de (i.e. the mail alias for the kernel people at Pengutronix) as email address was to have a backup when a given developer is on vacation or run over by a bus. Make this more explicit by adding the alias as reviewer and use the personal address for Sascha and me. Link: http://lkml.kernel.org/r/20180413083312.11213-1-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	kasan: add no_sanitize attribute for clang builds	Andrey Konovalov	1	-0/+3
	KASAN uses the __no_sanitize_address macro to disable instrumentation of particular functions. Right now it's defined only for GCC build, which causes false positives when clang is used. This patch adds a definition for clang. Note, that clang's revision 329612 or higher is required. [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check] Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Paul Lawrence <paullawrence@google.com> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	rapidio: fix rio_dma_transfer error handling	Ioan Nicu	1	-10/+9
	Some of the mport_dma_req structure members were initialized late inside the do_dma_request() function, just before submitting the request to the dma engine. But we have some error branches before that. In case of such an error, the code would return on the error path and trigger the calling of dma_req_free() with a req structure which is not completely initialized. This causes a NULL pointer dereference in dma_req_free(). This patch fixes these error branches by making sure that all necessary mport_dma_req structure members are initialized in rio_dma_transfer() immediately after the request structure gets allocated. Link: http://lkml.kernel.org/r/20180412150605.GA31409@nokia.com Fixes: bbd876adb8c72 ("rapidio: use a reference count for struct mport_dma_req") Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Acked-by: Alexandre Bounine <alex.bou9@gmail.com> Cc: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Frank Kunz <frank.kunz@nokia.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	mm: enable thp migration for shmem thp	Naoya Horiguchi	3	-7/+20
	My testing for the latest kernel supporting thp migration showed an infinite loop in offlining the memory block that is filled with shmem thps. We can get out of the loop with a signal, but kernel should return with failure in this case. What happens in the loop is that scan_movable_pages() repeats returning the same pfn without any progress. That's because page migration always fails for shmem thps. In memory offline code, memory blocks containing unmovable pages should be prevented from being offline targets by has_unmovable_pages() inside start_isolate_page_range(). So it's possible to change migratability for non-anonymous thps to avoid the issue, but it introduces more complex and thp-specific handling in migration code, so it might not good. So this patch is suggesting to fix the issue by enabling thp migration for shmem thp. Both of anon/shmem thp are migratable so we don't need precheck about the type of thps. Link: http://lkml.kernel.org/r/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp Fixes: commit 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zi Yan <zi.yan@sent.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	writeback: safer lock nesting	Greg Thelen	4	-26/+34
	lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if the page's memcg is undergoing move accounting, which occurs when a process leaves its memcg for a new one that has memory.move_charge_at_immigrate set. unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the given inode is switching writeback domains. Switches occur when enough writes are issued from a new domain. This existing pattern is thus suspicious: lock_page_memcg(page); unlocked_inode_to_wb_begin(inode, &locked); ... unlocked_inode_to_wb_end(inode, locked); unlock_page_memcg(page); If both inode switch and process memcg migration are both in-flight then unlocked_inode_to_wb_end() will unconditionally enable interrupts while still holding the lock_page_memcg() irq spinlock. This suggests the possibility of deadlock if an interrupt occurs before unlock_page_memcg(). truncate __cancel_dirty_page lock_page_memcg unlocked_inode_to_wb_begin unlocked_inode_to_wb_end <interrupts mistakenly enabled> <interrupt> end_page_writeback test_clear_page_writeback lock_page_memcg <deadlock> unlock_page_memcg Due to configuration limitations this deadlock is not currently possible because we don't mix cgroup writeback (a cgroupv2 feature) and memory.move_charge_at_immigrate (a cgroupv1 feature). If the kernel is hacked to always claim inode switching and memcg moving_account, then this script triggers lockup in less than a minute: cd /mnt/cgroup/memory mkdir a b echo 1 > a/memory.move_charge_at_immigrate echo 1 > b/memory.move_charge_at_immigrate ( echo $BASHPID > a/cgroup.procs while true; do dd if=/dev/zero of=/mnt/big bs=1M count=256 done ) & while true; do sync done & sleep 1h & SLEEP=$! while true; do echo $SLEEP > a/cgroup.procs echo $SLEEP > b/cgroup.procs done The deadlock does not seem possible, so it's debatable if there's any reason to modify the kernel. I suggest we should to prevent future surprises. And Wang Long said "this deadlock occurs three times in our environment", so there's more reason to apply this, even to stable. Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch see "[PATCH for-4.4] writeback: safer lock nesting" https://lkml.org/lkml/2018/4/11/146 Wang Long said "this deadlock occurs three times in our environment" [gthelen@google.com: v4] Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com [akpm@linux-foundation.org: comment tweaks, struct initialization simplification] Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613 Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen <gthelen@google.com> Reported-by: Wang Long <wanglong19@meituan.com> Acked-by: Wang Long <wanglong19@meituan.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> [v4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	mm, pagemap: fix swap offset value for PMD migration entry	Huang Ying	1	-1/+5
	The swap offset reported by /proc/<pid>/pagemap may be not correct for PMD migration entries. If addr passed into pagemap_pmd_range() isn't aligned with PMD start address, the swap offset reported doesn't reflect this. And in the loop to report information of each sub-page, the swap offset isn't increased accordingly as that for PFN. This may happen after opening /proc/<pid>/pagemap and seeking to a page whose address doesn't align with a PMD start address. I have verified this with a simple test program. BTW: migration swap entries have PFN information, do we need to restrict whether to show them? [akpm@linux-foundation.org: fix typo, per Huang, Ying] Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Jerome Glisse" <jglisse@redhat.com> Cc: Daniel Colascione <dancol@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	mm: fix do_pages_move status handling	Michal Hocko	1	-0/+3
	Li Wang has reported that LTP move_pages04 test fails with the current tree: LTP move_pages04: TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT The test allocates an array of two pages, one is present while the other is not (resp. backed by zero page) and it expects EFAULT for the second page as the man page suggests. We are reporting EPERM which doesn't make any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter THP migration"). do_pages_move tries to handle as many pages in one batch as possible so we queue all pages with the same node target together and that corresponds to [start, i] range which is then used to update status array. add_page_for_migration will correctly notice the zero (resp. !present) page and returns with EFAULT which gets written to the status. But if this is the last page in the array we do not update start and so the last store_status after the loop will overwrite the range of the last batch with NUMA_NO_NODE (which corresponds to EPERM). Fix this by simply bailing out from the last flush if the pagelist is empty as there is clearly nothing more to do. Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org Fixes: cf5f16b23ec9 ("mm: unclutter THP migration") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Li Wang <liwang@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	fork: unconditionally clear stack on fork	Kees Cook	2	-7/+2
	One of the classes of kernel stack content leaks[1] is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. In the face of stack content exposure flaws, those contents can leak to userspace. Fixing this will make the kernel no longer vulnerable to these flaws, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit. Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54 and after: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46 Instead of making this a build or runtime config, Andy Lutomirski recommended this just be enabled by default. [1] A noisy search for many kinds of stack content leaks can be seen here: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak I did some more with perf and cycle counts on running 100,000 execs of /bin/true. before: Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841 Mean: 221015379122.60 Std Dev: 4662486552.47 after: Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348 Mean: 217745009865.40 Std Dev: 5935559279.99 It continues to look like it's faster, though the deviation is rather wide, but I'm not sure what I could do that would be less noisy. I'm open to ideas! Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	CIFS: fix typo in cifs_dbg	Aurelien Aptel	1	-1/+1
	Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Reported-by: Long Li <longli@microsoft.com>
2018-04-20	cifs: do not allow creating sockets except with SMB1 posix exensions	Steve French	1	-4/+5
	RHBZ: 1453123 Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040 Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used. Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin. Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com> CC: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Cc: stable@vger.kernel.org
2018-04-20	cifs: smbd: Dump SMB packet when configured	Long Li	1	-1/+5
	When sending through SMB Direct, also dump the packet in SMB send path. Also fixed a typo in debug message. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-20	btrfs: print-tree: debugging output enhancement	Qu Wenruo	2	-11/+16
	This patch enhances the following things: - tree block header * add generation and owner output for node and leaf - node pointer generation output - allow btrfs_print_tree() to not follow nodes * just like btrfs-progs Please note that, although function btrfs_print_tree() is not called by anyone right now, it's still a pretty useful function to debug kernel. So that function is still kept for later use. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-20	btrfs: Fix race condition between delayed refs and blockgroup removal	Nikolay Borisov	3	-10/+26
	When the delayed refs for a head are all run, eventually cleanup_ref_head is called which (in case of deletion) obtains a reference for the relevant btrfs_space_info struct by querying the bg for the range. This is problematic because when the last extent of a bg is deleted a race window emerges between removal of that bg and the subsequent invocation of cleanup_ref_head. This can result in cache being null and either a null pointer dereference or assertion failure. task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000 RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs] RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292 RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8 RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001 R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0 R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0 FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs] btrfs_run_delayed_refs+0x68/0x250 [btrfs] btrfs_should_end_transaction+0x42/0x60 [btrfs] btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs] btrfs_evict_inode+0x4c6/0x5c0 [btrfs] evict+0xc6/0x190 do_unlinkat+0x19c/0x300 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fbf589c57a7 To fix this, introduce a new flag "is_system" to head_ref structs, which is populated at insertion time. This allows to decouple the querying for the spaceinfo from querying the possibly deleted bg. Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting") CC: stable@vger.kernel.org # 4.14+ Suggested-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-20	vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion	David Howells	1	-1/+1
	In do_mount() when the MS_* flags are being converted to MNT_* flags, MS_RDONLY got accidentally convered to SB_RDONLY. Undo this change. Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	afs: Fix server record deletion	David Howells	1	-1/+8
	AFS server records get removed from the net->fs_servers tree when they're deleted, but not from the net->fs_addresses{4,6} lists, which can lead to an oops in afs_find_server() when a server record has been removed, for instance during rmmod. Fix this by deleting the record from the by-address lists before posting it for RCU destruction. The reason this hasn't been noticed before is that the fileserver keeps probing the local cache manager, thereby keeping the service record alive, so the oops would only happen when a fileserver eventually gets bored and stops pinging or if the module gets rmmod'd and a call comes in from the fileserver during the window between the server records being destroyed and the socket being closed. The oops looks something like: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c ... Workqueue: kafsd afs_process_async_call [kafs] RIP: 0010:afs_find_server+0x271/0x36f [kafs] ... Call Trace: afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs] afs_deliver_to_call+0x1ee/0x5e8 [kafs] afs_process_async_call+0x5b/0xd0 [kafs] process_one_work+0x2c2/0x504 worker_thread+0x1d4/0x2ac kthread+0x11f/0x127 ret_from_fork+0x24/0x30 Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20	perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs	Oskar Senft	1	-1/+17
	SBOX on some Broadwell CPUs is broken because it's enabled unconditionally despite the fact that there are no SBOXes available. Check the Power Control Unit CAPID4 register to determine the number of available SBOXes on the particular CPU before trying to enable them. If there are none, nullify the SBOX descriptor so it isn't tried to be initialized. Signed-off-by: Oskar Senft <osk@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mark van Dijk <mark@voidzero.net> Reviewed-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: peterz@infradead.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1521810690-2576-2-git-send-email-kan.liang@linux.intel.com
2018-04-20	perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"	Stephane Eranian	1	-0/+21
	This reverts commit 3b94a891667c ("perf/x86/intel/uncore: Remove SBOX support for Broadwell server") Revert because there exists a proper workaround for Broadwell-EP servers without SBOX now. Note that BDX-DE does not have a SBOX. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: osk@google.com Cc: mark@voidzero.net Link: https://lkml.kernel.org/r/1521810690-2576-1-git-send-email-kan.liang@linux.intel.com