linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-05-25	serial: imx: ensure UCR3 and UFCR are setup correctly	Uwe Kleine-König	1	-2/+12
	Commit e61c38d85b73 ("serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off") has a flaw: While UCR3 and UFCR were modified using read-modify-write before it switched to write register values independent of the previous state. That's a good idea in principle (and that's why I did it) but needs more care. This patch reinstates read-modify-write for UFCR and for UCR3 ensures that RXDMUXSEL and ADNIMP are set for post imx1. Fixes: e61c38d85b73 ("serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Mika Penttilä <mika.penttila@nextfour.com> Tested-by: Mika Penttilä <mika.penttila@nextfour.com> Acked-by: Steve Twiss <stwiss.opensource@diasemi.com> Tested-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-24	MAINTAINERS/serial: Change maintainer of jsm driver	Guilherme G. Piccoli	1	-1/+1
	Gabriel will no longer maintain this driver, so I'm adding myself as maintainer. Thanks for all your work on jsm driver Gabriel. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serial: enable serdev support	Johan Hovold	1	-2/+2
	Enable serdev support by using the new device-registration helpers. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	tty/serdev: add serdev registration interface	Johan Hovold	4	-4/+93
	Add a new interface for registering a serdev controller and clients, and a helper function to deregister serdev devices (or a tty device) that were previously registered using the new interface. Once every driver currently using the tty_port_register_device() helpers have been vetted and converted to use the new serdev registration interface (at least for deregistration), we can move serdev registration to the current helpers and get rid of the serdev-specific functions. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serdev: Restore serdev_device_write_buf for atomic context	Stefan Wahren	2	-7/+19
	Starting with commit 6fe729c4bdae ("serdev: Add serdev_device_write subroutine") the function serdev_device_write_buf cannot be used in atomic context anymore (mutex_lock is sleeping). So restore the old behavior. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: 6fe729c4bdae ("serdev: Add serdev_device_write subroutine") Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serial: core: fix crash in uart_suspend_port	Lucas Stach	1	-1/+1
	With serdev we might end up with serial ports that have no cdev exported to userspace, as they are used as the bus interface to other devices. In that case serial_match_port() won't be able to find a matching tty_dev. Skip the irq wakeup enabling in that case, as serdev will make sure to keep the port active, as long as there are devices depending on it. Fixes: 8ee3fde04758 (tty_port: register tty ports with serdev bus) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	tty: fix port buffer locking	Vegard Nossum	1	-0/+2
	tty_insert_flip_string_fixed_flag() is racy against itself when called from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc() workqueue path [2]. The problem is that port->buf.tail->used is modified without consistent locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue path takes ldata->output_lock. We cannot simply take ldata->output_lock, since that is specific to the N_TTY line discipline. It might seem natural to try to take port->buf.lock inside tty_insert_flip_string_fixed_flag() and friends (where port->buf is actually used/modified), but this creates problems for flush_to_ldisc() which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem, and ldata->output_lock. Therefore, the simplest solution for now seems to be to take tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock is also used in the write path [3] with a consistent ordering. [1]: Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_send_xchar // down_read(&o_tty->termios_rwsem) // mutex_lock(&tty->atomic_write_lock) n_tty_ioctl_helper n_tty_ioctl tty_ioctl // down_read(&tty->ldisc_sem) do_vfs_ioctl SyS_ioctl [2]: Workqueue: events_unbound flush_to_ldisc Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_put_char __process_echoes commit_echoes // mutex_lock(&ldata->output_lock) n_tty_receive_buf_common n_tty_receive_buf2 tty_ldisc_receive_buf // down_read(&o_tty->termios_rwsem) tty_port_default_receive_buf // down_read(&tty->ldisc_sem) flush_to_ldisc // mutex_lock(&port->buf.lock) process_one_work [3]: Call Trace: tty_insert_flip_string_fixed_flag pty_write n_tty_write // mutex_lock(&ldata->output_lock) // down_read(&tty->termios_rwsem) do_tty_write (inline) // mutex_lock(&tty->atomic_write_lock) tty_write // down_read(&tty->ldisc_sem) __vfs_write vfs_write SyS_write The bug can result in about a dozen different crashes depending on what exactly gets corrupted when port->buf.tail->used points outside the buffer. The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is always welcome. Found using syzkaller. Cc: <stable@vger.kernel.org> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	tty: ehv_bytechan: clean up init error handling	Johan Hovold	1	-9/+8
	Straighten out the initcall error handling to avoid deregistering a never-registered tty driver (something which would lead to a NULL-pointer dereference) in the most unlikely event that driver registration fails (e.g. we've run out of major numbers). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serial: ifx6x60: fix use-after-free on module unload	Johan Hovold	1	-1/+1
	Make sure to deregister the SPI driver before releasing the tty driver to avoid use-after-free in the SPI remove callback where the tty devices are deregistered. Fixes: 72d4724ea54c ("serial: ifx6x60: Add modem power off function in the platform reboot process") Cc: stable <stable@vger.kernel.org> # 3.8 Cc: Jun Chen <jun.d.chen@intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serial: altera_jtaguart: adding iounmap()	Alexey Khoroshilov	1	-0/+1
	The driver does ioremap(port->mapbase, ALTERA_JTAGUART_SIZE), but there is no any iounmap(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serial: exar: Fix stuck MSIs	Jan Kiszka	1	-9/+10
	After migrating 8250_exar to MSI in 172c33cb61da, we can get stuck without further interrupts because of the special wake-up event these chips send. They are only cleared by reading INT0. As we fail to do so during startup and shutdown, we can leave the interrupt line asserted, which is fatal with edge-triggered MSIs. Add the required reading of INT0 to startup and shutdown. Also account for the fact that a pending wake-up interrupt means we have to return 1 from exar_handle_irq. Drop the unneeded reading of INT1..3 along with this - those never reset anything. An alternative approach would have been disabling the wake-up interrupt. Unfortunately, this feature (REGB[17] = 1) is not available on the XR17D15X. Fixes: 172c33cb61da ("serial: exar: Enable MSI support") Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'	Christophe JAILLET	1	-3/+8
	UARTn_FRAME_PARITY_ODD is 0x0300 UARTn_FRAME_PARITY_EVEN is 0x0200 So if the UART is configured for EVEN parity, it would be reported as ODD. Fix it by correctly testing if the 2 bits are set. Fixes: 3afbd89c9639 ("serial/efm32: add new driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	serdev: fix tty-port client deregistration	Johan Hovold	1	-5/+10
	The port client data must be set when registering the serdev controller or client deregistration will fail (and the serdev devices are left registered and allocated) if the port was never opened in between. Make sure to clear the port client data on any probe errors to avoid a use-after-free when the client is later deregistered unconditionally (e.g. in a tty-port deregistration helper). Also move port client operation initialisation to registration. Note that the client ops must be restored on failed probe. Fixes: bed35c6dfa6a ("serdev: add a tty port controller driver") Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	Revert "tty_port: register tty ports with serdev bus"	Johan Hovold	1	-12/+0
	This reverts commit 8ee3fde047589dc9c201251f07d0ca1dc776feca. The new serdev bus hooked into the tty layer in tty_port_register_device() by registering a serdev controller instead of a tty device whenever a serdev client is present, and by deregistering the controller in the tty-port destructor. This is broken in several ways: Firstly, it leads to a NULL-pointer dereference whenever a tty driver later deregisters its devices as no corresponding character device will exist. Secondly, far from every tty driver uses tty-port refcounting (e.g. serial core) so the serdev devices might never be deregistered or deallocated. Thirdly, deregistering at tty-port destruction is too late as the underlying device and structures may be long gone by then. A port is not released before an open tty device is closed, something which a registered serdev client can prevent from ever happening. A driver callback while the device is gone typically also leads to crashes. Many tty drivers even keep their ports around until the driver is unloaded (e.g. serial core), something which even if a late callback never happens, leads to leaks if a device is unbound from its driver and is later rebound. The right solution here is to add a new tty_port_unregister_device() helper and to never call tty_device_unregister() whenever the port has been claimed by serdev, but since this requires modifying just about every tty driver (and multiple subsystems) it will need to be done incrementally. Reverting the offending patch is the first step in fixing the broken lifetime assumptions. A follow-up patch will add a new pair of tty-device registration helpers, which a vetted tty driver can use to support serdev (initially serial core). When every tty driver uses the serdev helpers (at least for deregistration), we can add serdev registration to tty_port_register_device() again. Note that this also fixes another issue with serdev, which currently allocates and registers a serdev controller for every tty device registered using tty_port_device_register() only to immediately deregister and deallocate it when the corresponding OF node or serdev child node is missing. This should be addressed before enabling serdev for hot-pluggable buses. Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-18	drivers/tty: 8250: only call fintek_8250_probe when doing port I/O	Ard Biesheuvel	1	-1/+1
	Commit fa01e2ca9f53 ("serial: 8250: Integrate Fintek into 8250_base") modified the probing logic for PNP0501 devices, to remove a collision between the generic 16550A driver and the Fintek driver, which reused the same ACPI _HID. The Fintek device probe is now incorporated into the common 8250 probe path, and gets called for all discovered 16550A compatible devices, including ones that are MMIO mapped rather than IO mapped. However, the Fintek driver assumes the port base is a I/O address, and proceeds to probe some arbitrary offsets above it. This is generally a wrong thing to do, but on ARM systems (having no native port I/O), this may result in faulting accesses of completely unrelated MMIO regions in the PCI I/O space. Given that this is at serial probe time, this results in hard to diagnose crashes at boot. So let's restrict the Fintek probe to devices that we know are using port I/O in the first place. Fixes: fa01e2ca9f53 ("serial: 8250: Integrate Fintek into 8250_base") Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ricardo Ribalda <ricardo.ribalda@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-13	Linux 4.12-rc1	Linus Torvalds	1	-2/+2

2017-05-12	mm, docs: update memory.stat description with workingset* entries	Roman Gushchin	1	-0/+12
	Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in cache workingset transition") introduced three new entries in memory stat file: - workingset_refault - workingset_activate - workingset_nodereclaim This commit adds a corresponding description to the cgroup v2 docs. Link: http://lkml.kernel.org/r/1494530293-31236-1-git-send-email-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	mm: vmscan: scan until it finds eligible pages	Minchan Kim	1	-6/+15
	Although there are a ton of free swap and anonymous LRU page in elgible zones, OOM happened. balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT\|__GFP_ZERO\|__GFP_NOTRACK), nodemask=(null), order=0, oom_score_adj=0 CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: oom_kill_process+0x21d/0x3f0 out_of_memory+0xd8/0x390 __alloc_pages_slowpath+0xbc1/0xc50 __alloc_pages_nodemask+0x1a5/0x1c0 pte_alloc_one+0x20/0x50 __pte_alloc+0x1e/0x110 __handle_mm_fault+0x919/0x960 handle_mm_fault+0x77/0x120 __do_page_fault+0x27a/0x550 trace_do_page_fault+0x43/0x150 do_async_page_fault+0x2c/0x90 async_page_fault+0x28/0x30 Mem-Info: active_anon:424716 inactive_anon:65314 isolated_anon:0 active_file:52 inactive_file:46 isolated_file:0 unevictable:0 dirty:27 writeback:0 unstable:0 slab_reclaimable:3967 slab_unreclaimable:4125 mapped:133 shmem:43 pagetables:1674 bounce:0 free:4637 free_pcp:225 free_cma:0 Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 992 992 1952 DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB lowmem_reserve[]: 0 0 0 959 Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 DMA: 14kB (E) 08kB 1816kB (E) 1032kB (E) 1064kB (E) 9128kB (ME) 8256kB (E) 2512kB (E) 21024kB (E) 02048kB 04096kB = 7524kB DMA32: 4174kB (UMEH) 1818kB (UMEH) 6816kB (UMEH) 4832kB (UMEH) 1464kB (MH) 3128kB (M) 1256kB (H) 1512kB (M) 21024kB (M) 02048kB 04096kB = 9836kB Movable: 14kB (M) 18kB (M) 116kB (M) 132kB (M) 064kB 1128kB (M) 2256kB (M) 4512kB (M) 11024kB (M) 02048kB 0*4096kB = 3772kB 378 total pagecache pages 17 pages in swap cache Swap cache stats: add 17325, delete 17302, find 0/27 Free swap = 978940kB Total swap = 1048572kB 524157 pages RAM 0 pages HighMem/MovableOnly 12629 pages reserved 0 pages cma reserved 0 pages hwpoisoned [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 433] 0 433 4904 5 14 3 82 0 upstart-udev-br [ 438] 0 438 12371 5 27 3 191 -1000 systemd-udevd With investigation, skipping page of isolate_lru_pages makes reclaim void because it returns zero nr_taken easily so LRU shrinking is effectively nothing and just increases priority aggressively. Finally, OOM happens. The problem is that get_scan_count determines nr_to_scan with eligible zones so although priority drops to zero, it couldn't reclaim any pages if the LRU contains mostly ineligible pages. get_scan_count: size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); size = size >> sc->priority; Assumes sc->priority is 0 and LRU list is as follows. N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H (Ie, small eligible pages are in the head of LRU but others are almost ineligible pages) In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from tail of the LRU are not eligible pages. If get_scan_count counts skipped pages, it doesn't reclaim any pages remained after scanning 4 pages so it ends up OOM happening. This patch makes isolate_lru_pages try to scan pages until it encounters eligible zones's pages. [akpm@linux-foundation.org: clean up mind-bending `for' statement. Tweak comment text] Fixes: 3db65812d688 ("Revert "mm, vmscan: account for skipped pages as a partial scan"") Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	mm, thp: copying user pages must schedule on collapse	David Rientjes	1	-4/+3
	We have encountered need_resched warnings in __collapse_huge_page_copy() while doing {clear,copy}_user_highpage() over HPAGE_PMD_NR source pages. mm->mmap_sem is held for write, but the iteration is well bounded. Reschedule as needed. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705101426380.109808@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	dax: fix PMD data corruption when fault races with write	Ross Zwisler	1	-14/+14
	This is based on a patch from Jan Kara that fixed the equivalent race in the DAX PTE fault path. Currently DAX PMD read fault can race with write(2) in the following way: CPU1 - write(2) CPU2 - read fault dax_iomap_pmd_fault() ->iomap_begin() - sees hole dax_iomap_rw() iomap_apply() ->iomap_begin - allocates blocks dax_iomap_actor() invalidate_inode_pages2_range() - there's nothing to invalidate grab_mapping_entry() - we add huge zero page to the radix tree and map it to page tables The result is that hole page is mapped into page tables (and thus zeros are seen in mmap) while file has data written in that place. Fix the problem by locking exception entry before mapping blocks for the fault. That way we are sure invalidate_inode_pages2_range() call for racing write will either block on entry lock waiting for the fault to finish (and unmap stale page tables after that) or read fault will see already allocated blocks by write(2). Fixes: 9f141d6ef6258 ("dax: Call ->iomap_begin without entry lock during dax fault") Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	dax: fix data corruption when fault races with write	Jan Kara	1	-16/+16
	Currently DAX read fault can race with write(2) in the following way: CPU1 - write(2) CPU2 - read fault dax_iomap_pte_fault() ->iomap_begin() - sees hole dax_iomap_rw() iomap_apply() ->iomap_begin - allocates blocks dax_iomap_actor() invalidate_inode_pages2_range() - there's nothing to invalidate grab_mapping_entry() - we add zero page in the radix tree and map it to page tables The result is that hole page is mapped into page tables (and thus zeros are seen in mmap) while file has data written in that place. Fix the problem by locking exception entry before mapping blocks for the fault. That way we are sure invalidate_inode_pages2_range() call for racing write will either block on entry lock waiting for the fault to finish (and unmap stale page tables after that) or read fault will see already allocated blocks by write(2). Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	ext4: return to starting transaction in ext4_dax_huge_fault()	Jan Kara	1	-4/+17
	DAX will return to locking exceptional entry before mapping blocks for a page fault to fix possible races with concurrent writes. To avoid lock inversion between exceptional entry lock and transaction start, start the transaction already in ext4_dax_huge_fault(). Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	mm: fix data corruption due to stale mmap reads	Jan Kara	2	-2/+12
	Currently, we didn't invalidate page tables during invalidate_inode_pages2() for DAX. That could result in e.g. 2MiB zero page being mapped into page tables while there were already underlying blocks allocated and thus data seen through mmap were different from data seen by read(2). The following sequence reproduces the problem: - open an mmap over a 2MiB hole - read from a 2MiB hole, faulting in a 2MiB zero page - write to the hole with write(3p). The write succeeds but we incorrectly leave the 2MiB zero page mapping intact. - via the mmap, read the data that was just written. Since the zero page mapping is still intact we read back zeroes instead of the new data. Fix the problem by unconditionally calling invalidate_inode_pages2_range() in dax_iomap_actor() for new block allocations and by properly invalidating page tables in invalidate_inode_pages2_range() for DAX mappings. Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55 Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	dax: prevent invalidation of mapped DAX entries	Ross Zwisler	3	-36/+3
	Patch series "mm,dax: Fix data corruption due to mmap inconsistency", v4. This series fixes data corruption that can happen for DAX mounts when page faults race with write(2) and as a result page tables get out of sync with block mappings in the filesystem and thus data seen through mmap is different from data seen through read(2). The series passes testing with t_mmap_stale test program from Ross and also other mmap related tests on DAX filesystem. This patch (of 4): dax_invalidate_mapping_entry() currently removes DAX exceptional entries only if they are clean and unlocked. This is done via: invalidate_mapping_pages() invalidate_exceptional_entry() dax_invalidate_mapping_entry() However, for page cache pages removed in invalidate_mapping_pages() there is an additional criteria which is that the page must not be mapped. This is noted in the comments above invalidate_mapping_pages() and is checked in invalidate_inode_page(). For DAX entries this means that we can can end up in a situation where a DAX exceptional entry, either a huge zero page or a regular DAX entry, could end up mapped but without an associated radix tree entry. This is inconsistent with the rest of the DAX code and with what happens in the page cache case. We aren't able to unmap the DAX exceptional entry because according to its comments invalidate_mapping_pages() isn't allowed to block, and unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem. Since we essentially never have unmapped DAX entries to evict from the radix tree, just remove dax_invalidate_mapping_entry(). Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate") Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> [4.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	Tigran has moved	Andrew Morton	6	-7/+7
	Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	mm, vmalloc: fix vmalloc users tracking properly	Michal Hocko	3	-17/+26
	Commit 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users") has pulled asm/pgtable.h include dependency to linux/vmalloc.h and that turned out to be a bad idea for some architectures. E.g. m68k fails with In file included from arch/m68k/include/asm/pgtable_mm.h:145:0, from arch/m68k/include/asm/pgtable.h:4, from include/linux/vmalloc.h:9, from arch/m68k/kernel/module.c:9: arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page': >> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function) #define pgd_offset_k(address) pgd_offset(&init_mm, address) as spotted by kernel build bot. nios2 fails for other reason In file included from include/asm-generic/io.h:767:0, from arch/nios2/include/asm/io.h:61, from include/linux/io.h:25, from arch/nios2/include/asm/pgtable.h:18, from include/linux/mm.h:70, from include/linux/pid_namespace.h:6, from include/linux/ptrace.h:9, from arch/nios2/include/uapi/asm/elf.h:23, from arch/nios2/include/asm/elf.h:22, from include/linux/elf.h:4, from include/linux/module.h:15, from init/main.c:16: include/linux/vmalloc.h: In function '__vmalloc_node_flags': include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'? which is due to the newly added #include <asm/pgtable.h>, which on nios2 includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which again includes <linux/vmalloc.h>. Tweaking that around just turns out a bigger headache than necessary. This patch reverts 1f5307b1e094 and reimplements the original fix in a different way. __vmalloc_node_flags can stay static inline which will cover vmalloc* functions. We only have one external user (kvmalloc_node) and we can export __vmalloc_node_flags_caller and provide the caller directly. This is much simpler and it doesn't really need any games with header files. [akpm@linux-foundation.org: coding-style fixes] [mhocko@kernel.org: revert old comment] Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz Fixes: 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users") Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tobias Klauser <tklauser@distanz.ch> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin	SeongJae Park	1	-1/+3
	One return case of `__collapse_huge_page_swapin()` does not invoke tracepoint while every other return case does. This commit adds a tracepoint invocation for the case. Link: http://lkml.kernel.org/r/20170507101813.30187-1-sj38.park@gmail.com Signed-off-by: SeongJae Park <sj38.park@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	gcov: support GCC 7.1	Martin Liska	2	-1/+9
	Starting from GCC 7.1, __gcov_exit is a new symbol expected to be implemented in a profiling runtime. [akpm@linux-foundation.org: coding-style fixes] [mliska@suse.cz: v2] Link: http://lkml.kernel.org/r/e63a3c59-0149-c97e-4084-20ca8f146b26@suse.cz Link: http://lkml.kernel.org/r/8c4084fa-3885-29fe-5fc4-0d4ca199c785@suse.cz Signed-off-by: Martin Liska <mliska@suse.cz> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	mm, vmstat: Remove spurious WARN() during zoneinfo print	Reza Arbab	1	-2/+0
	After commit e2ecc8a79ed4 ("mm, vmstat: print non-populated zones in zoneinfo"), /proc/zoneinfo will show unpopulated zones. A memoryless node, having no populated zones at all, was previously ignored, but will now trigger the WARN() in is_zone_first_populated(). Remove this warning, as its only purpose was to warn of a situation that has since been enabled. Aside: The "per-node stats" are still printed under the first populated zone, but that's not necessarily the first stanza any more. I'm not sure which criteria is more important with regard to not breaking parsers, but it looks a little weird to the eye. Fixes: e2ecc8a79ed4 ("mm, vmstat: print node-based stats in zoneinfo file") Link: http://lkml.kernel.org/r/1493854905-10918-1-git-send-email-arbab@linux.vnet.ibm.com Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	time: delete current_fs_time()	Deepa Dinamani	2	-15/+0
	All uses of the current_fs_time() function have been replaced by other time interfaces. And, its use cases can be fulfilled by current_time() or ktime_get_* variants. Link: http://lkml.kernel.org/r/1491613030-11599-13-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	hwpoison, memcg: forcibly uncharge LRU pages	Michal Hocko	2	-1/+8
	Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In his particular case he has hit a bad_page("page still charged to cgroup") when onlining a hwpoison page. While this looks like something that shouldn't happen in the first place because onlining hwpages and returning them to the page allocator makes only little sense it shows a real problem. hwpoison pages do not get freed usually so we do not uncharge them (at least not since commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")). Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css reference for each charged page")) as well and so the mem_cgroup and the associated state will never go away. Fix this leak by forcibly uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also have to tweak uncharge_list because it cannot rely on zero ref count for these pages. [akpm@linux-foundation.org: coding-style fixes] Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") Link: http://lkml.kernel.org/r/20170502185507.GB19165@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12	sound: Disable the build of OSS drivers	Takashi Iwai	1	-0/+1
	OSS drivers are left as badly unmaintained, and now we're facing a problem to clean up the hackish set_fs() usage in their codes. Since most of drivers have been covered by ALSA, and the others are dead old and inactive, let's leave them RIP. This patch is the first step: disable the build of OSS drivers. We'll eventually drop the whole codes and clean up later. Note that sound/oss/dmasound is still kept, since it's a completely different implementation of OSS, and it doesn't suffer from set_fs() hack. Moreover, the build of ALSA is disabled on M68K by some reason, thus disabling it shall result in a regression. This one will be disabled / removed once when we add the support in ALSA side. Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-05-12	drm/i915: Make vblank evade warnings optional	Ville Syrjälä	2	-2/+18
	Add a new Kconfig option to enable/disable the extra warnings from the vblank evade code. For now we'll keep the warning about an actually missed vblank always enabled as that can have an actual user visible impact. But if we miss the deadline othrwise there's no real need to bother the user with that. We'll want these warnings enabled during development however so that we can catch regressions. Based on the reports it looks like this is still very easy to hit on SKL, so we have more work ahead of us to optimize the crtiical section further. Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reported-by: Jens Axboe <axboe@kernel.dk> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: e1edbd44e23b ("drm/i915: Complain if we take too long under vblank evasion.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-11	Input: cros_ec_keyb - remove extraneous 'const'	Arnd Bergmann	1	-1/+1
	gcc-7 warns about 'const SIMPLE_DEV_PM_OPS', as that macro already contains a 'const' keyword: drivers/input/keyboard/cros_ec_keyb.c:663:14: error: duplicate 'const' declaration specifier [-Werror=duplicate-decl-specifier] static const SIMPLE_DEV_PM_OPS(cros_ec_keyb_pm_ops, NULL, cros_ec_keyb_resume); This removes the extra one. Fixes: 6af6dc2d2aa6 ("input: Add ChromeOS EC keyboard driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-05-12	drm/nouveau/therm: remove ineffective workarounds for alarm bugs	Ben Skeggs	4	-4/+4
	These were ineffective due to touching the list without the alarm lock, but should no longer be required. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org
2017-05-12	drm/nouveau/tmr: avoid processing completed alarms when adding a new one	Ben Skeggs	1	-3/+13
	The idea here was to avoid having to "manually" program the HW if there's a new earliest alarm. This was lazy and bad, as it leads to loads of fun races between inter-related callers (ie. therm). Turns out, it's not so difficult after all. Go figure ;) Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org
2017-05-12	drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm	Ben Skeggs	1	-7/+10
	At least therm/fantog "attempts" to work around this issue, which could lead to corruption of the pending alarm list. Fix it properly by not updating the timestamp without the lock held, or trying to add an already pending alarm to the pending alarm list.... Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org
2017-05-12	drm/nouveau/tmr: handle races with hw when updating the next alarm time	Ben Skeggs	1	-10/+16
	If the time to the next alarm is short enough, we could race with HW and end up with an ~4 second delay until it triggers. Fix this by checking again after we update HW. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org
2017-05-12	drm/nouveau/tmr: ack interrupt before processing alarms	Ben Skeggs	1	-1/+1
	Fixes a race where we can miss an alarm that triggers while we're already processing previous alarms. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org
2017-05-12	drm/nouveau/core: fix static checker warning	Ben Skeggs	1	-1/+1
	object->engine cannot be NULL, it's either valid, or an error pointer. This particular condition shouldn't actually be possible, but just in case, we'll keep it. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-05-12	drm/nouveau/fb/ram/gf100-: remove 0x10f200 read	Ben Skeggs	1	-1/+0
	This reg has moved on Pascal, and causes a bus fault. We never use the value anyway, so just remove the read. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-05-12	drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes	Ben Skeggs	1	-3/+7
	The DRM core used to only call prepare_fb/cleanup_fb() when a plane's framebuffer changed, which achieved the desired effect. It's apparently now up to the driver to decide on its own. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org [4.11+]
2017-05-12	drm/nouveau/kms/nv50: fix source-rect-only plane updates	Ben Skeggs	1	-5/+3
	This "optimisation" (which was originally meant to skip updating cursor settings in the core channel on position-only updates) turned out to be pointless in the final design of the code before it was merged. Remove it completely, as it breaks other cases. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org [4.10+]
2017-05-12	drm/nouveau/kms/nv50: remove pointless argument to window atomic_check_acquire()	Ben Skeggs	1	-7/+6
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-05-11	doc: replace FTP URL to kernel.org with HTTPS one	Michael Heimpold	1	-1/+1
	FTP services were shutdown some weeks ago, so the FTP URL does not work anymore. Fix this by replacing it with corresponding HTTPS URL. Signed-off-by: Michael Heimpold <michael.heimpold@i2se.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-05-11	block: handle partial completions for special payload requests	Christoph Hellwig	1	-12/+12
	SCSI devices can return short writes on Write Same just like for normal writes, so we need to handle this case for our special payload requests as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-11	xen: adjust early dom0 p2m handling to xen hypervisor behavior	Juergen Gross	1	-3/+4
	When booted as pv-guest the p2m list presented by the Xen is already mapped to virtual addresses. In dom0 case the hypervisor might make use of 2M- or 1G-pages for this mapping. Unfortunately while being properly aligned in virtual and machine address space, those pages might not be aligned properly in guest physical address space. So when trying to obtain the guest physical address of such a page pud_pfn() and pmd_pfn() must be avoided as those will mask away guest physical address bits not being zero in this special case. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-11	x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen	Juergen Gross	2	-3/+3
	When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set on AMD cpus. This bug/feature bit is kind of special as it will be used very early when switching threads. Setting the bit and clearing it a little bit later leaves a critical window where things can go wrong. This time window has enlarged a little bit by using setup_clear_cpu_cap() instead of the hypervisor's set_cpu_features callback. It seems this larger window now makes it rather easy to hit the problem. The proper solution is to never set the bit in case of Xen. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-11	arm64: Silence first allocation with CONFIG_ARM64_MODULE_PLTS=y	Florian Fainelli	1	-1/+6
	When CONFIG_ARM64_MODULE_PLTS is enabled, the first allocation using the module space fails, because the module is too big, and then the module allocation is attempted from vmalloc space. Silence the first allocation failure in that case by setting __GFP_NOWARN. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-11	ARM: Silence first allocation with CONFIG_ARM_MODULE_PLTS=y	Florian Fainelli	1	-2/+9
	When CONFIG_ARM_MODULE_PLTS is enabled, the first allocation using the module space fails, because the module is too big, and then the module allocation is attempted from vmalloc space. Silence the first allocation failure in that case by setting __GFP_NOWARN. Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>