aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-12-16use __netdev_notify_peers in ibmvnicLijun Pan1-6/+3
Start to use the lockless version of netdev_notify_peers. Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16net: core: introduce __netdev_notify_peersLijun Pan2-2/+21
There are some use cases for netdev_notify_peers in the context when rtnl lock is already held. Introduce lockless version of netdev_notify_peers call to save the extra code to call call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, dev); call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev); After that, convert netdev_notify_peers to call the new helper. Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove functionChristophe JAILLET1-2/+5
'irq_of_parse_and_map()' should be balanced by a corresponding 'irq_dispose_mapping()' call. Otherwise, there is some resources leaks. Add such a call in the error handling path of the probe function and in the remove function. Fixes: 492205050d77 ("net: Add EMAC ethernet driver found on Allwinner A10 SoC's") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20201214202117.146293-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16ethtool: fix string set id checkMichal Kubecek1-1/+1
Syzbot reported a shift of a u32 by more than 31 in strset_parse_request() which is undefined behavior. This is caused by range check of string set id using variable ret (which is always 0 at this point) instead of id (string set id from request). Fixes: 71921690f974 ("ethtool: provide string sets with STRSET_GET request") Reported-by: syzbot+96523fb438937cd01220@syzkaller.appspotmail.com Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/r/b54ed5c5fd972a59afea3e1badfb36d86df68799.1607952208.git.mkubecek@suse.cz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16net: mscc: ocelot: Fix a resource leak in the error handling path of the probe functionChristophe JAILLET1-1/+7
In case of error after calling 'ocelot_init()', it must be undone by a corresponding 'ocelot_deinit()' call, as already done in the remove function. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201213114838.126922-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16net/connector: Add const qualifier to cb_idGeoff Levand4-12/+12
The connector driver never modifies any cb_id passed to it, so add a const qualifier to those arguments so callers can declare their struct cb_id as a constant object. Fixes build warnings like these when passing a constant struct cb_id: warning: passing argument 1 of ‘cn_add_callback’ discards ‘const’ qualifier from pointer target Signed-off-by: Geoff Levand <geoff@infradead.org> Link: https://lore.kernel.org/r/a9e49c9e-67fa-16e7-0a6b-72f6bd30c58a@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16net: bcmgenet: Fix a resource leak in an error handling path in the probe functinChristophe JAILLET1-1/+3
If the 'register_netdev()' call fails, we must undo a previous 'bcmgenet_mii_init()' call. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201212182005.120437-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16dpaa2-eth: fix the size of the mapped SGT bufferIoana Ciornei1-1/+1
This patch fixes an error condition triggered when the code path which transmits a S/G frame descriptor when the skb's headroom is not enough for DPAA2's needs. We are greated with a splat like the one below when a SGT structure is recycled and that is because even though a dma_unmap is performed on the Tx confirmation path, the unmap is not done with the proper size. [ 714.464927] WARNING: CPU: 13 PID: 0 at drivers/iommu/io-pgtable-arm.c:281 __arm_lpae_map+0x2d4/0x30c (...) [ 714.465343] Call trace: [ 714.465348] __arm_lpae_map+0x2d4/0x30c [ 714.465353] __arm_lpae_map+0x114/0x30c [ 714.465357] __arm_lpae_map+0x114/0x30c [ 714.465362] __arm_lpae_map+0x114/0x30c [ 714.465366] arm_lpae_map+0xf4/0x180 [ 714.465373] arm_smmu_map+0x4c/0xc0 [ 714.465379] __iommu_map+0x100/0x2bc [ 714.465385] iommu_map_atomic+0x20/0x30 [ 714.465391] __iommu_dma_map+0xb0/0x110 [ 714.465397] iommu_dma_map_page+0xb8/0x120 [ 714.465404] dma_map_page_attrs+0x1a8/0x210 [ 714.465413] __dpaa2_eth_tx+0x384/0xbd0 [fsl_dpaa2_eth] [ 714.465421] dpaa2_eth_tx+0x84/0x134 [fsl_dpaa2_eth] [ 714.465427] dev_hard_start_xmit+0x10c/0x2b0 [ 714.465433] sch_direct_xmit+0x1a0/0x550 (...) The dpaa2-eth driver uses an area of software annotations to transmit necessary information from the Tx path to the Tx confirmation one. This SWA structure has a different layout for each kind of frame that we are dealing with: linear, S/G or XDP. The commit referenced was incorrectly setting up the 'sgt_size' field for the S/G type of SWA even though we are dealing with a linear skb here. Fixes: d70446ee1f40 ("dpaa2-eth: send a scatter-gather FD instead of realloc-ing") Reported-by: Daniel Thompson <daniel.thompson@linaro.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20201211171607.108034-1-ciorneiioana@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16net: dsa: qca: ar9331: fix sleeping function called from invalid context bugOleksij Rempel1-9/+24
With lockdep enabled, we will get following warning: ar9331_switch ethernet.1:10 lan0 (uninitialized): PHY [!ahb!ethernet@1a000000!mdio!switch@10:00] driver [Qualcomm Atheros AR9331 built-in PHY] (irq=13) BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 18, name: kworker/0:1 INFO: lockdep is turned off. irq event stamp: 602 hardirqs last enabled at (601): [<8073fde0>] _raw_spin_unlock_irq+0x3c/0x80 hardirqs last disabled at (602): [<8073a4f4>] __schedule+0x184/0x800 softirqs last enabled at (0): [<80080f60>] copy_process+0x578/0x14c8 softirqs last disabled at (0): [<00000000>] 0x0 CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.10.0-rc3-ar9331-00734-g7d644991df0c #31 Workqueue: events deferred_probe_work_func Stack : 80980000 80980000 8089ef70 80890000 804b5414 80980000 00000002 80b53728 00000000 800d1268 804b5414 ffffffde 00000017 800afe08 81943860 0f5bfc32 00000000 00000000 8089ef70 819436c0 ffffffea 00000000 00000000 00000000 8194390c 808e353c 0000000f 66657272 80980000 00000000 00000000 80890000 804b5414 80980000 00000002 80b53728 00000000 00000000 00000000 80d40000 ... Call Trace: [<80069ce0>] show_stack+0x9c/0x140 [<800afe08>] ___might_sleep+0x220/0x244 [<8073bfb0>] __mutex_lock+0x70/0x374 [<8073c2e0>] mutex_lock_nested+0x2c/0x38 [<804b5414>] regmap_update_bits_base+0x38/0x8c [<804ee584>] regmap_update_bits+0x1c/0x28 [<804ee714>] ar9331_sw_unmask_irq+0x34/0x60 [<800d91f0>] unmask_irq+0x48/0x70 [<800d93d4>] irq_startup+0x114/0x11c [<800d65b4>] __setup_irq+0x4f4/0x6d0 [<800d68a0>] request_threaded_irq+0x110/0x190 [<804e3ef0>] phy_request_interrupt+0x4c/0xe4 [<804df508>] phylink_bringup_phy+0x2c0/0x37c [<804df7bc>] phylink_of_phy_connect+0x118/0x130 [<806c1a64>] dsa_slave_create+0x3d0/0x578 [<806bc4ec>] dsa_register_switch+0x934/0xa20 [<804eef98>] ar9331_sw_probe+0x34c/0x364 [<804eb48c>] mdio_probe+0x44/0x70 [<8049e3b4>] really_probe+0x30c/0x4f4 [<8049ea10>] driver_probe_device+0x264/0x26c [<8049bc10>] bus_for_each_drv+0xb4/0xd8 [<8049e684>] __device_attach+0xe8/0x18c [<8049ce58>] bus_probe_device+0x48/0xc4 [<8049db70>] deferred_probe_work_func+0xdc/0xf8 [<8009ff64>] process_one_work+0x2e4/0x4a0 [<800a0770>] worker_thread+0x2a8/0x354 [<800a774c>] kthread+0x16c/0x174 [<8006306c>] ret_from_kernel_thread+0x14/0x1c ar9331_switch ethernet.1:10 lan1 (uninitialized): PHY [!ahb!ethernet@1a000000!mdio!switch@10:02] driver [Qualcomm Atheros AR9331 built-in PHY] (irq=13) DSA: tree 0 setup To fix it, it is better to move access to MDIO register to the .irq_bus_sync_unlock call back. Fixes: ec6698c272de ("net: dsa: add support for Atheros AR9331 built-in switch") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20201211110317.17061-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16i40e, xsk: clear the status bits for the next_to_use descriptorBjörn Töpel1-1/+4
On the Rx side, the next_to_use index points to the next item in the HW ring to be refilled/allocated, and next_to_clean points to the next item to potentially be processed. When the HW Rx ring is fully refilled, i.e. no packets has been processed, the next_to_use will be next_to_clean - 1. When the ring is fully processed next_to_clean will be equal to next_to_use. The latter case is where a bug is triggered. If the next_to_use bits are not cleared, and the "fully processed" state is entered, a stale descriptor can be processed. The skb-path correctly clear the status bit for the next_to_use descriptor, but the AF_XDP zero-copy path did not do that. This change adds the status bits clearing of the next_to_use descriptor. Fixes: 3b4f0b66c2b3 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16ice, xsk: clear the status bits for the next_to_use descriptorBjörn Töpel1-1/+4
On the Rx side, the next_to_use index points to the next item in the HW ring to be refilled/allocated, and next_to_clean points to the next item to potentially be processed. When the HW Rx ring is fully refilled, i.e. no packets has been processed, the next_to_use will be next_to_clean - 1. When the ring is fully processed next_to_clean will be equal to next_to_use. The latter case is where a bug is triggered. If the next_to_use bits are not cleared, and the "fully processed" state is entered, a stale descriptor can be processed. The skb-path correctly clear the status bit for the next_to_use descriptor, but the AF_XDP zero-copy path did not do that. This change adds the status bits clearing of the next_to_use descriptor. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16lan743x: fix rx_napi_poll/interrupt ping-pongSven Van Asbroeck1-20/+23
Even if there is more rx data waiting on the chip, the rx napi poll fn will never run more than once - it will always read a few buffers, then bail out and re-arm interrupts. Which results in ping-pong between napi and interrupt. This defeats the purpose of napi, and is bad for performance. Fix by making the rx napi poll behave identically to other ethernet drivers: 1. initialize rx napi polling with an arbitrary budget (64). 2. in the polling fn, return full weight if rx queue is not depleted, this tells the napi core to "keep polling". 3. update the rx tail ("ring the doorbell") once for every 8 processed rx ring buffers. Thanks to Jakub Kicinski, Eric Dumazet and Andrew Lunn for their expert opinions and suggestions. Tested with 20 seconds of full bandwidth receive (iperf3): rx irqs softirqs(NET_RX) ----------------------------- before 23827 33620 after 129 4081 Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430 Fixes: 23f0703c125be ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Link: https://lore.kernel.org/r/20201215161954.5950-1-TheSven73@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-15mm: cleanup kstrto*() usageAlexey Dobriyan2-18/+18
Range checks can folded into proper conversion function. kstrto*() exist for all arithmetic types. Link: https://lkml.kernel.org/r/20201122123759.GC92364@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: fix fall-through warnings for ClangGustavo A. R. Silva2-0/+2
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple of warnings by explicitly adding a break statement instead of just letting the code fall through to the next, and by adding a fallthrough pseudo-keyword in places where the code is intended to fall through. Link: https://github.com/KSPP/linux/issues/115 Link: https://lkml.kernel.org/r/f5756988b8842a3f10008fbc5b0a654f828920a9.1605896059.git.gustavoars@kernel.org Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_atJoe Perches1-73/+77
Convert the unbounded uses of sprintf to sysfs_emit. A few conversions may now not end in a newline if the output buffer is overflowed. Link: https://lkml.kernel.org/r/0c90a90f466167f8c37de4b737553cf49c4a277f.1605376435.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: shmem: convert shmem_enabled_show to use sysfs_emit_atJoe Perches1-9/+12
Update the function to use sysfs_emit_at while neatening the uses of sprintf and overwriting the last space char with a newline to avoid possible output buffer overflow. Miscellanea: - in shmem_enabled_show, the removal of the indirected use of fmt allows __printf verification Link: https://lkml.kernel.org/r/b612a93825e5ea330cb68d2e8b516e9687a06cc6.1605376435.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm:backing-dev: use sysfs_emit in macro defining functionsJoe Perches1-4/+4
The cocci script used in commit bdacbb8d04f ("mm: Use sysfs_emit for struct kobject * uses") does not convert the name##_show macro because the macro uses concatenation via ##. Convert it by hand. Link: https://lkml.kernel.org/r/45ec6cfc177d743f9c0ebaf35e43969dce43af42.1605376435.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neateningJoe Perches1-21/+31
Convert the only use of sprintf with struct kobject * that the cocci script could not convert. Miscellanea: - Neaten the uses of a constant string with sysfs_emit to use a const char * to reduce overall object size Link: https://lkml.kernel.org/r/7df6be66bbd68e1a0bca9d35aca1341dbf94d2a7.1605376435.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: use sysfs_emit for struct kobject * usesJoe Perches5-46/+52
Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2. Use the new sysfs_emit family and not the sprintf family. This patch (of 5): Use the sysfs_emit function instead of the sprintf family. Done with cocci script as in commit 3c6bff3cf988 ("RDMA: Convert sysfs kobject * show functions to use sysfs_emit()") Link: https://lkml.kernel.org/r/cover.1605376435.git.joe@perches.com Link: https://lkml.kernel.org/r/9c249215bad6df616ba0410ad980042694970c1b.1605376435.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: fix kernel-doc markupsMauro Carvalho Chehab3-22/+24
Kernel-doc markups should use this format: identifier - description Fix some issues on mm files: 1) The definition for get_user_pages_locked() doesn't follow it. Also, it expects a short descrpition at the header, followed by a long one, after the parameters. Fix it. 2) Kernel-doc requires that a kernel-doc markup to be immediately below the function prototype, as otherwise it will rename it. So, move get_pfnblock_flags_mask() description to the right place. 3) Make invalidate_mapping_pagevec() to also follow the expected kernel-doc format. While here, fix a few minor English syntax issues, as suggested by Matthew: will used -> will be used similar with -> similar to Link: https://lkml.kernel.org/r/80e85dddc92d333bc2159ee8a2294921612e8745.1605521731.git.mchehab+huawei@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Suggested-by: Mattew Wilcox <willy@infradead.org> [English fixes] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15zram: break the strict dependency from lzoRui Salvaterra3-2/+44
From the beginning, the zram block device always enabled CRYPTO_LZO, since lzo-rle is hardcoded as the fallback compression algorithm. As a consequence, on systems where another compression algorithm is chosen (e.g. CRYPTO_ZSTD), the lzo kernel module becomes unused, while still having to be built/loaded. This patch removes the hardcoded lzo-rle dependency and allows the user to select the default compression algorithm for zram at build time. The previous behaviour is kept, as the default algorithm is still lzo-rle. Link: https://lkml.kernel.org/r/20201207121245.50529-1-rsalvaterra@gmail.com Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com> Suggested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15zram: add stat to gather incompressible pages since zram set upMinchan Kim3-2/+6
Currently, zram supports the stat via /sys/block/zram/mm_stat to represent how many of incompressible pages are stored at the moment but it couldn't show how many times incompressible pages were wrote down since zram set up. It's also good indication to see how zram is effective in the system. Link: https://lkml.kernel.org/r/20201130201907.1284910-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15zram: support page writebackMinchan Kim2-4/+22
There is demand to writeback specific process pages to backing store instead of all idles pages in the system due to storage wear out concerns and to launching latency of apps which are most of the time idle but are critical for resume latency. This patch extends the writeback knob to support a specific page writeback. Link: https://lkml.kernel.org/r/20201020190506.3758660-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/process_vm_access: remove redundant initialization of iov_rColin Ian King1-1/+1
The pointer iov_r is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Link: https://lkml.kernel.org/r/20201102120614.694917-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/zsmalloc.c: rework the list_add code in insert_zspage()Miaohe Lin1-7/+4
Rework the list_add code to make it more readable and simple. Link: https://lkml.kernel.org/r/20201015130107.65195-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/zswap: move to use crypto_acomp API for hardware accelerationBarry Song1-46/+137
Right now, all new ZIP drivers are adapted to crypto_acomp APIs rather than legacy crypto_comp APIs. Tradiontal ZIP drivers like lz4,lzo etc have been also wrapped into acomp via scomp backend. But zswap.c is still using the old APIs. That means zswap won't be able to work on any new ZIP drivers in kernel. This patch moves to use cryto_acomp APIs to fix the disconnected bridge between new ZIP drivers and zswap. It is probably the first real user to use acomp but perhaps not a good example to demonstrate how multiple acomp requests can be executed in parallel in one acomp instance. frontswap is doing page load and store page by page synchronously. swap_writepage() depends on the completion of frontswap_store() to decide if it should call __swap_writepage() to swap to disk. However this patch creates multiple acomp instances, so multiple threads running on multiple different cpus can actually do (de)compression parallelly, leveraging the power of multiple ZIP hardware queues. This is also consistent with frontswap's page management model. The old zswap code uses atomic context and avoids the race conditions while shared resources like zswap_dstmem are accessed. Here since acomp can sleep, per-cpu mutex is used to replace preemption-disable. While it is possible to make mm/page_io.c and mm/frontswap.c support async (de)compression in some way, the entire design requires careful thinking and performance evaluation. For the first step, the base with fixed connection between ZIP drivers and zswap should be built. Link: https://lkml.kernel.org/r/20201107065332.26992-1-song.bao.hua@hisilicon.com Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Acked-by: Vitaly Wool <vitalywool@gmail.com> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Mahipal Challa <mahipalreddy2006@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Zhou Wang <wangzhou1@hisilicon.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/zswap: fix passing zero to 'PTR_ERR' warningYueHaibing1-1/+1
Fix smatch warning: mm/zswap.c:425 zswap_cpu_comp_prepare() warn: passing zero to 'PTR_ERR' crypto_alloc_comp() never return NULL, use IS_ERR instead of IS_ERR_OR_NULL to fix this. Link: https://lkml.kernel.org/r/20201031055615.28080-1-yuehaibing@huawei.com Fixes: f1c54846ee45 ("zswap: dynamic pool creation") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/zswap: make struct kernel_param_ops definitions constJoe Perches1-3/+3
These should be const, so make it so. Link: https://lkml.kernel.org/r/1791535ee0b00f4a5c68cc4a8adada06593ad8f1.1601770305.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15userfaultfd/selftests: hint the test runner on required privilegePeter Xu1-1/+2
Now userfaultfd test program requires either root or ptrace privilege due to the signal/event tests. When UFFDIO_API failed, hint the test runner about this fact verbosely. Link: https://lkml.kernel.org/r/20201208024709.7701-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15userfaultfd/selftests: fix retval check for userfaultfd_open()Peter Xu1-4/+4
userfaultfd_open() returns 1 for errors rather than negatives. Fix it on all the callers so when UFFDIO_API failed the test will bail out. Link: https://lkml.kernel.org/r/20201208024709.7701-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15userfaultfd/selftests: always dump something in modesPeter Xu1-0/+2
Patch series "userfaultfd: selftests: Small fixes". Some very trivial fixes that I kept locally to userfaultfd selftest program. This patch (of 3): BOUNCE_POLL is a special bit that if cleared it means "READ" instead. Dump that too otherwise we'll see tests with empty modes. Link: https://lkml.kernel.org/r/20201208024709.7701-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20201208024709.7701-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15userfaultfd: selftests: make __{s,u}64 format specifiers portableAxel Rasmussen1-46/+35
On certain platforms (powerpcle is the one on which I ran into this), "%Ld" and "%Lu" are unsuitable for printing __s64 and __u64, respectively, resulting in build warnings. Cast to {u,}int64_t, and use the PRI{d,u}64 macros defined in inttypes.h to print them. This ought to be portable to all platforms. Splitting this off into a separate macro lets us remove some lines, and get rid of some (I would argue) stylistically odd cases where we joined printf() and exit() into a single statement with a ,. Finally, this also fixes a "missing braces around initializer" warning when we initialize prms in wp_range(). [axelrasmussen@google.com: v2] Link: https://lkml.kernel.org/r/20201203180244.1811601-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20201202211542.1121189-1-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Alan Gilbert <dgilbert@redhat.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knobLokesh Gidra2-7/+18
With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. The main reason this change is desirable as in the short term is that the Android userland will behave as with the sysctl set to zero. So without this commit, any Linux binary using userfaultfd to manage its memory would behave differently if run within the Android userland. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Daniel Colascione <dancol@dancol.org> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: <calin@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Shaohua Li <shli@fb.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nitin Gupta <nigupta@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Daniel Colascione <dancol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15userfaultfd: add UFFD_USER_MODE_ONLYLokesh Gidra2-1/+18
Patch series "Control over userfaultfd kernel-fault handling", v6. This patch series is split from [1]. The other series enables SELinux support for userfaultfd file descriptors so that its creation and movement can be controlled. It has been demonstrated on various occasions that suspending kernel code execution for an arbitrary amount of time at any access to userspace memory (copy_from_user()/copy_to_user()/...) can be exploited to change the intended behavior of the kernel. For instance, handling page faults in kernel-mode using userfaultfd has been exploited in [2, 3]. Likewise, FUSE, which is similar to userfaultfd in this respect, has been exploited in [4, 5] for similar outcome. This small patch series adds a new flag to userfaultfd(2) that allows callers to give up the ability to handle kernel-mode faults with the resulting UFFD file object. It then adds a 'user-mode only' option to the unprivileged_userfaultfd sysctl knob to require unprivileged callers to use this new flag. The purpose of this new interface is to decrease the chance of an unprivileged userfaultfd user taking advantage of userfaultfd to enhance security vulnerabilities by lengthening the race window in kernel code. [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/ [2] https://duasynt.com/blog/linux-kernel-heap-spray [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit [4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808 This patch (of 2): userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com Signed-off-by: Daniel Colascione <dancol@google.com> Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <calin@google.com> Cc: Daniel Colascione <dancol@dancol.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nitin Gupta <nigupta@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shaohua Li <shli@fb.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm, page_poison: remove CONFIG_PAGE_POISONING_ZEROVlastimil Babka4-28/+2
CONFIG_PAGE_POISONING_ZERO uses the zero pattern instead of 0xAA. It was introduced by commit 1414c7f4f7d7 ("mm/page_poisoning.c: allow for zero poisoning"), noting that using zeroes retains the benefit of sanitizing content of freed pages, with the benefit of not having to zero them again on alloc, and the downside of making some forms of corruption (stray writes of NULLs) harder to detect than with the 0xAA pattern. Together with CONFIG_PAGE_POISONING_NO_SANITY it made possible to sanitize the contents on free without checking it back on alloc. These days we have the init_on_free() option to achieve sanitization with zeroes and to save clearing on alloc (and without checking on alloc). Arguably if someone does choose to check the poison for corruption on alloc, the savings of not clearing the page are secondary, and it makes sense to always use the 0xAA poison pattern. Thus, remove the CONFIG_PAGE_POISONING_ZERO option for being redundant. Link: https://lkml.kernel.org/r/20201113104033.22907-6-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <labbott@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITYVlastimil Babka3-17/+5
CONFIG_PAGE_POISONING_NO_SANITY skips the check on page alloc whether the poison pattern was corrupted, suggesting a use-after-free. The motivation to introduce it in commit 8823b1dbc05f ("mm/page_poison.c: enable PAGE_POISONING as a separate option") was to simply sanitize freed pages, optimally together with CONFIG_PAGE_POISONING_ZERO. These days we have an init_on_free=1 boot option, which makes this use case of page poisoning redundant. For sanitizing, writing zeroes is sufficient, there is pretty much no benefit from writing the 0xAA poison pattern to freed pages, without checking it back on alloc. Thus, remove this option and suggest init_on_free instead in the main config's help. Link: https://lkml.kernel.org/r/20201113104033.22907-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <labbott@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15kernel/power: allow hibernation with page_poison sanity checkingVlastimil Babka5-6/+14
Page poisoning used to be incompatible with hibernation, as the state of poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION forces CONFIG_PAGE_POISONING_NO_SANITY. For the same reason, the poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable hibernation. The latter restriction was removed by commit 1ad1410f632d ("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and similarly for init_on_free by commit 18451f9f9e58 ("PM: hibernate: fix crashes with init_on_free=1") by making sure free pages are cleared after resume. We can use the same mechanism to instead poison free pages with PAGE_POISON after resume. This covers both zero and 0xAA patterns. Thus we can remove the Kconfig restriction that disables page poison sanity checking when hibernation is enabled. Link: https://lkml.kernel.org/r/20201113104033.22907-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernation] Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <labbott@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm, page_poison: use static key more efficientlyVlastimil Babka4-54/+52
Commit 11c9c7edae06 ("mm/page_poison.c: replace bool variable with static key") changed page_poisoning_enabled() to a static key check. However, the function is not inlined, so each check still involves a function call with overhead not eliminated when page poisoning is disabled. Analogically to how debug_pagealloc is handled, this patch converts page_poisoning_enabled() back to boolean check, and introduces page_poisoning_enabled_static() for fast paths. Both functions are inlined. The function kernel_poison_pages() is also called unconditionally and does the static key check inside. Remove it from there and put it to callers. Also split it to two functions kernel_poison_pages() and kernel_unpoison_pages() instead of the confusing bool parameter. Also optimize the check that enables page poisoning instead of debug_pagealloc for architectures without proper debug_pagealloc support. Move the check to init_mem_debugging_and_hardening() to enable a single static key instead of having two static branches in page_poisoning_enabled_static(). Link: https://lkml.kernel.org/r/20201113104033.22907-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <labbott@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parametersVlastimil Babka3-64/+46
Patch series "cleanup page poisoning", v3. I have identified a number of issues and opportunities for cleanup with CONFIG_PAGE_POISON and friends: - interaction with init_on_alloc and init_on_free parameters depends on the order of parameters (Patch 1) - the boot time enabling uses static key, but inefficienty (Patch 2) - sanity checking is incompatible with hibernation (Patch 3) - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have init_on_free (Patch 4) - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we have init_on_free (Patch 5) This patch (of 5): Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1 produces a warning in dmesg that page_poison takes precedence. However, as these warnings are printed in early_param handlers for init_on_alloc/free, they are not printed if page_poison is enabled later on the command line (handlers are called in the order of their parameters), or when init_on_alloc/free is always enabled by the respective config option - before the page_poison early param handler is called, it is not considered to be enabled. This is inconsistent. We can remove the dependency on order by making the init_on_* parameters only set a boolean variable, and postponing the evaluation after all early params have been processed. Introduce a new init_mem_debugging_and_hardening() function for that, and move the related debug_pagealloc processing there as well. As a result init_mem_debugging_and_hardening() knows always accurately if init_on_* and/or page_poison options were enabled. Thus we can also optimize want_init_on_alloc() and want_init_on_free(). We don't need to check page_poisoning_enabled() there, we can instead not enable the init_on_* static keys at all, if page poisoning is enabled. This results in a simpler and more effective code. Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Laura Abbott <labbott@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: cma: improve pr_debug log in cma_release()Charan Teja Reddy1-1/+1
It is required to print 'count' of pages, along with the pages, passed to cma_release to debug the cases of mismatched count value passed between cma_alloc() and cma_release() from a code path. As an example, consider the below scenario: 1) CMA pool size is 4MB and 2) User doing the erroneous step of allocating 2 pages but freeing 1 page in a loop from this CMA pool. The step 2 causes cma_alloc() to return NULL at one point of time because of -ENOMEM condition. And the current pr_debug logs is not giving the info about these types of allocation patterns because of count value not being printed in cma_release(). We are printing the count value in the trace logs, just extend the same to pr_debug logs too. [akpm@linux-foundation.org: fix printk warning] Link: https://lkml.kernel.org/r/1606318341-29521-1-git-send-email-charante@codeaurora.org Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Reviewed-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/cma.c: remove redundant cma_mutex lockLecopzer Chen1-3/+1
The cma_mutex which protects alloc_contig_range() was first appeared in commit 7ee793a62fa8c ("cma: Remove potential deadlock situation"), at that time, there is no guarantee the behavior of concurrency inside alloc_contig_range(). After commit 2c7452a075d4db2dc ("mm/page_isolation.c: make start_isolate_page_range() fail if already isolated") > However, two subsystems (CMA and gigantic > huge pages for example) could attempt operations on the same range. If > this happens, one thread may 'undo' the work another thread is doing. > This can result in pageblocks being incorrectly left marked as > MIGRATE_ISOLATE and therefore not available for page allocation. The concurrency inside alloc_contig_range() was clarified. Now we can find that hugepage and virtio call alloc_contig_range() without any lock, thus cma_mutex is "redundant" in cma_alloc() now. Link: https://lkml.kernel.org/r/20201020102241.3729-1-lecopzer.chen@mediatek.com Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: YJ Chiang <yj.chiang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: migrate: remove unused parameter in migrate_vma_insert_page()Stephen Zhang1-4/+2
"dst" parameter to migrate_vma_insert_page() is not used anymore. Link: https://lkml.kernel.org/r/CANubcdUwCAMuUyamG2dkWP=cqSR9MAS=tHLDc95kQkqU-rEnAg@mail.gmail.com Signed-off-by: Stephen Zhang <starzhangzsd@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: migrate: return -ENOSYS if THP migration is unsupportedYang Shi1-16/+46
In the current implementation unmap_and_move() would return -ENOMEM if THP migration is unsupported, then the THP will be split. If split is failed just exit without trying to migrate other pages. It doesn't make too much sense since there may be enough free memory to migrate other pages and there may be a lot base pages on the list. Return -ENOSYS to make consistent with hugetlb. And if THP split is failed just skip and try other pages on the list. Just skip the whole list and exit when free memory is really low. Link: https://lkml.kernel.org/r/20201113205359.556831-6-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Song Liu <songliubraving@fb.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: migrate: clean up migrate_prep{_local}Yang Shi3-14/+6
The migrate_prep{_local} never fails, so it is pointless to have return value and check the return value. Link: https://lkml.kernel.org/r/20201113205359.556831-5-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: migrate: skip shared exec THP for NUMA balancingYang Shi1-2/+16
The NUMA balancing skip shared exec base page. Since CONFIG_READ_ONLY_THP_FOR_FS was introduced, there are probably shared exec THP, so skip such THPs for NUMA balancing as well. And Willy's regular filesystem THP support patches could create shared exec THP wven without that config. In addition, the page_is_file_lru() is used to tell if the page is file cache or not, but it filters out shmem page. It sounds like a typical usecase by putting executables in shmem to achieve performance gain via using shmem-THP, so it sounds worth skipping migration for such case too. Link: https://lkml.kernel.org/r/20201113205359.556831-4-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Song Liu <songliubraving@fb.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: migrate: simplify the logic for handling permanent failureYang Shi1-30/+38
When unmap_and_move{_huge_page}() returns !-EAGAIN and !MIGRATEPAGE_SUCCESS, the page would be put back to LRU or proper list if it is non-LRU movable page. But, the callers always call putback_movable_pages() to put the failed pages back later on, so it seems not very efficient to put every single page back immediately, and the code looks convoluted. Put the failed page on a separate list, then splice the list to migrate list when all pages are tried. It is the caller's responsibility to call putback_movable_pages() to handle failures. This also makes the code simpler and more readable. After the change the rules are: * Success: non hugetlb page will be freed, hugetlb page will be put back * -EAGAIN: stay on the from list * -ENOMEM: stay on the from list * Other errno: put on ret_pages list then splice to from list The from list would be empty iff all pages are migrated successfully, it was not so before. This has no impact to current existing callsites. Link: https://lkml.kernel.org/r/20201113205359.556831-3-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: truncate_complete_page() does not exist any moreYang Shi2-2/+2
Patch series "mm: misc migrate cleanup and improvement", v3. This patch (of 5): The commit 9f4e41f4717832e ("mm: refactor truncate_complete_page()") refactored truncate_complete_page(), and it is not existed anymore, correct the comment in vmscan and migrate to avoid confusion. Link: https://lkml.kernel.org/r/20201113205359.556831-1-shy828301@gmail.com Link: https://lkml.kernel.org/r/20201113205359.556831-2-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Song Liu <songliubraving@fb.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: support THPs in zero_user_segmentsMatthew Wilcox (Oracle)2-4/+67
We can only kmap() one subpage of a THP at a time, so loop over all relevant subpages, skipping ones which don't need to be zeroed. This is too large to inline when THPs are enabled and we actually need highmem, so put it in highmem.c. [willy@infradead.org: start1 was allowed to be less than start2] Link: https://lkml.kernel.org/r/20201124041507.28996-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Song Liu <songliubraving@fb.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/migrate.c: optimize migrate_vma_pages() mmu notifierRalph Campbell1-5/+4
When migrating a zero page or pte_none() anonymous page to device private memory, migrate_vma_setup() will initialize the src[] array with a NULL PFN. This lets the device driver allocate device private memory and clear it instead of DMAing a page of zeros over the device bus. Since the source page didn't exist at the time, no struct page was locked nor a migration PTE inserted into the CPU page tables. The actual PTE insertion happens in migrate_vma_pages() when it tries to insert the device private struct page PTE into the CPU page tables. migrate_vma_pages() has to call the mmu notifiers again since another device could fault on the same page before the page table locks are acquired. Allow device drivers to optimize the invalidation similar to migrate_vma_setup() by calling mmu_notifier_range_init() which sets struct mmu_notifier_range event type to MMU_NOTIFY_MIGRATE and the migrate_pgmap_owner field. Link: https://lkml.kernel.org/r/20201021191335.10916-1-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/migrate.c: fix comment spellingLong Li1-1/+1
The word in the comment is misspelled, it should be "include". Link: https://lkml.kernel.org/r/20201024114144.GA20552@lilong Signed-off-by: Long Li <lonuxli.64@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>