aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/zram (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-01-06block: convert to bio_first_bvec_all & bio_first_page_allMing Lei1-1/+1
This patch converts to bio_first_bvec_all() & bio_first_page_all() for retrieving the 1st bvec/page, and prepares for supporting multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-15drivers/block/zram/zram_drv.c: make zram_page_end_io() staticColin Ian King1-1/+1
zram_page_end_io() is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'zram_page_end_io' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20171016173336.20320-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15zram: remove zlib from the list of recommended algorithmsSergey Senozhatsky1-3/+0
ZSTD tends to outperform deflate/inflate, thus we remove zlib from the list of recommended algorithms and recommend zstd instead. Link: http://lkml.kernel.org/r/20170912050005.3247-2-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15zram: add zstd to the supported algorithms listSergey Senozhatsky1-0/+3
Add ZSTD to the list of supported compression algorithms. ZRAM fio perf test: LZO DEFLATE ZSTD #jobs1 WRITE: (2180MB/s) (77.2MB/s) (1429MB/s) WRITE: (1617MB/s) (77.7MB/s) (1202MB/s) READ: (426MB/s) (595MB/s) (1181MB/s) READ: (422MB/s) (572MB/s) (1020MB/s) READ: (318MB/s) (67.8MB/s) (563MB/s) WRITE: (318MB/s) (67.9MB/s) (564MB/s) READ: (336MB/s) (68.3MB/s) (583MB/s) WRITE: (335MB/s) (68.2MB/s) (582MB/s) #jobs2 WRITE: (3441MB/s) (152MB/s) (2141MB/s) WRITE: (2507MB/s) (147MB/s) (1888MB/s) READ: (801MB/s) (1146MB/s) (1890MB/s) READ: (767MB/s) (1096MB/s) (2073MB/s) READ: (621MB/s) (126MB/s) (1009MB/s) WRITE: (621MB/s) (126MB/s) (1009MB/s) READ: (656MB/s) (125MB/s) (1075MB/s) WRITE: (657MB/s) (126MB/s) (1077MB/s) #jobs3 WRITE: (4772MB/s) (225MB/s) (3394MB/s) WRITE: (3905MB/s) (211MB/s) (2939MB/s) READ: (1216MB/s) (1608MB/s) (3218MB/s) READ: (1159MB/s) (1431MB/s) (2981MB/s) READ: (906MB/s) (156MB/s) (1457MB/s) WRITE: (907MB/s) (156MB/s) (1458MB/s) READ: (953MB/s) (158MB/s) (1595MB/s) WRITE: (952MB/s) (157MB/s) (1593MB/s) #jobs4 WRITE: (6036MB/s) (265MB/s) (4469MB/s) WRITE: (5059MB/s) (263MB/s) (3951MB/s) READ: (1618MB/s) (2066MB/s) (4276MB/s) READ: (1573MB/s) (1942MB/s) (3830MB/s) READ: (1202MB/s) (227MB/s) (1971MB/s) WRITE: (1200MB/s) (227MB/s) (1968MB/s) READ: (1265MB/s) (226MB/s) (2116MB/s) WRITE: (1264MB/s) (226MB/s) (2114MB/s) #jobs5 WRITE: (5339MB/s) (233MB/s) (3781MB/s) WRITE: (4298MB/s) (234MB/s) (3276MB/s) READ: (1626MB/s) (2048MB/s) (4081MB/s) READ: (1567MB/s) (1929MB/s) (3758MB/s) READ: (1174MB/s) (205MB/s) (1747MB/s) WRITE: (1173MB/s) (204MB/s) (1746MB/s) READ: (1214MB/s) (208MB/s) (1890MB/s) WRITE: (1215MB/s) (208MB/s) (1892MB/s) #jobs6 WRITE: (5666MB/s) (270MB/s) (4338MB/s) WRITE: (4828MB/s) (267MB/s) (3772MB/s) READ: (1803MB/s) (2058MB/s) (4946MB/s) READ: (1805MB/s) (2156MB/s) (4711MB/s) READ: (1334MB/s) (235MB/s) (2135MB/s) WRITE: (1335MB/s) (235MB/s) (2137MB/s) READ: (1364MB/s) (236MB/s) (2268MB/s) WRITE: (1365MB/s) (237MB/s) (2270MB/s) #jobs7 WRITE: (5474MB/s) (270MB/s) (4300MB/s) WRITE: (4666MB/s) (266MB/s) (3817MB/s) READ: (2022MB/s) (2319MB/s) (5472MB/s) READ: (1924MB/s) (2260MB/s) (5031MB/s) READ: (1369MB/s) (242MB/s) (2153MB/s) WRITE: (1370MB/s) (242MB/s) (2155MB/s) READ: (1499MB/s) (246MB/s) (2310MB/s) WRITE: (1497MB/s) (246MB/s) (2307MB/s) #jobs8 WRITE: (5558MB/s) (273MB/s) (4439MB/s) WRITE: (4763MB/s) (271MB/s) (3918MB/s) READ: (2201MB/s) (2599MB/s) (6062MB/s) READ: (2105MB/s) (2463MB/s) (5413MB/s) READ: (1490MB/s) (252MB/s) (2238MB/s) WRITE: (1488MB/s) (252MB/s) (2236MB/s) READ: (1566MB/s) (254MB/s) (2434MB/s) WRITE: (1568MB/s) (254MB/s) (2437MB/s) #jobs9 WRITE: (5120MB/s) (264MB/s) (4035MB/s) WRITE: (4531MB/s) (267MB/s) (3740MB/s) READ: (1940MB/s) (2258MB/s) (4986MB/s) READ: (2024MB/s) (2387MB/s) (4871MB/s) READ: (1343MB/s) (246MB/s) (2038MB/s) WRITE: (1342MB/s) (246MB/s) (2037MB/s) READ: (1553MB/s) (238MB/s) (2243MB/s) WRITE: (1552MB/s) (238MB/s) (2242MB/s) #jobs10 WRITE: (5345MB/s) (271MB/s) (3988MB/s) WRITE: (4750MB/s) (254MB/s) (3668MB/s) READ: (1876MB/s) (2363MB/s) (5150MB/s) READ: (1990MB/s) (2256MB/s) (5080MB/s) READ: (1355MB/s) (250MB/s) (2019MB/s) WRITE: (1356MB/s) (251MB/s) (2020MB/s) READ: (1490MB/s) (252MB/s) (2202MB/s) WRITE: (1488MB/s) (252MB/s) (2199MB/s) jobs1 perfstat instructions 52,065,555,710 ( 0.79) 855,731,114,587 ( 2.64) 54,280,709,944 ( 1.40) branches 14,020,427,116 ( 725.847) 101,733,449,582 (1074.521) 11,170,591,067 ( 992.869) branch-misses 22,626,174 ( 0.16%) 274,197,885 ( 0.27%) 25,915,805 ( 0.23%) jobs2 perfstat instructions 103,633,110,402 ( 0.75) 1,710,822,100,914 ( 2.59) 107,879,874,104 ( 1.28) branches 27,931,237,282 ( 679.203) 203,298,267,479 (1037.326) 22,185,350,842 ( 884.427) branch-misses 46,103,811 ( 0.17%) 533,747,204 ( 0.26%) 49,682,483 ( 0.22%) jobs3 perfstat instructions 154,857,283,657 ( 0.76) 2,565,748,974,197 ( 2.57) 161,515,435,813 ( 1.31) branches 41,759,490,355 ( 670.529) 304,905,605,277 ( 978.765) 33,215,805,907 ( 888.003) branch-misses 74,263,293 ( 0.18%) 759,746,240 ( 0.25%) 76,841,196 ( 0.23%) jobs4 perfstat instructions 206,215,849,076 ( 0.75) 3,420,169,460,897 ( 2.60) 215,003,061,664 ( 1.31) branches 55,632,141,739 ( 666.501) 406,394,977,433 ( 927.241) 44,214,322,251 ( 883.532) branch-misses 102,287,788 ( 0.18%) 1,098,617,314 ( 0.27%) 103,891,040 ( 0.23%) jobs5 perfstat instructions 258,711,315,588 ( 0.67) 4,275,657,533,244 ( 2.23) 269,332,235,685 ( 1.08) branches 69,802,821,166 ( 588.823) 507,996,211,252 ( 797.036) 55,450,846,129 ( 735.095) branch-misses 129,217,214 ( 0.19%) 1,243,284,991 ( 0.24%) 173,512,278 ( 0.31%) jobs6 perfstat instructions 312,796,166,008 ( 0.61) 5,133,896,344,660 ( 2.02) 323,658,769,588 ( 1.04) branches 84,372,488,583 ( 520.541) 610,310,494,402 ( 697.642) 66,683,292,992 ( 693.939) branch-misses 159,438,978 ( 0.19%) 1,396,368,563 ( 0.23%) 174,406,934 ( 0.26%) jobs7 perfstat instructions 363,211,372,930 ( 0.56) 5,988,205,600,879 ( 1.75) 377,824,674,156 ( 0.93) branches 98,057,013,765 ( 463.117) 711,841,255,974 ( 598.762) 77,879,009,954 ( 600.443) branch-misses 199,513,153 ( 0.20%) 1,507,651,077 ( 0.21%) 248,203,369 ( 0.32%) jobs8 perfstat instructions 413,960,354,615 ( 0.52) 6,842,918,558,378 ( 1.45) 431,938,486,581 ( 0.83) branches 111,812,574,884 ( 414.224) 813,299,084,518 ( 491.173) 89,062,699,827 ( 517.795) branch-misses 233,584,845 ( 0.21%) 1,531,593,921 ( 0.19%) 286,818,489 ( 0.32%) jobs9 perfstat instructions 465,976,220,300 ( 0.53) 7,698,467,237,372 ( 1.47) 486,352,600,321 ( 0.84) branches 125,931,456,162 ( 424.063) 915,207,005,715 ( 498.192) 100,370,404,090 ( 517.439) branch-misses 256,992,445 ( 0.20%) 1,782,809,816 ( 0.19%) 345,239,380 ( 0.34%) jobs10 perfstat instructions 517,406,372,715 ( 0.53) 8,553,527,312,900 ( 1.48) 540,732,653,094 ( 0.84) branches 139,839,780,676 ( 427.732) 1,016,737,699,389 ( 503.172) 111,696,557,638 ( 516.750) branch-misses 259,595,561 ( 0.19%) 1,952,570,279 ( 0.19%) 357,818,661 ( 0.32%) seconds elapsed 20.630411534 96.084546565 12.743373571 seconds elapsed 22.292627625 100.984155001 14.407413560 seconds elapsed 22.396016966 110.344880848 14.032201392 seconds elapsed 22.517330949 113.351459170 14.243074935 seconds elapsed 28.548305104 156.515193765 19.159286861 seconds elapsed 30.453538116 164.559937678 19.362492717 seconds elapsed 33.467108086 188.486827481 21.492612173 seconds elapsed 35.617727591 209.602677783 23.256422492 seconds elapsed 42.584239509 243.959902566 28.458540338 seconds elapsed 47.683632526 269.635248851 31.542404137 Over all, ZSTD has slower WRITE, but much faster READ (perhaps a static compression buffer used during the test helped ZSTD a lot), which results in faster test results. Memory consumption (zram mm_stat file): zram LZO mm_stat mm_stat (jobs1): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs2): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs3): 2147483648 23068672 33558528 0 33562624 0 0 mm_stat (jobs4): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs5): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs6): 2147483648 23068672 33558528 0 33562624 0 0 mm_stat (jobs7): 2147483648 23068672 33558528 0 33566720 0 0 mm_stat (jobs8): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs9): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs10): 2147483648 23068672 33558528 0 33562624 0 0 zram DEFLATE mm_stat mm_stat (jobs1): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs2): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs3): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs4): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs5): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs6): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs7): 2147483648 16252928 25178112 0 25190400 0 0 mm_stat (jobs8): 2147483648 16252928 25178112 0 25190400 0 0 mm_stat (jobs9): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs10): 2147483648 16252928 25178112 0 25178112 0 0 zram ZSTD mm_stat mm_stat (jobs1): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs2): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs3): 2147483648 11010048 16781312 0 16785408 0 0 mm_stat (jobs4): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs5): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs6): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs7): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs8): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs9): 2147483648 11010048 16781312 0 16785408 0 0 mm_stat (jobs10): 2147483648 11010048 16781312 0 16781312 0 0 ================================================================================== Official benchmarks [1]: Compressor name Ratio Compression Decompress. zstd 1.1.3 -1 2.877 430 MB/s 1110 MB/s zlib 1.2.8 -1 2.743 110 MB/s 400 MB/s brotli 0.5.2 -0 2.708 400 MB/s 430 MB/s quicklz 1.5.0 -1 2.238 550 MB/s 710 MB/s lzo1x 2.09 -1 2.108 650 MB/s 830 MB/s lz4 1.7.5 2.101 720 MB/s 3600 MB/s snappy 1.1.3 2.091 500 MB/s 1650 MB/s lzf 3.6 -1 2.077 400 MB/s 860 MB/s Minchan said: : I did test with my sample data and compared zstd with deflate. zstd's : compress ratio is lower a little bit but compression speed is much faster : 3 times more and decompress speed is too 2 times more. With different : data, it is different but overall, zstd would be better for speed at the : cost of a little lower compress ratio(about 5%) so I believe it's worth to : replace deflate. [1] https://github.com/facebook/zstd Link: http://lkml.kernel.org/r/20170912050005.3247-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Tested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15bdi: introduce BDI_CAP_SYNCHRONOUS_IOMinchan Kim1-1/+1
As discussed at https://lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com> someday we will remove rw_page(). If so, we need something to detect such super-fast storage on which synchronous IO operations like the current rw_page are always a win. Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices. With it, we could use various optimization techniques. Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15zram: set BDI_CAP_STABLE_WRITES onceMinchan Kim1-10/+6
With fast swap storage, the platform wants to use swap more aggressively and swap-in is crucial to application latency. The rw_page() based synchronous devices like zram, pmem and btt are such fast storage. When I profile swapin performance with zram lz4 decompress test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm. This patchset reduces swap-in latency by skipping swapcache if the swap device is a synchronous device like a rw_page() based device. It enhances by 45% my swapin test (5G sequential swapin, no readahead) from 2.41sec to 1.64sec. This patch (of 4): Commit 19b7ccf8651d ("block: get rid of blk_integrity_revalidate()") fixed a weird thing (i.e., reset BDI_CAP_STABLE_WRITES flag unconditionally whenever revalidat_disk is called) so zram doesn't need to reset the flag any more when revalidating the bdev. Instead, set the flag just once when the zram device is created. It shouldn't change any behavior. Link: http://lkml.kernel.org/r/1505886205-9671-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Hugh Dickins <hughd@google.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-03zram: fix null dereference of handleMinchan Kim1-24/+12
In testing I found handle passed to zs_map_object in __zram_bvec_read is NULL so eh kernel goes oops in pin_object(). The reason is there is no routine to check the slot's freeing after getting the slot's lock. This patch fixes it. [minchan@kernel.org: v2] Link: http://lkml.kernel.org/r/1505887347-10881-1-git-send-email-minchan@kernel.org Link: http://lkml.kernel.org/r/1505788488-26723-1-git-send-email-minchan@kernel.org Fixes: 1f7319c74275 ("zram: partial IO refactoring") Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08drivers/block/zram/zram_drv.c: convert to using memset_lMatthew Wilcox1-11/+2
zram was the motivation for creating memset_l(). Minchan Kim sees a 7% performance improvement on x86 with 100MB of non-zero deduplicatable data: perf stat -r 10 dd if=/dev/zram0 of=/dev/null vanilla: 0.232050465 seconds time elapsed ( +- 0.51% ) memset_l: 0.217219387 seconds time elapsed ( +- 0.07% ) Link: http://lkml.kernel.org/r/20170720184539.31609-7-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+5
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
2017-09-06block, THP: make block_device_operations.rw_page support THPHuang Ying1-0/+2
The .rw_page in struct block_device_operations is used by the swap subsystem to read/write the page contents from/into the corresponding swap slot in the swap device. To support the THP (Transparent Huge Page) swap optimization, the .rw_page is enhanced to support to read/write THP if possible. Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c] Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: add config and doc file for writeback featureMinchan Kim1-0/+12
This patch adds document and kconfig for using of writeback feature. Link: http://lkml.kernel.org/r/1498459987-24562-10-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: read page from backing deviceMinchan Kim1-5/+118
This patch enables read IO from backing device. For the feature, it implements two IO read functions to transfer data from backing storage. One is asynchronous IO function and other is synchronous one. A reason I need synchrnous IO is due to partial write which need to complete read IO before the overwriting partial data. We can make the partial IO's case asynchronous, too but at the moment, I don't feel adding more complexity to support such rare use cases so want to go with simple. [xieyisheng1@huawei.com: read_from_bdev_async(): return 1 to avoid call page_endio() in zram_rw_page()] Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com Link: http://lkml.kernel.org/r/1498459987-24562-9-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: write incompressible pages to backing deviceMinchan Kim2-13/+103
This patch enables write IO to transfer data to backing device. For that, it implements write_to_bdev function which creates new bio and chaining with parent bio to make the parent bio asynchrnous. For rw_page which don't have parent bio, it submit owned bio and handle IO completion by zram_page_end_io. Also, this patch defines new flag ZRAM_WB to mark written page for later read IO. [xieyisheng1@huawei.com: fix typo in comment] Link: http://lkml.kernel.org/r/1502707447-6944-2-git-send-email-xieyisheng1@huawei.com Link: http://lkml.kernel.org/r/1498459987-24562-8-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: identify asynchronous IO's return valueMinchan Kim1-8/+24
For upcoming asynchronous IO like writeback, zram_rw_page should be aware of that whether requested IO was completed or submitted successfully, otherwise error. For the goal, zram_bvec_rw has three return values. -errno: returns error number 0: IO request is done synchronously 1: IO request is issued successfully. Link: http://lkml.kernel.org/r/1498459987-24562-7-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: add free space management in backing deviceMinchan Kim2-1/+50
With backing device, zram needs management of free space of backing device. This patch adds bitmap logic to manage free space which is very naive. However, it would be simple enough as considering uncompressible pages's frequenty in zram. Link: http://lkml.kernel.org/r/1498459987-24562-6-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: add interface to specif backing deviceMinchan Kim2-0/+147
For writeback feature, user should set up backing device before the zram working. This patch enables the interface via /sys/block/zramX/backing_dev. Currently, it supports block device only but it could be enhanced for file as well. Link: http://lkml.kernel.org/r/1498459987-24562-5-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: rename zram_decompress_page to __zram_bvec_readMinchan Kim1-3/+3
zram_decompress_page naming is not proper because it doesn't decompress if page was dedup hit or stored with compression. Use more abstract term and consistent with write path function __zram_bvec_write. Link: http://lkml.kernel.org/r/1498459987-24562-4-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: inline zram_compressMinchan Kim1-42/+22
zram_compress does several things, compress, entry alloc and check limitation. I did for just readbility but it hurts modulization.:( So this patch removes zram_compress functions and inline it in __zram_bvec_write for upcoming patches. Link: http://lkml.kernel.org/r/1498459987-24562-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: clean up duplicated codes in __zram_bvec_writeMinchan Kim1-33/+23
Patch series "writeback incompressible pages to storage", v1. zRam is useful for memory saving with compressible pages but sometime, workload can be changed and system has lots of incompressible pages which is very harmful for zram. This patch supports writeback feature of zram so admin can set up a block device and with it, zram can save the memory via writing out the incompressile pages once it found it's incompressible pages (1/4 comp ratio) instead of keeping the page in memory. [1-3] is just clean up and [4-8] is step by step feature enablement. [4-8] is logically not bisectable(ie, logical unit separation) although I tried to compiled out without breaking but I think it would be better to review. This patch (of 9): __zram_bvec_write has some of duplicated logic for zram meta data handling of same_page|compressed_page. This patch aims to clean it up without behavior change. [xieyisheng1@huawei.com: fix compr_data_size stat] Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org Link: http://lkml.kernel.org/r/1498459987-24562-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Juneho Choi <juno.choi@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10zram: rework copy of compressor name in comp_algorithm_store()Matthias Kaehlcke1-2/+2
comp_algorithm_store() passes the size of the source buffer to strlcpy() instead of the destination buffer size. Make it explicit that the two buffers have the same size and use strcpy() instead of strlcpy(). The latter can be done safely since the function ensures that the string in the source buffer is terminated. Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-09block: pass in queue to inflight accountingJens Axboe1-2/+3
No functional change in this patch, just in preparation for basing the inflight mechanism on the queue in question. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-10zram: constify attribute_group structures.Arvind Yadav1-1/+1
attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. File size before: text data bss dec hex filename 8293 841 4 9138 23b2 drivers/block/zram/zram_drv.o File size After adding 'const': text data bss dec hex filename 8357 777 4 9138 23b2 drivers/block/zram/zram_drv.o Link: http://lkml.kernel.org/r/65680c1c4d85818f7094cbfa31c91bf28185ba1b.1499061182.git.arvind.yadav.cs@gmail.com Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10zram: use __sysfs_match_string() helperAndy Shevchenko1-6/+4
Use __sysfs_match_string() helper instead of open coded variant. Link: http://lkml.kernel.org/r/20170609120835.22156-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06zram: count same page write as page_storedMinchan Kim1-0/+2
Regardless of whether it is same page or not, it's surely write and stored to zram so we should increase pages_stored stat. Otherwise, user can see zero value via mm_stats although he writes a lot of pages to zram. Link: http://lkml.kernel.org/r/1494834068-27004-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-13zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()Greg Kroah-Hartman1-1/+1
I missed converting the last zram attribute to CLASS_ATTR_RO() after removing CLASS_ATTR() from the kernel, causing a build breakage. This patch fixes that problem. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09zram: use class_groups instead of class_attrsGreg Kroah-Hartman1-11/+15
The class_attrs pointer is long depreciated, and is about to be finally removed, so move to use the class_groups pointer instead. Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-03zram: reduce load operation in page_same_filledSangwoo Park1-3/+5
In page_same_filled function, all elements in the page is compared with next index value. The current comparison routine compares the (i)th and (i+1)th values of the page. In this case, two load operaions occur for each comparison. But if we store first value of the page stores at 'val' variable and using it to compare with others, the load opearation is reduced. It reduce load operation per page by up to 64times. Link: http://lkml.kernel.org/r/1488428104-7257-1-git-send-email-sangwoo2.park@lge.com Signed-off-by: Sangwoo Park <sangwoo2.park@lge.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: use zram_free_page instead of open-codedMinchan Kim1-14/+3
The zram_free_page already handles NULL handle case and same page so use it to reduce error probability. (Acutaully, I made a mistake when I handled same page feature) Link: http://lkml.kernel.org/r/1492052365-16169-7-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: introduce zram data accessorMinchan Kim1-11/+22
With element, sometime I got confused handle and element access. It might be my bad but I think it's time to introduce accessor to prevent future idiot like me. This patch is just clean-up patch so it shouldn't change any behavior. Link: http://lkml.kernel.org/r/1492052365-16169-6-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: remove zram_meta structureMinchan Kim2-117/+78
It's redundant now. Instead, remove it and use zram structure directly. Link: http://lkml.kernel.org/r/1492052365-16169-5-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: use zram_slot_lock instead of raw bit_spin_lock opMinchan Kim1-14/+27
With this clean-up phase, I want to use zram's wrapper function to lock table access which is more consistent with other zram's functions. Link: http://lkml.kernel.org/r/1492052365-16169-4-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: partial IO refactoringMinchan Kim1-153/+184
For architecture(PAGE_SIZE > 4K), zram have supported partial IO. However, the mixed code for handling normal/partial IO is too mess, error-prone to modify IO handler functions with upcoming feature so this patch aims for cleaning up zram's IO handling functions. Link: http://lkml.kernel.org/r/1492052365-16169-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: handle multiple pages attached bio's bvecMinchan Kim1-29/+11
Patch series "zram clean up", v2. This patchset aims to clean up zram . [1] clean up multiple pages's bvec handling. [2] clean up partial IO handling [3-6] clean up zram via using accessor and removing pointless structure. With [2-6] applied, we can get a few hundred bytes as well as huge readibility enhance. x86: 708 byte save add/remove: 1/1 grow/shrink: 0/11 up/down: 478/-1186 (-708) function old new delta zram_special_page_read - 478 +478 zram_reset_device 317 314 -3 mem_used_max_store 131 128 -3 compact_store 96 93 -3 mm_stat_show 203 197 -6 zram_add 719 712 -7 zram_slot_free_notify 229 214 -15 zram_make_request 819 803 -16 zram_meta_free 128 111 -17 zram_free_page 180 151 -29 disksize_store 432 361 -71 zram_decompress_page.isra 504 - -504 zram_bvec_rw 2592 2080 -512 Total: Before=25350773, After=25350065, chg -0.00% ppc64: 231 byte save add/remove: 2/0 grow/shrink: 1/9 up/down: 681/-912 (-231) function old new delta zram_special_page_read - 480 +480 zram_slot_lock - 200 +200 vermagic 39 40 +1 mm_stat_show 256 248 -8 zram_meta_free 200 184 -16 zram_add 944 912 -32 zram_free_page 348 308 -40 disksize_store 572 492 -80 zram_decompress_page 664 564 -100 zram_slot_free_notify 292 160 -132 zram_make_request 1132 1000 -132 zram_bvec_rw 2768 2396 -372 Total: Before=17565825, After=17565594, chg -0.00% This patch (of 6): Johannes Thumshirn reported system goes the panic when using NVMe over Fabrics loopback target with zram. The reason is zram expects each bvec in bio contains a single page but nvme can attach a huge bulk of pages attached to the bio's bvec so that zram's index arithmetic could be wrong so that out-of-bound access makes system panic. [1] in mainline solved solved the problem by limiting max_sectors with SECTORS_PER_PAGE but it makes zram slow because bio should split with each pages so this patch makes zram aware of multiple pages in a bvec so it could solve without any regression(ie, bio split). [1] 0bc315381fe9, zram: set physical queue limits to avoid array out of bounds accesses Link: http://lkml.kernel.org/r/20170413134057.GA27499@bbox Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-01Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-5/+8
Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
2017-04-13zram: do not use copy_page with non-page aligned addressMinchan Kim1-2/+2
The copy_page is optimized memcpy for page-alinged address. If it is used with non-page aligned address, it can corrupt memory which means system corruption. With zram, it can happen with 1. 64K architecture 2. partial IO 3. slub debug Partial IO need to allocate a page and zram allocates it via kmalloc. With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned address. And finally, copy_page(mem, cmem) corrupts memory. So, this patch changes it to memcpy. Actuaully, we don't need to change zram_bvec_write part because zsmalloc returns page-aligned address in case of PAGE_SIZE class but it's not good to rely on the internal of zsmalloc. Note: When this patch is merged to stable, clear_page should be fixed, too. Unfortunately, recent zram removes it by "same page merge" feature so it's hard to backport this patch to -stable tree. I will handle it when I receive the mail from stable tree maintainer to merge this patch to backport. Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()") Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13zram: fix operator precedence to get offsetMinchan Kim1-1/+1
In zram_rw_page, the logic to get offset is wrong by operator precedence (i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt the user's data. This patch fixes it. Fixes: 8c7f01025 ("zram: implement rw_page operation of zram") Link: http://lkml.kernel.org/r/1492042622-12074-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08zram: implement REQ_OP_WRITE_ZEROESChristoph Hellwig1-5/+8
Just the same as discard if the block size equals the system page size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08zram: set physical queue limits to avoid array out of bounds accessesJohannes Thumshirn1-0/+2
zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using the NVMe over Fabrics loopback target which potentially sends a huge bulk of pages attached to the bio's bvec this results in a kernel panic because of array out of bounds accesses in zram_decompress_page(). Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-24zram: extend zero pages to same element pageszhouxianrong2-30/+66
The idea is that without doing more calculations we extend zero pages to same element pages for zram. zero page is special case of same element page with zero element. 1. the test is done under android 7.0 2. startup too many applications circularly 3. sample the zero pages, same pages (none-zero element) and total pages in function page_zero_filled the result is listed as below: ZERO SAME TOTAL 36214 17842 598196 ZERO/TOTAL SAME/TOTAL (ZERO+SAME)/TOTAL ZERO/SAME AVERAGE 0.060631909 0.024990816 0.085622726 2.663825038 STDEV 0.00674612 0.005887625 0.009707034 2.115881328 MAX 0.069698422 0.030046087 0.094975336 7.56043956 MIN 0.03959586 0.007332205 0.056055193 1.928985507 from the above data, the benefit is about 2.5% and up to 3% of total swapout pages. The defect of the patch is that when we recovery a page from non-zero element the operations are low efficient for partial read. This patch extends zero_page to same_page so if there is any user to have monitored zero_pages, he will be surprised if the number is increased but it's not harmful, I believe. [minchan@kernel.org: do not free same element pages in zram_meta_free] Link: http://lkml.kernel.org/r/20170207065741.GA2567@bbox Link: http://lkml.kernel.org/r/1483692145-75357-1-git-send-email-zhouxianrong@huawei.com Link: http://lkml.kernel.org/r/1486307804-27903-1-git-send-email-minchan@kernel.org Signed-off-by: zhouxianrong <zhouxianrong@huawei.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24zram: remove waitqueue for IO doneMinchan Kim2-40/+3
zram_reset_device() waits for ongoing writepage pages to be completed by zram->refcount logic. However, it's pointless because before the reset, we prevent further opening of zram by zram->claim and flush all of pending IO by fsync_bdev so there should be no pending IO at the zram_reset_device(). So let's remove that code which is even broken due to the lack of wake_up elsewhere. Link: http://lkml.kernel.org/r/1485145031-11661-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22zram: remove obsolete sysfs attrsSergey Senozhatsky1-99/+2
We had a deprecated_attr_warn() warning for 2 years and now the time has come and we finally can do the cleanup. The plan was as follows: : per-stat sysfs attributes are considered to be deprecated. : The basic strategy is: : -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11) : -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11) : : The list of deprecated attributes can be found here: : Documentation/ABI/obsolete/sysfs-block-zram : : Basically, every attribute that has its own read accessible sysfs : node (e.g. num_reads) *AND* is accessible via one of the stat files : (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered : to be deprecated. The patch also removes `obsolete/sysfs-block-zram', clean ups `testing/sysfs-block-zram' and tweaks zram.txt files. Link: http://lkml.kernel.org/r/20170118035838.11090-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-02zram_drv: update for backing dev info changesJens Axboe1-1/+1
A previous commit made the bdi embedded in the request queue a pointer, but neglected to fixup zram. Fix it up. Fixes: dc3b17cc8bf ("block: Use pointer to backing_dev_info from request_queue") Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-10zram: support BDI_CAP_STABLE_WRITESMinchan Kim1-2/+11
zram has used per-cpu stream feature from v4.7. It aims for increasing cache hit ratio of scratch buffer for compressing. Downside of that approach is that zram should ask memory space for compressed page in per-cpu context which requires stricted gfp flag which could be failed. If so, it retries to allocate memory space out of per-cpu context so it could get memory this time and compress the data again, copies it to the memory space. In this scenario, zram assumes the data should never be changed but it is not true without stable page support. So, If the data is changed under us, zram can make buffer overrun so that zsmalloc free object chain is broken so system goes crash like below https://bugzilla.suse.com/show_bug.cgi?id=997574 This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block device needing *stable write*". Fixes: da9556a2367c ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Hyeoncheol Lee <cheol.lee@lge.com> Cc: <yjay.kim@lge.com> Cc: Sangseok Lee <sangseok.lee@lge.com> Cc: Hugh Dickins <hughd@google.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10zram: revalidate disk under init_lockMinchan Kim1-7/+1
Commit b4c5c60920e3 ("zram: avoid lockdep splat by revalidate_disk") moved revalidate_disk call out of init_lock to avoid lockdep false-positive splat. However, commit 08eee69fcf6b ("zram: remove init_lock in zram_make_request") removed init_lock in IO path so there is no worry about lockdep splat. So, let's restore it. This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next patch. Fixes: da9556a2367c ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Hyeoncheol Lee <cheol.lee@lge.com> Cc: <yjay.kim@lge.com> Cc: Sangseok Lee <sangseok.lee@lge.com> Cc: Hugh Dickins <hughd@google.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-53/+37
Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...
2016-12-07zram: restrict add/remove attributes to root onlySergey Senozhatsky1-1/+7
zram hot_add sysfs attribute is a very 'special' attribute - reading from it creates a new uninitialized zram device. This file, by a mistake, can be read by a 'normal' user at the moment, while only root must be able to create a new zram device, therefore hot_add attribute must have S_IRUSR mode, not S_IRUGO. [akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols] Fixes: 6566d1a32bf72 ("zram: add dynamic device add/remove functionality") Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Steven Allen <steven@stebalien.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-02zram: Convert to hotplug state machineAnna-Maria Gleixner3-53/+37
Install the callbacks via the state machine with multi instance support and let the core invoke the callbacks on the already online CPUs. [bigeasy: wire up the multi instance stuff] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: rt@linutronix.de Cc: Nitin Gupta <ngupta@vflare.org> Link: http://lkml.kernel.org/r/20161126231350.10321-19-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-30zram: fix unbalanced idr management at hot removalTakashi Iwai1-1/+2
The zram hot removal code calls idr_remove() even when zram_remove() returns an error (typically -EBUSY). This results in a leftover at the device release, eventually leading to a crash when the module is reloaded. As described in the bug report below, the following procedure would cause an Oops with zram: - provision three zram devices via modprobe zram num_devices=3 - configure a size for each device + echo "1G" > /sys/block/$zram_name/disksize - mkfs and mount zram0 only - attempt to hot remove all three devices + echo 2 > /sys/class/zram-control/hot_remove + echo 1 > /sys/class/zram-control/hot_remove + echo 0 > /sys/class/zram-control/hot_remove - zram0 removal fails with EBUSY, as expected - unmount zram0 - try zram0 hot remove again + echo 0 > /sys/class/zram-control/hot_remove - fails with ENODEV (unexpected) - unload zram kernel module + completes successfully - zram0 device node still exists - attempt to mount /dev/zram0 + mount command is killed + following BUG is encountered BUG: unable to handle kernel paging request at ffffffffa0002ba0 IP: get_disk+0x16/0x50 Oops: 0000 [#1] SMP CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176 Call Trace: exact_lock+0xc/0x20 kobj_lookup+0xdc/0x160 get_gendisk+0x2f/0x110 __blkdev_get+0x10c/0x3c0 blkdev_get+0x19d/0x2e0 blkdev_open+0x56/0x70 do_dentry_open.isra.19+0x1ff/0x310 vfs_open+0x43/0x60 path_openat+0x2c9/0xf30 do_filp_open+0x79/0xd0 do_sys_open+0x114/0x1e0 SyS_open+0x19/0x20 entry_SYSCALL_64_fastpath+0x13/0x94 This patch adds the proper error check in hot_remove_store() not to call idr_remove() unconditionally. Fixes: 17ec4cd98578 ("zram: don't call idr_remove() from zram_remove()") Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970 Link: http://lkml.kernel.org/r/20161121132140.12683-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: David Disseldorp <ddiss@suse.de> Reported-by: David Disseldorp <ddiss@suse.de> Tested-by: David Disseldorp <ddiss@suse.de> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-07block/mm: make bdev_ops->rw_page() take a bool for read/writeJens Axboe1-11/+12
Commit abf545484d31 changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>