aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-18aio: don't zero entire aio_kiocb aio_get_req()Jens Axboe1-2/+7
It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: separate out ring reservation from req allocationChristoph Hellwig1-13/+17
This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: use assigned completion handlerJens Axboe1-1/+1
We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17blk-mq: enable IO poll if .nr_queues of type poll > 0Ming Lei2-2/+4
The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues is bigger than zero, so enhance the constraint by checking .nr_queues of type poll before enabling IO poll. Otherwise IO race & timeout can be observed when running block/007. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()Jens Axboe3-10/+10
There's a single user of this function, dm, and dm just wants to check if IO is inflight, not that it's just allocated. This fixes a hang with srp/002 in blktests with dm, where it tries to suspend but waits for inflight IO to finish first. As it checks for just allocated requests, this fails. Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17ima: cleanup the match_token policy codeMimi Zohar1-5/+5
Start the policy_tokens and the associated enumeration from zero, simplifying the pt macro. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-17security: don't use a negative Opt_err token indexLinus Torvalds2-2/+2
The code uses a bitmap to check for duplicate tokens during parsing, and that doesn't work at all for the negative Opt_err token case. There is absolutely no reason to make Opt_err be negative, and in fact it only confuses things, since some of the affected functions actually return a positive Opt_xyz enum _or_ a regular negative error code (eg -EINVAL), and using -1 for Opt_err makes no sense. There are similar problems in ima_policy.c and key encryption, but they don't have the immediate bug wrt bitmap handing, and ima_policy.c in particular needs a different patch to make the enum values match the token array index. Mimi is sending that separately. Reported-by: syzbot+a22e0dc07567662c50bc@syzkaller.appspotmail.com Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: 5208cc83423d ("keys, trusted: fix: *do not* allow duplicate key options") Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]") Cc: James Morris James Morris <jmorris@namei.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-17blk-mq: skip zero-queue maps in blk_mq_map_swqueueMing Lei1-0/+3
From 7e849dd9cf37 ("nvme-pci: don't share queue maps"), the mapping table won't be initialized actually if map->nr_queues is zero, so we can't use blk_mq_map_queue_type() to retrieve hctx any more. This way still may cause broken mapping, fix it by skipping zero-queues maps in blk_mq_map_swqueue(). Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17block: fix blk-iolatency accounting underflowDennis Zhou3-1/+7
The blk-iolatency controller measures the time from rq_qos_throttle() to rq_qos_done_bio() and attributes this time to the first bio that needs to create the request. This means if a bio is plug-mergeable or bio-mergeable, it gets to bypass the blk-iolatency controller. The recent series [1], to tag all bios w/ blkgs undermined how iolatency was determining which bios it was charging and should process in rq_qos_done_bio(). Because all bios are being tagged, this caused the atomic_t for the struct rq_wait inflight count to underflow and result in a stall. This patch adds a new flag BIO_TRACKED to let controllers know that a bio is going through the rq_qos path. blk-iolatency now checks if this flag is set to see if it should process the bio in rq_qos_done_bio(). Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing. BIO_THROTTLED was another candidate, but the flag is set for all bios that have gone through blk-throttle code. Overloading a flag comes with the burden of making sure that when either implementation changes, a change in setting rules for one doesn't cause a bug in the other. So here, we unfortunately opt for adding a new flag. [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/ Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") Signed-off-by: Dennis Zhou <dennis@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17blk-mq: fix dispatch from sw queueMing Lei4-45/+68
When a request is added to rq list of sw queue(ctx), the rq may be from a different type of hctx, especially after multi queue mapping is introduced. So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or blk_mq_dequeue_from_ctx(), one request belonging to other queue type of hctx can be dispatched to current hctx in case that read queue or poll queue is enabled. This patch fixes this issue by introducing per-queue-type list. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Changed by me to not use separately cacheline aligned lists, just place them all in the same cacheline where we had just the one list and lock before. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17block: mq-deadline: Fix write completion handlingDamien Le Moal3-2/+14
For a zoned block device using mq-deadline, if a write request for a zone is received while another write was already dispatched for the same zone, dd_dispatch_request() will return NULL and the newly inserted write request is kept in the scheduler queue waiting for the ongoing zone write to complete. With this behavior, when no other request has been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to __blk_mq_free_request() call of blk_mq_sched_restart() to not run the queue when the already dispatched write request completes. The newly dispatched request stays stuck in the scheduler queue until eventually another request is submitted. This problem does not affect SCSI disk as the SCSI stack handles queue restart on request completion. However, this problem is can be triggered the nullblk driver with zoned mode enabled. Fix this by always requesting a queue restart in dd_dispatch_request() if no request was dispatched while WRITE requests are queued. Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Add missing export of blk_mq_sched_restart() Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17nvme-pci: don't share queue mapsChristoph Hellwig1-5/+1
Now that the block layer checks if a queue map has any queues inside it there is no more reason to duplicate the maps for the non-default types. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17blk-mq: only dispatch to non-defauly queue maps if they have queuesChristoph Hellwig1-5/+8
We should check if a given queue map actually has queues enabled before dispatching to it. This allows drivers to not initialize optional but not used map types, which subsequently will allow fixing problems with queue map rebuilds for that case. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17blk-mq: export hctx->type in debugfs instead of sysfsMing Lei2-17/+16
Now we only export hctx->type via sysfs, and there isn't such info in hctx entry under debugfs. We often use debugfs only to diagnose queue mapping issue, so add the support in debugfs. Queue mapping becomes a bit more complicated after multiple queue mapping is supported, we may write blktest to verify if queue mapping is valid based on blk-mq-debugfs. Given not necessary to export hctx->type twice, so remove the export from sysfs. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17blk-mq: fix allocation for queue mapping tableMing Lei1-1/+1
Type of each element in queue mapping table is 'unsigned int, intead of 'struct blk_mq_queue_map)', so fix it. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16blk-wbt: export internal state via debugfsMing Lei1-0/+91
This information is helpful to either investigate issues, or understand wbt's internal behaviour. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16blk-mq-debugfs: support rq_qosMing Lei5-0/+98
blk-mq-debugfs has been proved as very helpful for debug some tough issues, such as IO hang. We have seen blk-wbt related IO hang several times, even inside Red Hat BZ, there is such report not sovled yet, so this patch adds support debugfs on rq_qos. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16block: update sysfs documentationDamien Le Moal2-3/+38
Add the description of the zoned, nr_zones and chunk_sectors sysfs queue attributes to Documentation/block/queue-sysfs.txt. The description of the zoned and chunk_sector attributes are mostly copied from ABI/testing/sysfs-block (added a typo fix). While at it, also fix a typo in the description of the io_poll_delay attribute. nr_zones description is also added to ABI/testing/sysfs-block and contact email address updated for the zoned attribute. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16Linux 4.20-rc7Linus Torvalds1-1/+1
2018-12-16block: loop: check error using IS_ERR instead of IS_ERR_OR_NULL in loop_add()Chengguang Xu1-1/+1
blk_mq_init_queue() will not return NULL pointer to its caller, so it's better to replace IS_ERR_OR_NULL using IS_ERR in loop_add(). If in the future things change to check NULL pointer inside loop_add(), we should return -ENOMEM as return code instead of PTR_ERR(NULL). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16aoe: add __exit annotationChengguang Xu1-1/+1
Add __exit annotation to cleanup helper which is only called once in the module. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16block: clear REQ_HIPRI if polling is not supportedChristoph Hellwig1-0/+3
This prevents a HIPRI bio from being submitted through a stacking driver that does not support polling and thus won't poll for I/O completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16blk-mq: replace and kill blk_mq_request_issue_directlyJianchao Wang3-11/+8
Replace blk_mq_request_issue_directly with blk_mq_try_issue_directly in blk_insert_cloned_request and kill it as nobody uses it any more. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requestsJianchao Wang2-16/+12
It is not necessary to issue request directly with bypass 'true' in blk_mq_sched_insert_requests and handle the non-issued requests itself. Just set bypass to 'false' and let blk_mq_try_issue_directly handle them totally. Remove the blk_rq_can_direct_dispatch check, because blk_mq_try_issue_directly can handle it well.If request is direct-issued unsuccessfully, insert the reset. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16blk-mq: refactor the code of issue request directlyJianchao Wang1-49/+54
Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly into one interface to unify the interfaces to issue requests directly. The merged interface takes over the requests totally, it could insert, end or do nothing based on the return value of .queue_rq and 'bypass' parameter. Then caller needn't any other handling any more and then code could be cleaned up. And also the commit c616cbee ( blk-mq: punt failed direct issue to dispatch list ) always inserts requests to hctx dispatch list whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is overkill and will harm the merging. We just need to do that for the requests that has been through .queue_rq. This patch also could fix this. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16block: remove the bio_integrity_advance exportChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16block: remove the bioset_integrity_free exportChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14scripts/spdxcheck.py: always open files in binary modeThierry Reding1-2/+4
The spdxcheck script currently falls over when confronted with a binary file (such as Documentation/logo.gif). To avoid that, always open files in binary mode and decode line-by-line, ignoring encoding errors. One tricky case is when piping data into the script and reading it from standard input. By default, standard input will be opened in text mode, so we need to reopen it in binary mode. The breakage only happens with python3 and results in a UnicodeDecodeError (according to Uwe). Link: http://lkml.kernel.org/r/20181212131210.28024-1-thierry.reding@gmail.com Fixes: 6f4d29df66ac ("scripts/spdxcheck.py: make python3 compliant") Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jeremy Cline <jcline@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joe Perches <joe@perches.com> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14checkstack.pl: fix for aarch64Qian Cai1-2/+2
There is actually a space after "sp," like this, ffff2000080813c8: a9bb7bfd stp x29, x30, [sp, #-80]! Right now, checkstack.pl isn't able to print anything on aarch64, because it won't be able to match the stating objdump line of a function due to this missing space. Hence, it displays every stack as zero-size. After this patch, checkpatch.pl is able to match the start of a function's objdump, and is then able to calculate each function's stack correctly. Link: http://lkml.kernel.org/r/20181207195843.38528-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registeredAndrea Arcangeli1-1/+2
Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd could trigger an harmless false positive WARN_ON. Check the vma is already registered before checking VM_MAYWRITE to shut off the false positive warning. Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com Cc: <stable@vger.kernel.org> Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14fs/iomap.c: get/put the page in iomap_page_create/release()Piotr Jaroszynski1-0/+7
migrate_page_move_mapping() expects pages with private data set to have a page_count elevated by 1. This is what used to happen for xfs through the buffer_heads code before the switch to iomap in commit 82cb14175e7d ("xfs: add support for sub-pagesize writeback without buffer_heads"). Not having the count elevated causes move_pages() to fail on memory mapped files coming from xfs. Make iomap compatible with the migrate_page_move_mapping() assumption by elevating the page count as part of iomap_page_create() and lowering it in iomap_page_release(). It causes the move_pages() syscall to misbehave on memory mapped files from xfs. It does not not move any pages, which I suppose is "just" a perf issue, but it also ends up returning a positive number which is out of spec for the syscall. Talking to Michal Hocko, it sounds like returning positive numbers might be a necessary update to move_pages() anyway though (https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz). I only hit this in tests that verify that move_pages() actually moved the pages. The test also got confused by the positive return from move_pages() (it got treated as a success as positive numbers were not expected and not handled) making it a bit harder to track down what's going on. Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com Fixes: 82cb14175e7d ("xfs: add support for sub-pagesize writeback without buffer_heads") Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Brian Foster <bfoster@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14hugetlbfs: call VM_BUG_ON_PAGE earlier in free_huge_page()Yongkai Wu1-2/+3
A stack trace was triggered by VM_BUG_ON_PAGE(page_mapcount(page), page) in free_huge_page(). Unfortunately, the page->mapping field was set to NULL before this test. This made it more difficult to determine the root cause of the problem. Move the VM_BUG_ON_PAGE tests earlier in the function so that if they do trigger more information is present in the page struct. Link: http://lkml.kernel.org/r/1543491843-23438-1-git-send-email-nic_w@163.com Signed-off-by: Yongkai Wu <nic_w@163.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14memblock: annotate memblock_is_reserved() with __init_memblockYueyi Li1-1/+1
Found warning: WARNING: EXPORT symbol "gsi_write_channel_scratch" [vmlinux] version generation failed, symbol will not be versioned. WARNING: vmlinux.o(.text+0x1e0a0): Section mismatch in reference from the function valid_phys_addr_range() to the function .init.text:memblock_is_reserved() The function valid_phys_addr_range() references the function __init memblock_is_reserved(). This is often because valid_phys_addr_range lacks a __init annotation or the annotation of memblock_is_reserved is wrong. Use __init_memblock instead of __init. Link: http://lkml.kernel.org/r/BLUPR13MB02893411BF12EACB61888E80DFAE0@BLUPR13MB0289.namprd13.prod.outlook.com Signed-off-by: Yueyi Li <liyueyi@live.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14psi: fix reference to kernel commandline enableBaruch Siach1-2/+2
The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED help text contradicts the documentation in kernel-parameters.txt, and the code. Fix that. Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org Fixes: e0c274472d ("psi: make disabling/enabling easier for vendor kernels") Signed-off-by: Baruch Siach <baruch@tkos.co.il> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14arch/sh/include/asm/io.h: provide prototypes for PCI I/O mapping in asm/io.hMark Brown1-0/+1
Most architectures provide prototypes for the PCI I/O mapping operations when asm/io.h is included but SH doesn't currently do that, leading to for example warnings in sound/pci/hda/patch_ca0132.c when pci_iomap() is used on current -next. Make SH more consistent with other architectures by including asm-generic/pci_iomap.h in asm/io.h. Link: http://lkml.kernel.org/r/20181106175142.27988-1-broonie@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14mm/sparse: add common helper to mark all memblocks presentLogan Gunthorpe2-0/+22
Presently the arches arm64, arm and sh have a function which loops through each memblock and calls memory present. riscv will require a similar function. Introduce a common memblocks_present() function that can be used by all the arches. Subsequent patches will cleanup the arches that make use of this. Link: http://lkml.kernel.org/r/20181107205433.3875-3-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14mm: introduce common STRUCT_PAGE_MAX_SHIFT defineLogan Gunthorpe4-17/+6
This define is used by arm64 to calculate the size of the vmemmap region. It is defined as the log2 of the upper bound on the size of a struct page. We move it into mm_types.h so it can be defined properly instead of set and checked with a build bug. This also allows us to use the same define for riscv. Link: http://lkml.kernel.org/r/20181107205433.3875-2-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14alpha: fix hang caused by the bootmem removalMike Rapoport2-3/+4
The conversion of alpha to memblock as the early memory manager caused boot to hang as described at [1]. The issue is caused because for CONFIG_DISCTONTIGMEM=y case, memblock_add() is called using memory start PFN that had been rounded down to the nearest 8Mb and it caused memblock to see more memory that is actually present in the system. Besides, memblock allocates memory from high addresses while bootmem was using low memory, which broke the assumption that early allocations are always accessible by the hardware. This patch ensures that memblock_add() is using the correct PFN for the memory start and forces memblock to use bottom-up allocations. [1] https://lkml.org/lkml/2018/11/22/1032 Link: http://lkml.kernel.org/r/1543233216-25833-1-git-send-email-rppt@linux.ibm.com Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Meelis Roos <mroos@linux.ee> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14block: remove the unused bio_set_pages_dirty and bio_check_pages_dirty exportsChristoph Hellwig1-2/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14block: remove the unused bio_iov_iter_get_pages exportChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14block: remove the blk_recount_segments exportChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14block: remove the bio_phys_segments exportChristoph Hellwig1-2/+1
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13nvme: fix kernel paging oopsSagi Grimberg1-1/+1
free the controller discard_page correctly. Fixes: cb5b7262b011 ("nvme: provide fallback for discard alloc failure") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13XArray: Fix xa_alloc when id exceeds maxMatthew Wilcox2-5/+36
Specifying a starting ID greater than the maximum ID isn't something attempted very often, but it should fail. It was succeeding due to xas_find_marked() returning the wrong error state, so add tests for both xa_alloc() and xas_find_marked(). Fixes: b803b42823d0 ("xarray: Add XArray iterators") Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-12-13bcache: print number of keys in trace_bcache_journal_writeGuoju Fang2-4/+25
Sometimes flush journal may be very frequent, so it's useful to dump number of keys every time write journal. Signed-off-by: Guoju Fang <fangguoju@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13bcache: set writeback_percent in a flexible rangeColy Li1-1/+2
Because CUTOFF_WRITEBACK is defined as 40, so before the changes of dynamic cutoff writeback values, writeback_percent is limited to [0, CUTOFF_WRITEBACK]. Any value larger than CUTOFF_WRITEBACK will be fixed up to 40. Now cutof writeback limit is a dynamic value bch_cutoff_writeback, so the range of writeback_percent can be a more flexible range as [0, bch_cutoff_writeback]. The flexibility is, it can be expended to a larger or smaller range than [0, 40], depends on how value bch_cutoff_writeback is specified. The default value is still strongly recommended to most of users for most of workloads. But for people who want to do research on bcache writeback perforamnce tuning, they may have chance to specify more flexible writeback_percent in range [0, 70]. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13bcache: make cutoff_writeback and cutoff_writeback_sync tunableColy Li3-2/+55
Currently the cutoff writeback and cutoff writeback sync thresholds are defined by CUTOFF_WRITEBACK (40) and CUTOFF_WRITEBACK_SYNC (70) as static values. Most of time these they work fine, but when people want to do research on bcache writeback mode performance tuning, there is no chance to modify the soft and hard cutoff writeback values. This patch introduces two module parameters bch_cutoff_writeback_sync and bch_cutoff_writeback which permit people to tune the values when loading bcache.ko. If they are not specified by module loading, current values CUTOFF_WRITEBACK_SYNC and CUTOFF_WRITEBACK will be used as default and nothing changes. When people want to tune this two values, - cutoff_writeback can be set in range [1, 70] - cutoff_writeback_sync can be set in range [1, 90] - cutoff_writeback always <= cutoff_writeback_sync The default values are strongly recommended to most of users for most of workloads. Anyway, if people wants to take their own risk to do research on new writeback cutoff tuning for their own workload, now they can make it. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13bcache: add MODULE_DESCRIPTION informationColy Li1-3/+4
This patch moves MODULE_AUTHOR and MODULE_LICENSE to end of super.c, and add MODULE_DESCRIPTION("Bcache: a Linux block layer cache"). This is preparation for adding module parameters. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13bcache: option to automatically run gc thread after writebackColy Li4-0/+52
The option gc_after_writeback is disabled by default, because garbage collection will discard SSD data which drops cached data. Echo 1 into /sys/fs/bcache/<UUID>/internal/gc_after_writeback will enable this option, which wakes up gc thread when writeback accomplished and all cached data is clean. This option is helpful for people who cares writing performance more. In heavy writing workload, all cached data can be clean only happens when writeback thread cleans all cached data in I/O idle time. In such situation a following gc running may help to shrink bcache B+ tree and discard more clean data, which may be helpful for future writing requests. If you are not sure whether this is helpful for your own workload, please leave it as disabled by default. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13bcache: introduce force_wake_up_gc()Coly Li2-15/+20
Garbage collection thread starts to work when c->sectors_to_gc is negative value, otherwise nothing will happen even the gc thread is woken up by wake_up_gc(). force_wake_up_gc() sets c->sectors_to_gc to -1 before calling wake_up_gc(), then gc thread may have chance to run if no one else sets c->sectors_to_gc to a positive value before gc_should_run(). This routine can be called where the gc thread is woken up and required to run in force. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>