linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-25	nvme: fix KASAN warning when parsing host nqn	Hannes Reinecke	1	-1/+1
	The host nqn actually is smaller than the space reserved for it, so we should be using strlcpy to keep KASAN happy. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvmet-loop: use nr_phys_segments when map rq to sgl	Chaitanya Kulkarni	1	-1/+1
	Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check if a command contains data to me mapped. This fixes the case where a struct requests contains LBAs, but no data will actually be send, e.g. the pending Write Zeroes support. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvmet-fc: increase LS buffer count per fc port	James Smart	1	-1/+1
	Todays limit on concurrent LS's is very small - 4 buffers. With large subsystem counts or large numbers of initiators connecting, the limit may be exceeded. Raise the LS buffer count to 256. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvmet: add simple file backed ns support	Chaitanya Kulkarni	6	-52/+413
	This patch adds simple file backed namespace support for NVMeOF target. The new file io-cmd-file.c is responsible for handling the code for I/O commands when ns is file backed. Also, we introduce mempools based slow path using sync I/Os for file backed ns to ensure forward progress under reclaim. The old block device based implementation is moved to io-cmd-bdev.c and use a "nvmet_bdev_" symbol prefix. The enable/disable calls are also move into the respective files. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: updated changelog, fixed double req->ns lookup in bdev case] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvmet: remove duplicate NULL initialization for req->ns	Chaitanya Kulkarni	5	-12/+1
	Remove the duplicate NULL initialization for req->ns. req->ns is always initialized to NULL in nvmet_req_init(), so there is no need to reset it later on failures unless we have previously assigned a value to it. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvmet: make a few error messages more generic	Chaitanya Kulkarni	1	-2/+2
	"nvmet_check_ctrl_status()" is called from admin-cmd.c along with io-cmd.c, make the error message more generic. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvme-fabrics: allow duplicate connections to the discovery controller	Hannes Reinecke	2	-1/+3
	The whole point of the discovery controller is that it can accept multiple connections. Additionally the cmic field is not even defined for the discovery controller identify page. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvme-fabrics: centralize discovery controller defaults	Hannes Reinecke	1	-4/+4
	When connecting to the discovery controller we have certain defaults to observe, so centralize them to avoid inconsistencies due to argument ordering. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvme-fabrics: remove unnecessary controller subnqn validation	James Smart	1	-10/+0
	After creating the nvme controller, nvmf_create_ctrl() validates the newly created subsysnqn vs the one specified by the options. In general, this is an unnecessary check as the Connect message should implicitly ensure this value matches. With the change to the FC transport to do an asynchronous connect for the first association create, the transport will return to nvmf_create_ctrl() before that first association has been established, thus the subnqn will not yet be set. Remove the unnecessary validation. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvme-fc: remove setting DNR on exception conditions	James Smart	1	-10/+0
	Current code will set DNR if the controller is deleting or there is an error during controller init. None of this is necessary. Remove the code that sets DNR Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvme-rdma: stop admin queue before freeing it	Jianchao Wang	1	-4/+6
	For any failure after nvme_rdma_start_queue in nvme_rdma_configure_admin_queue, the admin queue will be freed with the NVME_RDMA_Q_LIVE flag still set. Once nvme_rdma_stop_queue is invoked, that will cause a use-after-free. BUG: KASAN: use-after-free in rdma_disconnect+0x1f/0xe0 [rdma_cm] To fix it, call nvme_rdma_stop_queue for all the failed cases after nvme_rdma_start_queue. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvme-pci: Fix AER reset handling	Keith Busch	1	-9/+5
	The nvme timeout handling doesn't do anything if the pci channel is offline, which is the case when recovering from PCI error event, so it was a bad idea to sync the controller reset in this state. This patch flushes the reset work in the error_resume callback instead when the channel is back to online. This keeps AER handling serialized and can recover from timeouts. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199757 Fixes: cc1d5e749a2e ("nvme/pci: Sync controller reset for AER slot_reset") Reported-by: Alex Gagniuc <mr.nuke.me@gmail.com> Tested-by: Alex Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25	nvme-pci: set nvmeq->cq_vector after alloc cq/sq	Jianchao Wang	1	-9/+16
	Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will invoke free_irq for it and cause a 'Trying to free already-free IRQ xxx' warning if the create CQ/SQ command times out. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Keith Busch <keith.busch@intel.com> [hch: fixed to pass a s16 and clean up the comment] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-23	nvme: host: core: fix precedence of ternary operator	Ivan Bornyakov	1	-2/+2
	Ternary operator have lower precedence then bitwise or, so 'cdw10' was calculated wrong. Signed-off-by: Ivan Bornyakov <brnkv.i1@gmail.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-05-23	nvme: fix lockdep warning in nvme_mpath_clear_current_path	Johannes Thumshirn	1	-1/+2
	When running blktest's nvme/005 with a lockdep enabled kernel the test case fails due to the following lockdep splat in dmesg: ============================= WARNING: suspicious RCU usage 4.17.0-rc5 #881 Not tainted ----------------------------- drivers/nvme/host/nvme.h:457 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by kworker/u32:5/1102: #0: (ptrval) ((wq_completion)"nvme-wq"){+.+.}, at: process_one_work+0x152/0x5c0 #1: (ptrval) ((work_completion)(&ctrl->scan_work)){+.+.}, at: process_one_work+0x152/0x5c0 #2: (ptrval) (&subsys->lock#2){+.+.}, at: nvme_ns_remove+0x43/0x1c0 [nvme_core] The only caller of nvme_mpath_clear_current_path() is nvme_ns_remove() which holds the subsys lock so it's likely a false positive, but when using rcu_access_pointer(), we're telling rcu and lockdep that we're only after the pointer falue. Fixes: 32acab3181c7 ("nvme: implement multipath access to nvme subsystems") Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-05-22	blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers	Bart Van Assche	1	-2/+6
	Avoid that complaints similar to the following appear in the kernel log if the number of zones is sufficiently large: fio: page allocation failure: order:9, mode:0x140c0c0(GFP_KERNEL\|__GFP_COMP\|__GFP_ZERO), nodemask=(null) Call Trace: dump_stack+0x63/0x88 warn_alloc+0xf5/0x190 __alloc_pages_slowpath+0x8f0/0xb0d __alloc_pages_nodemask+0x242/0x260 alloc_pages_current+0x6a/0xb0 kmalloc_order+0x18/0x50 kmalloc_order_trace+0x26/0xb0 __kmalloc+0x20e/0x220 blkdev_report_zones_ioctl+0xa5/0x1a0 blkdev_ioctl+0x1ba/0x930 block_ioctl+0x41/0x50 do_vfs_ioctl+0xaa/0x610 SyS_ioctl+0x79/0x90 do_syscall_64+0x79/0x1b0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Shaun Tancheff <shaun.tancheff@seagate.com> Cc: Damien Le Moal <damien.lemoal@hgst.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-22	block/ndb: add WQ_UNBOUND to the knbd-recv workqueue	Dan Melnic	1	-1/+2
	Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound to a single CPU that is selected at device creation time. Signed-off-by: Dan Melnic <dmm@fb.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-22	blk-mq: remove wrong 'unlikely' check	huhai	1	-1/+1
	When dispatch_rq_from_ctx is called, in the vast majority of cases the ctx->rq_list is not empty. Signed-off-by: huhai <huhai@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-21	nvme-pci: fix race between poll and IRQ completions	Jens Axboe	1	-4/+11
	If polling completions are racing with the IRQ triggered by a completion, the IRQ handler will find no work and return IRQ_NONE. This can trigger complaints about spurious interrupts: [ 560.169153] irq 630: nobody cared (try booting with the "irqpoll" option) [ 560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65 [ 560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018 [ 560.175991] Call Trace: [ 560.175994] <IRQ> [ 560.176005] dump_stack+0x5c/0x7b [ 560.176010] __report_bad_irq+0x30/0xc0 [ 560.176013] note_interrupt+0x235/0x280 [ 560.176020] handle_irq_event_percpu+0x51/0x70 [ 560.176023] handle_irq_event+0x27/0x50 [ 560.176026] handle_edge_irq+0x6d/0x180 [ 560.176031] handle_irq+0xa5/0x110 [ 560.176036] do_IRQ+0x41/0xc0 [ 560.176042] common_interrupt+0xf/0xf [ 560.176043] </IRQ> [ 560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0 [ 560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f [ 560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000 [ 560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008 [ 560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358 [ 560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593 [ 560.176065] ? cpuidle_enter_state+0x8b/0x2b0 [ 560.176071] do_idle+0x1f4/0x260 [ 560.176075] cpu_startup_entry+0x6f/0x80 [ 560.176080] start_secondary+0x184/0x1d0 [ 560.176085] secondary_startup_64+0xa5/0xb0 [ 560.176088] handlers: [ 560.178387] [<00000000efb612be>] nvme_irq [nvme] [ 560.183019] Disabling IRQ #630 A previous commit removed ->cqe_seen that was handling this case, but we need to handle this a bit differently due to completions now running outside the queue lock. Return IRQ_HANDLED from the IRQ handler, if the completion ring head was moved since we last saw it. Fixes: 5cb525c8315f ("nvme-pci: handle completions outside of the queue lock") Reported-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Tested-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-18	nvme-pci: drop IRQ disabling on submission queue lock	Jens Axboe	1	-4/+4
	Since we aren't sharing the lock for completions now, we don't have to make it IRQ safe. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18	nvme-pci: split the nvme queue lock into submission and completion locks	Jens Axboe	1	-21/+23
	This is now feasible. We protect the submission queue ring with ->sq_lock, and the completion side with ->cq_lock. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18	nvme-pci: handle completions outside of the queue lock	Jens Axboe	1	-42/+45
	Split the completion of events into a two part process: 1) Reap the events inside the queue lock 2) Complete the events outside the queue lock Since we never wrap the queue, we can access it locklessly after we've updated the completion queue head. This patch started off with batching events on the stack, but with this trick we don't have to. Keith Busch <keith.busch@intel.com> came up with that idea. Note that this kills the ->cqe_seen as well. I haven't been able to trigger any ill effects of this. If we do race with polling every so often, it should be rare enough NOT to trigger any issues. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: refactored, restored poll early exit optimization] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18	nvme-pci: move ->cq_vector == -1 check outside of ->q_lock	Jens Axboe	1	-5/+13
	We only clear it dynamically in nvme_suspend_queue(). When we do, ensure to do a full flush so that any nvme_queue_rq() invocation will see it. Ideally we'd kill this check completely, but we're using it to flush requests on a dying queue. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18	nvme-pci: remove cq check after submission	Jens Axboe	1	-2/+0
	We always check the completion queue after submitting, but in my testing this isn't a win even on DRAM/xpoint devices. In some cases it's actually worse. Kill it. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18	nvme-pci: simplify nvme_cqe_valid	Christoph Hellwig	1	-6/+6
	We always look at the current CQ head and phase, so don't pass these as separate arguments, and rename the function to nvme_cqe_pending. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-18	nvme: mark the result argument to nvme_complete_async_event volatile	Christoph Hellwig	2	-2/+2
	We'll need that in the PCIe driver soon as we'll read it straight off the CQ. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18	blk-mq: clear hctx->dispatch_from when mappings change	huhai	1	-0/+1
	When the number of hardware queues is changed, the drivers will call blk_mq_update_nr_hw_queues() to remap hardware queues. This changes the ctx mappings, but the current code doesn't clear the ->dispatch_from hint. This can result in dispatch_from pointing to a ctx that isn't mapped to the hctx anymore. Fixes: b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") Signed-off-by: huhai <huhai@kylinos.cn> Reviewed-by: Ming Lei <ming.lei@redhat.com> Moved the placement of the clearing to where we clear other items pertaining to the existing mapping, added Fixes line, and reworded the commit message. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	nbd: call nbd_bdev_reset instead of bd_set_size on disconnect	Josef Bacik	1	-1/+1
	We need to make sure we don't just set the size of the bdev to 0 while it's being used by a file system. We have the appropriate check in nbd_bdev_reset, simply use that helper instead. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	nbd: fix how we set bd_invalidated	Josef Bacik	1	-4/+3
	bd_invalidated is kind of a pain wrt partitions as it really only triggers the partition rescan if it is set after bd_ops->open() runs, so setting it when we reset the device isn't useful. We also sporadically would still have partitions left over in some disconnect cases, so fix this by always setting bd_invalidated on open if there's no configuration or if we've had a disconnect action happen, that way the partition table gets invalidated and rescanned properly. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	nbd: clear_sock on netlink disconnect	Josef Bacik	1	-0/+1
	This is what the ioctl based nbd disconnect does as well. Without this the device will just sit there and wait for the connection to go away (or IO to occur) before the device gets torn down. Instead clear everything up on our end so the configuration goes away as quickly as possible. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	nbd: use bd_set_size when updating disk size	Josef Bacik	1	-1/+9
	When we stopped relying on the bdev everywhere I broke updating the block device size on the fly, which ceph relies on. We can't just do set_capacity, we also have to do bd_set_size so things like parted will notice the device size change. Fixes: 29eaadc ("nbd: stop using the bdev everywhere") cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	nbd: update size when connected	Josef Bacik	1	-0/+2
	I messed up changing the size of an NBD device while it was connected by not actually updating the device or doing the uevent. Fix this by updating everything if we're connected and we change the size. cc: stable@vger.kernel.org Fixes: 639812a ("nbd: don't set the device size until we're connected") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	nbd: fix nbd device deletion	Josef Bacik	1	-1/+4
	This fixes a use after free bug, we shouldn't be doing disk->queue right after we do del_gendisk(disk). Save the queue and do the cleanup after the del_gendisk. Fixes: c6a4759ea0c9 ("nbd: add device refcounting") cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	block: fix MAINTAINERS email for nbd	Josef Bacik	1	-1/+1
	I've been missing stuff because it's been going into my work email which is a black hole. Update to the email I actually use so I stop missing patches and bug reports. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16	blk-mq: remove redundant insert case in blk_mq_make_request()	huhai	1	-15/+1
	We can use blk_mq_sched_insert_request() even if we don't have an IO scheduler attached, since that case will end up being exactly the same as what blk_mq_queue_io() was doing now. Signed-off-by: huhai <huhai@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-15	Remove jsflash driver	Jens Axboe	4	-708/+0
	Nobody is using it anymore, and it's been abandoned. Since David is fine with removing it, kill it. Suggested-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add sysfs entry for fua support	Kent Overstreet	1	-0/+11
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Export bio check/set pages_dirty	Kent Overstreet	1	-0/+2
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add warning for bi_next not NULL in bio_endio()	Kent Overstreet	2	-1/+10
	Recently found a bug where a driver left bi_next not NULL and then called bio_endio(), and then the submitter of the bio used bio_copy_data() which was treating src and dst as lists of bios. Fixed that bug by splitting out bio_list_copy_data(), but in case other things are depending on bi_next in weird ways, add a warning to help avoid more bugs like that in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add missing flush_dcache_page() call	Kent Overstreet	1	-0/+2
	Since a bio can point to userspace pages (e.g. direct IO), this is generally necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Split out bio_list_copy_data()	Kent Overstreet	3	-35/+55
	Found a bug (with ASAN) where we were passing a bio to bio_copy_data() with bi_next not NULL, when it should have been - a driver had left bi_next set to something after calling bio_endio(). Since the normal case is only copying single bios, split out bio_list_copy_data() to avoid more bugs like this in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add bio_copy_data_iter(), zero_fill_bio_iter()	Kent Overstreet	2	-23/+39
	Add versions that take bvec_iter args instead of using bio->bi_iter - to be used by bcachefs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Use bioset_init() for fs_bio_set	Kent Overstreet	4	-8/+7
	Minor optimization - remove a pointer indirection when using fs_bio_set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add bioset_init()/bioset_exit()	Kent Overstreet	2	-28/+67
	Similarly to mempool_init()/mempool_exit(), take a pointer indirection out of allocation/freeing by allowing biosets to be embedded in other structs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Convert bio_set to mempool_init()	Kent Overstreet	3	-39/+36
	Minor performance improvement by getting rid of pointer indirections from allocation/freeing fastpaths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	mempool: Add mempool_init()/mempool_exit()	Kent Overstreet	2	-27/+115
	Allows mempools to be embedded in other structs, getting rid of a pointer indirection from allocation fastpaths. mempool_exit() is safe to call on an uninitialized but zeroed mempool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	sbitmap: fix race in wait batch accounting	Jens Axboe	1	-10/+25
	If we have multiple callers of sbq_wake_up(), we can end up in a situation where the wait_cnt will continually go more and more negative. Consider the case where our wake batch is 1, hence wait_cnt will start out as 1. wait_cnt == 1 CPU0 CPU1 atomic_dec_return(), cnt == 0 atomic_dec_return(), cnt == -1 cmpxchg(-1, 0) (succeeds) [wait_cnt now 0] cmpxchg(0, 1) (fails) This ends up with wait_cnt being 0, we'll wakeup immediately next time. Going through the same loop as above again, and we'll have wait_cnt -1. For the case where we have a larger wake batch, the only difference is that the starting point will be higher. We'll still end up with continually smaller batch wakeups, which defeats the purpose of the rolling wakeups. Always reset the wait_cnt to the batch value. Then it doesn't matter who wins the race. But ensure that whomever does win the race is the one that increments the ws index and wakes up our batch count, loser gets to call __sbq_wake_up() again to account his wakeups towards the next active wait state index. Fixes: 6c0ca7ae292a ("sbitmap: fix wakeup hang after sbq resize") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: consistently use GFP_NOIO instead of __GFP_NORECLAIM	Christoph Hellwig	8	-15/+16
	Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: use GFP_NOIO instead of __GFP_DIRECT_RECLAIM	Christoph Hellwig	1	-3/+2
	We just can't do I/O when doing block layer requests allocations, so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: pass an explicit gfp_t to get_request	Christoph Hellwig	2	-11/+7
	blk_old_get_request already has it at hand, and in blk_queue_bio, which is the fast path, it is constant. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>