aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/nbd.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-02-27nbd: fix return value in error handling pathGustavo A. R. Silva1-1/+1
It seems that the proper value to return in this particular case is the one contained into variable new_index instead of ret. Addresses-Coverity-ID: 1465148 ("Copy-paste error") Fixes: e46c7287b1c2 ("nbd: add a basic netlink interface") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-14Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-16/+10
Pull core block layer updates from Jens Axboe: "This is the main pull request for block storage for 4.15-rc1. Nothing out of the ordinary in here, and no API changes or anything like that. Just various new features for drivers, core changes, etc. In particular, this pull request contains: - A patch series from Bart, closing the whole on blk/scsi-mq queue quescing. - A series from Christoph, building towards hidden gendisks (for multipath) and ability to move bio chains around. - NVMe - Support for native multipath for NVMe (Christoph). - Userspace notifications for AENs (Keith). - Command side-effects support (Keith). - SGL support (Chaitanya Kulkarni) - FC fixes and improvements (James Smart) - Lots of fixes and tweaks (Various) - bcache - New maintainer (Michael Lyle) - Writeback control improvements (Michael) - Various fixes (Coly, Elena, Eric, Liang, et al) - lightnvm updates, mostly centered around the pblk interface (Javier, Hans, and Rakesh). - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph) - Writeback series that fix the much discussed hundreds of millions of sync-all units. This goes all the way, as discussed previously (me). - Fix for missing wakeup on writeback timer adjustments (Yafang Shao). - Fix laptop mode on blk-mq (me). - {mq,name} tupple lookup for IO schedulers, allowing us to have alias names. This means you can use 'deadline' on both !mq and on mq (where it's called mq-deadline). (me). - blktrace race fix, oopsing on sg load (me). - blk-mq optimizations (me). - Obscure waitqueue race fix for kyber (Omar). - NBD fixes (Josef). - Disable writeback throttling by default on bfq, like we do on cfq (Luca Miccio). - Series from Ming that enable us to treat flush requests on blk-mq like any other request. This is a really nice cleanup. - Series from Ming that improves merging on blk-mq with schedulers, getting us closer to flipping the switch on scsi-mq again. - BFQ updates (Paolo). - blk-mq atomic flags memory ordering fixes (Peter Z). - Loop cgroup support (Shaohua). - Lots of minor fixes from lots of different folks, both for core and driver code" * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits) nvme: fix visibility of "uuid" ns attribute blk-mq: fixup some comment typos and lengths ide: ide-atapi: fix compile error with defining macro DEBUG blk-mq: improve tag waiting setup for non-shared tags brd: remove unused brd_mutex blk-mq: only run the hardware queue if IO is pending block: avoid null pointer dereference on null disk fs: guard_bio_eod() needs to consider partitions xtensa/simdisk: fix compile error nvme: expose subsys attribute to sysfs nvme: create 'slaves' and 'holders' entries for hidden controllers block: create 'slaves' and 'holders' entries for hidden gendisks nvme: also expose the namespace identification sysfs files for mpath nodes nvme: implement multipath access to nvme subsystems nvme: track shared namespaces nvme: introduce a nvme_ns_ids structure nvme: track subsystems block, nvme: Introduce blk_mq_req_flags_t block, scsi: Make SCSI quiesce and resume work reliably block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag ...
2017-11-06nbd: don't start req until after the dead connection logicJosef Bacik1-13/+7
We can end up sleeping for a while waiting for the dead timeout, which means we could get the per request timer to fire. We did handle this case, but if the dead timeout happened right after we submitted we'd either tear down the connection or possibly requeue as we're handling an error and race with the endio which can lead to panics and other hilarity. Fixes: 560bc4b39952 ("nbd: handle dead connections") Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-06nbd: wait uninterruptible for the dead timeoutJosef Bacik1-3/+3
If we have a pending signal or the user kills their application then it'll bring down the whole device, which is less than awesome. Instead wait uninterruptible for the dead timeout so we're sure we gave it our best shot. Fixes: 560bc4b39952 ("nbd: handle dead connections") Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-24nbd: handle interrupted sendmsg with a sndtimeo setJosef Bacik1-2/+11
If you do not set sk_sndtimeo you will get -ERESTARTSYS if there is a pending signal when you enter sendmsg, which we handle properly. However if you set a timeout for your commands we'll set sk_sndtimeo to that timeout, which means that sendmsg will start returning -EINTR instead of -ERESTARTSYS. Fix this by checking either cases and doing the correct thing. Cc: stable@vger.kernel.org Fixes: dc88e34d69d8 ("nbd: set sk->sk_sndtimeo for our sockets") Reported-and-tested-by: Daniel Xu <dlxu@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-09nbd: don't set the device size until we're connectedJosef Bacik1-1/+1
A user reported a regression with using the normal ioctl interface on newer kernels. This happens because I was setting the device size before the device was actually connected, which caused us to error out and close everything down. This didn't happen on netlink because we hold the device lock the whole time we're setting things up, but we don't do that for the ioctl path. This fixes the problem. Cc: stable@vger.kernel.org Fixes: 29eaadc ("nbd: stop using the bdev everywhere") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-02nbd: fix -ERESTARTSYS handlingJosef Bacik1-1/+5
Christoph made it so that if we return'ed BLK_STS_RESOURCE whenever we got ERESTARTSYS from sending our packets we'd return BLK_STS_OK, which means we'd never requeue and just hang. We really need to return the right value from the upper layer. Fixes: fc17b6534eb8 ("blk-mq: switch ->queue_rq return value to blk_status_t") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nbd: ignore non-nbd ioctl'sJosef Bacik1-0/+6
In testing we noticed that nbd would spew if you ran a fio job against the raw device itself. This is because fio calls a block device specific ioctl, however the block layer will first pass this back to the driver ioctl handler in case the driver wants to do something special. Since the device was setup using netlink this caused us to spew every time fio called this ioctl. Since we don't have special handling, just error out for any non-nbd specific ioctl's that come in. This fixes the spew. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-28nbd: make device_attribute constBhumika Goyal1-1/+1
Make this const as is is only passed as an argument to the function device_create_file and device_remove_file and the corresponding arguments are of type const. Done using Coccinelle Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-17nbd: change the default nbd partitionsJosef Bacik1-2/+2
There's no reason to have partitions disabled for nbd by default, it costs us nothing to have it enabled and is just confusing/obnoxious to users who try to use partitions with nbd. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-17nbd: allow device creation at a specific indexJosef Bacik1-0/+9
If users really want to use a particular index for their nbd device and it doesn't already exist there's no reason we can't just create it for them. Do this instead of erroring out. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-25nbd: clear disconnected on reconnectJosef Bacik1-0/+2
If our device loses its connection for longer than the dead timeout we will set NBD_DISCONNECTED in order to quickly fail any pending IO's that flood in after the IO's that were waiting during the dead timer. However if we re-connect at some point in the future we'll still see this DISCONNECTED flag set if we then lose our connection again after that, which means we won't get notifications for our newly lost connections. Fix this by just clearing the DISCONNECTED flag on reconnect in order to make sure everything works as it's supposed to. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-22nbd: only set sndtimeo if we have a timeout setJosef Bacik1-2/+5
A user reported that he was getting immediate disconnects with my sndtimeo patch applied. This is because by default the OSS nbd client doesn't set a timeout, so we end up setting the sndtimeo to 0, which of course means we have send errors a lot. Instead only set our sndtimeo if the user specified a timeout, otherwise we'll just wait forever like we did previously. Fixes: dc88e34d69d8 ("nbd: set sk->sk_sndtimeo for our sockets") Reported-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-22nbd: take tx_lock before disconnectingJosef Bacik1-0/+4
We need to take the tx_lock so we don't interleave our disconnect request between real data going down the wire. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-22nbd: allow multiple disconnects to be sentJosef Bacik1-3/+2
There's no reason to limit ourselves to one disconnect message per socket. Sometimes networks do strange things, might as well let sysadmins hit the panic button as much as they want. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-13nbd: kill unused ret in recv_workKefeng Wang1-2/+0
No need to return value in queue work, kill ret variable. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-06nbd: quiesce request queues to make sure no submissions are inflightSagi Grimberg1-2/+2
Unlike blk_mq_stop_hw_queues, blk_mq_quiesce_queue respects the submission path rcu grace. quiesce the queue before iterating on live tags. Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-06-12Merge tag 'v4.12-rc5' into for-4.13/blockJens Axboe1-10/+5
We've already got a few conflicts and upcoming work depends on some of the changes that have gone into mainline as regression fixes for this series. Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream trees to continue working on 4.13 changes. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-09blk-mq: switch ->queue_rq return value to blk_status_tChristoph Hellwig1-8/+4
Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09block: introduce new block status code typeChristoph Hellwig1-7/+7
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09nbd: set sk->sk_sndtimeo for our socketsJosef Bacik1-0/+2
If the nbd server stops receiving packets altogether we will get stuck waiting for them to receive indefinitely as the tcp buffer will never empty, which looks like a deadlock. Fix this by setting the sk send timeout to our configured timeout, that way if the server really misbehaves we'll disconnect cleanly instead of waiting forever. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-30nbd: add FUA op supportShaun McDowell1-3/+13
NBD userland client and server have FUA (forced unit access) support and flags defined. Make NBD kernel module recognize NBD_FLAG_SEND_FUA, enable FUA on the queue, and forward FUA requests to the server. Signed-off-by: Shaun McDowell <shaunjmcdowell@gmail.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-30nbd: don't leak nbd_configIlya Dryomov1-0/+1
nbd_config is allocated in nbd_alloc_config(), but never freed. Fixes: 5ea8d10802ec ("nbd: separate out the config information") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-30nbd: nbd_reset() call in nbd_dev_add() is redundantIlya Dryomov1-10/+4
There is nothing to clear -- nbd_device has just been allocated. Fold nbd_reset() into its other caller, nbd_config_put(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-08treewide: convert PF_MEMALLOC manipulations to new helpersVlastimil Babka1-3/+4
We now have memalloc_noreclaim_{save,restore} helpers for robust setting and clearing of PF_MEMALLOC. Let's convert the code which was using the generic tsk_restore_flags(). No functional change. [vbabka@suse.cz: in net/core/sock.c the hunk is missing] Link: http://lkml.kernel.org/r/20170405074700.29871-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Chris Leech <cleech@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Wouter Verhelst <w@uter.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-06Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+3
Pull block fixes and updates from Jens Axboe: "Some fixes and followup features/changes that should go in, in this merge window. This contains: - Two fixes for lightnvm from Javier, fixing problems in the new code merge previously in this merge window. - A fix from Jan for the backing device changes, fixing an issue in NFS that causes a failure to mount on certain setups. - A change from Christoph, cleaning up the blk-mq init and exit request paths. - Remove elevator_change(), which is now unused. From Bart. - A fix for queue operation invocation on a dead queue, from Bart. - A series fixing up mtip32xx for blk-mq scheduling, removing a bandaid we previously had in place for this. From me. - A regression fix for this series, fixing a case where we wait on workqueue flushing from an invalid (non-blocking) context. From me. - A fix/optimization from Ming, ensuring that we don't both quiesce and freeze a queue at the same time. - A fix from Peter on lock ordering for CPU hotplug. Not a real problem right now, but will be once the CPU hotplug rework goes in. - A series from Omar, cleaning up out blk-mq debugfs support, and adding support for exporting info from schedulers in debugfs as well. This is really useful in debugging stalls or livelocks. From Omar" * 'for-linus' of git://git.kernel.dk/linux-block: (28 commits) mq-deadline: add debugfs attributes kyber: add debugfs attributes blk-mq-debugfs: allow schedulers to register debugfs attributes blk-mq: untangle debugfs and sysfs blk-mq: move debugfs declarations to a separate header file blk-mq: Do not invoke queue operations on a dead queue blk-mq-debugfs: get rid of a bunch of boilerplate blk-mq-debugfs: rename hw queue directories from <n> to hctx<n> blk-mq-debugfs: don't open code strstrip() blk-mq-debugfs: error on long write to queue "state" file blk-mq-debugfs: clean up flag definitions blk-mq-debugfs: separate flags with | nfs: Fix bdi handling for cloned superblocks block/mq: Cure cpu hotplug lock inversion lightnvm: fix bad back free on error path lightnvm: create cmd before allocating request blk-mq: don't use sync workqueue flushing from drivers mtip32xx: convert internal commands to regular block infrastructure mtip32xx: cleanup internal tag assumptions block: don't call blk_mq_quiesce_queue() after queue is frozen ...
2017-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-2/+2
Pull networking updates from David Millar: "Here are some highlights from the 2065 networking commits that happened this development cycle: 1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri) 2) Add a generic XDP driver, so that anyone can test XDP even if they lack a networking device whose driver has explicit XDP support (me). 3) Sparc64 now has an eBPF JIT too (me) 4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei Starovoitov) 5) Make netfitler network namespace teardown less expensive (Florian Westphal) 6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana) 7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger) 8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky) 9) Multiqueue support in stmmac driver (Joao Pinto) 10) Remove TCP timewait recycling, it never really could possibly work well in the real world and timestamp randomization really zaps any hint of usability this feature had (Soheil Hassas Yeganeh) 11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay Aleksandrov) 12) Add socket busy poll support to epoll (Sridhar Samudrala) 13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso, and several others) 14) IPSEC hw offload infrastructure (Steffen Klassert)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits) tipc: refactor function tipc_sk_recv_stream() tipc: refactor function tipc_sk_recvmsg() net: thunderx: Optimize page recycling for XDP net: thunderx: Support for XDP header adjustment net: thunderx: Add support for XDP_TX net: thunderx: Add support for XDP_DROP net: thunderx: Add basic XDP support net: thunderx: Cleanup receive buffer allocation net: thunderx: Optimize CQE_TX handling net: thunderx: Optimize RBDR descriptor handling net: thunderx: Support for page recycling ipx: call ipxitf_put() in ioctl error path net: sched: add helpers to handle extended actions qed*: Fix issues in the ptp filter config implementation. qede: Fix concurrency issue in PTP Tx path processing. stmmac: Add support for SIMATIC IOT2000 platform net: hns: fix ethtool_get_strings overflow in hns driver tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 bpf, arm64: implement jiting of BPF_XADD ...
2017-05-02blk-mq: update ->init_request and ->exit_request prototypesChristoph Hellwig1-4/+3
Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-01Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...
2017-04-28nbd: fix use after free on module unloadJosef Bacik1-4/+4
list_for_each_entry() isn't super safe if we're freeing the objects while we traverse the list. Also don't bother taking the extra reference, the module refcounting stuff will save us from having anybody messing with the device while we're trying to unload. Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20nbd: set the max segments to USHRT_MAXJosef Bacik1-0/+1
I lack the basic understanding of what segments mean, so we were being limited to 512kib requests even with higher max_sectors sizes set. Setting the maximum number of segments to unlimited allows us to actually have arbitrarily large IO's go through NBD. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20blk-mq: remove the error argument to blk_mq_complete_requestChristoph Hellwig1-2/+2
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20nbd: don't use req->errorsChristoph Hellwig1-15/+15
Add a nbd-specific field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19nbd: set the max segment size to UINT_MAXJosef Bacik1-0/+1
NBD doesn't care about limiting the segment size, let the user push the largest bio's they want. This allows us to control the request size solely through max_sectors_kb. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a flag to destroy an nbd device on disconnectJosef Bacik1-0/+30
For ease of management it would be nice for users to specify that the device node for a nbd device is destroyed once it is disconnected and there are no more users. Add a client flag and enable this operation to happen. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add device refcountingJosef Bacik1-18/+86
In order to support deleting the device on disconnect we need to refcount the actual nbd_device struct. So add the refcounting framework and change how we free the normal devices at rmmod time so we can catch reference leaks. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a status netlink commandJosef Bacik1-0/+108
Allow users to query the status of existing nbd devices. Right now this only returns whether or not the device is connected, but could be extended in the future to include more information. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: handle dead connectionsJosef Bacik1-4/+59
Sometimes we like to upgrade our server without making all of our clients freak out and reconnect. This patch provides a way to specify a dead connection timeout to allow us to pause all requests and wait for new connections to be opened. With this in place I can take down the nbd server for less than the dead connection timeout time and bring it back up and everything resumes gracefully. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: only clear the queue on device teardownJosef Bacik1-1/+3
When running a disconnect torture test I noticed that sometimes we would crash with a negative ref count on our queue. This was because we were ending the same request twice. Turns out we were racing with NBD_CLEAR_SOCK clearing the requests as well as the teardown of the device clearing the requests. So instead make the ioctl only shutdown the sockets and make it so that we only ever run nbd_clear_que from the device teardown. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: multicast dead link notificationsJosef Bacik1-12/+77
Provide a mechanism to notify userspace that there's been a link problem on a NBD device. This will allow userspace to re-establish a connection and provide the new socket to the device without disrupting the device. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a reconfigure netlink commandJosef Bacik1-0/+141
We want to be able to reconnect dead connections to existing block devices, so add a reconfigure netlink command. We will also allow users to change their timeout on the fly, but everything else will require a disconnect and reconnect. You won't be able to add more connections either, simply replace dead connections with new more lively connections. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a basic netlink interfaceJosef Bacik1-26/+290
The existing ioctl interface for configuring NBD devices is a bit cumbersome and hard to extend. The other problem is we leave a userspace app sitting in it's syscall until the device disconnects, which is less than ideal. This patch introduces a netlink interface for adding and disconnecting nbd devices. This has the benefits of being easily extendable without breaking older userspace applications, and allows us to configure a nbd device without leaving a userspace app sitting waiting for the device to disconnect. With this interface we also gain the ability to configure more devices than are preallocated at insmod time. We also have gained the ability to not specify a particular device and be provided one for us so that userspace doesn't need to find a free device to configure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: stop using the bdev everywhereJosef Bacik1-55/+37
In preparation for the upcoming netlink interface we need to not rely on already having the bdev for the NBD device we are doing operations on. Instead of passing the bdev around, just use it in places where we know we already have the bdev. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: separate out the config informationJosef Bacik1-174/+263
In order to properly refcount the various aspects of a NBD device we need to separate out the configuration elements of the nbd device. The configuration of a NBD device has a different lifetime from the actual device, so it doesn't make sense to bundle these two concepts. Add a config_refs to keep track of the configuration structure, that way we can be sure that we never access it when we've torn down the device. Add a new nbd_config structure to hold all of the transient configuration information. Finally create this when we open the device so that it is in place when we start to configure the device. This has a nice side-effect of fixing a long standing problem where you could end up with a half-configured nbd device that needed to be "disconnected" in order to be usable again. Now once we close our device the configuration will be discarded. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: handle single path failures gracefullyJosef Bacik1-26/+125
Currently if we have multiple connections and one of them goes down we will tear down the whole device. However there's no reason we need to do this as we could have other connections that are working fine. Deal with this by keeping track of the state of the different connections, and if we lose one we mark it as dead and send all IO destined for that socket to one of the other healthy sockets. Any outstanding requests that were on the dead socket will timeout and be re-submitted properly. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: put socket in error casesJosef Bacik1-2/+7
When adding a new socket we look it up and then try to add it to our configuration. If any of those steps fail we need to make sure we put the socket so we don't leak them. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-11sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()NeilBrown1-1/+1
It is not safe for one thread to modify the ->flags of another thread as there is no locking that can protect the update. So tsk_restore_flags(), which takes a task pointer and modifies the flags, is an invitation to do the wrong thing. All current users pass "current" as the task, so no developers have accepted that invitation. It would be best to ensure it remains that way. So rename tsk_restore_flags() to current_restore_flags() and don't pass in a task_struct pointer. Always operate on current->flags. Signed-off-by: NeilBrown <neilb@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-08block: remove the discard_zeroes_data flagChristoph Hellwig1-1/+0
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can kill this hack. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07Merge branch 'for-linus' into for-4.12/blockJens Axboe1-33/+103
We've added a considerable amount of fixes for stalls and issues with the blk-mq scheduling in the 4.11 series since forking off the for-4.12/block branch. We need to do improvements on top of that for 4.12, so pull in the previous fixes to make our lives easier going forward. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-31blk-mq: constify struct blk_mq_opsEric Biggers1-1/+1
Constify all instances of blk_mq_ops, as they are never modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>