aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/block (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds16-0/+4392
Pull rdma updates from Jason Gunthorpe: "A more active cycle than most of the recent past, with a few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and flowing through RDMA due to it also introducing a new ULP. The removal of FMR has been a recurring discussion theme for a long time. And the usual smattering of features and bug fixes. Summary: - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits) RDMA/cm: Spurious WARNING triggered in cm_destroy_id() RDMA/mlx5: Return ECE DC support RDMA/mlx5: Don't rely on FW to set zeros in ECE response RDMA/mlx5: Return an error if copy_to_user fails IB/hfi1: Use free_netdev() in hfi1_netdev_free() RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr() RDMA/core: Move and rename trace_cm_id_create() IB/hfi1: Fix hfi1_netdev_rx_init() error handling RDMA: Remove 'max_map_per_fmr' RDMA: Remove 'max_fmr' RDMA/core: Remove FMR device ops RDMA/rdmavt: Remove FMR memory registration RDMA/mthca: Remove FMR support for memory registration RDMA/mlx4: Remove FMR support for memory registration RDMA/i40iw: Remove FMR leftovers RDMA/bnxt_re: Remove FMR leftovers RDMA/mlx5: Remove FMR leftovers RDMA/core: Remove FMR pool API RDMA/rds: Remove FMR support for memory registration RDMA/srp: Remove support for FMR memory registration ...
2020-06-05Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-1/+0
Pull powerpc updates from Michael Ellerman: - Support for userspace to send requests directly to the on-chip GZIP accelerator on Power9. - Rework of our lockless page table walking (__find_linux_pte()) to make it safe against parallel page table manipulations without relying on an IPI for serialisation. - A series of fixes & enhancements to make our machine check handling more robust. - Lots of plumbing to add support for "prefixed" (64-bit) instructions on Power10. - Support for using huge pages for the linear mapping on 8xx (32-bit). - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver. - Removal of some obsolete 40x platforms and associated cruft. - Initial support for booting on Power10. - Lots of other small features, cleanups & fixes. Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram Sang, Xiongfeng Wang. * tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits) powerpc/pseries: Make vio and ibmebus initcalls pseries specific cxl: Remove dead Kconfig options powerpc: Add POWER10 architected mode powerpc/dt_cpu_ftrs: Add MMA feature powerpc/dt_cpu_ftrs: Enable Prefixed Instructions powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected powerpc: Add support for ISA v3.1 powerpc: Add new HWCAP bits powerpc/64s: Don't set FSCR bits in INIT_THREAD powerpc/64s: Save FSCR to init_task.thread.fscr after feature init powerpc/64s: Don't let DT CPU features set FSCR_DSCR powerpc/64s: Don't init FSCR_DSCR in __init_FSCR() powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations powerpc/module_64: Consolidate ftrace code powerpc/32: Disable KASAN with pages bigger than 16k powerpc/uaccess: Don't set KUEP by default on book3s/32 powerpc/uaccess: Don't set KUAP by default on book3s/32 powerpc/8xx: Reduce time spent in allow_user_access() and friends ...
2020-06-04zcomp: Use ARRAY_SIZE() for backends listAndy Shevchenko1-4/+3
Instead of keeping NULL terminated array switch to use ARRAY_SIZE() which helps to further clean up. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: http://lkml.kernel.org/r/20200508100758.51644-1-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds4-39/+10
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-06-02Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-3/+3
Pull DAX updates part one from Darrick Wong: "After many years of LKML-wrangling about how to enable programs to query and influence the file data access mode (DAX) when a filesystem resides on storage devices such as persistent memory, Ira Weiny has emerged with a proposed set of standard behaviors that has not been shot down by anyone! We're more or less standardizing on the current XFS behavior and adapting ext4 to do the same. This is the first of a handful pull requests that will make ext4 and XFS present a consistent interface for user programs that care about DAX. We add a statx attribute that programs can check to see if DAX is enabled on a particular file. Then, we update the DAX documentation to spell out the user-visible behaviors that filesystems will guarantee (until the next storage industry shakeup). The on-disk inode flag has been in XFS for a few years now. Summary: - Clean up io_is_direct. - Add a new statx flag to indicate when file data access is being done via DAX (as opposed to the page cache). - Update the documentation for how system administrators and application programmers can take advantage of the (still experimental DAX) feature" Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/ * tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: Documentation/dax: Update Usage section fs/stat: Define DAX statx attribute fs: Remove unneeded IS_DAX() check in io_is_direct()
2020-06-02Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds3-387/+468
Pull block driver updates from Jens Axboe: "On top of the core changes, here are the block driver changes for this merge window: - NVMe changes: - NVMe over Fibre Channel protocol updates, which also reach over to drivers/scsi/lpfc (James Smart) - namespace revalidation support on the target (Anthony Iliopoulos) - gcc zero length array fix (Arnd Bergmann) - nvmet cleanups (Chaitanya Kulkarni) - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg) - use a SRQ per completion vector (Max Gurtovoy) - fix handling of runtime changes to the queue count (Weiping Zhang) - t10 protection information support for nvme-rdma and nvmet-rdma (Israel Rukshin and Max Gurtovoy) - target side AEN improvements (Chaitanya Kulkarni) - various fixes and minor improvements all over, icluding the nvme part of the lpfc driver" - Floppy code cleanup series (Willy, Denis) - Floppy contention fix (Jiri) - Loop CONFIGURE support (Martijn) - bcache fixes/improvements (Coly, Joe, Colin) - q->queuedata cleanups (Christoph) - Get rid of ioctl_by_bdev (Christoph, Stefan) - md/raid5 allocation fixes (Coly) - zero length array fixes (Gustavo) - swim3 task state fix (Xu)" * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits) bcache: configure the asynchronous registertion to be experimental bcache: asynchronous devices registration bcache: fix refcount underflow in bcache_device_free() bcache: Convert pr_<level> uses to a more typical style bcache: remove redundant variables i and n lpfc: Fix return value in __lpfc_nvme_ls_abort lpfc: fix axchg pointer reference after free and double frees lpfc: Fix pointer checks and comments in LS receive refactoring nvme: set dma alignment to qword nvmet: cleanups the loop in nvmet_async_events_process nvmet: fix memory leak when removing namespaces and controllers concurrently nvmet-rdma: add metadata/T10-PI support nvmet: add metadata support for block devices nvmet: add metadata/T10-PI support nvme: add Metadata Capabilities enumerations nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len nvmet: rename nvmet_rw_len to nvmet_rw_data_len nvmet: add metadata characteristics for a namespace nvme-rdma: add metadata/T10-PI support nvme-rdma: introduce nvme_rdma_sgl structure ...
2020-06-02Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds8-66/+74
Pull block updates from Jens Axboe: "Core block changes that have been queued up for this release: - Remove dead blk-throttle and blk-wbt code (Guoqing) - Include pid in blktrace note traces (Jan) - Don't spew I/O errors on wouldblock termination (me) - Zone append addition (Johannes, Keith, Damien) - IO accounting improvements (Konstantin, Christoph) - blk-mq hardware map update improvements (Ming) - Scheduler dispatch improvement (Salman) - Inline block encryption support (Satya) - Request map fixes and improvements (Weiping) - blk-iocost tweaks (Tejun) - Fix for timeout failing with error injection (Keith) - Queue re-run fixes (Douglas) - CPU hotplug improvements (Christoph) - Queue entry/exit improvements (Christoph) - Move DMA drain handling to the few drivers that use it (Christoph) - Partition handling cleanups (Christoph)" * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits) block: mark bio_wouldblock_error() bio with BIO_QUIET blk-wbt: rename __wbt_update_limits to wbt_update_limits blk-wbt: remove wbt_update_limits blk-throttle: remove tg_drain_bios blk-throttle: remove blk_throtl_drain null_blk: force complete for timeout request blk-mq: drain I/O when all CPUs in a hctx are offline blk-mq: add blk_mq_all_tag_iter blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx blk-mq: use BLK_MQ_NO_TAG in more places blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG blk-mq: move more request initialization to blk_mq_rq_ctx_init blk-mq: simplify the blk_mq_get_request calling convention blk-mq: remove the bio argument to ->prepare_request nvme: force complete cancelled requests blk-mq: blk-mq: provide forced completion method block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds block: blk-crypto-fallback: remove redundant initialization of variable err block: reduce part_stat_lock() scope block: use __this_cpu_add() instead of access by smp_processor_id() ...
2020-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-4/+2
Merge updates from Andrew Morton: "A few little subsystems and a start of a lot of MM patches. Subsystems affected by this patch series: squashfs, ocfs2, parisc, vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup, swap, memcg, pagemap, memory-failure, vmalloc, kasan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) kasan: move kasan_report() into report.c mm/mm_init.c: report kasan-tag information stored in page->flags ubsan: entirely disable alignment checks under UBSAN_TRAP kasan: fix clang compilation warning due to stack protector x86/mm: remove vmalloc faulting mm: remove vmalloc_sync_(un)mappings() x86/mm/32: implement arch_sync_kernel_mappings() x86/mm/64: implement arch_sync_kernel_mappings() mm/ioremap: track which page-table levels were modified mm/vmalloc: track which page-table levels were modified mm: add functions to track page directory modifications s390: use __vmalloc_node in stack_alloc powerpc: use __vmalloc_node in alloc_vm_stack arm64: use __vmalloc_node in arch_alloc_vmap_stack mm: remove vmalloc_user_node_flags mm: switch the test_vmalloc module to use __vmalloc_node mm: remove __vmalloc_node_flags_caller mm: remove both instances of __vmalloc_node_flags mm: remove the prot argument to __vmalloc_node mm: remove the pgprot argument to __vmalloc ...
2020-06-02mm: remove the pgprot argument to __vmallocChristoph Hellwig1-3/+1
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLENeilBrown1-1/+1
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a daemon needs to write to one bdi (the final bdi) in order to free up writes queued to another bdi (the client bdi). The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty pages, so that it can still dirty pages after other processses have been throttled. The purpose of this is to avoid deadlock that happen when the PF_LESS_THROTTLE process must write for any dirty pages to be freed, but it is being thottled and cannot write. This approach was designed when all threads were blocked equally, independently on which device they were writing to, or how fast it was. Since that time the writeback algorithm has changed substantially with different threads getting different allowances based on non-trivial heuristics. This means the simple "add 25%" heuristic is no longer reliable. The important issue is not that the daemon needs a *larger* dirty page allowance, but that it needs a *private* dirty page allowance, so that dirty pages for the "client" bdi that it is helping to clear (the bdi for an NFS filesystem or loop block device etc) do not affect the throttling of the daemon writing to the "final" bdi. This patch changes the heuristic so that the task is not throttled when the bdi it is writing to has a dirty page count below below (or equal to) the free-run threshold for that bdi. This ensures it will always be able to have some pages in flight, and so will not deadlock. In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might still be throttled by global threshold, but that is acceptable as it is only the deadlock state that is interesting for this flag. This approach of "only throttle when target bdi is busy" is consistent with the other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes attention to be focussed only on the target bdi. So this patch - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, - removes the 25% bonus that that flag gives, and - If PF_LOCAL_THROTTLE is set, don't delay at all unless the global and the local free-run thresholds are exceeded. Note that previously realtime threads were treated the same as PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour for real-time threads, so it is now different from the behaviour of nfsd and loop tasks. I don't know what is wanted for realtime. [akpm@linux-foundation.org: coding style fixes] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd] Cc: Christoph Hellwig <hch@lst.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-29null_blk: force complete for timeout requestDongli Zhang1-1/+1
The commit 7b11eab041da ("blk-mq: blk-mq: provide forced completion method") exports new API to force a request to complete without error injection. There should be no error injection when completing a request by timeout handler. Otherwise, the below would hang because timeout handler is failed. echo 100 > /sys/kernel/debug/fail_io_timeout/probability echo 1000 > /sys/kernel/debug/fail_io_timeout/times echo 1 > /sys/block/nullb0/io-timeout-fail dd if=/dev/zero of=/dev/nullb0 bs=512 count=1 oflag=direct With this patch, the timeout handler is able to complete the IO. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: drain I/O when all CPUs in a hctx are offlineMing Lei1-1/+1
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]: "That was the constraint of managed interrupts from the very beginning: The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again." However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a cpuhp state handled after the CPU is down, so there isn't any chance to quiesce the hctx before shutting down the CPU. Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs where the last CPU goes away, and wait for completion of in-flight requests. This guarantees that there is no inflight I/O before shutting down the managed IRQ. Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need to wait for completion of in-flight requests from these drivers to avoid a potential dead-lock. It is safe to do this for stacking drivers as those do not use interrupts at all and their I/O completions are triggered by underlying devices I/O completion. [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [hch: different retry mechanism, merged two patches, minor cleanups] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-28tcp: add tcp_sock_set_quickackChristoph Hellwig2-10/+2
Add a helper to directly set the TCP_QUICKACK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_nodelayChristoph Hellwig3-10/+3
Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_corkChristoph Hellwig3-19/+5
Add a helper to directly set the TCP_CORK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28zram: Use local lock to protect per-CPU dataMike Galbraith2-2/+8
The zcomp driver uses per-CPU compression. The per-CPU data pointer is acquired with get_cpu_ptr() which implicitly disables preemption. It allocates memory inside the preempt disabled region which conflicts with the PREEMPT_RT semantics. Replace the implicit preemption control with an explicit local lock. This allows RT kernels to substitute it with a real per CPU lock, which serializes the access but keeps the code section preemptible. On non RT kernels this maps to preempt_disable() as before, i.e. no functional change. [bigeasy: Use local_lock(), description, drop reordering] Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-8-bigeasy@linutronix.de
2020-05-28zram: Allocate struct zcomp_strm as per-CPU memorySebastian Andrzej Siewior2-26/+17
zcomp::stream is a per-CPU pointer, pointing to struct zcomp_strm which contains two pointers. Having struct zcomp_strm allocated directly as per-CPU memory would avoid one additional memory allocation and a pointer dereference. This also simplifies the addition of a local_lock to struct zcomp_strm. Allocate zcomp::stream directly as per-CPU memory. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200527201119.1692513-7-bigeasy@linutronix.de
2020-05-27zram: nvdimm: use bio_{start,end}_io_acct and disk_{start,end}_io_acctChristoph Hellwig1-14/+10
Switch zram to use the nicer bio accounting helpers, and as part of that ensure each bio is counted as a single I/O request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27rsxx: use bio_{start,end}_io_acctChristoph Hellwig1-17/+2
Switch rsxx to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27drbd: use bio_{start,end}_io_acctChristoph Hellwig1-23/+4
Switch drbd to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-26block/floppy: fix contended case in floppy_queue_rq()Jiri Kosina1-5/+5
Since the switch of floppy driver to blk-mq, the contended (fdc_busy) case in floppy_queue_rq() is not handled correctly. In case we reach floppy_queue_rq() with fdc_busy set (i.e. with the floppy locked due to another request still being in-flight), we put the request on the list of requests and return BLK_STS_OK to the block core, without actually scheduling delayed work / doing further processing of the request. This means that processing of this request is postponed until another request comes and passess uncontended. Which in some cases might actually never happen and we keep waiting indefinitely. The simple testcase is for i in `seq 1 2000`; do echo -en $i '\r'; blkid --info /dev/fd0 2> /dev/null; done run in quemu. That reliably causes blkid eventually indefinitely hanging in __floppy_read_block_0() waiting for completion, as the BIO callback never happens, and no further IO is ever submitted on the (non-existent) floppy device. This was observed reliably on qemu-emulated device. Fix that by not queuing the request in the contended case, and return BLK_STS_RESOURCE instead, so that blk core handles the request rescheduling and let it pass properly non-contended later. Fixes: a9f38e1dec107a ("floppy: convert to blk-mq") Cc: stable@vger.kernel.org Tested-by: Libor Pechacek <lpechacek@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-24loop: remove redundant assignment to variable errorColin Ian King1-2/+0
The variable error is being assigned a value that is never read so the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-22RDMA/rnbd: Fix compilation error when CONFIG_MODULES is disabledDanil Kipnis1-4/+7
module_is_live function is only defined when CONFIG_MODULES is enabled. Use try_module_get instead to check whether the module is being removed. When module unload and manuall unmapping is happening in parallel, we can try removing the symlink twice: rnbd_client_exit vs. rnbd_clt_unmap_dev_store. This is probably not the best way to deal with this race in general, but for now this fixes the compilation issue when CONFIG_MODULES is disabled and has no functional impact. Regression tests passed. Fixes: 1eb54f8f5dd8 ("block/rnbd: client: sysfs interface functions") Link: https://lore.kernel.org/r/20200521185909.457245-1-danil.kipnis@cloud.ionos.com Reported-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-22Merge tag 'block-5.7-2020-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds2-0/+11
Pull block fixes from Jens Axboe: "Two fixes for null_blk zone mode" * tag 'block-5.7-2020-05-22' of git://git.kernel.dk/linux-block: null_blk: don't allow discard for zoned mode null_blk: return error for invalid zone size
2020-05-22block/rnbd: Fix an IS_ERR() vs NULL check in find_or_create_sess()Dan Carpenter1-5/+4
The alloc_sess() function returns error pointers, it never returns NULL. Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality") Link: https://lore.kernel.org/r/20200519120347.GD42765@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21Merge tag 'v5.7-rc6' into rdma.git for-nextJason Gunthorpe5-75/+181
Linux 5.7-rc6 Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c resolved by deleting dr_cq_event, matching how netdev resolved it. Required for dependencies in the following patches. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21null_blk: don't allow discard for zoned modeChaitanya Kulkarni1-0/+7
Zoned block device specification do not define the behavior of discard/trim command as this command is generally replaced by the reset write pointer (zone reset) command. Emulate this in null_blk by making zoned and discard options mutually exclusive. Suggested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21null_blk: return error for invalid zone sizeChaitanya Kulkarni1-0/+4
In null_init_zone_dev() check if the zone size is larger than device capacity, return error if needed. This also fixes the following oops :- null_blk: changed the number of conventional zones to 4294967295 BUG: kernel NULL pointer dereference, address: 0000000000000010 PGD 7d76c5067 P4D 7d76c5067 PUD 7d240c067 PMD 0 Oops: 0002 [#1] SMP NOPTI CPU: 4 PID: 5508 Comm: nullbtests.sh Tainted: G OE 5.7.0-rc4lblk-fnext0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e4 RIP: 0010:null_init_zoned_dev+0x17a/0x27f [null_blk] RSP: 0018:ffffc90007007e00 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff8887fb3f3c00 RCX: 0000000000000007 RDX: 0000000000000000 RSI: ffff8887ca09d688 RDI: ffff888810fea510 RBP: 0000000000000010 R08: ffff8887ca09d688 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887c26e8000 R13: ffffffffa05e9390 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fcb5256f740(0000) GS:ffff888810e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000081e8fe000 CR4: 00000000003406e0 Call Trace: null_add_dev+0x534/0x71b [null_blk] nullb_device_power_store.cold.41+0x8/0x2e [null_blk] configfs_write_file+0xe6/0x150 vfs_write+0xba/0x1e0 ksys_write+0x5f/0xe0 do_syscall_64+0x60/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fcb51c71840 Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Add LOOP_CONFIGURE ioctlMartijn Coenen1-28/+76
This allows userspace to completely setup a loop device with a single ioctl, removing the in-between state where the device can be partially configured - eg the loop device has a backing file associated with it, but is reading from the wrong offset. Besides removing the intermediate state, another big benefit of this ioctl is that LOOP_SET_STATUS can be slow; the main reason for this slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to freeze the associated queue; this requires waiting for RCU synchronization, which I've measured can take about 15-20ms on this device on average. In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can also be used to: - Set the correct block size immediately by setting loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE) - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO) - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY in loop_config.info.lo_flags Here's setting up ~70 regular loop devices with an offset on an x86 Android device, using LOOP_SET_FD and LOOP_SET_STATUS: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m03.40s real 0m00.02s user 0m00.03s system Here's configuring ~70 devices in the same way, but using a modified losetup that uses the new LOOP_CONFIGURE ioctl: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m01.94s real 0m00.01s user 0m00.01s system Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Clean up LOOP_SET_STATUS lo_flags handlingMartijn Coenen1-6/+13
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this explicit by updating the UAPI to include the flags that can be set/cleared using this ioctl. The implementation can then blindly take over the passed in flags, and use the previous flags for those flags that can't be set / cleared using LOOP_SET_STATUS. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Rework lo_ioctl() __user argument castingMartijn Coenen1-6/+5
In preparation for a new ioctl that needs to copy_from_user(); makes the code easier to read as well. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Move loop_set_status_from_info() and friends upMartijn Coenen1-103/+103
So we can use it without forward declaration. This is a separate commit to make it easier to verify that this is just a move, without functional modifications. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Factor out configuring loop from statusMartijn Coenen1-50/+67
Factor out this code into a separate function, so it can be reused by other code more easily. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Remove figure_loop_size()Martijn Coenen1-9/+4
This function was now only used by loop_set_capacity(). Just open code the remaining code in the caller instead. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Refactor loop_set_status() size calculationMartijn Coenen1-18/+19
figure_loop_size() calculates the loop size based on the passed in parameters, but at the same time it updates the offset and sizelimit parameters in the loop device configuration. That is a somewhat unexpected side effect of a function with this name, and it is only only needed by one of the two callers of this function - loop_set_status(). Move the lo_offset and lo_sizelimit assignment back into loop_set_status(), and use the newly factored out functions to validate and apply the newly calculated size. This allows us to get rid of figure_loop_size() in a follow-up commit. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Switch to set_capacity_revalidate_and_notify()Martijn Coenen1-3/+2
This was recently added to block/genhd.c, and takes care of both updating the capacity and notifying userspace of the new size. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Factor out setting loop device sizeMartijn Coenen1-9/+21
This code is used repeatedly. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Remove sector_t truncation checksMartijn Coenen1-14/+7
sector_t is now always u64, so we don't need to check for truncation. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Call loop_config_discard() only after new config is appliedMartijn Coenen1-2/+2
loop_set_status() calls loop_config_discard() to configure discard for the loop device; however, the discard configuration depends on whether the loop device uses encryption, and when we call it the encryption configuration has not been updated yet. Move the call down so we apply the correct discard configuration based on the new configuration. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19rnbd/rtrs: Pass max segment size from blk user to the rdma libraryDanil Kipnis1-0/+1
When Block Device Layer is disabled, BLK_MAX_SEGMENT_SIZE is undefined. The rtrs is a transport library and should compile independently of the block layer. The desired max segment size should be passed down by the user. Introduce max_segment_size parameter for the rtrs_clt_open() call. Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality") Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Fixes: cb80329c9434 ("RDMA/rtrs: client: private header with client structs and functions") Fixes: b5c27cdb094e ("RDMA/rtrs: public interface header to establish RDMA connections") Link: https://lore.kernel.org/r/20200519111419.924170-1-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19null_blk: Zero-initialize read buffers in non-memory-backed modeBart Van Assche1-0/+26
This patch suppresses an uninteresting KMSAN complaint without affecting performance of the null_blk driver if CONFIG_KMSAN is disabled. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Alexander Potapenko <glider@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19ps3disk: use the default segment boundaryEmmanuel Nicolet1-1/+0
Since commit dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count"), the kernel will bug_on on the PS3 because bio_split() is called with sectors == 0: kernel BUG at block/bio.c:1853! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=8 NUMA PS3 Modules linked in: firewire_sbp2 rtc_ps3(+) soundcore ps3_gelic(+) \ ps3rom(+) firewire_core ps3vram(+) usb_common crc_itu_t CPU: 0 PID: 97 Comm: blkid Not tainted 5.3.0-rc4 #1 NIP: c00000000027d0d0 LR: c00000000027d0b0 CTR: 0000000000000000 REGS: c00000000135ae90 TRAP: 0700 Not tainted (5.3.0-rc4) MSR: 8000000000028032 <SF,EE,IR,DR,RI> CR: 44008240 XER: 20000000 IRQMASK: 0 GPR00: c000000000289368 c00000000135b120 c00000000084a500 c000000004ff8300 GPR04: 0000000000000c00 c000000004c905e0 c000000004c905e0 000000000000ffff GPR08: 0000000000000000 0000000000000001 0000000000000000 000000000000ffff GPR12: 0000000000000000 c0000000008ef000 000000000000003e 0000000000080001 GPR16: 0000000000000100 000000000000ffff 0000000000000000 0000000000000004 GPR20: c00000000062fd7e 0000000000000001 000000000000ffff 0000000000000080 GPR24: c000000000781788 c00000000135b350 0000000000000080 c000000004c905e0 GPR28: c00000000135b348 c000000004ff8300 0000000000000000 c000000004c90000 NIP [c00000000027d0d0] .bio_split+0x28/0xac LR [c00000000027d0b0] .bio_split+0x8/0xac Call Trace: [c00000000135b120] [c00000000027d130] .bio_split+0x88/0xac (unreliable) [c00000000135b1b0] [c000000000289368] .__blk_queue_split+0x11c/0x53c [c00000000135b2d0] [c00000000028f614] .blk_mq_make_request+0x80/0x7d4 [c00000000135b3d0] [c000000000283a8c] .generic_make_request+0x118/0x294 [c00000000135b4b0] [c000000000283d34] .submit_bio+0x12c/0x174 [c00000000135b580] [c000000000205a44] .mpage_bio_submit+0x3c/0x4c [c00000000135b600] [c000000000206184] .mpage_readpages+0xa4/0x184 [c00000000135b750] [c0000000001ff8fc] .blkdev_readpages+0x24/0x38 [c00000000135b7c0] [c0000000001589f0] .read_pages+0x6c/0x1a8 [c00000000135b8b0] [c000000000158c74] .__do_page_cache_readahead+0x118/0x184 [c00000000135b9b0] [c0000000001591a8] .force_page_cache_readahead+0xe4/0xe8 [c00000000135ba50] [c00000000014fc24] .generic_file_read_iter+0x1d8/0x830 [c00000000135bb50] [c0000000001ffadc] .blkdev_read_iter+0x40/0x5c [c00000000135bbc0] [c0000000001b9e00] .new_sync_read+0x144/0x1a0 [c00000000135bcd0] [c0000000001bc454] .vfs_read+0xa0/0x124 [c00000000135bd70] [c0000000001bc7a4] .ksys_read+0x70/0xd8 [c00000000135be20] [c00000000000a524] system_call+0x5c/0x70 Instruction dump: 7fe3fb78 482e30dc 7c0802a6 482e3085 7c9e2378 f821ff71 7ca42b78 7d3e00d0 7c7d1b78 79290fe0 7cc53378 69290001 <0b090000> 81230028 7bca0020 7929ba62 [ end trace 313fec760f30aa1f ]--- The problem originates from setting the segment boundary of the request queue to -1UL. This makes get_max_segment_size() return zero when offset is zero, whatever the max segment size. The test with BLK_SEG_BOUNDARY_MASK fails and 'mask - (mask & offset) + 1' overflows to zero in the return statement. Not setting the segment boundary and using the default value (BLK_SEG_BOUNDARY_MASK) fixes the problem. Signed-off-by: Emmanuel Nicolet <emmanuel.nicolet@gmail.com> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/060a416c43138f45105c0540eff1a45539f7e2fc.1589049250.git.geoff@infradead.org
2020-05-17block/rnbd: a bit of documentationJack Wang1-0/+92
README with description of major sysfs entries, sysfs documentation are moved to ABI dir as Bart suggested. Link: https://lore.kernel.org/r/20200511135131.27580-25-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17block/rnbd: include client and server modules into kernel compilationJack Wang4-0/+46
Add rnbd Makefile, Kconfig and also corresponding lines into upper block layer files. Link: https://lore.kernel.org/r/20200511135131.27580-24-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17block/rnbd: server: sysfs interface functionsJack Wang1-0/+215
This is the sysfs interface to rnbd mapped devices on server side: /sys/class/rnbd-server/ctl/devices/<device_name>/ |- block_dev | *** link pointing to the corresponding block device sysfs entry | |- sessions/<session-name>/ | *** sessions directory | |- read_only | *** is devices mapped as read only | |- mapping_path *** relative device path provided by the client during mapping Link: https://lore.kernel.org/r/20200511135131.27580-23-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17block/rnbd: server: functionality for IO submitting to block devJack Wang2-0/+226
This provides helper functions for IO submitting to block dev. Link: https://lore.kernel.org/r/20200511135131.27580-22-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17block/rnbd: server: main functionalityJack Wang1-0/+844
This is main functionality of rnbd-server module, which handles RTRS events and rnbd protocol requests, like map (open) or unmap (close) device. Also server side is responsible for processing incoming IBTRS IO requests and forward them to local mapped devices. Link: https://lore.kernel.org/r/20200511135131.27580-21-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17block/rnbd: server: private header with server structs and functionsJack Wang1-0/+78
This header describes main structs and functions used by rnbd-server module, namely structs for managing sessions from different clients and mapped (opened) devices. Link: https://lore.kernel.org/r/20200511135131.27580-20-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17block/rnbd: client: sysfs interface functionsJack Wang1-0/+636
This is the sysfs interface to rnbd block devices on client side: /sys/class/rnbd-client/ctl/ |- map_device | *** maps remote device | |- devices/ *** all mapped devices /sys/block/rnbd<N>/rnbd/ |- unmap_device | *** unmaps device | |- state | *** device state | |- session | *** session name | |- mapping_path *** path of the dev that was mapped on server Link: https://lore.kernel.org/r/20200511135131.27580-19-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17block/rnbd: client: main functionalityJack Wang1-0/+1729
This is main functionality of rnbd-client module, which provides interface to map remote device as local block device /dev/rnbd<N> and feeds RTRS with IO requests. Link: https://lore.kernel.org/r/20200511135131.27580-18-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>