aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-03-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-13/+24
Pull block fixes from Jens Axboe: "Final round of fixes for this merge window - some of this has come up after the initial pull request, and some of it was put in a post-merge branch before the merge window. This contains: - Fix for a bad check for an error on dma mapping in the mtip32xx driver, from Alexey Khoroshilov. - A set of fixes for lightnvm, from Javier, Matias, and Wenwei. - An NVMe completion record corruption fix from Marta, ensuring that we read things in the right order. - Two writeback fixes from Tejun, marked for stable@ as well. - A blk-mq sw queue iterator fix from Thomas, fixing an oops for sparse CPU maps. They hit this in the hot plug/unplug rework" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: avoid cqe corruption when update at the same time as read writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list() blk-mq: Use proper cpumask iterator mtip32xx: fix checks for dma mapping errors lightnvm: do not load L2P table if not supported lightnvm: do not reserve lun on l2p loading nvme: lightnvm: return ppa completion status lightnvm: add a bitmap of luns lightnvm: specify target's logical address area null_blk: add lightnvm null_blk device to the nullb_list
2016-03-24Merge tag 'for-linus-20160324' of git://git.infradead.org/linux-mtdLinus Torvalds4-29/+47
Pull MTD updates from Brian Norris: "NAND: - Add sunxi_nand randomizer support - begin refactoring NAND ecclayout structs - fix pxa3xx_nand dmaengine usage - brcmnand: fix support for v7.1 controller - add Qualcomm NAND controller driver SPI NOR: - add new ls1021a, ls2080a support to Freescale QuadSPI - add new flash ID entries - support bottom-block protection for Winbond flash - support Status Register Write Protect - remove broken QPI support for Micron SPI flash JFFS2: - improve post-mount CRC scan efficiency General: - refactor bcm63xxpart parser, to later extend for NAND - add writebuf size parameter to mtdram Other minor code quality improvements" * tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd: (72 commits) mtd: nand: remove kerneldoc for removed function parameter mtd: nand: Qualcomm NAND controller driver dt/bindings: qcom_nandc: Add DT bindings mtd: nand: don't select chip in nand_chip's block_bad op mtd: spi-nor: support lock/unlock for a few Winbond chips mtd: spi-nor: add TB (Top/Bottom) protect support mtd: spi-nor: add SPI_NOR_HAS_LOCK flag mtd: spi-nor: use BIT() for flash_info flags mtd: spi-nor: disallow further writes to SR if WP# is low mtd: spi-nor: make lock/unlock bounds checks more obvious and robust mtd: spi-nor: silently drop lock/unlock for already locked/unlocked region mtd: spi-nor: wait for SR_WIP to clear on initial unlock mtd: nand: simplify nand_bch_init() usage mtd: mtdswap: remove useless if (!mtd->ecclayout) test mtd: create an mtd_oobavail() helper and make use of it mtd: kill the ecclayout->oobavail field mtd: nand: check status before reporting timeout mtd: bcm63xxpart: give width specifier an 'int', not 'size_t' mtd: mtdram: Add parameter for setting writebuf size mtd: nand: pxa3xx_nand: kill unused field 'drcmr_cmd' ...
2016-03-24Merge tag 'upstream-4.6-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds4-25/+75
Pull UBI/UBIFS updates from Richard Weinberger: "This contains cleanups and a maintainer update for UBI and UBIFS" * tag 'upstream-4.6-rc1' of git://git.infradead.org/linux-ubifs: ubifs: Remove unused header MAINTAINERS: Update UBIFS entry mtd: ubi: Add logging functions ubi_msg, ubi_warn and ubi_err ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn
2016-03-24Merge tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds19-92/+681
Pull more nfsd updates from Bruce Fields: "Apologies for the previous request, which omitted the top 8 commits from my for-next branch (including the SCSI layout commits). Thanks to Trond for spotting my error!" This actually includes the new layout types, so here's that part of the pull message repeated: "Support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. Note this pull request also includes the client side of SCSI layout, with Trond's permission" * tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux: nfsd: use short read as well as i_size to set eof nfsd: better layoutupdate bounds-checking nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK nfsd: add SCSI layout support nfsd: move some blocklayout code nfsd: add a new config option for the block layout driver nfs/blocklayout: add SCSI layout support nfs4.h: add SCSI layout definitions
2016-03-24Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linuxLinus Torvalds4-20/+28
Pull nfsd updates from Bruce Fields: "Various bugfixes, a RDMA update from Chuck Lever, and support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. (Also: note this pull request also includes the client side of SCSI layout, with Trond's permission.)" * tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race nfsd: recover: fix memory leak nfsd: fix deadlock secinfo+readdir compound nfsd4: resfh unused in nfsd4_secinfo svcrdma: Use new CQ API for RPC-over-RDMA server send CQs svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs svcrdma: Remove close_out exit path svcrdma: Hook up the logic to return ERR_CHUNK svcrdma: Use correct XID in error replies svcrdma: Make RDMA_ERROR messages work rpcrdma: Add RPCRDMA_HDRLEN_ERR svcrdma: svc_rdma_post_recv() should close connection on error svcrdma: Close connection when a send error occurs nfsd: Lower NFSv4.1 callback message size limit svcrdma: Do not send Write chunk XDR pad with inline content svcrdma: Do not write xdr_buf::tail in a Write chunk svcrdma: Find client-provided write and reply chunks once per reply nfsd: Update NFS server comments related to RDMA support nfsd: Fix a memory leak when meeting unsupported state_protect_how4 nfsd4: fix bad bounds checking
2016-03-23nfsd: use short read as well as i_size to set eofBenjamin Coddington3-7/+30
Use the result of a local read to determine when to set the eof flag. This allows us to return the location of the end of the file atomically at the time of the read. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [bfields: add some documentation] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-23Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-4/+2
Pull crypto fixes from Herbert Xu: "This fixes the following issues: API: - Fix kzalloc error path crash in ecryptfs added by skcipher conversion. Note the subject of the commit is screwed up and the correct subject is actually in the body. Drivers: - A number of fixes to the marvell cesa hashing code. - Remove bogus nested irqsave that clobbers the saved flags in ccp" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: marvell/cesa - forward devm_ioremap_resource() error code crypto: marvell/cesa - initialize hash states crypto: marvell/cesa - fix memory leak crypto: ccp - fix lock acquisition code eCryptfs: Use skcipher and shash
2016-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds16-25/+977
Merge third patch-bomb from Andrew Morton: - more ocfs2 changes - a few hotfixes - Andy's compat cleanups - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd, panic, ipmi, kgdb, profile, kfifo, ubsan, etc. - many rapidio updates: fixes, new drivers. - kcov: kernel code coverage feature. Like gcov, but not "prohibitively expensive". - extable code consolidation for various archs * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) ia64/extable: use generic search and sort routines x86/extable: use generic search and sort routines s390/extable: use generic search and sort routines alpha/extable: use generic search and sort routines kernel/...: convert pr_warning to pr_warn drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP memremap: add MEMREMAP_WC flag memremap: don't modify flags kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE mm/mprotect.c: don't imply PROT_EXEC on non-exec fs ipc/sem: make semctl setting sempid consistent ubsan: fix tree-wide -Wmaybe-uninitialized false positives kfifo: fix sparse complaints scripts/gdb: account for changes in module data structure scripts/gdb: add cmdline reader command scripts/gdb: add version command kernel: add kcov code coverage profile: hide unused functions when !CONFIG_PROC_FS hpwdt: use nmi_panic() when kernel panics in NMI handler ...
2016-03-22eventfd: document lockless access in eventfd_pollPaolo Bonzini1-2/+40
Since commit e22553e2a25e ("eventfd: don't take the spinlock in eventfd_poll", 2015-02-17), eventfd is reading ctx->count outside ctx->wqh.lock. However, things aren't as simple as the read barrier in eventfd_poll would suggest. In fact, the read barrier, besides lacking a comment, is not paired in any obvious manner with another read barrier, and it is pointless because it is sitting between a write (deep in poll_wait) and the read of ctx->count. The read barrier is acting just as a compiler barrier, for which we can use READ_ONCE instead. This is what the code change in this patch does. The documentation change is just as important, however. The question, posed by Andrea Arcangeli, is then why the thing is safe on architectures where spin_unlock does not imply a store-load memory barrier. The answer is that it's safe because writes of ctx->count use the same lock as poll_wait, and hence an acquire barrier implicit in poll_wait provides the necessary synchronization between eventfd_poll and callers of wake_up_locked_poll. This is sort of mentioned in the commit message with respect to eventfd_ctx_read ("eventfd_read is similar, it will do a single decrement with the lock held") but it applies to all other callers too. It's tricky enough that it should be documented in the code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Mason <clm@fb.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22fs/coredump: prevent fsuid=0 dumps into user-controlled directoriesJann Horn3-9/+29
This commit fixes the following security hole affecting systems where all of the following conditions are fulfilled: - The fs.suid_dumpable sysctl is set to 2. - The kernel.core_pattern sysctl's value starts with "/". (Systems where kernel.core_pattern starts with "|/" are not affected.) - Unprivileged user namespace creation is permitted. (This is true on Linux >=3.8, but some distributions disallow it by default using a distro patch.) Under these conditions, if a program executes under secure exec rules, causing it to run with the SUID_DUMP_ROOT flag, then unshares its user namespace, changes its root directory and crashes, the coredump will be written using fsuid=0 and a path derived from kernel.core_pattern - but this path is interpreted relative to the root directory of the process, allowing the attacker to control where a coredump will be written with root privileges. To fix the security issue, always interpret core_pattern for dumps that are written under SUID_DUMP_ROOT relative to the root directory of init. Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22fat: add config option to set UTF-8 mount option by defaultMaciej S. Szmigiero2-2/+20
FAT has long supported its own default file name encoding config setting, separate from CONFIG_NLS_DEFAULT. However, if UTF-8 encoded file names are desired FAT character set should not be set to utf8 since this would make file names case sensitive even if case insensitive matching is requested. Instead, "utf8" mount options should be provided to enable UTF-8 file names in FAT file system. Unfortunately, there was no possibility to set the default value of this option so on UTF-8 system "utf8" mount option had to be added manually to most FAT mounts. This patch adds config option to set such default value. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22ext4: in ext4_dir_llseek, check syscall bitness directlyAndy Lutomirski1-1/+1
ext4 treats directory offsets differently for 32-bit and 64-bit callers. Check the caller type using in_compat_syscall, not is_compat_task. This changes behavior on SPARC slightly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22ocfs2: check/fix inode block for online file checkGang He2-9/+218
Implement online check or fix inode block during reading a inode block to memory. Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22ocfs2: create/remove sysfile for online file checkGang He1-0/+5
Create online file check sysfile when ocfs2 mount, remove the related sysfile when ocfs2 umount. Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22ocfs2: sysfile interfaces for online file checkGang He4-1/+660
Implement online file check sysfile interfaces, e.g. how to create the related sysfile according to device name, how to display/handle file check request from the sysfile. Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22ocfs2: export ocfs2_kset for online file checkGang He2-1/+4
When there are errors in the ocfs2 filesystem, they are usually accompanied by the inode number which caused the error. This inode number would be the input to fixing the file. One of these options could be considered: A file in the sys filesytem which would accept inode numbers. This could be used to communication back what has to be fixed or is fixed. You could write: $# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/check or $# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/fix Compare with second version, I re-design filecheck sysfs interfaces, there are three sysfs files (check, fix and set) under filecheck directory (see above), sysfs will accept only one argument <inode>. Second, I adjust some code in ocfs2_filecheck_repair_inode_block() function according to upstream feedback, we cannot just add VALID_FL flag back as a inode block fix, then we will not fix this field corruption currently until having a complete solution. Compare with first version, I use strncasecmp instead of double strncmp functions. Second, update the source file contribution vendor. This patch (of 4): Export ocfs2_kset object from ocfs2_stackglue kernel module, then online file check code will create the related sysfiles under ocfs2_kset object. We're exporting this because it's built in ocfs2_stackglue.ko. Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds13-106/+187
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Add support for multiple NFSv4.1 callbacks in flight - Initial patchset for RPC multipath support - Adapt RPC/RDMA to use the new completion queue API Bugfixes and cleanups: - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync - Fix RPC/RDMA credit accounting - Properly handle RDMA_ERROR replies - xprtrdma: Do not wait if ib_post_send() fails - xprtrdma: Segment head and tail XDR buffers on page boundaries - xprtrdma cleanups for dprintk, physical_op_map and unused macros" * tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits) nfs/blocklayout: make sure making a aligned read request nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed nfs: remove nfs_inode_dio_wait nfs: remove nfs4_file_fsync xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3 SUNRPC: Allow addition of new transports to a struct rpc_clnt NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections SUNRPC: Make NFS swap work with multipath ...
2016-03-22Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfsLinus Torvalds5-32/+136
Pull overlayfs updates from Miklos Szeredi: "Various fixes and tweaks" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: cleanup unused var in rename2 ovl: rename is_merge to is_lowest ovl: fixed coding style warning ovl: Ensure upper filesystem supports d_type ovl: Warn on copy up if a process has a R/O fd open to the lower file ovl: honor flag MS_SILENT at mount ovl: verify upper dentry before unlink and rename
2016-03-22Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds3-23/+46
Pull fuse update from Miklos Szeredi: "This contains direct I/O fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: return patrial success from fuse_direct_io() fuse: Add reference counting for fuse_io_priv fuse: do not use iocb after it may have been freed
2016-03-22nfsd: better layoutupdate bounds-checkingJ. Bruce Fields1-4/+8
You could add any multiple of 2^32/PNFS_SCSI_RANGE_SIZE to nr_iomaps and still pass this check. You'd probably still fail the following kcalloc, but best to be paranoid since this is from-the-wire data. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-21Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds35-681/+1091
Pull btrfs updates from Chris Mason: "We have a good sized cleanup of our internal read ahead code, and the first series of commits from Chandan to enable PAGE_SIZE > sectorsize Otherwise, it's a normal series of cleanups and fixes, with many thanks to Dave Sterba for doing most of the patch wrangling this time" * 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (82 commits) btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums btrfs: Fix misspellings in comments. btrfs: Print Warning only if ENOSPC_DEBUG is enabled btrfs: scrub: silence an uninitialized variable warning btrfs: move btrfs_compression_type to compression.h btrfs: rename btrfs_print_info to btrfs_print_mod_info Btrfs: Show a warning message if one of objectid reaches its highest value Documentation: btrfs: remove usage specific information btrfs: use kbasename in btrfsic_mount Btrfs: do not collect ordered extents when logging that inode exists Btrfs: fix race when checking if we can skip fsync'ing an inode Btrfs: fix listxattrs not listing all xattrs packed in the same item Btrfs: fix deadlock between direct IO reads and buffered writes Btrfs: fix extent_same allowing destination offset beyond i_size Btrfs: fix file loss on log replay after renaming a file and fsync Btrfs: fix unreplayable log after snapshot delete + parent dir fsync Btrfs: fix lockdep deadlock warning due to dev_replace btrfs: drop unused argument in btrfs_ioctl_get_supported_features btrfs: add GET_SUPPORTED_FEATURES to the control device ioctls btrfs: change max_inline default to 2048 ...
2016-03-21Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds13-437/+463
Pull UDF and quota updates from Jan Kara: "This contains a rewrite of UDF handling of filename encoding to fix remaining overflow issues from Andrew Gabbasov and quota changes to support new Q_[X]GETNEXTQUOTA quotactl for VFS quota formats" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Fix possible GPF due to uninitialised pointers ext4: Make Q_GETNEXTQUOTA work for quota in hidden inodes quota: Forbid Q_GETQUOTA and Q_GETNEXTQUOTA for frozen filesystem quota: Fix possible races during quota loading ocfs2: Implement get_next_id() quota_v2: Implement get_next_id() for V2 quota format quota: Add support for ->get_nextdqblk() for VFS quota udf: Merge linux specific translation into CS0 conversion function udf: Remove struct ustr as non-needed intermediate storage udf: Use separate buffer for copying split names udf: Adjust UDF_NAME_LEN to better reflect actual restrictions udf: Join functions for UTF8 and NLS conversions udf: Parameterize output length in udf_put_filename quota: Allow Q_GETQUOTA for frozen filesystem quota: Fixup comments about return value of Q_[X]GETNEXTQUOTA
2016-03-21Merge tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfsLinus Torvalds66-1590/+2178
Pull xfs updates from Dave Chinner: "There's quite a lot in this request, and there's some cross-over with ext4, dax and quota code due to the nature of the changes being made. As for the rest of the XFS changes, there are lots of little things all over the place, which add up to a lot of changes in the end. The major changes are that we've reduced the size of the struct xfs_inode by ~100 bytes (gives an inode cache footprint reduction of >10%), the writepage code now only does a single set of mapping tree lockups so uses less CPU, delayed allocation reservations won't overrun under random write loads anymore, and we added compile time verification for on-disk structure sizes so we find out when a commit or platform/compiler change breaks the on disk structure as early as possible. Change summary: - error propagation for direct IO failures fixes for both XFS and ext4 - new quota interfaces and XFS implementation for iterating all the quota IDs in the filesystem - locking fixes for real-time device extent allocation - reduction of duplicate information in the xfs and vfs inode, saving roughly 100 bytes of memory per cached inode. - buffer flag cleanup - rework of the writepage code to use the generic write clustering mechanisms - several fixes for inode flag based DAX enablement - rework of remount option parsing - compile time verification of on-disk format structure sizes - delayed allocation reservation overrun fixes - lots of little error handling fixes - small memory leak fixes - enable xfsaild freezing again" * tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (66 commits) xfs: always set rvalp in xfs_dir2_node_trim_free xfs: ensure committed is initialized in xfs_trans_roll xfs: borrow indirect blocks from freed extent when available xfs: refactor delalloc indlen reservation split into helper xfs: update freeblocks counter after extent deletion xfs: debug mode forced buffered write failure xfs: remove impossible condition xfs: check sizes of XFS on-disk structures at compile time xfs: ioends require logically contiguous file offsets xfs: use named array initializers for log item dumping xfs: fix computation of inode btree maxlevels xfs: reinitialise per-AG structures if geometry changes during recovery xfs: remove xfs_trans_get_block_res xfs: fix up inode32/64 (re)mount handling xfs: fix format specifier , should be %llx and not %llu xfs: sanitize remount options xfs: convert mount option parsing to tokens xfs: fix two memory leaks in xfs_attr_list.c error paths xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared ...
2016-03-21Merge tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fsLinus Torvalds33-2360/+2627
Pull f2fs updates from Jaegeuk Kim: "New Features: - uplift filesystem encryption into fs/crypto/ - give sysfs entries to control memroy consumption Enhancements: - aio performance by preallocating blocks in ->write_iter - use writepages lock for only WB_SYNC_ALL - avoid redundant inline_data conversion - enhance forground GC - use wait_for_stable_page as possible - speed up SEEK_DATA and fiiemap Bug Fixes: - corner case in terms of -ENOSPC for inline_data - hung task caused by long latency in shrinker - corruption between atomic write and f2fs_trace_pid - avoid garbage lengths in dentries - revoke atomicly written pages if an error occurs In addition, there are various minor bug fixes and clean-ups" * tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits) f2fs: submit node page write bios when really required f2fs: add missing argument to f2fs_setxattr stub f2fs: fix to avoid unneeded unlock_new_inode f2fs: clean up opened code with f2fs_update_dentry f2fs: declare static functions f2fs: use cryptoapi crc32 functions f2fs: modify the readahead method in ra_node_page() f2fs crypto: sync ext4_lookup and ext4_file_open fs crypto: move per-file encryption from f2fs tree to fs/crypto f2fs: mutex can't be used by down_write_nest_lock() f2fs: recovery missing dot dentries in root directory f2fs: fix to avoid deadlock when merging inline data f2fs: introduce f2fs_flush_merged_bios for cleanup f2fs: introduce f2fs_update_data_blkaddr for cleanup f2fs crypto: fix incorrect positioning for GCing encrypted data page f2fs: fix incorrect upper bound when iterating inode mapping tree f2fs: avoid hungtask problem caused by losing wake_up f2fs: trace old block address for CoWed page f2fs: try to flush inode after merging inline data f2fs: show more info about superblock recovery ...
2016-03-21Merge branch 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds3-31/+232
Pull cgroup namespace support from Tejun Heo: "These are changes to implement namespace support for cgroup which has been pending for quite some time now. It is very straight-forward and only affects what part of cgroup hierarchies are visible. After unsharing, mounting a cgroup fs will be scoped to the cgroups the task belonged to at the time of unsharing and the cgroup paths exposed to userland would be adjusted accordingly" * 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix and restructure error handling in copy_cgroup_ns() cgroup: fix alloc_cgroup_ns() error handling in copy_cgroup_ns() Add FS_USERNS_FLAG to cgroup fs cgroup: Add documentation for cgroup namespaces cgroup: mount cgroupns-root when inside non-init cgroupns kernfs: define kernfs_node_dentry cgroup: cgroup namespace setns support cgroup: introduce cgroup namespaces sched: new clone flag CLONE_NEWCGROUP for cgroup namespace kernfs: Add API to generate relative kernfs path
2016-03-21nfs/blocklayout: make sure making a aligned read requestKinglong Mee1-6/+7
Only treat write goes up to the inode size as aligned request, because it always write PAGE_CACHE_SIZE, but read a dynamic size. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-03-21ovl: cleanup unused var in rename2Miklos Szeredi1-2/+0
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21ovl: rename is_merge to is_lowestMiklos Szeredi1-8/+8
The 'is_merge' is an historical naming from when only a single lower layer could exist. With the introduction of multiple lower layers the meaning of this flag was changed to mean only the "lowest layer" (while all lower layers were being merged). So now 'is_merge' is inaccurate and hence renaming to 'is_lowest' Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21ovl: fixed coding style warningSohom Bhattacharjee1-0/+1
This patch fixes a newline warning found by the checkpatch.pl tool Signed-off-by: Sohom-Bhattacharjee <soham.bhattacharjee15@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21ovl: Ensure upper filesystem supports d_typeVivek Goyal3-0/+53
In some instances xfs has been created with ftype=0 and there if a file on lower fs is removed, overlay leaves a whiteout in upper fs but that whiteout does not get filtered out and is visible to overlayfs users. And reason it does not get filtered out because upper filesystem does not report file type of whiteout as DT_CHR during iterate_dir(). So it seems to be a requirement that upper filesystem support d_type for overlayfs to work properly. Do this check during mount and fail if d_type is not supported. Suggested-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21ovl: Warn on copy up if a process has a R/O fd open to the lower fileDavid Howells1-0/+34
Print a warning when overlayfs copies up a file if the process that triggered the copy up has a R/O fd open to the lower file being copied up. This can help catch applications that do things like the following: fd1 = open("foo", O_RDONLY); fd2 = open("foo", O_RDWR); where they expect fd1 and fd2 to refer to the same file - which will no longer be the case post-copy up. With this patch, the following commands: bash 5</mnt/a/foo128 6<>/mnt/a/foo128 assuming /mnt/a/foo128 to be an un-copied up file on an overlay will produce the following warning in the kernel log: overlayfs: Copying up foo129, but open R/O on fd 5 which will cease to be coherent [pid=3818 bash] This is enabled by setting: /sys/module/overlay/parameters/check_copy_up to 1. The warnings are ratelimited and are also limited to one warning per file - assuming the copy up completes in each case. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21ovl: honor flag MS_SILENT at mountKonstantin Khlebnikov1-1/+2
This patch hides error about missing lowerdir if MS_SILENT is set. We use mount(NULL, "/", "overlay", MS_SILENT, NULL) for testing support of overlayfs: syscall returns -ENODEV if it's not supported. Otherwise kernel automatically loads module and returns -EINVAL because lowerdir is missing. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21ovl: verify upper dentry before unlink and renameMiklos Szeredi1-21/+38
Unlink and rename in overlayfs checked the upper dentry for staleness by verifying upper->d_parent against upperdir. However the dentry can go stale also by being unhashed, for example. Expand the verification to actually look up the name again (under parent lock) and check if it matches the upper dentry. This matches what the VFS does before passing the dentry to filesytem's unlink/rename methods, which excludes any inconsistency caused by overlayfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sumsChris Mason1-0/+10
Commit c40a3d38aff4e1c (Btrfs: Compute and look up csums based on sectorsized blocks) changes around how we walk the bios while looking up crcs. There's an inner loop that is jumping to the next bvec based on sectors and before it derefs the next bvec, it needs to make sure we're still in the bio. In this case, the outer loop would have decided to stop moving forward too, and the bvec deref is never actually used for anything. But CONFIG_DEBUG_PAGEALLOC catches it because we're outside our bio. Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: David Sterba <dsterba@suse.com>
2016-03-20Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-2/+20
Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...
2016-03-20ubifs: Remove unused headerAndreas Gruenbacher1-1/+0
UBIFS does not support POSIX ACLs, so there is no need for including any POSIX ACL hesders. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-03-20ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warnJoe Perches3-24/+75
The existing logging macros are fairly large and converting the macros to functions make the object code smaller. Use %pV and __builtin_return_address(0) as appropriate. $ size fs/ubifs/built-in.o* text data bss dec hex filename 575831 309688 161312 1046831 ff92f fs/ubifs/built-in.o.allyesconfig.new 622457 312872 161120 1096449 10bb01 fs/ubifs/built-in.o.allyesconfig.old 223785 640 644 225069 36f2d fs/ubifs/built-in.o.defconfig.new 251873 640 644 253157 3dce5 fs/ubifs/built-in.o.defconfig.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-03-20writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inodeTejun Heo1-10/+19
When cgroup writeback is in use, there can be multiple wb's (bdi_writeback's) per bdi and an inode may switch among them dynamically. In a couple places, the wrong wb was used leading to performing operations on the wrong list under the wrong lock corrupting the io lists. * writeback_single_inode() was taking @wb parameter and used it to remove the inode from io lists if it becomes clean after writeback. The callers of this function were always passing in the root wb regardless of the actual wb that the inode was associated with, which could also change while writeback is in progress. Fix it by dropping the @wb parameter and using inode_to_wb_and_lock_list() to determine and lock the associated wb. * After writeback_sb_inodes() writes out an inode, it re-locks @wb and inode to remove it from or move it to the right io list. It assumes that the inode is still associated with @wb; however, the inode may have switched to another wb while writeback was in progress. Fix it by using inode_to_wb_and_lock_list() to determine and lock the associated wb after writeback is complete. As the function requires the original @wb->list_lock locked for the next iteration, in the unlikely case where the inode has changed association, switch the locks. Kudos to Tahsin for pinpointing these subtle breakages. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching") Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.com Reported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com> Tested-by: Tahsin Erdogan <tahsin@google.com> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-20writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()Tejun Heo1-3/+5
locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with the target inode, unlocks inode, locks the wb's list_lock and verifies that the inode is still associated with the wb. To prevent the wb going away between dropping inode lock and acquiring list_lock, the wb is pinned while inode lock is held. The wb reference is put right after acquiring list_lock citing that the wb won't be dereferenced anymore. This isn't true. If the inode is still associated with the wb, the inode has reference and it's safe to return the wb; however, if inode has been switched, the wb still needs to be unlocked which is a dereference and can lead to use-after-free if it it races with wb destruction. Fix it by putting the reference after releasing list_lock. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()") Cc: stable@vger.kernel.org # v4.2+ Tested-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-19Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds22-410/+459
Pull vfs updates from Al Viro: - Preparations of parallel lookups (the remaining main obstacle is the need to move security_d_instantiate(); once that becomes safe, the rest will be a matter of rather short series local to fs/*.c - preadv2/pwritev2 series from Christoph - assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) splice: handle zero nr_pages in splice_to_pipe() vfs: show_vfsstat: do not ignore errors from show_devname method dcache.c: new helper: __d_add() don't bother with __d_instantiate(dentry, NULL) untangle fsnotify_d_instantiate() a bit uninline d_add() replace d_add_unique() with saner primitive quota: use lookup_one_len_unlocked() cifs_get_root(): use lookup_one_len_unlocked() nfs_lookup: don't bother with d_instantiate(dentry, NULL) kill dentry_unhash() ceph_fill_trace(): don't bother with d_instantiate(dn, NULL) autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup() configfs: move d_rehash() into configfs_create() for regular files ceph: don't bother with d_rehash() in splice_dentry() namei: teach lookup_slow() to skip revalidate namei: massage lookup_slow() to be usable by lookup_one_len_unlocked() lookup_one_len_unlocked(): use lookup_dcache() namei: simplify invalidation logics in lookup_dcache() namei: change calling conventions for lookup_{fast,slow} and follow_managed() ...
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds12-91/+109
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-18Merge branch 'for-4.6/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+5
Pull core block updates from Jens Axboe: "Here are the core block changes for this merge window. Not a lot of exciting stuff going on in this round, most of the changes have been on the driver side of things. That pull request is coming next. This pull request contains: - A set of fixes for chained bio handling from Christoph. - A tag bounds check for blk-mq from Hannes, ensuring that we don't do something stupid if a device reports an invalid tag value. - A set of fixes/updates for the CFQ IO scheduler from Jan Kara. - A set of blk-mq fixes from Keith, adding support for dynamic hardware queues, and fixing init of max_dev_sectors for stacking devices. - A fix for the dynamic hw context from Ming. - Enabling of cgroup writeback support on a block device, from Shaohua" * 'for-4.6/core' of git://git.kernel.dk/linux-block: blk-mq: add bounds check on tag-to-rq conversion block: bio_remaining_done() isn't unlikely block: cleanup bio_endio block: factor out chained bio completion block: don't unecessarily clobber bi_error for chained bios block-dev: enable writeback cgroup support blk-mq: Fix NULL pointer updating nr_requests blk-mq: mark request queue as mq asap block: Initialize max_dev_sectors to 0 blk-mq: dynamic h/w context count cfq-iosched: Allow parent cgroup to preempt its child cfq-iosched: Allow sync noidle workloads to preempt each other cfq-iosched: Reorder checks in cfq_should_preempt() cfq-iosched: Don't group_idle if cfqq has big thinktime
2016-03-18Merge branches 'work.lookups', 'work.misc' and 'work.preadv2' into for-nextAl Viro22-410/+459
2016-03-18splice: handle zero nr_pages in splice_to_pipe()Rabin Vincent1-0/+3
Running the following command: busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null with any tracing enabled pretty very quickly leads to various NULL pointer dereferences and VM BUG_ON()s, such as these: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) kernel BUG at include/linux/mm.h:367! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89 (busybox's cat uses sendfile(2), unlike the coreutils version) This is because tracing_splice_read_pipe() can call splice_to_pipe() with spd->nr_pages == 0. spd_pages underflows in splice_to_pipe() and we fill the page pointers and the other fields of the pipe_buffers with garbage. All other callers of splice_to_pipe() avoid calling it when nr_pages == 0, and we could make tracing_splice_read_pipe() do that too, but it seems reasonable to have splice_to_page() handle this condition gracefully. Cc: stable@vger.kernel.org Signed-off-by: Rabin Vincent <rabin@rab.in> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-18nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCKChristoph Hellwig1-2/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18nfsd: add SCSI layout supportChristoph Hellwig10-8/+335
This is a simple extension to the block layout driver to use SCSI persistent reservations for access control and fencing, as well as SCSI VPD pages for device identification. For this we need to pass the nfs4_client to the proc_getdeviceinfo method to generate the reservation key, and add a new fence_client method to allow for fence actions in the layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18nfsd: move some blocklayout codeChristoph Hellwig1-40/+50
Trivial reorganization, no change in behavior. Move some code around, pull some code out of block layoutcommit that will be useful for the scsi layout. [bfields@redhat.com: split off from "nfsd: add SCSI layout support"] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18nfsd: add a new config option for the block layout driverChristoph Hellwig7-8/+20
Split the config symbols into a generic pNFS one, which is invisible and gets selected by the layout drivers, and one for the block layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18nfs/blocklayout: add SCSI layout supportChristoph Hellwig5-25/+238
This is a trivial extension to the block layout driver to support the new SCSI layouts draft. There are three changes: - device identifcation through the SCSI VPD page. This allows us to directly use the udev generated persistent device names instead of requiring an expensive lookup by crawling every block device node in /dev and reading a signature for it. - use of SCSI persistent reservations to protect device access and allow for robust fencing. On the client sides this just means registering and unregistering a server supplied key. - an optimized LAYOUTCOMMIT payload that doesn't send unessecary fields to the server. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-17f2fs: submit node page write bios when really requiredJaegeuk Kim1-11/+1
If many threads calls fsync with data writes, we don't need to flush every bios having node page writes. The f2fs_wait_on_page_writeback will flush its bios when the page is really needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>