aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-07-11befs: Implement show_optionsDavid Howells1-3/+21
Implement the show_options superblock op for befs as part of a bid to get rid of s_options and generic_show_options() to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Signed-off-by: David Howells <dhowells@redhat.com> cc: Luis de Bethencourt <luisbg@osg.samsung.com> cc: Salah Triki <salah.triki@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06ramfs: Implement show_optionsDavid Howells1-11/+21
Implement the show_options superblock op for ramfs as part of a bid to get rid of s_options and generic_show_options() to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06pstore: Implement show_optionsDavid Howells3-4/+15
Implement the show_options superblock op for pstore as part of a bid to get rid of s_options and generic_show_options() to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Signed-off-by: David Howells <dhowells@redhat.com> cc: Kees Cook <keescook@chromium.org> cc: Anton Vorontsov <anton@enomsg.org> cc: Colin Cross <ccross@android.com> cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06omfs: Implement show_optionsDavid Howells1-3/+30
Implement the show_options superblock op for omfs as part of a bid to get rid of s_options and generic_show_options() to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Note that the uid and gid should possibly be displayed relative to the viewer's user namespace. Signed-off-by: David Howells <dhowells@redhat.com> cc: Bob Copeland <me@bobcopeland.com> cc: linux-karma-devel@lists.sourceforge.net Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06hugetlbfs: Implement show_optionsDavid Howells1-14/+56
Implement the show_options superblock op for hugetlbfs as part of a bid to get rid of s_options and generic_show_options() to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Note that the uid and gid should possibly be displayed relative to the viewer's user namespace. Signed-off-by: David Howells <dhowells@redhat.com> cc: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06VFS: Don't use save/replace_mount_options if not using generic_show_optionsDavid Howells4-9/+0
btrfs, debugfs, reiserfs and tracefs call save_mount_options() and reiserfs calls replace_mount_options(), but they then implement their own ->show_options() methods and don't touch s_options, rendering the saved options unnecessary. I'm trying to eliminate s_options to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Remove the calls to save/replace_mount_options() call in these cases. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chris Mason <clm@fb.com> cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Steven Rostedt <rostedt@goodmis.org> cc: linux-btrfs@vger.kernel.org cc: reiserfs-devel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06VFS: Provide empty name qstrDavid Howells5-10/+10
Provide an empty name (ie. "") qstr for general use. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06VFS: Make get_filesystem() return the affected filesystemDavid Howells1-1/+2
Make get_filesystem() return a pointer to the filesystem on which it just got a ref. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06VFS: Clean up whitespace in fs/namespace.c and fs/super.cDavid Howells2-4/+4
Clean up line terminal whitespace in fs/namespace.c and fs/super.c. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-23Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-15/+43
Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/exec.c: account for argv/envp pointers ocfs2: fix deadlock caused by recursive locking in xattr slub: make sysfs file removal asynchronous lib/cmdline.c: fix get_options() overflow while parsing ranges fs/dax.c: fix inefficiency in dax_writeback_mapping_range() autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings mm, thp: remove cond_resched from __collapse_huge_page_copy
2017-06-23fs/exec.c: account for argv/envp pointersKees Cook1-4/+24
When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: b6a2fea39318 ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23ocfs2: fix deadlock caused by recursive locking in xattrEric Ren2-10/+17
Another deadlock path caused by recursive locking is reported. This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking inode lock at vfs entry points"). Yes, we intend to fix this kind of case in incremental way, because it's hard to find out all possible paths at once. This one can be reproduced like this. On node1, cp a large file from home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl. Both nodes will hang up there. The backtraces: On node1: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_write_begin+0x43/0x1a0 [ocfs2] generic_perform_write+0xa9/0x180 __generic_file_write_iter+0x1aa/0x1d0 ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2] __vfs_write+0xc3/0x130 vfs_write+0xb1/0x1a0 SyS_write+0x46/0xa0 On node2: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_xattr_set+0x12e/0xe80 [ocfs2] ocfs2_set_acl+0x22d/0x260 [ocfs2] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2] set_posix_acl+0x75/0xb0 posix_acl_xattr_set+0x49/0xa0 __vfs_setxattr+0x69/0x80 __vfs_setxattr_noperm+0x72/0x1a0 vfs_setxattr+0xa7/0xb0 setxattr+0x12d/0x190 path_setxattr+0x9f/0xb0 SyS_setxattr+0x14/0x20 Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock"). Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Eric Ren <zren@suse.com> Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23fs/dax.c: fix inefficiency in dax_writeback_mapping_range()Jan Kara1-0/+1
dax_writeback_mapping_range() fails to update iteration index when searching radix tree for entries needing cache flushing. Thus each pagevec worth of entries is searched starting from the start which is inefficient and prone to livelocks. Update index properly. Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz Fixes: 9973c98ecfda3 ("dax: add support for fsync/sync") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAILNeilBrown1-1/+1
If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl, autofs4_d_automount() will return ERR_PTR(status) with that status to follow_automount(), which will then dereference an invalid pointer. So treat a positive status the same as zero, and map to ENOENT. See comment in systemd src/core/automount.c::automount_send_ready(). Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23Merge tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-2/+5
Pull xfs fixes from Darrick Wong: "I have one more bugfix for you for 4.12-rc7 to fix a disk corruption problem: - don't allow swapon on files on the realtime device, because the swap code will swap pages out to blocks on the data device, thereby corrupting the filesystem" * tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't allow bmap on rt files
2017-06-22Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds5-9/+14
Pull cifs fixes from Steve French: "Various small fixes for stable" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix some return values in case of error in 'crypt_message' cifs: remove redundant return in cifs_creation_time_get CIFS: Improve readdir verbosity CIFS: check if pages is null rather than bv for a failed allocation CIFS: Set ->should_dirty in cifs_user_readv()
2017-06-21xfs: don't allow bmap on rt filesDarrick J. Wong1-2/+5
bmap returns a dumb LBA address but not the block device that goes with that LBA. Swapfiles don't care about this and will blindly assume that the data volume is the correct blockdev, which is totally bogus for files on the rt subvolume. This results in the swap code doing IOs to arbitrary locations on the data device(!) if the passed in mapping is a realtime file, so just turn off bmap for rt files. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-06-21Merge branch 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds4-32/+28
Pull more ufs fixes from Al Viro: "More UFS fixes, unfortunately including build regression fix for the 64-bit s_dsize commit. Fixed in this pile: - trivial bug in signedness of 32bit timestamps on ufs1 - ESTALE instead of ufs_error() when doing open-by-fhandle on something deleted - build regression on 32bit in ufs_new_fragments() - calculating that many percents of u64 pulls libgcc stuff on some of those. Mea culpa. - fix hysteresis loop broken by typo in 2.4.14.7 (right next to the location of previous bug). - fix the insane limits of said hysteresis loop on filesystems with very low percentage of reserved blocks. If it's 5% or less, just use the OPTSPACE policy. - calculate those limits once and mount time. This tree does pass xfstests clean (both ufs1 and ufs2) and it _does_ survive cross-builds. Again, my apologies for missing that, especially since I have noticed a related percentage-of-64bit issue in earlier patches (when dealing with amount of reserved blocks). Self-LART applied..." * 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: fix the logics for tail relocation ufs_iget(): fail with -ESTALE on deleted inode fix signedness of timestamps on ufs1
2017-06-21CIFS: Fix some return values in case of error in 'crypt_message'Christophe Jaillet1-1/+3
'rc' is known to be 0 at this point. So if 'init_sg' or 'kzalloc' fails, we should return -ENOMEM instead. Also remove a useless 'rc' in a debug message as it is meaningless here. Fixes: 026e93dc0a3ee ("CIFS: Encrypt SMB3 requests before sending") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2017-06-20cifs: remove redundant return in cifs_creation_time_getColin Ian King1-2/+0
There is a redundant return in function cifs_creation_time_get that appears to be old vestigial code than can be removed. So remove it. Detected by CoverityScan, CID#1361924 ("Structurally dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-06-20CIFS: Improve readdir verbosityPavel Shilovsky2-4/+9
Downgrade the loglevel for SMB2 to prevent filling the log with messages if e.g. readdir was interrupted. Also make SMB2 and SMB1 codepaths do the same logging during readdir. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2017-06-20CIFS: check if pages is null rather than bv for a failed allocationColin Ian King1-1/+1
pages is being allocated however a null check on bv is being used to see if the allocation failed. Fix this by checking if pages is null. Detected by CoverityScan, CID#1432974 ("Logically dead code") Fixes: ccf7f4088af2dd ("CIFS: Add asynchronous context to support kernel AIO") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-06-20CIFS: Set ->should_dirty in cifs_user_readv()Dan Carpenter1-1/+1
The current code causes a static checker warning because ITER_IOVEC is zero so the condition is never true. Fixes: 6685c5e2d1ac ("CIFS: Add asynchronous read support through kernel AIO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-06-19mm: larger stack guard gap, between vmasHugh Dickins2-5/+1
Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-18Merge tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-clientLinus Torvalds4-6/+8
Pull ceph fixes from Ilya Dryomov: "A fix for an old ceph ->fh_to_* bug from Luis and two timestamp fixups from Zheng, prompted by the ongoing y2038 work" * tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client: ceph: unify inode i_ctime update ceph: use current_kernel_time() to get request time stamp ceph: check i_nlink while converting a file handle to dentry
2017-06-17ufs: fix the logics for tail relocationAl Viro3-16/+17
* original hysteresis loop got broken by typo back in 2002; now it never switches out of OPTTIME state. Fixed. * critical levels for switching from OPTTIME to OPTSPACE and back ought to be calculated once, at mount time. * we should use mul_u64_u32_div() for those calculations, now that ->s_dsize is 64bit. * to quote Kirk McKusick (in 1995 FreeBSD commit message): The threshold for switching from time-space and space-time is too small when minfree is 5%...so make it stay at space in this case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-17ufs_iget(): fail with -ESTALE on deleted inodeAl Viro1-13/+8
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-17fix signedness of timestamps on ufs1Al Viro1-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-17Merge tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-4/+3
Pull xfs fix from Darrick Wong: "One more bugfix for you for 4.12-rc6 to fix something that came up in an earlier rc: - Fix some bogus ASSERT failures on CONFIG_SMP=n and CONFIG_XFS_DEBUG=y" * tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix spurious spin_is_locked() assert failures on non-smp kernels
2017-06-17Merge branch 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds6-68/+98
Pull ufs fixes from Al Viro: "Fix assorted ufs bugs: a couple of deadlocks, fs corruption in truncate(), oopsen on tail unpacking and truncate when racing with vmscan, mild fs corruption (free blocks stats summary buggered, *BSD fsck would complain and fix), several instances of broken logics around reserved blocks (starting with "check almost never triggers when it should" and then there are issues with sufficiently large UFS2)" [ Note: ufs hasn't gotten any loving in a long time, because nobody really seems to use it. These ufs fixes are triggered by people actually caring now, not some sudden influx of new bugs. - Linus ] * 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs_truncate_blocks(): fix the case when size is in the last direct block ufs: more deadlock prevention on tail unpacking ufs: avoid grabbing ->truncate_mutex if possible ufs_get_locked_page(): make sure we have buffer_heads ufs: fix s_size/s_dsize users ufs: fix reserved blocks check ufs: make ufs_freespace() return signed ufs: fix logics in "ufs: make fsck -f happy"
2017-06-17Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-6/+6
Pull vfs fixes from Al Viro: "A couple of fixes; a leak in mntns_install() caught by Andrei (this cycle regression) + d_invalidate() softlockup fix - that had been reported by a bunch of people lately, but the problem is pretty old" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: don't forget to put old mntns in mntns_install Hang/soft lockup in d_invalidate with simultaneous calls
2017-06-17userfaultfd: shmem: handle coredumping in handle_userfault()Andrea Arcangeli1-8/+21
Anon and hugetlbfs handle FOLL_DUMP set by get_dump_page() internally to __get_user_pages(). shmem as opposed has no special FOLL_DUMP handling there so handle_mm_fault() is invoked without mmap_sem and ends up calling handle_userfault() that isn't expecting to be invoked without mmap_sem held. This makes handle_userfault() fail immediately if invoked through shmem_vm_ops->fault during coredumping and solves the problem. The side effect is a BUG_ON with no lock held triggered by the coredumping process which exits. Only 4.11 is affected, pre-4.11 anon memory holes are skipped in __get_user_pages by checking FOLL_DUMP explicitly against empty pagetables (mm/gup.c:no_page_table()). It's zero cost as we already had a check for current->flags to prevent futex to trigger userfaults during exit (PF_EXITING). Link: http://lkml.kernel.org/r/20170615214838.27429-1-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-16Merge tag 'configfs-for-4.12' of git://git.infradead.org/users/hch/configfsLinus Torvalds2-2/+9
Pull configfs updates from Christoph Hellwig: "A fix from Nic for a race seen in production (including a stable tag). And while I'm sending you this I'm also sneaking in a trivial new helper from Bart so that we don't need inter-tree dependencies for the next merge window" * tag 'configfs-for-4.12' of git://git.infradead.org/users/hch/configfs: configfs: Introduce config_item_get_unless_zero() configfs: Fix race between create_link and configfs_rmdir
2017-06-16fs: pass on flags in compat_writevChristoph Hellwig1-1/+1
Fixes: 793b80ef14af ("vfs: pass a flags argument to vfs_readv/vfs_writev") Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-15fs: don't forget to put old mntns in mntns_installAndrei Vagin1-0/+2
Fixes: 4f757f3cbf54 ("make sure that mntns_install() doesn't end up with referral for root") Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-15Hang/soft lockup in d_invalidate with simultaneous callsAl Viro1-6/+4
It's not hard to trigger a bunch of d_invalidate() on the same dentry in parallel. They end up fighting each other - any dentry picked for removal by one will be skipped by the rest and we'll go for the next iteration through the entire subtree, even if everything is being skipped. Morevoer, we immediately go back to scanning the subtree. The only thing we really need is to dissolve all mounts in the subtree and as soon as we've nothing left to do, we can just unhash the dentry and bugger off. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-15Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2-2/+8
Pull crypto fix from Herbert Xu: "This fixes a bug on sparc where we may dereference freed stack memory" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: Work around deallocated stack frame reference gcc bug on sparc.
2017-06-15ufs_truncate_blocks(): fix the case when size is in the last direct blockAl Viro1-9/+12
The logics when deciding whether we need to do anything with direct blocks is broken when new size is within the last direct block. It's better to find the path to the last byte _not_ to be removed and use that instead of the path to the beginning of the first block to be freed... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-15ufs: more deadlock prevention on tail unpackingAl Viro1-1/+1
->s_lock is not needed for ufs_change_blocknr() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-15ufs: avoid grabbing ->truncate_mutex if possibleAl Viro2-10/+26
tail unpacking is done in a wrong place; the deadlocks galore is best dealt with by doing that in ->write_iter() (and switching to iomap, while we are at it), but that's rather painful to backport. The trouble comes from grabbing pages that cover the beginning of tail from inside of ufs_new_fragments(); ongoing pageout of any of those is going to deadlock on ->truncate_mutex with process that got around to extending the tail holding that and waiting for page to get unlocked, while ->writepage() on that page is waiting on ->truncate_mutex. The thing is, we don't need ->truncate_mutex when the fragment we are trying to map is within the tail - the damn thing is allocated (tail can't contain holes). Let's do a plain lookup and if the fragment is present, we can just pretend that we'd won the race in almost all cases. The only exception is a fragment between the end of tail and the end of block containing tail. Protect ->i_lastfrag with ->meta_lock - read_seqlock_excl() is sufficient. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14ufs_get_locked_page(): make sure we have buffer_headsAl Viro1-9/+8
callers rely upon that, but find_lock_page() racing with attempt of page eviction by memory pressure might have left us with * try_to_free_buffers() successfully done * __remove_mapping() failed, leaving the page in our mapping * find_lock_page() returning an uptodate page with no buffer_heads attached. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14ufs: fix s_size/s_dsize usersAl Viro4-24/+19
For UFS2 we need 64bit variants; we even store them in uspi, but use 32bit ones instead. One wrinkle is in handling of reserved space - recalculating it every time had been stupid all along, but now it would become really ugly. Just calculate it once... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14ufs: fix reserved blocks checkAl Viro1-4/+6
a) honour ->s_minfree; don't just go with default (5) b) don't bother with capability checks until we know we'll need them Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14ufs: make ufs_freespace() return signedAl Viro1-2/+2
as it is, checking that its return value is <= 0 is useless and that's how it's being used. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14ufs: fix logics in "ufs: make fsck -f happy"Al Viro1-13/+28
Storing stats _only_ at new locations is wrong for UFS1; old locations should always be kept updated. The check for "has been converted to use of new locations" is also wrong - it should be "->fs_maxbsize is equal to ->fs_bsize". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-14ceph: unify inode i_ctime updateYan, Zheng2-3/+3
Current __ceph_setattr() can set inode's i_ctime to current_time(), req->r_stamp or attr->ia_ctime. These time stamps may have minor differences. It may cause potential problem. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-06-14ceph: use current_kernel_time() to get request time stampYan, Zheng1-3/+1
ceph uses ktime_get_real_ts() to get request time stamp. In most other cases, current_kernel_time() is used to get time stamp for filesystem operations (called by current_time()). There is granularity difference between ktime_get_real_ts() and current_kernel_time(). The later one can be up to one jiffy behind the former one. This can causes inode's ctime to go back. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-06-14ceph: check i_nlink while converting a file handle to dentryLuis Henriques1-0/+4
Converting a file handle to a dentry can be done call after the inode unlink. This means that __fh_to_dentry() requires an extra check to verify the number of links is not 0. The issue can be easily reproduced using xfstest generic/426, which does something like: name_to_handle_at(&fh) echo 3 > /proc/sys/vm/drop_caches unlink() open_by_handle_at(&fh) The call to open_by_handle_at() should fail, as the file doesn't exist anymore. Link: http://tracker.ceph.com/issues/19958 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-06-12configfs: Introduce config_item_get_unless_zero()Bart Van Assche1-0/+8
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> [hch: minor style tweak] Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-12configfs: Fix race between create_link and configfs_rmdirNicholas Bellinger1-2/+1
This patch closes a long standing race in configfs between the creation of a new symlink in create_link(), while the symlink target's config_item is being concurrently removed via configfs_rmdir(). This can happen because the symlink target's reference is obtained by config_item_get() in create_link() before the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep() during configfs_rmdir() shutdown is actually checked.. This originally manifested itself on ppc64 on v4.8.y under heavy load using ibmvscsi target ports with Novalink API: [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added [ 7879.893760] ------------[ cut here ]------------ [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs] [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G O 4.8.17-customv2.22 #12 [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000 [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870 [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700 Tainted: G O (4.8.17-customv2.22) [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222242 XER: 00000000 [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1 GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820 GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000 GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80 GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40 GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940 GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000 GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490 GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940 [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs] [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893842] Call Trace: [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460 [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490 [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170 [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390 [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec [ 7879.893856] Instruction dump: [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000 [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000 [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]--- To close this race, go ahead and obtain the symlink's target config_item reference only after the existing CONFIGFS_USET_DROPPING check succeeds. This way, if configfs_rmdir() wins create_link() will return -ENONET, and if create_link() wins configfs_rmdir() will return -EBUSY. Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org