aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-12-15Merge branch 'remove-ksys-mount-dup' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linuxLinus Torvalds2-14/+3
Pull ksys_mount() and ksys_dup() removal from Dominik Brodowski: "This small series replaces all in-kernel calls to the userspace-focused ksys_mount() and ksys_dup() with calls to kernel-centric functions: For each replacement of ksys_mount() with do_mount(), one needs to verify that the first and third parameter (char *dev_name, char *type) are strings allocated in kernelspace and that the fifth parameter (void *data) is either NULL or refers to a full page (only occurence in init/do_mounts.c::do_mount_root()). The second and fourth parameters (char *dir_name, unsigned long flags) are passed by ksys_mount() to do_mount() unchanged, and therefore do not require particular care. Moreover, instead of pretending to be userspace, the opening of /dev/console as stdin/stdout/stderr can be implemented using in-kernel functions as well. Thereby, ksys_dup() can be removed for good" [ This doesn't get rid of the special "kernel init runs with KERNEL_DS" case, but it at least removes _some_ of the users of "treat kernel pointers as user pointers for our magical init sequence". One day we'll hopefully be rid of it all, and can initialize our init_thread addr_limit to USER_DS. - Linus ] * 'remove-ksys-mount-dup' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: fs: remove ksys_dup() init: unify opening /dev/console as stdin/stdout/stderr init: use do_mount() instead of ksys_mount() initrd: use do_mount() instead of ksys_mount() devtmpfs: use do_mount() instead of ksys_mount()
2019-12-14Merge tag '5.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds6-3/+26
Pull cfis fixes from Steve French: "Three small smb3 fixes: this addresses two recent issues reported in additional testing during rc1, a refcount underflow and a problem with an intermittent crash in SMB2_open_init" * tag '5.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Close cached root handle only if it has a lease SMB3: Fix crash in SMB2_open_init due to uninitialized field in compounding path smb3: fix refcount underflow warning on unmount when no directory leases
2019-12-14Merge tag 'ovl-fixes-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfsLinus Torvalds8-96/+159
Pull overlayfs fixes from Miklos Szeredi: "Fix some bugs and documentation" * tag 'ovl-fixes-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: docs: filesystems: overlayfs: Fix restview warnings docs: filesystems: overlayfs: Rename overlayfs.txt to .rst ovl: relax WARN_ON() on rename to self ovl: fix corner case of non-unique st_dev;st_ino ovl: don't use a temp buf for encoding real fh ovl: make sure that real fid is 32bit aligned in memory ovl: fix lookup failure on multi lower squashfs
2019-12-13Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-blockLinus Torvalds3-88/+121
Pull io_uring fixes from Jens Axboe: - A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that don't sever if the result is < 0. This is mostly for linked timeouts, where if we ask for a pure timeout we always get -ETIME. This makes links useless for that case, hence allow a case where it works. - Five minor optimizations to fix and improve cases that regressed since v5.4. - An SQTHREAD locking fix. - A sendmsg/recvmsg iov assignment fix. - Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and subsequently ensuring that works for io_uring. - Fix a case where for an invalid opcode we might return -EBADF instead of -EINVAL, if the ->fd of that sqe was set to an invalid fd value. * tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block: io_uring: ensure we return -EINVAL on unknown opcode io_uring: add sockets to list of files that support non-blocking issue net: make socket read/write_iter() honor IOCB_NOWAIT io_uring: only hash regular files for async work execution io_uring: run next sqe inline if possible io_uring: don't dynamically allocate poll data io_uring: deferred send/recvmsg should assign iov io_uring: sqthread should grab ctx->uring_lock for submissions io-wq: briefly spin for new work after finishing work io-wq: remove worker->wait waitqueue io_uring: allow unbreakable links
2019-12-13Merge tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds2-2/+2
Pull FIELD_SIZEOF conversion from Kees Cook: "A mostly mechanical treewide conversion from FIELD_SIZEOF() to sizeof_field(). This avoids the redundancy of having 2 macros (actually 3) doing the same thing, and consolidates on sizeof_field(). While "field" is not an accurate name, it is the common name used in the kernel, and doesn't result in any unintended innuendo. As there are still users of FIELD_SIZEOF() in -next, I will clean up those during this coming development cycle and send the final old macro removal patch at that time" * tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Use sizeof_field() macro MIPS: OCTEON: Replace SIZEOF_FIELD() macro
2019-12-13CIFS: Close cached root handle only if it has a leasePavel Shilovsky5-4/+25
SMB2_tdis() checks if a root handle is valid in order to decide whether it needs to close the handle or not. However if another thread has reference for the handle, it may end up with putting the reference twice. The extra reference that we want to put during the tree disconnect is the reference that has a directory lease. So, track the fact that we have a directory lease and close the handle only in that case. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-13SMB3: Fix crash in SMB2_open_init due to uninitialized field in compounding pathSteve French1-0/+1
Ran into an intermittent crash in SMB2_open_init+0x2f6/0x970 due to oparms.cifs_sb not being initialized when called from: smb2_compound_op+0x45d/0x1690 Zero the whole oparms struct in the compounding path before setting up the oparms so we don't risk any uninitialized fields. Fixes: fdef665ba44a ("smb3: fix mode passed in on create for modetosid mount option") Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-12-12Merge tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds7-42/+85
Pull ceph fixes from Ilya Dryomov: "A fix to avoid a corner case when scheduling cap reclaim in batches from Xiubo, a patch to add some observability into cap waiters from Jeff and a couple of cleanups" * tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client: ceph: add more debug info when decoding mdsmap ceph: switch to global cap helper ceph: trigger the reclaim work once there has enough pending caps ceph: show tasks waiting on caps in debugfs caps file ceph: convert int fields in ceph_mount_options to unsigned int
2019-12-12fs: remove ksys_dup()Dominik Brodowski1-6/+1
ksys_dup() is used only at one place in the kernel, namely to duplicate fd 0 of /dev/console to stdout and stderr. The same functionality can be achieved by using functions already available within the kernel namespace. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2019-12-12init: use do_mount() instead of ksys_mount()Dominik Brodowski1-8/+2
In prepare_namespace(), do_mount() can be used instead of ksys_mount() as the first and third argument are const strings in the kernel, the second and fourth argument are passed through anyway, and the fifth argument is NULL. In do_mount_root(), ksys_mount() is called with the first and third argument being already kernelspace strings, which do not need to be copied over from userspace to kernelspace (again). The second and fourth arguments are passed through to do_mount() anyway. The fifth argument, while already residing in kernelspace, needs to be put into a page of its own. Then, do_mount() can be used instead of ksys_mount(). Once this is done, there are no in-kernel users to ksys_mount() left, which can therefore be removed. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2019-12-11Merge tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds5-19/+20
Pull AFS fixes from David Howells: "Fixes for AFS plus one patch to make debugging easier: - Fix how addresses are matched to server records. This is currently incorrect which means cache invalidation callbacks from the server don't necessarily get delivered correctly. This causes stale data and metadata to be seen under some circumstances. - Make the dynamic root superblock R/W so that rpm/dnf can reapply the SELinux label to it when upgrading the Fedora filesystem-afs package. If the filesystem is R/O, this fails and the upgrade fails. It might be better in future to allow setxattr from an LSM to bypass the R/O protections, if only for pseudo-filesystems. - Fix the parsing of mountpoint strings. The mountpoint object has to have a terminal dot, whereas the source/device string passed to mount should not. This confuses type-forcing suffix detection leading to the wrong volume variant being mounted. - Make lookups in the dynamic root superblock for creation events (such as mkdir) fail with EOPNOTSUPP rather than something like EEXIST. The dynamic root only allows implicit creation by the ->lookup() method - and only if the target cell exists. - Fix the looking up of an AFS superblock to include the cell in the matching key - otherwise all volumes with the same ID number are treated as the same thing, irrespective of which cell they're in. - Show the volume name of each volume in the volume records displayed in /proc/net/afs/<cell>/volumes. This proved useful in debugging as it provides a way to map the volume IDs to names, where the names are what appear in /proc/mounts" * tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Show volume name in /proc/net/afs/<cell>/volumes afs: Fix missing cell comparison in afs_test_super() afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP afs: Fix mountpoint parsing afs: Fix SELinux setting security label on /afs afs: Fix afs_find_server lookups for ipv4 peers
2019-12-11io_uring: ensure we return -EINVAL on unknown opcodeJens Axboe1-7/+14
If we submit an unknown opcode and have fd == -1, io_op_needs_file() will return true as we default to needing a file. Then when we go and assign the file, we find the 'fd' invalid and return -EBADF. We really should be returning -EINVAL for that case, as we normally do for unsupported opcodes. Change io_op_needs_file() to have the following return values: 0 - does not need a file 1 - does need a file < 0 - error value and use this to pass back the right value for this invalid case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-11Merge tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofsLinus Torvalds1-0/+2
Pull erofs fixes from Gao Xiang: "Mainly address a regression reported by David recently observed together with overlayfs due to the improper return value of listxattr() without xattr. Update outdated expressions in document as well. Summary: - Fix improper return value of listxattr() with no xattr - Keep up documentation with latest code" * tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: update documentation erofs: zero out when listxattr is called with no xattr
2019-12-11pipe: simplify signal handling in pipe_read() and add commentsLinus Torvalds1-7/+29
There's no need to separately check for signals while inside the locked region, since we're going to do "wait_event_interruptible()" right afterwards anyway, and the error handling is much simpler there. The check for whether we had already read anything was also redundant, since we no longer do the odd merging of reads when there are pending writers. But perhaps more importantly, this adds commentary about why we still need to wake up possible writers even though we didn't read any data, and why we can skip all the finishing touches now if we get a signal (or had a signal pending) while waiting for more data. [ This is a split-out cleanup from my "make pipe IO use exclusive wait queues" thing, which I can't apply because it triggers a nasty bug in the GNU make jobserver - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-11afs: Show volume name in /proc/net/afs/<cell>/volumesDavid Howells1-3/+4
Show the name of each volume in /proc/net/afs/<cell>/volumes to make it easier to work out the name corresponding to a volume ID. This makes it easier to work out which mounts in /proc/mounts correspond to which volume ID. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
2019-12-11afs: Fix missing cell comparison in afs_test_super()David Howells1-0/+1
Fix missing cell comparison in afs_test_super(). Without this, any pair volumes that have the same volume ID will share a superblock, no matter the cell, unless they're in different network namespaces. Normally, most users will only deal with a single cell and so they won't see this. Even if they do look into a second cell, they won't see a problem unless they happen to hit a volume with the same ID as one they've already got mounted. Before the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #grand.central.org:root.archive /afs/kth.se afs ro,relatime 0 0 After the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ admin/ common/ install/ OldFiles/ service/ system/ bakrestores/ home/ misc/ pkg/ src/ wsadmin/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #kth.se:root.cell /afs/kth.se afs ro,relatime 0 0 Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Carsten Jacobi <jacobi@de.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org> cc: Todd DeSantis <atd@us.ibm.com>
2019-12-11afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPPDavid Howells1-0/+3
Fix the lookup method on the dynamic root directory such that creation calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP rather than failing with some odd error (such as EEXIST). lookup() itself tries to create automount directories when it is invoked. These are cached locally in RAM and not committed to storage. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-11afs: Fix mountpoint parsingDavid Howells1-2/+4
Each AFS mountpoint has strings that define the target to be mounted. This is required to end in a dot that is supposed to be stripped off. The string can include suffixes of ".readonly" or ".backup" - which are supposed to come before the terminal dot. To add to the confusion, the "fs lsmount" afs utility does not show the terminal dot when displaying the string. The kernel mount source string parser, however, assumes that the terminal dot marks the suffix and that the suffix is always "" and is thus ignored. In most cases, there is no suffix and this is not a problem - but if there is a suffix, it is lost and this affects the ability to mount the correct volume. The command line mount command, on the other hand, is expected not to include a terminal dot - so the problem doesn't arise there. Fix this by making sure that the dot exists and then stripping it when passing the string to the mount configuration. Fixes: bec5eb614130 ("AFS: Implement an autocell mount capability [ver #2]") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-10io_uring: add sockets to list of files that support non-blocking issueJens Axboe1-2/+4
In chasing a performance issue between using IORING_OP_RECVMSG and IORING_OP_READV on sockets, tracing showed that we always punt the socket reads to async offload. This is due to io_file_supports_async() not checking for S_ISSOCK on the inode. Since sockets supports the O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list of file types that we can do a non-blocking issue to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: only hash regular files for async work executionJens Axboe1-1/+3
We hash regular files to avoid having multiple threads hammer on the inode mutex, but it should not be needed on other types of files (like sockets). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: run next sqe inline if possibleJens Axboe1-4/+11
One major use case of linked commands is the ability to run the next link inline, if at all possible. This is done correctly for async offload, but somewhere along the line we lost the ability to do so when we were able to complete a request without having to punt it. Ensure that we do so correctly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: don't dynamically allocate poll dataJens Axboe1-16/+11
This essentially reverts commit e944475e6984. For high poll ops workloads, like TAO, the dynamic allocation of the wait_queue entry for IORING_OP_POLL_ADD adds considerable extra overhead. Go back to embedding the wait_queue_entry, but keep the usage of wait->private for the pointer stashing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: deferred send/recvmsg should assign iovJens Axboe1-2/+2
Don't just assign it from the main call path, that can miss the case when we're called from issue deferral. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: sqthread should grab ctx->uring_lock for submissionsJens Axboe1-5/+2
We use the mutex to guard against registered file updates, for instance. Ensure we're safe in accessing that state against concurrent updates. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io-wq: briefly spin for new work after finishing workJens Axboe2-5/+26
To avoid going to sleep only to get woken shortly thereafter, spin briefly for new work upon completion of work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io-wq: remove worker->wait waitqueueJens Axboe1-8/+2
We only have one cases of using the waitqueue to wake the worker, the rest are using wake_up_process(). Since we can save some cycles not fiddling with the waitqueue io_wqe_worker(), switch the work activation to task wakeup and get rid of the now unused wait_queue_head_t in struct io_worker. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: allow unbreakable linksJens Axboe1-38/+46
Some commands will invariably end in a failure in the sense that the completion result will be less than zero. One such example is timeouts that don't have a completion count set, they will always complete with -ETIME unless cancelled. For linked commands, we sever links and fail the rest of the chain if the result is less than zero. Since we have commands where we know that will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever regardless of the completion result. Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly. Cc: stable@vger.kernel.org # v5.4 Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10ovl: relax WARN_ON() on rename to selfAmir Goldstein1-1/+1
In ovl_rename(), if new upper is hardlinked to old upper underneath overlayfs before upper dirs are locked, user will get an ESTALE error and a WARN_ON will be printed. Changes to underlying layers while overlayfs is mounted may result in unexpected behavior, but it shouldn't crash the kernel and it shouldn't trigger WARN_ON() either, so relax this WARN_ON(). Reported-by: syzbot+bb1836a212e69f8e201a@syzkaller.appspotmail.com Fixes: 804032fabb3b ("ovl: don't check rename to self") Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix corner case of non-unique st_dev;st_inoAmir Goldstein1-1/+7
On non-samefs overlay without xino, non pure upper inodes should use a pseudo_dev assigned to each unique lower fs and pure upper inodes use the real upper st_dev. It is fine for an overlay pure upper inode to use the same st_dev;st_ino values as the real upper inode, because the content of those two different filesystem objects is always the same. In this case, however: - two filesystems, A and B - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B Non pure upper overlay inode, whose origin is in layer 1 will have the same st_dev;st_ino values as the real lower inode. This may result with a false positive results of 'diff' between the real lower and copied up overlay inode. Fix this by using the upper st_dev;st_ino values in this case. This breaks the property of constant st_dev;st_ino across copy up of this case. This breakage will be fixed by a later patch. Fixes: 5148626b806a ("ovl: allocate anon bdev per unique lower fs") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: don't use a temp buf for encoding real fhAmir Goldstein1-21/+16
We can allocate maximum fh size and encode into it directly. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: make sure that real fid is 32bit aligned in memoryAmir Goldstein4-73/+115
Seprate on-disk encoding from in-memory and on-wire resresentation of overlay file handle. In-memory and on-wire we only ever pass around pointers to struct ovl_fh, which encapsulates at offset 3 the on-disk format struct ovl_fb. struct ovl_fb encapsulates at offset 21 the real file handle. That makes sure that the real file handle is always 32bit aligned in-memory when passed down to the underlying filesystem. On-disk format remains the same and store/load are done into correctly aligned buffer. New nfs exported file handles are exported with aligned real fid. Old nfs file handles are copied to an aligned buffer before being decoded. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix lookup failure on multi lower squashfsAmir Goldstein3-7/+27
In the past, overlayfs required that lower fs have non null uuid in order to support nfs export and decode copy up origin file handles. Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed this requirement for nfs export support, as long as uuid (even if null) is unique among all lower fs. However, said commit unintentionally also relaxed the non null uuid requirement for decoding copy up origin file handles, regardless of the unique uuid requirement. Amend this mistake by disabling decoding of copy up origin file handle from lower fs with a conflicting uuid. We still encode copy up origin file handles from those fs, because file handles like those already exist in the wild and because they might provide useful information in the future. There is an unhandled corner case described by Miklos this way: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case bad_uuid won't be set for B, because the check only involves the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We will deal with this corner case later. Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid ...") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-09smb3: fix refcount underflow warning on unmount when no directory leasesSteve French1-1/+2
Fix refcount underflow warning when unmounting to servers which didn't grant directory leases. [ 301.680095] refcount_t: underflow; use-after-free. [ 301.680192] WARNING: CPU: 1 PID: 3569 at lib/refcount.c:28 refcount_warn_saturate+0xb4/0xf3 ... [ 301.682139] Call Trace: [ 301.682240] close_shroot+0x97/0xda [cifs] [ 301.682351] SMB2_tdis+0x7c/0x176 [cifs] [ 301.682456] ? _get_xid+0x58/0x91 [cifs] [ 301.682563] cifs_put_tcon.part.0+0x99/0x202 [cifs] [ 301.682637] ? ida_free+0x99/0x10a [ 301.682727] ? cifs_umount+0x3d/0x9d [cifs] [ 301.682829] cifs_put_tlink+0x3a/0x50 [cifs] [ 301.682929] cifs_umount+0x44/0x9d [cifs] Fixes: 72e73c78c446 ("cifs: close the shared root handle on tree disconnect") Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-and-tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
2019-12-09ceph: add more debug info when decoding mdsmapXiubo Li1-4/+8
Show the laggy state. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: switch to global cap helperXiubo Li1-14/+10
__ceph_is_any_caps is a duplicate helper. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: trigger the reclaim work once there has enough pending capsXiubo Li1-1/+1
The nr in ceph_reclaim_caps_nr() is very possibly larger than 1, so we may miss it and the reclaim work couldn't triggered as expected. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: show tasks waiting on caps in debugfs caps fileJeff Layton4-0/+40
Add some visibility of tasks that are waiting for caps to the "caps" debugfs file. Display the tgid of the waiting task, inode number, and the caps the task needs and wants. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: convert int fields in ceph_mount_options to unsigned intJeff Layton3-23/+26
Most of these values should never be negative, so convert them to unsigned values. Add some sanity checking to the parsed values, and clean up some unneeded casts. Note that while caps_max should never be negative, this patch leaves it signed, since this value ends up later being compared to a signed counter. Just ensure that userland never passes in a negative value for caps_max. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya2-2/+2
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09btrfs: add Kconfig dependency for BLAKE2BDavid Sterba1-0/+1
Because the BLAKE2B code went through a different tree, it was not available at the time the btrfs part was merged. Now that the Kconfig symbol exists, add it to the list. Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-09afs: Fix SELinux setting security label on /afsDavid Howells1-1/+0
Make the AFS dynamic root superblock R/W so that SELinux can set the security label on it. Without this, upgrades to, say, the Fedora filesystem-afs RPM fail if afs is mounted on it because the SELinux label can't be (re-)applied. It might be better to make it possible to bypass the R/O check for LSM label application through setxattr. Fixes: 4d673da14533 ("afs: Support the AFS dynamic root") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: selinux@vger.kernel.org cc: linux-security-module@vger.kernel.org
2019-12-09afs: Fix afs_find_server lookups for ipv4 peersMarc Dionne1-13/+8
afs_find_server tries to find a server that has an address that matches the transport address of an rxrpc peer. The code assumes that the transport address is always ipv6, with ipv4 represented as ipv4 mapped addresses, but that's not the case. If the transport family is AF_INET, srx->transport.sin6.sin6_addr.s6_addr32[] will be beyond the actual ipv4 address and will always be 0, and all ipv4 addresses will be seen as matching. As a result, the first ipv4 address seen on any server will be considered a match, and the server returned may be the wrong one. One of the consequences is that callbacks received over ipv4 will only be correctly applied for the server that happens to have the first ipv4 address on the fs_addresses4 list. Callbacks over ipv4 from all other servers are dropped, causing the client to serve stale data. This is fixed by looking at the transport family, and comparing ipv4 addresses based on a sockaddr_in structure rather than a sockaddr_in6. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-12-08Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds13-72/+265
Pull cifs fixes from Steve French: "Nine cifs/smb3 fixes: - one fix for stable (oops during oplock break) - two timestamp fixes including important one for updating mtime at close to avoid stale metadata caching issue on dirty files (also improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the wire) - two fixes for "modefromsid" mount option for file create (now allows mode bits to be set more atomically and accurately on create by adding "sd_context" on create when modefromsid specified on mount) - two fixes for multichannel found in testing this week against different servers - two small cleanup patches" * tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: improve check for when we send the security descriptor context on create smb3: fix mode passed in on create for modetosid mount option cifs: fix possible uninitialized access and race on iface_list cifs: Fix lookup of SMB connections on multichannel smb3: query attributes on file close smb3: remove unused flag passed into close functions cifs: remove redundant assignment to pointer pneg_ctxt fs: cifs: Fix atime update check vs mtime CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
2019-12-08Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds4-6/+5
Pull misc vfs cleanups from Al Viro: "No common topic, just three cleanups". * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make __d_alloc() static fs/namespace: add __user to open_tree and move_mount syscalls fs/fnctl: fix missing __user in fcntl_rw_hint()
2019-12-07Merge tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-12/+28
Pull iomap fixes from Darrick Wong: "Fix a race condition and a use-after-free error: - Fix a UAF when reporting writeback errors - Fix a race condition when handling page uptodate on fragmented file with blocksize < pagesize" * tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: stop using ioend after it's been freed in iomap_finish_ioend() iomap: fix sub-page uptodate handling
2019-12-07Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-12/+17
Pull xfs fixes from Darrick Wong: "Fix a couple of resource management errors and a hang: - fix a crash in the log setup code when log mounting fails - fix a hang when allocating space on the realtime device - fix a block leak when freeing space on the realtime device" * tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix mount failure crash on invalid iclog memory access xfs: don't check for AG deadlock for realtime files in bunmapi xfs: fix realtime file data space leak
2019-12-07Merge tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linuxLinus Torvalds3-7/+43
Pull orangefs update from Mike Marshall: "orangefs: posix open permission checking... Orangefs has no open, and orangefs checks file permissions on each file access. Posix requires that file permissions be checked on open and nowhere else. Orangefs-through-the-kernel needs to seem posix compliant. The VFS opens files, even if the filesystem provides no method. We can see if a file was successfully opened for read and or for write by looking at file->f_mode. When writes are flowing from the page cache, file is no longer available. We can trust the VFS to have checked file->f_mode before writing to the page cache. The mode of a file might change between when it is opened and IO commences, or it might be created with an arbitrary mode. We'll make sure we don't hit EACCES during the IO stage by using UID 0" [ This is "posixish", but not a great solution in the long run, since a proper secure network server shouldn't really trust the client like this. But proper and secure POSIX behavior requires an open method and a resulting cookie for IO of some kind, or similar. - Linus ] * tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: posix open permission checking...
2019-12-07Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds12-52/+136
Pull nfsd updates from Bruce Fields: "This is a relatively quiet cycle for nfsd, mainly various bugfixes. Possibly most interesting is Trond's fixes for some callback races that were due to my incomplete understanding of rpc client shutdown. Unfortunately at the last minute I've started noticing a new intermittent failure to send callbacks. As the logic seems basically correct, I'm leaving Trond's patches in for now, and hope to find a fix in the next week so I don't have to revert those patches" * tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits) nfsd: depend on CRYPTO_MD5 for legacy client tracking NFSD fixing possible null pointer derefering in copy offload nfsd: check for EBUSY from vfs_rmdir/vfs_unink. nfsd: Ensure CLONE persists data and metadata changes to the target file SUNRPC: Fix backchannel latency metrics nfsd: restore NFSv3 ACL support nfsd: v4 support requires CRYPTO_SHA256 nfsd: Fix cld_net->cn_tfm initialization lockd: remove __KERNEL__ ifdefs sunrpc: remove __KERNEL__ ifdefs race in exportfs_decode_fh() nfsd: Drop LIST_HEAD where the variable it declares is never used. nfsd: document callback_wq serialization of callback code nfsd: mark cb path down on unknown errors nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback() nfsd: minor 4.1 callback cleanup SUNRPC: Fix svcauth_gss_proxy_init() SUNRPC: Trace gssproxy upcall results sunrpc: fix crash when cache_head become valid before update nfsd: remove private bin2hex implementation ...
2019-12-07Merge tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds30-209/+1034
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv4.2 now supports cross device offloaded copy (i.e. offloaded copy of a file from one source server to a different target server). - New RDMA tracepoints for debugging congestion control and Local Invalidate WRs. Bugfixes and cleanups - Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for layoutreturn - Handle bad/dead sessions correctly in nfs41_sequence_process() - Various bugfixes to the delegation return operation. - Various bugfixes pertaining to delegations that have been revoked. - Cleanups to the NFS timespec code to avoid unnecessary conversions between timespec and timespec64. - Fix unstable RDMA connections after a reconnect - Close race between waking an RDMA sender and posting a receive - Wake pending RDMA tasks if connection fails - Fix MR list corruption, and clean up MR usage - Fix another RPCSEC_GSS issue with MIC buffer space" * tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) SUNRPC: Capture completion of all RPC tasks SUNRPC: Fix another issue with MIC buffer space NFS4: Trace lock reclaims NFS4: Trace state recovery operation NFSv4.2 fix memory leak in nfs42_ssc_open NFSv4.2 fix kfree in __nfs42_copy_file_range NFS: remove duplicated include from nfs4file.c NFSv4: Make _nfs42_proc_copy_notify() static NFS: Fallocate should use the nfs4_fattr_bitmap NFS: Return -ETXTBSY when attempting to write to a swapfile fs: nfs: sysfs: Remove NULL check before kfree NFS: remove unneeded semicolon NFSv4: add declaration of current_stateid NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process() nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list SUNRPC: Avoid RPC delays when exiting suspend NFS: Add a tracepoint in nfs_fh_to_dentry() NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done() NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn ...
2019-12-07smb3: improve check for when we send the security descriptor context on createSteve French1-0/+2
We had cases in the previous patch where we were sending the security descriptor context on SMB3 open (file create) in cases when we hadn't mounted with with "modefromsid" mount option. Add check for that mount flag before calling ad_sd_context in open init. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>