| Age | Commit message (Collapse) | Author | Files | Lines |
|
The backing_file_write_iter() function, which is called
immediately after this code, already contains identical
logic to clear the IOCB_DIO_CALLER_COMP flag along with
the same explanatory comment. There is no need to duplicate
this operation in the overlayfs code.
Signed-off-by: Seong-Gwang Heo <heo@mykernel.net>
Fixes: a6293b3e285c ("fs: factor out backing_file_{read,write}_iter() helpers")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This reverts commit 474b155adf3927d2c944423045757b54aa1ca4de.
This patch caused regression in ioctl_setflags(). Underlying filesystems
use EOPNOTSUPP to indicate that flag is not supported. This error is
also gets converted in ioctl_setflags(). Therefore, for unsupported
flags error changed from EOPNOSUPP to ENOIOCTLCMD.
Link: https://lore.kernel.org/linux-xfs/a622643f-1585-40b0-9441-cf7ece176e83@kernel.org/
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull file->f_path constification from Al Viro:
"Only one thing was modifying ->f_path of an opened file - acct(2).
Massaging that away and constifying a bunch of struct path * arguments
in functions that might be given &file->f_path ends up with the
situation where we can turn ->f_path into an anon union of const
struct path f_path and struct path __f_path, the latter modified only
in a few places in fs/{file_table,open,namei}.c, all for struct file
instances that are yet to be opened"
* tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
Have cc(1) catch attempts to modify ->f_path
kernel/acct.c: saner struct file treatment
configfs:get_target() - release path as soon as we grab configfs_item reference
apparmor/af_unix: constify struct path * arguments
ovl_is_real_file: constify realpath argument
ovl_sync_file(): constify path argument
ovl_lower_dir(): constify path argument
ovl_get_verity_digest(): constify path argument
ovl_validate_verity(): constify {meta,data}path arguments
ovl_ensure_verity_loaded(): constify datapath argument
ksmbd_vfs_set_init_posix_acl(): constify path argument
ksmbd_vfs_inherit_posix_acl(): constify path argument
ksmbd_vfs_kern_path_unlock(): constify path argument
ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path *
check_export(): constify path argument
export_operations->open(): constify path argument
rqst_exp_get_by_name(): constify path argument
nfs: constify path argument of __vfs_getattr()
bpf...d_path(): constify path argument
done_path_create(): constify path argument
...
|
|
Pull overlayfs updates from Amir Goldstein:
- Work by André Almeida to support case-insensitive overlayfs
Underlying case-insensitive filesystems casefolding is per directory,
but for overlayfs it is all-or-nothing. It supports layers where all
directories are casefolded (with same encoding) or layers where no
directories are casefolded.
- A fix for a "bug" in Neil's ovl directory lock changes, which only
manifested itself with casefold enabled layers which may return an
unhashed negative dentry from lookup.
* tag 'ovl-update-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: make sure that ovl_create_real() returns a hashed dentry
ovl: Support mounting case-insensitive enabled layers
ovl: Check for casefold consistency when creating new dentries
ovl: Add S_CASEFOLD as part of the inode flag to be copied
ovl: Set case-insensitive dentry operations for ovl sb
ovl: Ensure that all layers have the same encoding
ovl: Create ovl_casefold() to support casefolded strncmp()
ovl: Prepare for mounting case-insensitive enabled layers
fs: Create sb_same_encoding() helper
fs: Create sb_encoding() helper
|
|
Pull vfs async directory updates from Christian Brauner:
"This contains further preparatory changes for the asynchronous directory
locking scheme:
- Add lookup_one_positive_killable() which allows overlayfs to
perform lookup that won't block on a fatal signal
- Unify the mount idmap handling in struct renamedata as a rename can
only happen within a single mount
- Introduce kern_path_parent() for audit which sets the path to the
parent and returns a dentry for the target without holding any
locks on return
- Rename kern_path_locked() as it is only used to prepare for the
removal of an object from the filesystem:
kern_path_locked() => start_removing_path()
kern_path_create() => start_creating_path()
user_path_create() => start_creating_user_path()
user_path_locked_at() => start_removing_user_path_at()
done_path_create() => end_creating_path()
NA => end_removing_path()"
* tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
debugfs: rename start_creating() to debugfs_start_creating()
VFS: rename kern_path_locked() and related functions.
VFS/audit: introduce kern_path_parent() for audit
VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
VFS: discard err2 in filename_create()
VFS/ovl: add lookup_one_positive_killable()
|
|
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add "initramfs_options" parameter to set initramfs mount options.
This allows to add specific mount options to the rootfs to e.g.,
limit the memory size
- Add RWF_NOSIGNAL flag for pwritev2()
Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
signal from being raised when writing on disconnected pipes or
sockets. The flag is handled directly by the pipe filesystem and
converted to the existing MSG_NOSIGNAL flag for sockets
- Allow to pass pid namespace as procfs mount option
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs
mount is auto-selected based on the mounting process's active
pidns, and the pidns itself is basically hidden once the mount has
been constructed)
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns
of a procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes
creates a procfs mount from an empty pidns so that user
namespaced containers can be nested (without this, the nested
containers would fail to mount procfs)
But this requires forking off a helper process because you cannot
just one-shot this using mount(2)
* Container runtimes in general need to fork into a container
before configuring its mounts, which can lead to security issues
in the case of shared-pidns containers (a privileged process in
the pidns can interact with your container runtime process)
While SUID_DUMP_DISABLE and user namespaces make this less of an
issue, the strict need for this due to a minor uAPI wart is kind
of unfortunate
Things would be much easier if there was a way for userspace to
just specify the pidns they want. So this pull request contains
changes to implement a new "pidns" argument which can be set
using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
Cleanups:
- Remove the last references to EXPORT_OP_ASYNC_LOCK
- Make file_remove_privs_flags() static
- Remove redundant __GFP_NOWARN when GFP_NOWAIT is used
- Use try_cmpxchg() in start_dir_add()
- Use try_cmpxchg() in sb_init_done_wq()
- Replace offsetof() with struct_size() in ioctl_file_dedupe_range()
- Remove vfs_ioctl() export
- Replace rwlock() with spinlock in epoll code as rwlock causes
priority inversion on preempt rt kernels
- Make ns_entries in fs/proc/namespaces const
- Use a switch() statement() in init_special_inode() just like we do
in may_open()
- Use struct_size() in dir_add() in the initramfs code
- Use str_plural() in rd_load_image()
- Replace strcpy() with strscpy() in find_link()
- Rename generic_delete_inode() to inode_just_drop() and
generic_drop_inode() to inode_generic_drop()
- Remove unused arguments from fcntl_{g,s}et_rw_hint()
Fixes:
- Document @name parameter for name_contains_dotdot() helper
- Fix spelling mistake
- Always return zero from replace_fd() instead of the file descriptor
number
- Limit the size for copy_file_range() in compat mode to prevent a
signed overflow
- Fix debugfs mount options not being applied
- Verify the inode mode when loading it from disk in minixfs
- Verify the inode mode when loading it from disk in cramfs
- Don't trigger automounts with RESOLVE_NO_XDEV
If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
through automounts, but could still trigger them
- Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
tracepoints
- Fix unused variable warning in rd_load_image() on s390
- Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD
- Use ns_capable_noaudit() when determining net sysctl permissions
- Don't call path_put() under namespace semaphore in listmount() and
statmount()"
* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
fcntl: trim arguments
listmount: don't call path_put() under namespace semaphore
statmount: don't call path_put() under namespace semaphore
pid: use ns_capable_noaudit() when determining net sysctl permissions
fs: rename generic_delete_inode() and generic_drop_inode()
init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
initramfs: Replace strcpy() with strscpy() in find_link()
initrd: Use str_plural() in rd_load_image()
initramfs: Use struct_size() helper to improve dir_add()
initrd: Fix unused variable warning in rd_load_image() on s390
fs: use the switch statement in init_special_inode()
fs/proc/namespaces: make ns_entries const
filelock: add FL_RECLAIM to show_fl_flags() macro
eventpoll: Replace rwlock with spinlock
selftests/proc: add tests for new pidns APIs
procfs: add "pidns" mount option
pidns: move is-ancestor logic to helper
openat2: don't trigger automounts with RESOLVE_NO_XDEV
namei: move cross-device check to __traverse_mounts
namei: remove LOOKUP_NO_XDEV check from handle_mounts
...
|
|
A rename operation can only rename within a single mount. Callers of
vfs_rename() must and do ensure this is the case.
So there is no point in having two mnt_idmaps in renamedata as they are
always the same. Only one of them is passed to ->rename in any case.
This patch replaces both with a single "mnt_idmap" and changes all
callers.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl wants a lookup which won't block on a fatal signal. It currently
uses down_write_killable() and then repeatedly calls to lookup_one()
The lock may not be needed if the name is already in the dcache and it
aids proposed future changes if the locking is kept internal to namei.c
So this patch adds lookup_one_positive_killable() which is like
lookup_one_positive() but will abort in the face of a fatal signal.
overlayfs is changed to use this.
Note that instead of always getting an exclusive lock, ovl now only gets
a shared lock, and only sometimes. The exclusive lock was never needed.
However down_read_killable() was only added in v4.15 but overlayfs started
using down_write_killable() here in v4.7.
Note that the linked list ->first_maybe_whiteout ->next_maybe_white is
local to the thread so there is no concurrency in that list which could
be threatened by removing the locking.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
e8bd877fb76bb9f3 ("ovl: fix possible double unlink") added a sanity
check of !d_unhashed(child) to try to verify that child dentry was not
unlinked while parent dir was unlocked.
This "was not unlink" check has a false positive result in the case of
casefolded parent dir, because in that case, ovl_create_temp() returns
an unhashed dentry after ovl_create_real() gets an unhashed dentry from
ovl_lookup_upper() and makes it positive.
To avoid returning unhashed dentry from ovl_create_temp(), let
ovl_create_real() lookup again after making the newdentry positive,
so it always returns a hashed positive dentry (or an error).
This fixes the error in ovl_parent_lock() in ovl_check_rename_whiteout()
after ovl_create_temp() and allows mount of overlayfs with casefolding
enabled layers.
Reported-by: André Almeida <andrealmeid@igalia.com>
Closes: https://lore.kernel.org/r/18704e8c-c734-43f3-bc7c-b8be345e1bf5@igalia.com/
Suggested-by: Neil Brown <neil@brown.name>
Reviewed-by: Neil Brown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Drop the restriction for casefold dentries lookup to enable support for
case-insensitive layers in overlayfs.
Support case-insensitive layers with the condition that they should be
uniformly enabled across the stack and (i.e. if the root mount dir has
casefold enabled, so should all the dirs bellow for every layer).
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
In a overlayfs with casefold enabled, all new dentries should have
casefold enabled as well. Check this at ovl_create_real().
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
To keep ovl's inodes consistent with their real inodes, create a new
mask for inode file attributes that needs to be copied. Add the
S_CASEFOLD flag as part of the flags that need to be copied along with
the other file attributes.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
For filesystems with encoding (i.e. with case-insensitive support), set
the dentry operations for the super block as ovl_dentry_ci_operations.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
When merging layers from different filesystems with casefold enabled,
all layers should use the same encoding version and have the same flags
to avoid any kind of incompatibility issues.
Also, set the encoding and the encoding flags for the ovl super block as
the same as used by the first valid layer.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
To add overlayfs support casefold layers, create a new function
ovl_casefold(), to be able to do case-insensitive strncmp().
ovl_casefold() allocates a new buffer and stores the casefolded version
of the string on it. If the allocation or the casefold operation fails,
fallback to use the original string.
The case-insentive name is then used in the rb-tree search/insertion
operation. If the name is found in the rb-tree, the name can be
discarded and the buffer is freed. If the name isn't found, it's then
stored at struct ovl_cache_entry to be used later.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Prepare for mounting layers with case-insensitive dentries in order to
supporting such layers in overlayfs, while enforcing uniform casefold
layers.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
generic_delete_inode() is rather misleading for what the routine is
doing. inode_just_drop() should be much clearer.
The new naming is inconsistent with generic_drop_inode(), so rename that
one as well with inode_ as the suffix.
No functional changes.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
commit 9d23967b18c6 ("ovl: simplify an error path in
ovl_copy_up_workdir()") introduced the helper ovl_cleanup_unlocked(),
which is later used in several following patches to re-acquire the parent
inode lock and unlink a dentry that was earlier found using lookup.
This helper was eventually renamed to ovl_cleanup().
The helper ovl_parent_lock() is used to re-acquire the parent inode lock.
After acquiring the parent inode lock, the helper verifies that the
dentry has not since been moved to another parent, but it failed to
verify that the dentry wasn't unlinked from the parent.
This means that now every call to ovl_cleanup() could potentially
race with another thread, unlinking the dentry to be cleaned up
underneath overlayfs and trigger a vfs assertion.
Reported-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com
Tested-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com
Fixes: 9d23967b18c6 ("ovl: simplify an error path in ovl_copy_up_workdir()")
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
ovl_create_temp() treats "workdir" as a parent in which it creates an
object so it should use I_MUTEX_PARENT.
Prior to the commit identified below the lock was taken by the caller
which sometimes used I_MUTEX_PARENT and sometimes used I_MUTEX_NORMAL.
The use of I_MUTEX_NORMAL was incorrect but unfortunately copied into
ovl_create_temp().
Note to backporters: This patch only applies after the last Fixes given
below (post v6.16). To fix the bug in v6.7 and later the
inode_lock() call in ovl_copy_up_workdir() needs to nest using
I_MUTEX_PARENT.
Link: https://lore.kernel.org/all/67a72070.050a0220.3d72c.0022.GAE@google.com/
Cc: stable@vger.kernel.org
Reported-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Tested-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held")
Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Pull fileattr updates from Christian Brauner:
"This introduces the new file_getattr() and file_setattr() system calls
after lengthy discussions.
Both system calls serve as successors and extensible companions to
the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
started to show their age in addition to being named in a way that
makes it easy to conflate them with extended attribute related
operations.
These syscalls allow userspace to set filesystem inode attributes on
special files. One of the usage examples is the XFS quota projects.
XFS has project quotas which could be attached to a directory. All new
inodes in these directories inherit project ID set on parent
directory.
The project is created from userspace by opening and calling
FS_IOC_FSSETXATTR on each inode. This is not possible for special
files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
with empty project ID. Those inodes then are not shown in the quota
accounting but still exist in the directory. This is not critical but
in the case when special files are created in the directory with
already existing project quota, these new inodes inherit extended
attributes. This creates a mix of special files with and without
attributes. Moreover, special files with attributes don't have a
possibility to become clear or change the attributes. This, in turn,
prevents userspace from re-creating quota project on these existing
files.
In addition, these new system calls allow the implementation of
additional attributes that we couldn't or didn't want to fit into the
legacy ioctls anymore"
* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: tighten a sanity check in file_attr_to_fileattr()
tree-wide: s/struct fileattr/struct file_kattr/g
fs: introduce file_getattr and file_setattr syscalls
fs: prepare for extending file_get/setattr()
fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
selinux: implement inode_file_[g|s]etattr hooks
lsm: introduce new hooks for setting/getting inode fsxattr
fs: split fileattr related helpers into separate file
|
|
Pull overlayfs updates from Christian Brauner:
"This contains overlayfs updates for this cycle.
The changes for overlayfs in here are primarily focussed on preparing
for some proposed changes to directory locking.
Overlayfs currently will sometimes lock a directory on the upper
filesystem and do a few different things while holding the lock. This
is incompatible with the new potential scheme.
This series narrows the region of code protected by the directory
lock, taking it multiple times when necessary. This theoretically
opens up the possibilty of other changes happening on the upper
filesytem between the unlock and the lock. To some extent the patches
guard against that by checking the dentries still have the expect
parent after retaking the lock. In general, concurrent changes to the
upper and lower filesystems aren't supported properly anyway"
* tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
ovl: properly print correct variable
ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()
ovl: change ovl_create_real() to receive dentry parent
ovl: narrow locking in ovl_check_rename_whiteout()
ovl: narrow locking in ovl_whiteout()
ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed
ovl: narrow locking on ovl_remove_and_whiteout()
ovl: change ovl_workdir_cleanup() to take dir lock as needed.
ovl: narrow locking in ovl_workdir_cleanup_recurse()
ovl: narrow locking in ovl_indexdir_cleanup()
ovl: narrow locking in ovl_workdir_create()
ovl: narrow locking in ovl_cleanup_index()
ovl: narrow locking in ovl_cleanup_whiteouts()
ovl: narrow locking in ovl_rename()
ovl: simplify gotos in ovl_rename()
ovl: narrow locking in ovl_create_over_whiteout()
ovl: narrow locking in ovl_clear_empty()
ovl: narrow locking in ovl_create_upper()
ovl: narrow the locked region in ovl_copy_up_workdir()
ovl: Call ovl_create_temp() without lock held.
...
|
|
Pull misc VFS updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add ext4 IOCB_DONTCACHE support
This refactors the address_space_operations write_begin() and
write_end() callbacks to take const struct kiocb * as their first
argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
to the filesystem's buffered I/O path.
Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
and advertises support via the FOP_DONTCACHE file operation flag.
Additionally, the i915 driver's shmem write paths are updated to
bypass the legacy write_begin/write_end interface in favor of
directly calling write_iter() with a constructed synchronous kiocb.
Another i915 change replaces a manual write loop with
kernel_write() during GEM shmem object creation.
Cleanups:
- don't duplicate vfs_open() in kernel_file_open()
- proc_fd_getattr(): don't bother with S_ISDIR() check
- fs/ecryptfs: replace snprintf with sysfs_emit in show function
- vfs: Remove unnecessary list_for_each_entry_safe() from
evict_inodes()
- filelock: add new locks_wake_up_waiter() helper
- fs: Remove three arguments from block_write_end()
- VFS: change old_dir and new_dir in struct renamedata to dentrys
- netfs: Remove unused declaration netfs_queue_write_request()
Fixes:
- eventpoll: Fix semi-unbounded recursion
- eventpoll: fix sphinx documentation build warning
- fs/read_write: Fix spelling typo
- fs: annotate data race between poll_schedule_timeout() and
pollwake()
- fs/pipe: set FMODE_NOWAIT in create_pipe_files()
- docs/vfs: update references to i_mutex to i_rwsem
- fs/buffer: remove comment about hard sectorsize
- fs/buffer: remove the min and max limit checks in __getblk_slow()
- fs/libfs: don't assume blocksize <= PAGE_SIZE in
generic_check_addressable
- fs_context: fix parameter name in infofc() macro
- fs: Prevent file descriptor table allocations exceeding INT_MAX"
* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
netfs: Remove unused declaration netfs_queue_write_request()
eventpoll: fix sphinx documentation build warning
ext4: support uncached buffered I/O
mm/pagemap: add write_begin_get_folio() helper function
fs: change write_begin/write_end interface to take struct kiocb *
drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
drm/i915: Use kernel_write() in shmem object create
eventpoll: Fix semi-unbounded recursion
vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
fs/buffer: remove the min and max limit checks in __getblk_slow()
fs: Prevent file descriptor table allocations exceeding INT_MAX
fs: Remove three arguments from block_write_end()
fs/ecryptfs: replace snprintf with sysfs_emit in show function
fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
docs/vfs: update references to i_mutex to i_rwsem
fs/buffer: remove comment about hard sectorsize
fs_context: fix parameter name in infofc() macro
VFS: change old_dir and new_dir in struct renamedata to dentrys
proc_fd_getattr(): don't bother with S_ISDIR() check
...
|
|
Pull dentry d_flags updates from Al Viro:
"The current exclusion rules for dentry->d_flags stores are rather
unpleasant. The basic rules are simple:
- stores to dentry->d_flags are OK under dentry->d_lock
- stores to dentry->d_flags are OK in the dentry constructor, before
becomes potentially visible to other threads
Unfortunately, there's a couple of exceptions to that, and that's
where the headache comes from.
The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
dentry and adjusts the flags that correspond to presence of individual
methods. It's very easy to misuse; existing uses _are_ safe, but proof
of correctness is brittle.
Use in __d_alloc() is safe (we are within a constructor), but we might
as well precalculate the initial value of 'd_flags' when we set the
default ->d_op for given superblock and set 'd_flags' directly instead
of messing with that helper.
The reasons why other uses are safe are bloody convoluted; I'm not
going to reproduce it here. See [1] for gory details, if you care. The
critical part is using d_set_d_op() only just prior to
d_splice_alias(), which makes a combination of d_splice_alias() with
setting ->d_op, etc a natural replacement primitive.
Better yet, if we go that way, it's easy to take setting ->d_op and
modifying 'd_flags' under ->d_lock, which eliminates the headache as
far as 'd_flags' exclusion rules are concerned. Other exceptions are
minor and easy to deal with.
What this series does:
- d_set_d_op() is no longer available; instead a new primitive
(d_splice_alias_ops()) is provided, equivalent to combination of
d_set_d_op() and d_splice_alias().
- new field of struct super_block - 's_d_flags'. This sets the
default value of 'd_flags' to be used when allocating dentries on
this filesystem.
- new primitive for setting 's_d_op': set_default_d_op(). This
replaces stores to 's_d_op' at mount time.
All in-tree filesystems converted; out-of-tree ones will get caught
by the compiler ('s_d_op' is renamed, so stores to it will be
caught). 's_d_flags' is set by the same primitive to match the
's_d_op'.
- a lot of filesystems had sb->s_d_op->d_delete equal to
always_delete_dentry; that is equivalent to setting
DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
set that bit in 's_d_flags' and drop 'd_delete()' from
dentry_operations.
In quite a few cases that results in empty dentry_operations, which
means that we can get rid of those.
- kill simple_dentry_operations - not needed anymore
- massage d_alloc_parallel() to get rid of the other exception wrt
'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
allocate the new dentry; no need to delay that until we commit to
using the sucker.
As the result, 'd_flags' stores are all either under ->d_lock or done
before the dentry becomes visible in any shared data structures"
Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1]
* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
configfs: use DCACHE_DONTCACHE
debugfs: use DCACHE_DONTCACHE
efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
9p: don't bother with always_delete_dentry
ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
kill simple_dentry_operations
devpts, sunrpc, hostfs: don't bother with ->d_op
shmem: no dentry retention past the refcount reaching zero
d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
make d_set_d_op() static
simple_lookup(): just set DCACHE_DONTCACHE
tracefs: Add d_delete to remove negative dentries
set_default_d_op(): calculate the matching value for ->d_flags
correct the set of flags forbidden at d_set_d_op() time
split d_flags calculation out of d_set_d_op()
new helper: set_default_d_op()
fuse: no need for special dentry_operations for root dentry
switch procfs from d_set_d_op() to d_splice_alias_ops()
new helper: d_splice_alias_ops()
procfs: kill ->proc_dops
...
|
|
In case of ovl_lookup_temp() failure, we currently print `err`
which is actually not initialized at all.
Instead, properly print PTR_ERR(whiteout) which is where the
actual error really is.
Address-Coverity-ID: 1647983 ("Uninitialized variables (UNINIT)")
Fixes: 8afa0a7367138 ("ovl: narrow locking in ovl_whiteout()")
Signed-off-by: Antonio Quartulli <antonio@mandelbit.com>
Link: https://lore.kernel.org/20250721203821.7812-1-antonio@mandelbit.com
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we
no longer need both.
This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it
static.
ovl_cleanup_unlocked() is renamed to ovl_cleanup().
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of passing an inode *dir, pass a dentry *parent. This makes the
calling slightly cleaner.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-21-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_check_rename_whiteout() now only holds the directory lock when
needed, and takes it again if necessary.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-20-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_whiteout() relies on the workdir i_rwsem to provide exclusive access
to ofs->whiteout which it manipulates. Rather than depending on this,
add a new mutex, "whiteout_lock" to explicitly provide the required
locking. Use guard(mutex) for this so that we can return without
needing to explicitly unlock.
Then take the lock on workdir only when needed - to lookup the temp name
and to do the whiteout or link.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-19-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than locking the directory(s) before calling
ovl_cleanup_and_whiteout(), change it (and ovl_whiteout()) to do the
locking, so the locking can be fine grained as will be needed for
proposed locking changes.
Sometimes this is called to whiteout something in the index dir, in
which case only that dir must be locked. In one case it is called on
something in an upperdir, so two directories must be locked. We use
ovl_lock_rename_workdir() for this and remove the restriction that
upperdir cannot be indexdir - because now sometimes it is.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-18-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This code:
performs a lookup_upper
creates a whiteout object
renames the whiteout over the result of the lookup
The create and the rename must be locked separately for proposed
directory locking changes. This patch takes a first step of moving the
lookup out of the locked region. A subsequent patch will separate the
create from the rename.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-17-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than calling ovl_workdir_cleanup() with the dir already locked,
change it to take the dir lock only when needed.
Also change ovl_workdir_cleanup() to take a dentry for the parent rather
than an inode.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-16-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Only take the dir lock when needed, rather than for the whole loop.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-15-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of taking the directory lock for the whole cleanup, only take it
when needed.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-14-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In ovl_workdir_create() don't hold the dir lock for the whole time, but
only take it when needed.
It now gets taken separately for ovl_workdir_cleanup(). A subsequent
patch will move the locking into that function.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-13-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_cleanup_index() takes a lock on the directory and then does a lookup
and possibly one of two different cleanups.
This patch narrows the locking to use the _unlocked() versions of the
lookup and one cleanup, and just takes the lock for the other cleanup.
A subsequent patch will take the lock into the cleanup.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-12-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than lock the directory for the whole operation, use
ovl_lookup_upper_unlocked() and ovl_cleanup_unlocked() to take the lock
only when needed.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-11-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the rename lock immediately after the rename, and use
ovl_cleanup_unlocked() for cleanup.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-10-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than having three separate goto label: out_unlock, out_dput_old,
and out_dput, make use of that fact that dput() happily accepts a NULL
pointer to reduce this to just one goto label: out_unlock.
olddentry and newdentry are initialised to NULL and only set once a
value dentry is found. They are then dput() late in the function.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-9-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Unlock the parents immediately after the rename, and use
ovl_cleanup_unlocked() for cleanup, which takes a separate lock.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-8-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the locks immediately after rename, and use a separate lock for
cleanup.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Note that ovl_cleanup_whiteouts() operates on "upper", a child of
"upperdir" and does not require upperdir or workdir to be locked.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-7-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the directory lock immediately after the ovl_create_real() call and
take a separate lock later for cleanup in ovl_cleanup_unlocked() - if
needed.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-6-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In ovl_copy_up_workdir() unlock immediately after the rename. There is
nothing else in the function that needs the lock.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-5-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl currently locks a directory or two and then performs multiple actions
in one or both directories. This is incompatible with proposed changes
which will lock just the dentry of objects being acted on.
This patch moves calls to ovl_create_temp() out of the locked regions and
has it take and release the relevant lock itself.
The lock that was taken before this function was called is now taken
after. This means that any code between where the lock was taken and
ovl_create_temp() is now unlocked. This necessitates the use of
ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked().
These will be used more widely in future patches.
Now that the file is created before the lock is taken for rename, we
need to ensure the parent wasn't changed before the lock was gained.
ovl_lock_rename_workdir() is changed to optionally receive the dentries
that will be involved in the rename. If either is present but has the
wrong parent, an error is returned.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_copy_up_workdir() currently take a rename lock on two directories,
then use the lock to both create a file in one directory, perform a
rename, and possibly unlink the file for cleanup. This is incompatible
with proposed changes which will lock just the dentry of objects being
acted on.
This patch moves the call to ovl_create_index() earlier in
ovl_copy_up_workdir() to before the lock is taken.
ovl_create_index() then takes the required lock only when needed.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-3-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|