aboutsummaryrefslogtreecommitdiffstats
path: root/fs/overlayfs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-10ovl: remove redundant IOCB_DIO_CALLER_COMP clearingSeong-Gwang Heo1-5/+0
The backing_file_write_iter() function, which is called immediately after this code, already contains identical logic to clear the IOCB_DIO_CALLER_COMP flag along with the same explanatory comment. There is no need to duplicate this operation in the overlayfs code. Signed-off-by: Seong-Gwang Heo <heo@mykernel.net> Fixes: a6293b3e285c ("fs: factor out backing_file_{read,write}_iter() helpers") Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-10Revert "fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP"Andrey Albershteyn2-2/+5
This reverts commit 474b155adf3927d2c944423045757b54aa1ca4de. This patch caused regression in ioctl_setflags(). Underlying filesystems use EOPNOTSUPP to indicate that flag is not supported. This error is also gets converted in ioctl_setflags(). Therefore, for unsupported flags error changed from EOPNOSUPP to ENOIOCTLCMD. Link: https://lore.kernel.org/linux-xfs/a622643f-1585-40b0-9441-cf7ece176e83@kernel.org/ Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-03Merge tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds5-11/+11
Pull file->f_path constification from Al Viro: "Only one thing was modifying ->f_path of an opened file - acct(2). Massaging that away and constifying a bunch of struct path * arguments in functions that might be given &file->f_path ends up with the situation where we can turn ->f_path into an anon union of const struct path f_path and struct path __f_path, the latter modified only in a few places in fs/{file_table,open,namei}.c, all for struct file instances that are yet to be opened" * tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) Have cc(1) catch attempts to modify ->f_path kernel/acct.c: saner struct file treatment configfs:get_target() - release path as soon as we grab configfs_item reference apparmor/af_unix: constify struct path * arguments ovl_is_real_file: constify realpath argument ovl_sync_file(): constify path argument ovl_lower_dir(): constify path argument ovl_get_verity_digest(): constify path argument ovl_validate_verity(): constify {meta,data}path arguments ovl_ensure_verity_loaded(): constify datapath argument ksmbd_vfs_set_init_posix_acl(): constify path argument ksmbd_vfs_inherit_posix_acl(): constify path argument ksmbd_vfs_kern_path_unlock(): constify path argument ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path * check_export(): constify path argument export_operations->open(): constify path argument rqst_exp_get_by_name(): constify path argument nfs: constify path argument of __vfs_getattr() bpf...d_path(): constify path argument done_path_create(): constify path argument ...
2025-10-03Merge tag 'ovl-update-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfsLinus Torvalds11-41/+229
Pull overlayfs updates from Amir Goldstein: - Work by André Almeida to support case-insensitive overlayfs Underlying case-insensitive filesystems casefolding is per directory, but for overlayfs it is all-or-nothing. It supports layers where all directories are casefolded (with same encoding) or layers where no directories are casefolded. - A fix for a "bug" in Neil's ovl directory lock changes, which only manifested itself with casefold enabled layers which may return an unhashed negative dentry from lookup. * tag 'ovl-update-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: make sure that ovl_create_real() returns a hashed dentry ovl: Support mounting case-insensitive enabled layers ovl: Check for casefold consistency when creating new dentries ovl: Add S_CASEFOLD as part of the inode flag to be copied ovl: Set case-insensitive dentry operations for ovl sb ovl: Ensure that all layers have the same encoding ovl: Create ovl_casefold() to support casefolded strncmp() ovl: Prepare for mounting case-insensitive enabled layers fs: Create sb_same_encoding() helper fs: Create sb_encoding() helper
2025-09-29Merge tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds2-16/+15
Pull vfs async directory updates from Christian Brauner: "This contains further preparatory changes for the asynchronous directory locking scheme: - Add lookup_one_positive_killable() which allows overlayfs to perform lookup that won't block on a fatal signal - Unify the mount idmap handling in struct renamedata as a rename can only happen within a single mount - Introduce kern_path_parent() for audit which sets the path to the parent and returns a dentry for the target without holding any locks on return - Rename kern_path_locked() as it is only used to prepare for the removal of an object from the filesystem: kern_path_locked() => start_removing_path() kern_path_create() => start_creating_path() user_path_create() => start_creating_user_path() user_path_locked_at() => start_removing_user_path_at() done_path_create() => end_creating_path() NA => end_removing_path()" * tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: debugfs: rename start_creating() to debugfs_start_creating() VFS: rename kern_path_locked() and related functions. VFS/audit: introduce kern_path_parent() for audit VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata VFS: discard err2 in filename_create() VFS/ovl: add lookup_one_positive_killable()
2025-09-29Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-1/+1
Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add "initramfs_options" parameter to set initramfs mount options. This allows to add specific mount options to the rootfs to e.g., limit the memory size - Add RWF_NOSIGNAL flag for pwritev2() Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal from being raised when writing on disconnected pipes or sockets. The flag is handled directly by the pipe filesystem and converted to the existing MSG_NOSIGNAL flag for sockets - Allow to pass pid namespace as procfs mount option Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed) This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs) But this requires forking off a helper process because you cannot just one-shot this using mount(2) * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process) While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate Things would be much easier if there was a way for userspace to just specify the pidns they want. So this pull request contains changes to implement a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); Cleanups: - Remove the last references to EXPORT_OP_ASYNC_LOCK - Make file_remove_privs_flags() static - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used - Use try_cmpxchg() in start_dir_add() - Use try_cmpxchg() in sb_init_done_wq() - Replace offsetof() with struct_size() in ioctl_file_dedupe_range() - Remove vfs_ioctl() export - Replace rwlock() with spinlock in epoll code as rwlock causes priority inversion on preempt rt kernels - Make ns_entries in fs/proc/namespaces const - Use a switch() statement() in init_special_inode() just like we do in may_open() - Use struct_size() in dir_add() in the initramfs code - Use str_plural() in rd_load_image() - Replace strcpy() with strscpy() in find_link() - Rename generic_delete_inode() to inode_just_drop() and generic_drop_inode() to inode_generic_drop() - Remove unused arguments from fcntl_{g,s}et_rw_hint() Fixes: - Document @name parameter for name_contains_dotdot() helper - Fix spelling mistake - Always return zero from replace_fd() instead of the file descriptor number - Limit the size for copy_file_range() in compat mode to prevent a signed overflow - Fix debugfs mount options not being applied - Verify the inode mode when loading it from disk in minixfs - Verify the inode mode when loading it from disk in cramfs - Don't trigger automounts with RESOLVE_NO_XDEV If openat2() was called with RESOLVE_NO_XDEV it didn't traverse through automounts, but could still trigger them - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints - Fix unused variable warning in rd_load_image() on s390 - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD - Use ns_capable_noaudit() when determining net sysctl permissions - Don't call path_put() under namespace semaphore in listmount() and statmount()" * tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) fcntl: trim arguments listmount: don't call path_put() under namespace semaphore statmount: don't call path_put() under namespace semaphore pid: use ns_capable_noaudit() when determining net sysctl permissions fs: rename generic_delete_inode() and generic_drop_inode() init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD initramfs: Replace strcpy() with strscpy() in find_link() initrd: Use str_plural() in rd_load_image() initramfs: Use struct_size() helper to improve dir_add() initrd: Fix unused variable warning in rd_load_image() on s390 fs: use the switch statement in init_special_inode() fs/proc/namespaces: make ns_entries const filelock: add FL_RECLAIM to show_fl_flags() macro eventpoll: Replace rwlock with spinlock selftests/proc: add tests for new pidns APIs procfs: add "pidns" mount option pidns: move is-ancestor logic to helper openat2: don't trigger automounts with RESOLVE_NO_XDEV namei: move cross-device check to __traverse_mounts namei: remove LOOKUP_NO_XDEV check from handle_mounts ...
2025-09-23VFS: unify old_mnt_idmap and new_mnt_idmap in renamedataNeilBrown1-2/+1
A rename operation can only rename within a single mount. Callers of vfs_rename() must and do ensure this is the case. So there is no point in having two mnt_idmaps in renamedata as they are always the same. Only one of them is passed to ->rename in any case. This patch replaces both with a single "mnt_idmap" and changes all callers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-23VFS/ovl: add lookup_one_positive_killable()NeilBrown1-14/+14
ovl wants a lookup which won't block on a fatal signal. It currently uses down_write_killable() and then repeatedly calls to lookup_one() The lock may not be needed if the name is already in the dcache and it aids proposed future changes if the locking is kept internal to namei.c So this patch adds lookup_one_positive_killable() which is like lookup_one_positive() but will abort in the face of a fatal signal. overlayfs is changed to use this. Note that instead of always getting an exclusive lock, ovl now only gets a shared lock, and only sometimes. The exclusive lock was never needed. However down_read_killable() was only added in v4.15 but overlayfs started using down_write_killable() here in v4.7. Note that the linked list ->first_maybe_whiteout ->next_maybe_white is local to the thread so there is no concurrency in that list which could be threatened by removing the locking. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-23ovl: make sure that ovl_create_real() returns a hashed dentryAmir Goldstein1-1/+21
e8bd877fb76bb9f3 ("ovl: fix possible double unlink") added a sanity check of !d_unhashed(child) to try to verify that child dentry was not unlinked while parent dir was unlocked. This "was not unlink" check has a false positive result in the case of casefolded parent dir, because in that case, ovl_create_temp() returns an unhashed dentry after ovl_create_real() gets an unhashed dentry from ovl_lookup_upper() and makes it positive. To avoid returning unhashed dentry from ovl_create_temp(), let ovl_create_real() lookup again after making the newdentry positive, so it always returns a hashed positive dentry (or an error). This fixes the error in ovl_parent_lock() in ovl_check_rename_whiteout() after ovl_create_temp() and allows mount of overlayfs with casefolding enabled layers. Reported-by: André Almeida <andrealmeid@igalia.com> Closes: https://lore.kernel.org/r/18704e8c-c734-43f3-bc7c-b8be345e1bf5@igalia.com/ Suggested-by: Neil Brown <neil@brown.name> Reviewed-by: Neil Brown <neil@brown.name> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23ovl: Support mounting case-insensitive enabled layersAndré Almeida2-11/+12
Drop the restriction for casefold dentries lookup to enable support for case-insensitive layers in overlayfs. Support case-insensitive layers with the condition that they should be uniformly enabled across the stack and (i.e. if the root mount dir has casefold enabled, so should all the dirs bellow for every layer). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23ovl: Check for casefold consistency when creating new dentriesAndré Almeida1-0/+7
In a overlayfs with casefold enabled, all new dentries should have casefold enabled as well. Check this at ovl_create_real(). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23ovl: Add S_CASEFOLD as part of the inode flag to be copiedAndré Almeida4-4/+8
To keep ovl's inodes consistent with their real inodes, create a new mask for inode file attributes that needs to be copied. Add the S_CASEFOLD flag as part of the flags that need to be copied along with the other file attributes. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23ovl: Set case-insensitive dentry operations for ovl sbAndré Almeida1-1/+24
For filesystems with encoding (i.e. with case-insensitive support), set the dentry operations for the super block as ovl_dentry_ci_operations. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23ovl: Ensure that all layers have the same encodingAndré Almeida1-0/+38
When merging layers from different filesystems with casefold enabled, all layers should use the same encoding version and have the same flags to avoid any kind of incompatibility issues. Also, set the encoding and the encoding flags for the ovl super block as the same as used by the first valid layer. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23ovl: Create ovl_casefold() to support casefolded strncmp()André Almeida1-21/+105
To add overlayfs support casefold layers, create a new function ovl_casefold(), to be able to do case-insensitive strncmp(). ovl_casefold() allocates a new buffer and stores the casefolded version of the string on it. If the allocation or the casefold operation fails, fallback to use the original string. The case-insentive name is then used in the rb-tree search/insertion operation. If the name is found in the rb-tree, the name can be discarded and the buffer is freed. If the name isn't found, it's then stored at struct ovl_cache_entry to be used later. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23ovl: Prepare for mounting case-insensitive enabled layersAndré Almeida3-3/+14
Prepare for mounting layers with case-insensitive dentries in order to supporting such layers in overlayfs, while enforcing uniform casefold layers. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-15ovl_is_real_file: constify realpath argumentAl Viro1-1/+1
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15ovl_sync_file(): constify path argumentAl Viro1-1/+1
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15ovl_lower_dir(): constify path argumentAl Viro1-1/+1
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15ovl_get_verity_digest(): constify path argumentAl Viro2-2/+2
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15ovl_validate_verity(): constify {meta,data}path argumentsAl Viro2-4/+4
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15ovl_ensure_verity_loaded(): constify datapath argumentAl Viro2-2/+2
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15fs: rename generic_delete_inode() and generic_drop_inode()Mateusz Guzik1-1/+1
generic_delete_inode() is rather misleading for what the routine is doing. inode_just_drop() should be much clearer. The new naming is inconsistent with generic_drop_inode(), so rename that one as well with inode_ as the suffix. No functional changes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-18ovl: fix possible double unlinkAmir Goldstein1-1/+2
commit 9d23967b18c6 ("ovl: simplify an error path in ovl_copy_up_workdir()") introduced the helper ovl_cleanup_unlocked(), which is later used in several following patches to re-acquire the parent inode lock and unlink a dentry that was earlier found using lookup. This helper was eventually renamed to ovl_cleanup(). The helper ovl_parent_lock() is used to re-acquire the parent inode lock. After acquiring the parent inode lock, the helper verifies that the dentry has not since been moved to another parent, but it failed to verify that the dentry wasn't unlinked from the parent. This means that now every call to ovl_cleanup() could potentially race with another thread, unlinking the dentry to be cleaned up underneath overlayfs and trigger a vfs assertion. Reported-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com Tested-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com Fixes: 9d23967b18c6 ("ovl: simplify an error path in ovl_copy_up_workdir()") Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-08-18ovl: use I_MUTEX_PARENT when locking parent in ovl_create_temp()NeilBrown1-1/+1
ovl_create_temp() treats "workdir" as a parent in which it creates an object so it should use I_MUTEX_PARENT. Prior to the commit identified below the lock was taken by the caller which sometimes used I_MUTEX_PARENT and sometimes used I_MUTEX_NORMAL. The use of I_MUTEX_NORMAL was incorrect but unfortunately copied into ovl_create_temp(). Note to backporters: This patch only applies after the last Fixes given below (post v6.16). To fix the bug in v6.7 and later the inode_lock() call in ovl_copy_up_workdir() needs to nest using I_MUTEX_PARENT. Link: https://lore.kernel.org/all/67a72070.050a0220.3d72c.0022.GAE@google.com/ Cc: stable@vger.kernel.org Reported-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com Tested-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held") Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.") Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-07-28Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds4-19/+16
Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-28Merge tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds10-215/+288
Pull overlayfs updates from Christian Brauner: "This contains overlayfs updates for this cycle. The changes for overlayfs in here are primarily focussed on preparing for some proposed changes to directory locking. Overlayfs currently will sometimes lock a directory on the upper filesystem and do a few different things while holding the lock. This is incompatible with the new potential scheme. This series narrows the region of code protected by the directory lock, taking it multiple times when necessary. This theoretically opens up the possibilty of other changes happening on the upper filesytem between the unlock and the lock. To some extent the patches guard against that by checking the dentries still have the expect parent after retaking the lock. In general, concurrent changes to the upper and lower filesystems aren't supported properly anyway" * tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) ovl: properly print correct variable ovl: rename ovl_cleanup_unlocked() to ovl_cleanup() ovl: change ovl_create_real() to receive dentry parent ovl: narrow locking in ovl_check_rename_whiteout() ovl: narrow locking in ovl_whiteout() ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed ovl: narrow locking on ovl_remove_and_whiteout() ovl: change ovl_workdir_cleanup() to take dir lock as needed. ovl: narrow locking in ovl_workdir_cleanup_recurse() ovl: narrow locking in ovl_indexdir_cleanup() ovl: narrow locking in ovl_workdir_create() ovl: narrow locking in ovl_cleanup_index() ovl: narrow locking in ovl_cleanup_whiteouts() ovl: narrow locking in ovl_rename() ovl: simplify gotos in ovl_rename() ovl: narrow locking in ovl_create_over_whiteout() ovl: narrow locking in ovl_clear_empty() ovl: narrow locking in ovl_create_upper() ovl: narrow the locked region in ovl_copy_up_workdir() ovl: Call ovl_create_temp() without lock held. ...
2025-07-28Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds6-22/+22
Pull misc VFS updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add ext4 IOCB_DONTCACHE support This refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Cleanups: - don't duplicate vfs_open() in kernel_file_open() - proc_fd_getattr(): don't bother with S_ISDIR() check - fs/ecryptfs: replace snprintf with sysfs_emit in show function - vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() - filelock: add new locks_wake_up_waiter() helper - fs: Remove three arguments from block_write_end() - VFS: change old_dir and new_dir in struct renamedata to dentrys - netfs: Remove unused declaration netfs_queue_write_request() Fixes: - eventpoll: Fix semi-unbounded recursion - eventpoll: fix sphinx documentation build warning - fs/read_write: Fix spelling typo - fs: annotate data race between poll_schedule_timeout() and pollwake() - fs/pipe: set FMODE_NOWAIT in create_pipe_files() - docs/vfs: update references to i_mutex to i_rwsem - fs/buffer: remove comment about hard sectorsize - fs/buffer: remove the min and max limit checks in __getblk_slow() - fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable - fs_context: fix parameter name in infofc() macro - fs: Prevent file descriptor table allocations exceeding INT_MAX" * tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) netfs: Remove unused declaration netfs_queue_write_request() eventpoll: fix sphinx documentation build warning ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create eventpoll: Fix semi-unbounded recursion vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable fs/buffer: remove the min and max limit checks in __getblk_slow() fs: Prevent file descriptor table allocations exceeding INT_MAX fs: Remove three arguments from block_write_end() fs/ecryptfs: replace snprintf with sysfs_emit in show function fs: annotate suspected data race between poll_schedule_timeout() and pollwake() docs/vfs: update references to i_mutex to i_rwsem fs/buffer: remove comment about hard sectorsize fs_context: fix parameter name in infofc() macro VFS: change old_dir and new_dir in struct renamedata to dentrys proc_fd_getattr(): don't bother with S_ISDIR() check ...
2025-07-28Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+1
Pull dentry d_flags updates from Al Viro: "The current exclusion rules for dentry->d_flags stores are rather unpleasant. The basic rules are simple: - stores to dentry->d_flags are OK under dentry->d_lock - stores to dentry->d_flags are OK in the dentry constructor, before becomes potentially visible to other threads Unfortunately, there's a couple of exceptions to that, and that's where the headache comes from. The main PITA comes from d_set_d_op(); that primitive sets ->d_op of dentry and adjusts the flags that correspond to presence of individual methods. It's very easy to misuse; existing uses _are_ safe, but proof of correctness is brittle. Use in __d_alloc() is safe (we are within a constructor), but we might as well precalculate the initial value of 'd_flags' when we set the default ->d_op for given superblock and set 'd_flags' directly instead of messing with that helper. The reasons why other uses are safe are bloody convoluted; I'm not going to reproduce it here. See [1] for gory details, if you care. The critical part is using d_set_d_op() only just prior to d_splice_alias(), which makes a combination of d_splice_alias() with setting ->d_op, etc a natural replacement primitive. Better yet, if we go that way, it's easy to take setting ->d_op and modifying 'd_flags' under ->d_lock, which eliminates the headache as far as 'd_flags' exclusion rules are concerned. Other exceptions are minor and easy to deal with. What this series does: - d_set_d_op() is no longer available; instead a new primitive (d_splice_alias_ops()) is provided, equivalent to combination of d_set_d_op() and d_splice_alias(). - new field of struct super_block - 's_d_flags'. This sets the default value of 'd_flags' to be used when allocating dentries on this filesystem. - new primitive for setting 's_d_op': set_default_d_op(). This replaces stores to 's_d_op' at mount time. All in-tree filesystems converted; out-of-tree ones will get caught by the compiler ('s_d_op' is renamed, so stores to it will be caught). 's_d_flags' is set by the same primitive to match the 's_d_op'. - a lot of filesystems had sb->s_d_op->d_delete equal to always_delete_dentry; that is equivalent to setting DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well set that bit in 's_d_flags' and drop 'd_delete()' from dentry_operations. In quite a few cases that results in empty dentry_operations, which means that we can get rid of those. - kill simple_dentry_operations - not needed anymore - massage d_alloc_parallel() to get rid of the other exception wrt 'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry; no need to delay that until we commit to using the sucker. As the result, 'd_flags' stores are all either under ->d_lock or done before the dentry becomes visible in any shared data structures" Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1] * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits) configfs: use DCACHE_DONTCACHE debugfs: use DCACHE_DONTCACHE efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry() 9p: don't bother with always_delete_dentry ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE kill simple_dentry_operations devpts, sunrpc, hostfs: don't bother with ->d_op shmem: no dentry retention past the refcount reaching zero d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier make d_set_d_op() static simple_lookup(): just set DCACHE_DONTCACHE tracefs: Add d_delete to remove negative dentries set_default_d_op(): calculate the matching value for ->d_flags correct the set of flags forbidden at d_set_d_op() time split d_flags calculation out of d_set_d_op() new helper: set_default_d_op() fuse: no need for special dentry_operations for root dentry switch procfs from d_set_d_op() to d_splice_alias_ops() new helper: d_splice_alias_ops() procfs: kill ->proc_dops ...
2025-07-25ovl: properly print correct variableAntonio Quartulli1-2/+3
In case of ovl_lookup_temp() failure, we currently print `err` which is actually not initialized at all. Instead, properly print PTR_ERR(whiteout) which is where the actual error really is. Address-Coverity-ID: 1647983 ("Uninitialized variables (UNINIT)") Fixes: 8afa0a7367138 ("ovl: narrow locking in ovl_whiteout()") Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Link: https://lore.kernel.org/20250721203821.7812-1-antonio@mandelbit.com Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()NeilBrown6-25/+25
The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we no longer need both. This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it static. ovl_cleanup_unlocked() is renamed to ovl_cleanup(). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_create_real() to receive dentry parentNeilBrown3-6/+6
Instead of passing an inode *dir, pass a dentry *parent. This makes the calling slightly cleaner. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-21-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_check_rename_whiteout()NeilBrown1-8/+7
ovl_check_rename_whiteout() now only holds the directory lock when needed, and takes it again if necessary. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-20-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_whiteout()NeilBrown3-20/+27
ovl_whiteout() relies on the workdir i_rwsem to provide exclusive access to ofs->whiteout which it manipulates. Rather than depending on this, add a new mutex, "whiteout_lock" to explicitly provide the required locking. Use guard(mutex) for this so that we can return without needing to explicitly unlock. Then take the lock on workdir only when needed - to lookup the temp name and to do the whiteout or link. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-19-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_cleanup_and_whiteout() to take rename lock as neededNeilBrown3-26/+12
Rather than locking the directory(s) before calling ovl_cleanup_and_whiteout(), change it (and ovl_whiteout()) to do the locking, so the locking can be fine grained as will be needed for proposed locking changes. Sometimes this is called to whiteout something in the index dir, in which case only that dir must be locked. In one case it is called on something in an upperdir, so two directories must be locked. We use ovl_lock_rename_workdir() for this and remove the restriction that upperdir cannot be indexdir - because now sometimes it is. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-18-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking on ovl_remove_and_whiteout()NeilBrown1-13/+10
This code: performs a lookup_upper creates a whiteout object renames the whiteout over the result of the lookup The create and the rename must be locked separately for proposed directory locking changes. This patch takes a first step of moving the lookup out of the locked region. A subsequent patch will separate the create from the rename. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-17-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_workdir_cleanup() to take dir lock as needed.NeilBrown3-29/+15
Rather than calling ovl_workdir_cleanup() with the dir already locked, change it to take the dir lock only when needed. Also change ovl_workdir_cleanup() to take a dentry for the parent rather than an inode. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-16-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_workdir_cleanup_recurse()NeilBrown1-5/+9
Only take the dir lock when needed, rather than for the whole loop. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-15-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_indexdir_cleanup()NeilBrown1-7/+13
Instead of taking the directory lock for the whole cleanup, only take it when needed. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-14-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_workdir_create()NeilBrown1-11/+14
In ovl_workdir_create() don't hold the dir lock for the whole time, but only take it when needed. It now gets taken separately for ovl_workdir_cleanup(). A subsequent patch will move the locking into that function. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-13-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_cleanup_index()NeilBrown1-8/+8
ovl_cleanup_index() takes a lock on the directory and then does a lookup and possibly one of two different cleanups. This patch narrows the locking to use the _unlocked() versions of the lookup and one cleanup, and just takes the lock for the other cleanup. A subsequent patch will take the lock into the cleanup. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-12-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_cleanup_whiteouts()NeilBrown1-4/+2
Rather than lock the directory for the whole operation, use ovl_lookup_upper_unlocked() and ovl_cleanup_unlocked() to take the lock only when needed. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-11-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_rename()NeilBrown1-4/+7
Drop the rename lock immediately after the rename, and use ovl_cleanup_unlocked() for cleanup. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-10-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: simplify gotos in ovl_rename()NeilBrown1-27/+27
Rather than having three separate goto label: out_unlock, out_dput_old, and out_dput, make use of that fact that dput() happily accepts a NULL pointer to reduce this to just one goto label: out_unlock. olddentry and newdentry are initialised to NULL and only set once a value dentry is found. They are then dput() late in the function. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-9-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_create_over_whiteout()NeilBrown1-7/+6
Unlock the parents immediately after the rename, and use ovl_cleanup_unlocked() for cleanup, which takes a separate lock. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-8-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_clear_empty()NeilBrown1-4/+3
Drop the locks immediately after rename, and use a separate lock for cleanup. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Note that ovl_cleanup_whiteouts() operates on "upper", a child of "upperdir" and does not require upperdir or workdir to be locked. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-7-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_create_upper()NeilBrown1-7/+5
Drop the directory lock immediately after the ovl_create_real() call and take a separate lock later for cleanup in ovl_cleanup_unlocked() - if needed. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-6-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow the locked region in ovl_copy_up_workdir()NeilBrown1-2/+2
In ovl_copy_up_workdir() unlock immediately after the rename. There is nothing else in the function that needs the lock. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-5-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: Call ovl_create_temp() without lock held.NeilBrown5-38/+52
ovl currently locks a directory or two and then performs multiple actions in one or both directories. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves calls to ovl_create_temp() out of the locked regions and has it take and release the relevant lock itself. The lock that was taken before this function was called is now taken after. This means that any code between where the lock was taken and ovl_create_temp() is now unlocked. This necessitates the use of ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked(). These will be used more widely in future patches. Now that the file is created before the lock is taken for rename, we need to ensure the parent wasn't changed before the lock was gained. ovl_lock_rename_workdir() is changed to optionally receive the dentries that will be involved in the rename. If either is present but has the wrong parent, an error is returned. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_create_index() to take dir locksNeilBrown1-9/+13
ovl_copy_up_workdir() currently take a rename lock on two directories, then use the lock to both create a file in one directory, perform a rename, and possibly unlink the file for cleanup. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves the call to ovl_create_index() earlier in ovl_copy_up_workdir() to before the lock is taken. ovl_create_index() then takes the required lock only when needed. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-3-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>