summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Use NULL instead of 0 to clear v_socket pointer (which actually clears allclaudio2021-01-291-2/+2
| | | | | of the v_un pointers). OK jsg@ mvs@
* Remove unused debug_syncprt, improve debug sysctl handlingkn2020-08-231-2/+2
| | | | | | | | | | | | | | | | | | | "syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008. Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c the only visible difference between used and stub ctldebug structs in the debugvars[] array is their extern keyword, indicating that it is defined elsewhere. sys/sysctl.h declares all debugN members as extern upfront, but these declarations are not needed. Remove the unused debug sysctl, rename the only remaining one to something meaningful and remove forward declarations from /sys/sysctl.h; this way, adding new debug sysctls is a matter of adding extern and coming up with a name, which is nicer to read on its own and better to grep for. OK mpi
* Move sysctl(2) CTL_DEBUG from DEBUG to new DEBUG_SYSCTLkn2020-08-221-3/+3
| | | | | | | | | | | | | | | | | Adding "debug.my-knob" sysctls is really helpful to select different code paths and/or log on demand during runtime without recompile, but as this code is under DEBUG, lots of other noise comes with it which is often undesired, at least when looking at specific subsystems only. Adding globals to the kernel and breaking into DDB to change them helps, but that does not work over SSH, hence the need for debug sysctls. Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general option for all of sysctl(2). OK gnezdo
* Relax the lockcount assertion in vputonfreelist(). Back when I fixedanton2020-03-271-2/+6
| | | | | | | | | | | | several problems with the vnode exclusive lock implementation, I overlooked the fact that a vnode can be in a state where the usecount is zero while the holdcount still being positive. There could still be threads waiting on the vnode lock in uvn_io() as long as the holdcount is positive. "go ahead" mpi@ Reported-by: syzbot+767d6deb1a647850a0ca@syzkaller.appspotmail.com
* Move the LK_DRAIN logic from VOP_LOCK() to vclean() the only caller ofclaudio2020-02-131-1/+12
| | | | | VOP_LOCK with LK_DRAIN. This simplifies VOP_LOCK() a fair bit. OK visa@
* struct vops is not modified during runtime so use const which moves eachclaudio2020-01-201-2/+2
| | | | | into read-only data segment. OK deraadt@ tedu@
* Convert the vnode list at the mount point into a tailq. Duringbluhm2020-01-101-8/+8
| | | | | | | | | | | | unmount this list is traversed and the dirty vnodes are flushed to disk. Forced unmount expects that the list is empty after flushing, otherwise the kernel panics with "dangling vnode". As the write to disk can sleep, new vnodes may be inserted. If softdep is enabled, resolving the dependencies creates new dirty vnodes and inserts them to the list. To fix the panic, let insmntque() insert new vnodes at the tail of the list. Then vflush() will still catch them while traversing the list in forward direction. OK tedu@ millert@ visa@
* In vcount() a safe loop over vnodes was commited to 4.4BSD in 1994.bluhm2019-12-301-3/+3
| | | | | | This is not necessary as the loop is restarted after vgone(). Switch to SLIST_FOREACH without _SAFE. OK visa@
* Convert the speclisth hash buckets into SLIST macros. This makesbluhm2019-12-271-24/+12
| | | | | the vnode alias code more readable. OK visa@
* Fix white spaces.bluhm2019-12-261-13/+15
|
* Convert infinite sleeps to tsleep_nsec(9).mpi2019-12-081-3/+3
| | | | ok visa@, jca@
* When a thread tries to exclusively lock a vnode, the same thread mustanton2019-08-261-1/+4
| | | | | | | | | | | | ensure that any other thread currently trying to acquire the underlying vnode lock has observed that the same vnode is about to be exclusively locked. Such threads must then sleep until the exclusive lock has been released and then try to acquire the lock again. Otherwise, exclusive access to the vnode cannot be guaranteed. Thanks to naddy@ and visa@ for testing; ok visa@ Reported-by: syzbot+374d0e7e2400004957f7@syzkaller.appspotmail.com
* vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@cheloha2019-07-251-4/+4
|
* vwaitforio(9): tsleep(9) -> tsleep_nsec(9); ok visa@cheloha2019-07-191-5/+5
|
* Skip VFS barrier lock during normal operation to reduce overhead.visa2019-06-281-5/+12
| | | | | | | This removes a system-wide serialization point, which might help finding timing-related bugs. OK deraadt@ anton@
* Add a temporary workaround to make removal of giant files betterbeck2019-06-091-1/+18
| | | | | | | | | | | | mlarkin@ noticed we would freeze while removing enormous files because of the amount of work done to invalidate buffers on unlink. This adds a temporary workaround to ensure we give up the lock and yield while doing this. The longer term answer will be to move these buffers to another list and not do the work here. ok deraadt@
* Add a subsystem lock for vfs_lockf.c. This enables calling lf_advlock()visa2019-04-191-2/+2
| | | | | | and lf_purgelocks() without the kernel lock. OK anton@ mpi@
* Restrict which filesystems are available for swap. This rules outvisa2019-04-021-2/+2
| | | | | | obvious misconfigurations that cannot work. OK mpi@ tedu@
* if a write fails, we mark the buffer invalid and throw it away. this cantedu2019-02-171-1/+2
| | | | | | | lead to lost errors, where a later fsync will return success. to fix this, set a flag on the vnode indicating a past error has occurred, and return an error for future fsync calls. ok bluhm deraadt visa
* Introduce a dedicated entry point data structure for file locks. This new dataanton2019-01-211-1/+3
| | | | | | | | | | | | structure allows for better tracking of pending lock operations which is essential in order to prevent a use-after-free once the underlying vnode is gone. Inspired by the lockf implementation in FreeBSD. ok visa@ Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
* Rectify some issues with the noperm mount flag; the root vnode was notnatano2018-12-231-1/+10
| | | | | | | | protected properly and files without any x bit set were accidentaly considered executable when checked with access(2). Issues found and reported by deraadt, halex, reyk, tb ok deraadt
* free(9) sizes for netcred.mpi2018-12-071-4/+6
| | | | ok visa@
* Use atomic operations to update vfc_refcount. Change the field's typevisa2018-09-291-4/+5
| | | | | | to unsigned int. OK deraadt@
* Move the allocating and freeing of mount points intovisa2018-09-261-15/+39
| | | | | | dedicated functions. OK deraadt@ mpi@
* Harmonize spacing after ellipses in displayed messages.fcambus2018-09-221-4/+4
| | | | | | | | | | | | | We were using spacing after ellipses in an inconsistent way in the installer. Standardize on using "... " everywhere and take into account the cursor position while we are waiting for the task to complete: the cursor is now always positioned after the last dot, and the space is added when displaying completion confirmation. While there, also take cursor position into account in vfs_shutdown(), and remove the extra leading space before ticks in dhclient. OK deraadt@
* Simplify VFS initialization.visa2018-09-171-68/+1
| | | | | | | | | | Because loadable kernel modules are no longer, there is no need to register or unregister filesystem implementations at runtime. Remove vfs_register() and vfs_unregister(), and make vfsinit() call vfs_init routines directly. Replace the linked list of vfsconf structs with the vfsconflist[] array. OK mpi@ bluhm@
* Move vfsconf lookup code into dedicated functions.visa2018-09-161-12/+4
| | | | OK bluhm@
* Unveiling unveil(2).beck2018-07-131-1/+4
| | | | | | | | | | | | | This brings unveil into the tree, disabled by default - Currently this will return EPERM on all attempts to use it until we are fully certain it is ready for people to start using, but this now allows for others to do more tweaking and experimentation. Still needs to send the unveil's across forks and execs before fully enabling. Many thanks to robert@ and deraadt@ for extensive testing. ok deraadt@
* Use more list macros for v_dirtyblkhd.bluhm2018-07-021-3/+3
| | | | OK mpi@
* The function dounmount() traverses the mnt_list in forward directionbluhm2018-06-061-3/+4
| | | | | | | to call vfs_busy() for all nested mount points. vfs_stall() called vfs_busy() in reverser order for all mount points. Change the direction of the latter to resolve the lock order conflict. OK visa@
* Add VB_DUPOK to suppress witness(4) warning of concurrent mount locks.guenther2018-06-041-2/+7
| | | | | | | | | Use that in three places: - vfs_stall() - sys_mount() - dounmount()'s MNT_FORCE-does-recursive-unmounts case ok deraadt@ visa@
* Drop unnecessary `p' parameter from vget(9).visa2018-05-271-3/+3
| | | | OK mpi@
* When looping over mount points, the FOREACH SAVE macro is not save.bluhm2018-05-081-3/+7
| | | | | | | | | The loop variable mp is protected by vfs_busy() so that it cannot be unmounted. But the next mount point nmp could be unmounted while VFS_SYNC() sleeps. As the loop in vfs_stall() does not destroy the mount point, TAILQ_FOREACH_REVERSE without _SAVE is the correct macro to use. OK deraadt@ visa@
* Move the vfs stall "barrier" logic to a function. FREF() will soonmpi2018-05-081-1/+8
| | | | | | change and this has nothing to do with it. ok visa@, bluhm@
* Print the vp pointer in the vinvalbuf() panic strings.bluhm2018-05-071-4/+4
| | | | OK mpi@
* Remove proc from the parameters of vn_lock(). The parameter isvisa2018-05-021-3/+3
| | | | | | unnecessary because curproc always does the locking. OK mpi@
* Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is alwaysvisa2018-04-281-5/+5
| | | | | | | curproc that does the locking or unlocking, so the proc parameter is pointless and can be dropped. OK mpi@, deraadt@
* Remounting files systems read-only does not work reliably. Therebluhm2018-03-071-40/+27
| | | | | | | are corner cases where ffs may leak blocks. So better revert and unmount all file systems at reboot. The "init died" panic will be fixed in a different way. OK deraadt@
* Syncronize filesystems to disk when suspending. Each mountpoint's vnodesderaadt2018-02-101-8/+49
| | | | | | | | | | are pushed to disk. Dangling vnodes (unlinked files still in use) and vnodes undergoing change by long-running syscalls are identified -- and such filesystems are marked dirty on-disk while we are suspended (in case power is lost, a fsck will be required). Filesystems without dangling or busy vnodes are marked clean, resulting in faster boots following "battery died" circumstances. Tested by numerous developers, thanks for the feedback.
* Don't bother using DETACH_FORCE for the softraid luns at rebootderaadt2017-12-141-3/+3
| | | | | time; the aggressive mountpoint destruction seems to hit insane use-after-frees when we are already far on the way down.
* Give vflush_vnode() a hint about vnodes we don't need to account as "busy".deraadt2017-12-141-5/+9
| | | | | Change mountpoint to RDONLY a little later. Seems to improve the rw->ro transition a bit.
* Format the vnode lists of ddb show mount properly in columns.bluhm2017-12-111-14/+20
| | | | OK krw@
* In uvm Chuck decided backing store would not be allocated proactivelyderaadt2017-12-111-38/+52
| | | | | | | | | | | | | | | | | | | | | | for blocks re-fetchable from the filesystem. However at reboot time, filesystems are unmounted, and since processes lack backing store they are killed. Since the scheduler is still running, in some cases init is killed... which drops us to ddb [noted by bluhm]. Solution is to convert filesystems to read-only [proposed by kettenis]. The tale follows: sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT() with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a copyin() late... so store the sizes in vfsconflist[] and move the copyin() to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is sharp and rusty especially wrt softdep, so fix some bugs adn add ~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help, so tie them to &dead_vnops. ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but this issue is seperate and will be dealt with in time. couple hundred reboots by bluhm and myself, advice from guenther and others at the hut
* Use _kernel_lock_held() instead of __mp_lock_held(&kernel_lock).mpi2017-12-041-2/+2
| | | | ok visa@
* Give back some space to the ramdisk by compiling net/radix.c onlyflorian2017-07-311-2/+14
| | | | | | | if we compile pf, ipsec, pipex or nfsserver. Suggested by mpi some time ago. Tweak & OK bluhm deraadt assumes it's fair
* Tweak lock inits to make the system runnable with witness(4)visa2017-04-201-2/+2
| | | | on amd64 and i386.
* struct vfsconf is tightly packed, but let's M_ZERO it in case that everderaadt2017-04-041-2/+2
| | | | changes to avoid exposing userland memory.
* When traversing the mount list, the current mount point is lockedbluhm2017-01-151-4/+5
| | | | | | | | | with vfs_busy(). If the FOREACH_SAFE macro is used, the next pointer is not locked and could be freed by another process. Unless necessary, do not use _SAFE as it is unsafe. In vfs_unmountall() the current pointer is actullay freed. Add a comment that this race has to be fixed later. OK krw@
* Replace manual for() loops with FOREACH() macro.bluhm2017-01-101-7/+4
| | | | OK millert@
* Remove the unused olddp parameter from function dounmount().bluhm2017-01-101-2/+2
| | | | OK mpi@ millert@