| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
timespecadd(3) is fast. There is no need to call getnanouptime(9)
repeatedly when searching for the next expiration point. Given that
it_interval is at least 1/hz, we expect to run through the loop maybe
hz times at most. Even at HZ=10000 that's pretty brief.
While we're here, pull *all* of the other logic out of the loop.
The only thing we need to do in the loop is timespecadd(3).
|
|
|
|
|
|
|
|
|
|
| |
The struct keeps track of the end point of an event queue scan by
persisting the end marker. This will be needed when kqueue_scan() is
called repeatedly to complete a scan in a piecewise fashion.
Extracted from a previous diff from visa@.
ok visa@, anton@
|
|
|
|
|
|
|
|
| |
- Consolidate variable declarations.
- Remove superfluous parentheses from return statements.
- Prefer sizeof(variable) to sizeof(type) for copyin(9)/copyout(9).
- Remove some intermediate pointers from sys_setitimer(). Using SCARG()
directly here makes it more obvious to the reader what you're copying.
|
|
|
|
|
|
|
| |
Now that the critical sections are merged we should call
getnanouptime(9) once. This makes an ITIMER_REAL timer swap atomic
with respect to the clock: the time remaining on the old timer is
computed with the same timestamp used to schedule the new timer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge the common code from sys_getitimer() and sys_setitimer() into a
new kernel subroutine, setitimer(). setitimer() performs all of the
error-free work for both system calls within a single critical
section. We need a single critical section to make the setitimer(2)
timer swap operation atomic relative to realitexpire() and hardclock(9).
The downside of the new atomicity is that the behavior of setitimer(2)
must change. With a single critical section we can no longer copyout(9)
the old timer before installing the new timer. So If SCARG(uap, oitv)
points to invalid memory, setitimer(2) now fail with EFAULT but the
new timer will be left running. You can see this in action with code
like the following:
struct itv, olditv;
itv.it_value.tv_sec = 1;
itv.it_value.tv_usec = 0;
itv.it_interval = itv.it_value;
/* This should EFAULT. 0x1 is probably an invalid address. */
if (setitimer(ITIMER_REAL, &itv, (void *)0x1) == -1)
warn("setitimer");
/* The timer will be running anyway. */
getitimer(ITIMER_REAL, &olditv);
printf("time left: %lld.%06ld\n",
olditv.it_value.tv_sec, olditv.it_value.tv_usec);
There is no easy way to work around this. Both FreeBSD's and Linux's
setitimer(2) implementations have a single critical section and they
too fail with EFAULT in this case and leave the new timer running.
I imagine their developers decided that fixing this error case was
a waste of effort. Without permitting copyout(9) from within a mutex
I'm not sure it is even possible to avoid it on OpenBSD without
sacrificing atomicity during a setitimer(2) timer swap.
Given the rarity of this error case I would rather have an atomic swap.
Behavior change discussed with deraadt@.
|
|
|
|
|
|
|
|
|
|
|
|
| |
One exception of this rule is VOP_CLOSE() where NULL is used instead
of curproc when the garbace collector of unix sockets, that runs in
a kernel thread, drops the last reference of a file.
This will allows for future simplifications of the VFS interfaces.
Previous version ok visa@, anton@.
ok kn@
|
|
|
|
| |
ok beck@
|
|
|
|
|
| |
if they are out of range, making it easier to isolate reason for EINVAL
ok cheloha
|
|
|
|
|
|
|
| |
so_state and splice checks were done without the proper lock which is
incorrect. This is similar to sobind(), soconnect() which also require
the callee to hold the socket lock.
Found by, with and OK mvs@, OK mpi@
|
|
|
|
|
|
|
|
|
|
| |
enough that this pool uses the single page allocator for which PR_WAITOK
is a no-op. However it presence suggests that pool_put(9) may sleep.
The single page allocator will never actually do that.
This makes it obvious that refreshcreds() will not sleep.
ok deraadt@, visa@
|
|
|
|
|
|
|
|
|
|
|
|
| |
The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.
deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.
ok millert@ deraadt@
|
|
|
|
|
|
|
|
| |
to_process is assigned during timeout_add(9) within timeout_mutex. In
timeout_run() we need to read to_process before leaving timeout_mutex
to ensure that the process pointer given to kcov_remote_enter(9) is
the same as the one we set from timeout_add(9) when the candidate
timeout was originally scheduled to run.
|
|
|
|
| |
ok claudio@
|
|
|
|
| |
Fixes previous. Problem spotted by kettenis@
|
|
|
|
|
|
|
|
| |
in pledged programs that is unfortable. My snark levels are a bit drained,
but I must say I'm always dissapointed when programs operating on virtual
resources enquire about total physical resource availability, the only
reason to ask is so they can act unfair relative to others in the shared
environment. SIGH.
|
|
|
|
|
|
|
|
|
|
|
| |
vmstat(8) uses kvm_read(3) to extract the naptime from the kernel.
Problem is, I deleted `naptime' from the global namespace when I moved
it into the timehands. This patch restores it. It gets updated from
tc_windup(). Only userspace should use it, and only when the kernel
is dead.
We need to tweak a variable in tc_setclock() to avoid shadowing the
(once again) global naptime.
|
|
|
|
|
|
|
| |
while here, swap two lines in bufcache_release() to put a KASSERT() first
following the pattern in bufcache_take()
ok beck@ mpi@
|
|
|
|
| |
struct sigacts since that is the only thing that is modified by siginit.
|
|
|
|
|
| |
little step towards moving signal delivery outside of KERNEL_LOCK.
OK mpi@
|
|
|
|
| |
OK mpi@
|
|
|
|
| |
Noticed by, and based on a diff from Mike Small <smallm@sdf.org>.
|
|
|
|
| |
ok claudio@, pirofti@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the scope of SCHED_LOCK() to better synchronize
single_thread_set(), single_thread_clear() and single_thread_check().
This prevents threads from suspending before single_thread_set() has
finished. If a thread suspended early, ps_singlecount might get
decremented too much, which in turn could make single_thread_wait()
get stuck.
The race could be triggered for example by trying to stop
a multithreaded process with a debugger. When triggered, the race
prevents the debugger from finishing a wait4(2) call on the debuggee.
This kind of gdb hang was reported by Julian Smith on misc@.
Unfortunately, single-thread mode switching still has issues and hangs
are still possible.
OK mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008.
Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c
the only visible difference between used and stub ctldebug structs in the
debugvars[] array is their extern keyword, indicating that it is defined
elsewhere.
sys/sysctl.h declares all debugN members as extern upfront, but these
declarations are not needed.
Remove the unused debug sysctl, rename the only remaining one to something
meaningful and remove forward declarations from /sys/sysctl.h; this way,
adding new debug sysctls is a matter of adding extern and coming up with a
name, which is nicer to read on its own and better to grep for.
OK mpi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding "debug.my-knob" sysctls is really helpful to select different
code paths and/or log on demand during runtime without recompile,
but as this code is under DEBUG, lots of other noise comes with it
which is often undesired, at least when looking at specific subsystems
only.
Adding globals to the kernel and breaking into DDB to change them helps,
but that does not work over SSH, hence the need for debug sysctls.
Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of
DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general
option for all of sysctl(2).
OK gnezdo
|
|
|
|
| |
ok kettenis@, visa@
|
|
|
|
|
|
| |
Thanks kettenis@ for pointing out.
ok kettenis@
|
|
|
|
|
|
|
|
|
| |
Take into account the circular nature of the message buffer when
computing the number of available bytes. Move the computation into
a separate function and use it with the kevent(2) and ioctl(2)
interfaces.
OK mpi@
|
|
|
|
| |
OK mvs@
|
|
|
|
|
|
| |
Design by deraadt@
ok deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
setitimer(2) works with timespecs in its critical section. It will be
easier to merge the two critical sections if getitimer(2) also works
with timespecs.
In particular, we currently read the uptime clock *twice* during a
setitimer(2) swap: we call getmicrouptime(9) in sys_getitimer() and
then call getnanouptime(9) in sys_setitimer(). This means that
swapping one timer in for another is not atomic with respect to the
uptime clock. It also means the two operations are working with
different time structures and resolutions, which is potentially
confusing.
If both critical sections work with timespecs we can combine the two
getnanouptime(9) calls into a single call at the start of the combined
critical section in a future patch, making the swap atomic with
respect to the clock.
So, in preparation, move the TIMESPEC_TO_TIMEVAL conversions in
getitimer(2) after the ITIMER_REAL conversion from absolute to
relative time, just before copyout(9). The ITIMER_REAL conversion
must then be done with timespec macros and getnanouptime(9), just like
in setitimer(2).
|
|
|
|
|
|
|
|
|
| |
If we're replacing the current ITIMER_REAL timer with a new one we
don't need to call timeout_del(9) before calling timeout_add(9).
timeout_add(9) does the work of timeout_del(9) implicitly if the
timeout in question is already pending.
This saves us an extra trip through the timeout_mutex.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reuse the kev[] array of sys_kevent() in kqueue_scan() to lower
stack usage.
The code has reset kevp, but not nkev, whenever the retry branch is
taken. However, the resetting is unnecessary because retry should be
taken only if no events have been collected. Make this clearer by
adding KASSERTs.
OK mpi@
|
|
|
|
|
|
|
|
|
|
|
| |
Rearrange the critical section in setitimer(2) to match that of
getitimer(2). This will make it easier to merge the two critical
sections in a subsequent diff.
In particular, we want to write the new timer value in *one* place in
the code, regardless of which timer we're setting.
ok millert@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For what are probably historical reasons, setitimer(2) does not
validate its input (itv) immediately after copyin(9). Instead, it
waits until after (possibly) performing a getitimer(2) to copy out the
state of the timer.
Consolidating copyin(9), input validation, and input conversion into a
single block before the getitimer(2) operation makes setitimer(2)
itself easier to read. It will also simplify merging the critical
sections of setitimer(2) and getitimer(2) in a subsequent patch.
This changes setitimer(2)'s behavior in the EINVAL case. Currently,
if your input (itv) is invalid, we return EINVAL *after* modifying the
output (olditv). With the patch we will now return EINVAL *before*
modifying the output. However, any code dependent upon this behavior
is broken: the contents of olditv are undefined in all setitimer(2)
error cases.
ok millert@
|
|
|
|
|
|
|
|
| |
The ITIMER_REAL per-process interval timer is protected by the kernel
lock. The ITIMER_REAL timeout (ps_realit_to), setitimer(2), and
getitimer(2) all run under the kernel lock. Entering itimer_mtx
during getitimer(2) when reading the ITIMER_REAL ps_timer state is
superfluous and misleading.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ITIMER_VIRTUAL and ITIMER_PROF per-process interval timers are
updated from hardclock(9). If a timer for the parent process is
enabled the hardclock(9) thread calls itimerdecr() to update and
reload it as needed.
However, in itimerdecr(), after entering itimer_mtx, the thread needs
to double-check that the timer in question is still enabled. While
the hardclock(9) thread is entering itimer_mtx a thread in
setitimer(2) can take the mutex and disable the timer.
If the timer is disabled, itimerdecr() should return 1 to indicate
that the timer has not expired and that no action needs to be taken.
ok kettenis@
|
|
|
|
|
|
|
| |
The current input validation for overflow is more complex than
it needs to be. We can flatten the conditional hierarchy into
a string of checks just one level deep. The result is easier
to read.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The socket splice idle timeout is a timeval, so we need to check that
tv_usec is both non-negative and less than one million. Otherwise it
isn't in canonical form.
We can check for this with timerisvalid(3).
benno@ says this shouldn't break anything in base.
ok benno@, bluhm@
|
|
|
|
|
|
|
|
|
|
| |
These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.
Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.
ok visa@, mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit v1.77 introduced remote kcov support for timeouts. We need to
tweak a few things to make our support more correct:
- Set to_process for barrier timeouts to the calling thread's parent
process. Currently it is uninitialized, so during timeout_run() we
are passing stack garbage to kcov_remote_enter(9).
- Set to_process to NULL during timeout_set_flags(9). If in the
future we forget to properly initialize to_process before reaching
timeout_run(), we'll pass NULL to kcov_remote_enter(9).
anton@ says this is harmless. I assume it is also preferable to
passing stack garbage.
- Save a copy of to_process on the stack in timeout_run() before
calling to_func to ensure that we pass the same process pointer
to kcov_remote_leave(9) upon return. The timeout may be freely
modified from to_func, so to_process may have changed when we
return.
Tested by anton@.
ok anton@
|
|
|
|
|
|
|
| |
Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.
OK kn@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.
Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.
The kernel API is documented in a new kcov_remote_register(9) manual.
Remote coverage is also supported by kcov on NetBSD and Linux.
ok mpi@
|
|
|
|
|
|
|
| |
Reminder that unveil does not kill from brynet and gsoares.
Wording tweaks from jmc; feedback from deraadt.
ok jmc@, millert@, solene@, "fine with me" deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.
Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.
Suggested by procter@ several months ago.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New timeouts are appended to the timeout_todo circq via
timeout_add(9). If this is done during softclock(), i.e. a timeout
function calls timeout_add(9) to reschedule itself, the newly added
timeout will be processed later during the same softclock().
This works, but it is not optimal:
1. If a timeout reschedules itself to run in zero ticks, i.e.
timeout_add(..., 0);
it will be run again during the current softclock(). This can
cause an infinite loop, softlocking the primary CPU.
2. Many timeouts are cancelled before they execute. Processing a
timeout during the current softclock() is "eager": if we waited, the
timeout might be cancelled and we could spare ourselves the effort.
If the timeout is not cancelled before the next softclock() we can
bucket it as we normally would with no change in behavior.
3. Many timeouts are scheduled to run after 1 tick, i.e.
timeout_add(..., 1);
Processing these timeouts during the same softclock means bucketing
them for no reason: they will be dumped into the timeout_todo queue
during the next hardclock(9) anyway. Processing them is pointless.
We can avoid these issues by using an intermediate queue, timeout_new.
New timeouts are put onto this queue during timeout_add(9). The queue
is concatenated to the end of the timeout_todo queue at the start of
each softclock() and then softclock() proceeds. This means the amount
of work done during a given softclock() is finite and we avoid doing
extra work with eager processing.
Any timeouts that *depend* upon being rerun during the current
softclock() will need to be updated, though I doubt any such timeouts
exist.
Discussed with visa@ last year.
No complaints after a month.
|
|
|
|
|
|
|
|
| |
console. When the kernel panics, print console output is enabled such
that we see those messages. Use this option for the powerpc64 boot
kernel.
ok visa@, deraadt@
|
|
|
|
|
|
|
| |
All other timeout_add_*() functions do so before calling timeout_add(9) as
described in the manual, this one did not.
OK cheloha
|
|
|
|
|
|
| |
newline doesn't occur to rewind to column 0. If OPOST is inactive,
simply return 0.
ok millert
|