summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_time.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* nanosleep(2): shorten wmesg from "nanosleep" to "nanoslp"cheloha2020-12-231-2/+2
| | | | | | | We only see 8 characters of wmesg in e.g. top(1), so shorten the string to fit. Indirectly prompted by kettenis@.
* setitimer(2): ITIMER_REAL: protect state with per-process mutex ps_mtxcheloha2020-11-101-9/+32
| | | | | | | | | | | | | | | | | | | | To unlock getitimer(2) and setitimer(2) we need to protect the per-process ITIMER_REAL state with something other than the kernel lock. As the ITIMER_REAL timeout callback realitexpire() runs at IPL_SOFTCLOCK the per-process mutex ps_mtx is appropriate. In setitimer() we need to use ps_mtx instead of the global itimer_mtx if the given timer is ITIMER_REAL. Easy. The ITIMER_REAL timeout callback routine realitexpire() is trickier. When we enter ps_mtx during the callback we need to check if the timer was cancelled or rescheduled. A thread from the process can call setitimer(2) at the exact moment the callback is about to run from timeout_run() (see kern_timeout.c). Update the locking annotation in sys/proc.h accordingly. ok anton@
* setitimer(2): ITIMER_REAL: use kclock timeoutscheloha2020-10-251-14/+6
| | | | | | | | | | | | | | | Reimplement the ITIMER_REAL interval timer with a kclock timeout. Couple things of note: - We need to use the high-res nanouptime(9) call, not the low-res getnanouptime(9). - The code is simpler now that we aren't working with ticks. Misc. thoughts: - Still unsure if "kclock" is the right name for these things. - MP-safely cancelling a periodic timeout is very difficult.
* _exit(2), execve(2): tweak per-process interval timer cancellationcheloha2020-10-151-3/+6
| | | | | | | | If we fold the for-loop iterating over each interval timer into the helper function the result is slightly tidier than what we have now. Rename the helper function "cancel_all_itimers". Based on input from millert@ and kettenis@.
* _exit(2), execve(2): cancel per-process interval timers safelycheloha2020-10-151-1/+11
| | | | | | | | | | | | | | | | | During _exit(2) and sometimes during execve(2) we need to cancel any active per-process interval timers. We don't currently do this in an MP-safe way. Both syscalls ignore the locking assumptions documented in proc.h. The easiest way to make them MP-safe is to use setitimer(), just like the getitimer(2) and setitimer(2) syscalls do. To make things a bit cleaner I have added a helper function, cancelitimer(), so the callers don't need to fuss with an itimerval struct. While we're here we can remove the splclock/splx dance from execve(2). It is no longer necessary. ok deraadt@
* setitimer(2): zero itv.it_interval if itv.it_value is zerocheloha2020-10-131-1/+3
| | | | | | | | | | | | | | | | | | | | | If itv.it_value is zero we cancel the timer. When we cancel the timer we don't care about itv.it_interval because the timer is not running: we don't use it, we don't look at it, etc. To be on the paranoid side, I think we should zero itv.it_interval when itv.it_value is zero. No need to write arbitrary values into the process struct if we aren't required to. The standard is ambiguous about what should happen in this case, i.e. the value of olditv after the following code executes is unspecified: struct itimerval newitv, olditv; newitv.it_value.tv_sec = newitv.it_value.tv_usec = 0; newitv.it_interval.tv_sec = newitv.it_interval.tv_usec = 1; setitimer(ITIMER_REAL, &newitv, NULL); getitimer(ITIMER_REAL, &olditv); This change should not break any real code.
* setitimer(2): realitexpire(): call getnanouptime(9) oncecheloha2020-10-131-16/+15
| | | | | | | | | | timespecadd(3) is fast. There is no need to call getnanouptime(9) repeatedly when searching for the next expiration point. Given that it_interval is at least 1/hz, we expect to run through the loop maybe hz times at most. Even at HZ=10000 that's pretty brief. While we're here, pull *all* of the other logic out of the loop. The only thing we need to do in the loop is timespecadd(3).
* sys_getitimer(), sys_setitimer(): style(9), misc. cleanupcheloha2020-10-071-21/+15
| | | | | | | | - Consolidate variable declarations. - Remove superfluous parentheses from return statements. - Prefer sizeof(variable) to sizeof(type) for copyin(9)/copyout(9). - Remove some intermediate pointers from sys_setitimer(). Using SCARG() directly here makes it more obvious to the reader what you're copying.
* getitimer(2), setitimer(2): ITIMER_REAL: call getnanouptime(9) oncecheloha2020-10-071-6/+5
| | | | | | | Now that the critical sections are merged we should call getnanouptime(9) once. This makes an ITIMER_REAL timer swap atomic with respect to the clock: the time remaining on the old timer is computed with the same timestamp used to schedule the new timer.
* getitimer(2), setitimer(2): merge critical sectionscheloha2020-10-071-59/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge the common code from sys_getitimer() and sys_setitimer() into a new kernel subroutine, setitimer(). setitimer() performs all of the error-free work for both system calls within a single critical section. We need a single critical section to make the setitimer(2) timer swap operation atomic relative to realitexpire() and hardclock(9). The downside of the new atomicity is that the behavior of setitimer(2) must change. With a single critical section we can no longer copyout(9) the old timer before installing the new timer. So If SCARG(uap, oitv) points to invalid memory, setitimer(2) now fail with EFAULT but the new timer will be left running. You can see this in action with code like the following: struct itv, olditv; itv.it_value.tv_sec = 1; itv.it_value.tv_usec = 0; itv.it_interval = itv.it_value; /* This should EFAULT. 0x1 is probably an invalid address. */ if (setitimer(ITIMER_REAL, &itv, (void *)0x1) == -1) warn("setitimer"); /* The timer will be running anyway. */ getitimer(ITIMER_REAL, &olditv); printf("time left: %lld.%06ld\n", olditv.it_value.tv_sec, olditv.it_value.tv_usec); There is no easy way to work around this. Both FreeBSD's and Linux's setitimer(2) implementations have a single critical section and they too fail with EFAULT in this case and leave the new timer running. I imagine their developers decided that fixing this error case was a waste of effort. Without permitting copyout(9) from within a mutex I'm not sure it is even possible to avoid it on OpenBSD without sacrificing atomicity during a setitimer(2) timer swap. Given the rarity of this error case I would rather have an atomic swap. Behavior change discussed with deraadt@.
* expose timeval/timespec from system calls into ktrace, before determiningderaadt2020-10-021-1/+9
| | | | | if they are out of range, making it easier to isolate reason for EINVAL ok cheloha
* getitimer(2): delay TIMESPEC_TO_TIMEVAL(9) conversion until copyout(9)cheloha2020-08-121-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | setitimer(2) works with timespecs in its critical section. It will be easier to merge the two critical sections if getitimer(2) also works with timespecs. In particular, we currently read the uptime clock *twice* during a setitimer(2) swap: we call getmicrouptime(9) in sys_getitimer() and then call getnanouptime(9) in sys_setitimer(). This means that swapping one timer in for another is not atomic with respect to the uptime clock. It also means the two operations are working with different time structures and resolutions, which is potentially confusing. If both critical sections work with timespecs we can combine the two getnanouptime(9) calls into a single call at the start of the combined critical section in a future patch, making the swap atomic with respect to the clock. So, in preparation, move the TIMESPEC_TO_TIMEVAL conversions in getitimer(2) after the ITIMER_REAL conversion from absolute to relative time, just before copyout(9). The ITIMER_REAL conversion must then be done with timespec macros and getnanouptime(9), just like in setitimer(2).
* setitimer(2): ITIMER_REAL: don't call timeout_del(9) before timeout_add(9)cheloha2020-08-121-3/+3
| | | | | | | | | If we're replacing the current ITIMER_REAL timer with a new one we don't need to call timeout_del(9) before calling timeout_add(9). timeout_add(9) does the work of timeout_del(9) implicitly if the timeout in question is already pending. This saves us an extra trip through the timeout_mutex.
* setitimer(2): write new timer value in one placecheloha2020-08-111-6/+9
| | | | | | | | | | | Rearrange the critical section in setitimer(2) to match that of getitimer(2). This will make it easier to merge the two critical sections in a subsequent diff. In particular, we want to write the new timer value in *one* place in the code, regardless of which timer we're setting. ok millert@
* setitimer(2): consolidate copyin(9), input validation, input conversioncheloha2020-08-111-8/+10
| | | | | | | | | | | | | | | | | | | | | For what are probably historical reasons, setitimer(2) does not validate its input (itv) immediately after copyin(9). Instead, it waits until after (possibly) performing a getitimer(2) to copy out the state of the timer. Consolidating copyin(9), input validation, and input conversion into a single block before the getitimer(2) operation makes setitimer(2) itself easier to read. It will also simplify merging the critical sections of setitimer(2) and getitimer(2) in a subsequent patch. This changes setitimer(2)'s behavior in the EINVAL case. Currently, if your input (itv) is invalid, we return EINVAL *after* modifying the output (olditv). With the patch we will now return EINVAL *before* modifying the output. However, any code dependent upon this behavior is broken: the contents of olditv are undefined in all setitimer(2) error cases. ok millert@
* getitimer(2): don't enter itimer_mtx to read ITIMER_REAL itimerspeccheloha2020-08-111-3/+6
| | | | | | | | The ITIMER_REAL per-process interval timer is protected by the kernel lock. The ITIMER_REAL timeout (ps_realit_to), setitimer(2), and getitimer(2) all run under the kernel lock. Entering itimer_mtx during getitimer(2) when reading the ITIMER_REAL ps_timer state is superfluous and misleading.
* hardclock(9): fix race with setitimer(2) for ITIMER_VIRTUAL, ITIMER_PROFcheloha2020-08-091-1/+15
| | | | | | | | | | | | | | | | | The ITIMER_VIRTUAL and ITIMER_PROF per-process interval timers are updated from hardclock(9). If a timer for the parent process is enabled the hardclock(9) thread calls itimerdecr() to update and reload it as needed. However, in itimerdecr(), after entering itimer_mtx, the thread needs to double-check that the timer in question is still enabled. While the hardclock(9) thread is entering itimer_mtx a thread in setitimer(2) can take the mutex and disable the timer. If the timer is disabled, itimerdecr() should return 1 to indicate that the timer has not expired and that no action needs to be taken. ok kettenis@
* adjtime(2): simplify input validation for new adjustmentcheloha2020-08-081-13/+9
| | | | | | | The current input validation for overflow is more complex than it needs to be. We can flatten the conditional hierarchy into a string of checks just one level deep. The result is easier to read.
* settimeofday(2): securelevel 2: prevent root from freezing the UTC clockcheloha2020-07-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | At securelevel 2 we prevent root from rewinding the kernel UTC clock. The rationale given in the comment is that this prevents a compromised root from setting arbitrary timestamps on files. I can't really speak to the efficacy of this mitigation, or to the efficacy of the securelevel concept in general, but the implementation of this mitigation is wrong. We need to check: timespeccmp(ts, &now, <=) instead of timespeccmp(ts, &now, <) like we do now. Time is a continuous value that is always advancing. We must prevent root from setting the kernel UTC clock to its current value in addition to prior values. Setting the UTC clock to its current value amounts to rewinding it even if we cannot actually measure the difference with a timespec. With this change, at securelevel 2, root can no longer completely freeze the UTC clock.
* adjfreq(2): limit adjustment to [-500000, +500000] ppmcheloha2020-07-091-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we recompute the scaling factor during tc_windup() there is an opportunity for arithmetic overflow if the active timecounter's adjfreq(2) adjustment is too large. If we limit the adjustment to [-500000, +500000] ppm the statement in question cannot overflow. In particular, we are concerned with the following bit of code: scale = (u_int64_t)1 << 63; scale += \ ((th->th_adjustment + th->th_counter->tc_freq_adj) / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; where scale is an int64_t. Overflow when we do: scale += (...) / 1024 * 2199; as th->th_counter->tc_freq_adj is currently unbounded. th->th_adjustment is limited to [-5000ppm, 5000ppm]. To see that overflow is prevented with the new bounds, consider the new edge case where th->th_counter->tc_freq_adj is 500000ppm and th->th_adjustment is 5000ppm. Both are of type int64_t. We have: int64_t th_adjustment = (5000 * 1000) << 32; /* 21474836480000000 */ int64_t tc_freq_adj = 500000000LL << 32; /* 2147483648000000000 */ scale = (u_int64_t)1 << 63; /* 9223372036854775808 */ scale += (th_adjustment + tc_freq_adj) / 1024 * 2199; /* scale += 2168958484480000000 / 1024 * 2199; */ /* scale += 4657753620480000000; */ 9223372036854775808 + 4657753620480000000 = 13881125657334775808, which less than 18446744073709551616, so we don't have overflow. On the opposite end, if th->th_counter->tc_freq_adj is -500000ppm and th->th_adjustment is -5000ppm we would have -4657753620480000000. 9223372036854775808 - 4657753620480000000 = 4565618416374775808. Again, no overflow. 500000ppm and -500000ppm are extreme adjustments. otto@ says ntpd(8) would never arrive at them naturally, so we are not at risk of breaking a working setup by imposing these restrictions. Documentation input from kettenis@. No complaints from otto@.
* inittodr(9): introduce dedicated flag to enable writes from resettodr(9)cheloha2020-06-221-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | We don't want resettodr(9) to write the RTC until inittodr(9) has actually run. Until inittodr(9) calls tc_setclock() the system UTC clock will contain a meaningless value and there's no sense in overwriting a good value with a value we know is nonsense. This is not an uncommon problem if you're debugging a problem in early boot, e.g. a panic that occurs prior to inittodr(9). Currently we use the following logic in resettodr(9) to inhibit writes: if (time_second == 1) return; ... this is too magical. A better way to accomplish the same thing is to introduce a dedicated flag set from inittodr(9). Hence, "inittodr_done". Suggested by visa@. ok kettenis@
* clock_gettime(2): use nanoruntime(9) to get value for CLOCK_UPTIMEcheloha2020-05-201-5/+2
|
* Add function for attaching RTC drivers, to reduce direct usevisa2020-05-171-1/+7
| | | | | | of todr_handle. OK kettenis@
* Make inittodr() and resettodr() MI.kettenis2020-05-161-1/+91
| | | | | ok deraadt@, mpi@, visa@ ok cheloha@ as well (would have preferred in new file for this code)
* nanosleep(2): tsleep(9) -> tsleep_nsec(9)cheloha2020-03-201-4/+5
| | | | | | | | | | While here, rename the wait channel so the tsleep_nsec(9) call will fit onto a single line. It isn't a global channel so the name is arbitrary anyway. With input from visa@. ok visa@
* adjfreq(2): fix atomic swapcheloha2019-11-071-4/+4
| | | | | | | | | I broke adjfreq(2)'s atomic swap in kern_time.c,v1.112. By using the "f" variable to store both the new and old frequency adjustments, the new adjustment gets clobbered by the old adjustment if the caller asked for a swap. ok visa@ mpi@
* clock_getres(2): actually return the resolution of the given clockcheloha2019-10-261-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently we return (1000000000 / hz) from clock_getres(2) as the resolution for every clock. This is often untrue. For CPUTIME clocks, if we have a separate statclock interrupt the resolution is (1000000000 / stathz). Otherwise it is as we currently claim: (1000000000 / hz). For the REALTIME/MONOTONIC/UPTIME/BOOTTIME clocks the resolution is that of the active timecounter. During tc_init() we can compute the precision of a timecounter by examining its tc_counter_mask and store it for lookup later in a new member, tc_precision. The resolution of a clock backed by a timecounter "tc" is then tc.tc_precision * (2^64 / tc.tc_frequency) fractional seconds. While here we can clean up sys_clock_getres() a bit. Standards input from guenther@. Lots of input, feedback from kettenis@. ok kettenis@
* gettimeofday, settimeofday(2): limit timezone supportcheloha2019-09-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | For gettimeofday(2), always copy out an empty timezone struct. For settimeofday(2), still copyin(9) the struct but ignore the contents. In gettimeofday(2)'s case we have not changed the original BSD semantics: the kernel only tracks UTC time without an offset for DST, so a zeroed timezone struct is the correct thing to return to the caller. Future work could move these out into libc as stubs for clock_gettime and clock_settime(2). But, definitely a "later" thing, given that we are in beta. Update the manpage to de-emphasize the timezone parameters for these syscalls. Discussed with tedu@, deraadt@, millert@, kettenis@, yasuoka@, jca@, and guenther@. Tested by job@. Ports input from jca@ and sthen@. Manpage input from jca@. ok jca@ deraadt@
* R.I.P. itimerround(); ok mpi@cheloha2019-08-031-12/+1
|
* per-process itimers: itimerval -> itimerspeccheloha2019-08-021-34/+35
| | | | | | | | | | | | | Loongson runs at 128hz. 128 doesn't divide evenly into a million, but it does divide evenly into a billion. So if we do the per-process itimer bookkeeping with itimerspec structs we can have error-free virtual itimers on loongson just as we do on most other platforms. This change doesn't fix the virtual itimer error alpha, as 1024 does not divide evenly into a billion. But this doesn't make the situation any worse, either. ok deraadt@
* itimerdecr(): simplify logic with timer*(9) macros; ok millert@cheloha2019-07-251-32/+19
|
* R.I.P. timespecfix(); ok visa@ mpi@cheloha2019-07-021-15/+1
|
* Switch from bintime_add() et al. to bintimeadd(9).cheloha2019-06-031-3/+3
| | | | | | | | | | | | | | | Basically just make all the bintime routines look and behave more like the timeradd(3) macros. Switch to three-argument forms for structure math, introduce and use bintimecmp(9), and rename the structure conversion routines to resemble e.g. TIMEVAL_TO_TIMESPEC(3). Document all of this in a new bintimeadd.9 page. Code input from mpi@, manpage input from schwarze@. code ok mpi@, docs ok schwarze@, docs probably still ok jmc@
* Revert to using the SCHED_LOCK() to protect time accounting.mpi2019-06-011-11/+3
| | | | | | | | | It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
* Use a per-process mutex to protect time accounting instead of SCHED_LOCK().mpi2019-05-311-3/+11
| | | | | | | Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
* Fix uninitialized return code in adjfreq(2); CID 1480285stsp2019-05-211-2/+2
| | | | ok mlarkin, otto (who both had the same diff)
* Unlock adjfreq(2), adjtime(2), clock_settime(2), and settimeofday(2).cheloha2019-05-091-1/+3
| | | | | | | | | clock_settime(2)/settimeofday(2) still need KERNEL_LOCK for a moment when resetting the RTC, as that's done periodically from a task under KERNEL_LOCK. Not quite sure how to approach that one yet. ok visa@ mpi@, "good stuff" tedu@, "please wait until after [tree] unlock" deraadt@
* Tweak previous: include <sys/stdint.h> for INT64_MAX/INT64_MIN.cheloha2019-03-261-1/+2
|
* adjtime(2): set EINVAL if delta overflows 64 bits of microseconds.cheloha2019-03-261-3/+13
| | | | | | | | | | | | | | | | No other (known) BSD-derived adjtime(2) implementation checks for overflow when converting delta into its final denomination of fractional seconds. This is peculiar, as the call originates in 4.3BSD. However, glibc, uclibc, and (to an extent) musl /do/ check the input and set EINVAL if it exceeds a certain bound, so we'll just use the errno that they use to be consistent with extant practice. Prompted by the comment kettenis@ left when we switched to storing the adjustment in an int64_t like ~5 years ago (kern_time.c,v 1.87). Positive feedback from deraadt@, manpage bits ok jmc@, no code complaints from otto@ or tedu@.
* MP-safe timecounting: new rwlock: tc_lockcheloha2019-03-251-26/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | tc_lock allows adjfreq(2) and the kern.timecounter.hardware sysctl(2) to read/write the active timecounter pointer and the .tc_adj_freq member of the active timecounter safely. This eliminates any possibility of a torn read/write for the .tc_adj_freq member when we drop the KERNEL_LOCK from the timecounting layer. It also ensures the active timecounter does not change in the midst of an adjfreq(2) call. Because these are not high-traffic paths, we can get away with using tc_lock in write-mode to ensure combination read/write adjtime(2) calls are relatively atomic (a) to other writer adjtime(2) calls, and (b) to settimeofday(2)/clock_settime(2) calls, which cancel ongoing adjtime(2) adjustment. When the KERNEL_LOCK is dropped, an unprivileged user will be able to create some tc_lock contention via adjfreq(2); it is very unlikely to ever be a problem. If it ever is actually a problem a lockless read could be added to address it. While here, reorganize sys_adjfreq()/sys_adjtime() to minimize code under the lock. Also while here, make tc_adjfreq() void, as it cannot fail under any circumstance. Also also while here, annotate various globals/struct members with lock ordering details. With lots of input from mpi@ and visa@. ok visa@
* Move adjtimedelta from kern_time.c to kern_tc.c.cheloha2019-03-101-13/+8
| | | | | | | | | | | | | | | This will simplify upcoming MP-safety diffs for the timecounting layer. adjtimedelta is now accessed nowhere outside of kern_tc.c, so we can remove its extern declaration from kernel.h. Zeroing adjtimedelta within timecounter_mtx before we jump the real-time clock is also a bit safer than what we do now, as we are not racing a simultaneous tc_windup() call from hardclock(), which itself can modify adjtimedelta via ntp_update_second(). Discussed with visa@ and mpi@. ok visa@
* matthew noticed that some clocks use tfind() which is not mpsafe.tedu2019-01-311-10/+20
| | | | | add locking in clock_gettime where needed. ok cheloha matthew
* Sprinkle a pinch of timerisvalid/timespecisvalid over the rest of sys/kerncheloha2019-01-231-10/+8
|
* no need to KERNEL_LOCK before calling ktrstruct() anymore; ok mpi@ visa@cheloha2019-01-181-21/+6
|
* adjtime(2), settimeofday(2), clock_settime(2): validate inputcheloha2019-01-181-1/+8
| | | | | | | | | Add documentation for the new EINVAL cases for adjtime(2) and settimeofday(2). adjtime.2 docs ok schwarze@, settimeofday(2)/clock_settime(2) stuff ok tedu@, "stop waiting" deraadt@
* settime: Don't cancel ongoing adjtime(2) until after full permission checkscheloha2019-01-101-7/+6
| | | | ok jca@ visa@ guenther@ deraadt@
* nanosleep: loop tsleep(9) to ensure coverage of the full timeout range.cheloha2018-12-311-10/+13
| | | | | | | | | | | | tsleep(9)'s maximum timeout shrinks as HZ grows, so this ensures we do not return early from longer timeouts on alpha or on custom kernels. POSIX says you cannot return early unless a signal is delivered, so this makes us more compliant with the standard. While here, remove the 100 million second upper bound. It is an artifact from itimerfix() and it serves no discernible purpose. ok tedu@ visa@
* sys_nanosleep: switch to descriptive, idiomatic variable names; ok tedu@cheloha2018-12-291-20/+19
|
* Constipate a bunch of time functionsguenther2018-05-281-2/+2
| | | | ok tb@ kettenis@
* nanosleep: ensure tv_nsec input is on [0, 1000000000)cheloha2018-05-221-5/+3
| | | | | | | | | | | Instead of converting timespec -> timeval and truncating the input, check with timespecfix and use tstohz(9) for the tsleep. All other contemporary systems check this correctly. Also add a regression test for this case. ok tb@