| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
Do not include <sys/kthread.h> where it is not needed and stop including
<sys/proc.h> in it.
ok visa@, anton@
|
| |
|
|
|
|
|
| |
This prevent the soft-interrupt to run in-between of timeouts executed
in a thread context.
ok kettenis@, visa@
|
| |
|
|
|
|
|
| |
The process-context timeout(s) in question might be cancelled before we
leave the loop, leading to a spurious wakeup(9).
ok mpi@
|
| |
|
|
|
|
|
|
|
|
| |
These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.
With input from bluhm@, guenther@, and visa@.
"makes sense to me" bluhm@, ok visa@
|
| |
|
|
|
|
|
|
| |
This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.
"makes sense to me" bluhm@, ok visa@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:
1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.
2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.
rprocter@ raises some interesting questions. Some answers:
- This interface is not stable and name changes are possible at a
later date.
- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.
- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.
ok visa@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backed out during revert of "timeout(9): switch to tickless backend".
Original commit message:
- CIRCQ_APPEND -> CIRCQ_CONCAT
- Flip argument order of CIRCQ_INSERT to match e.g. TAILQ_INSERT_TAIL
- CIRCQ_INSERT -> CIRCQ_INSERT_TAIL
- Add CIRCQ_FOREACH, use it in ddb(4) when printing buckets
- While here, use tabs for indentation like we do with other macros
ok visa@ mpi@
|
| |
|
|
|
|
|
|
|
| |
It appears to have caused major performance regressions all over the
network stack.
Reported by bluhm@
ok deraadt@
|
| |
|
|
|
|
|
|
|
|
| |
- CIRCQ_APPEND -> CIRCQ_CONCAT
- Flip argument order of CIRCQ_INSERT to match e.g. TAILQ_INSERT_TAIL
- CIRCQ_INSERT -> CIRCQ_INSERT_TAIL
- Add CIRCQ_FOREACH, use it in ddb(4) when printing buckets
- While here, use tabs for indentation like we do with other macros
ok visa@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.
To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.
Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.
To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.
A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.
Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.
Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.
Positive feedback from deraadt@, ok visa@
|
| |
|
|
| |
ok deraadt@
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- Move mutex to top of file, annotate locking for module
- Group module-local prototypes below globals but above function defs
- __inline -> inline
- No static without inline
- Drop extra parentheses around return values
Compiler input from visa@.
ok visa@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).
We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().
ok visa@
|
| |
|
|
|
|
|
|
|
|
| |
Move the check for lateness earlier in the softclock() loop so every
timeout is checked before being run.
While here, remove an unsafe DEBUG printf(9). You can't safely printf(9)
within a mutex, and the print itself isn't even particularly useful.
ok bluhm@
|
| |
|
|
|
|
| |
While here in timeout_add(9), use KASSERT for brevity.
CLR/ISSET/SET bits ok krw@
|
| |
|
|
|
|
|
|
|
|
|
| |
- display timeouts in the thread work queue, if any
- identify timeouts in the thread/softint work queue as such
- if not in work queue, print <bucket>/<level>; easier to right-align
- print arg pointer by hand to ensure consistent length for all pointers
on both 32 and 64-bit platforms
- generally make sure columns are correctly aligned and spaced
ok mpi@ visa@
|
| |
|
|
|
|
|
|
|
| |
With these totals one can track the throughput of the timeout(9) layer
from userspace.
With input from mpi@.
ok mpi@
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Reduce code clutter by removing the file name and line number output
from witness(4). Typically it is easy enough to locate offending locks
using the stack traces that are shown in lock order conflict reports.
Tricky cases can be tracked using sysctl kern.witness.locktrace=1 .
This patch additionally removes the witness(4) wrapper for mutexes.
Now each mutex implementation has to invoke the WITNESS_*() macros
in order to utilize the checker.
Discussed with and OK dlg@, OK mpi@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.
This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.
In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).
OK dlg@ mpi@
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.
previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.
timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.
the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.
fixes and ok visa@, who thinks this will be useful for his work
too.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as noted by haesbaert, this is necessary to avoid deadlocks because
the scheduler can call back into the timeout subsystem while its
holding its own locks.
this happened in two places. firstly, in softclock() it would take
timeout_mutex to find pending work. if that pending work needs a
process context, it would queue the work for the thread and call
wakeup, which enters the scheduler locks. if another cpu is trying
to tsleep (or msleep) with a timeout specified, the sleep code would
be holding the sched lock and call timeout_add, which takes
timeout_mutex.
this is solved by deferring the wakeup to after timeout_mutex is
left. this also has the benefit of mitigating the number of wakeups
done per softclock tick.
secondly, the timeout worker thread takes timeout_mutex and calls
msleep when there's no work to do (ie, the queue is empty). msleep
will take the sched locks. again, if another cpu does a tsleep
with a timeout, you get a deadlock.
to solve this im using sleep_setup and sleep_finish to sleep on an
empty queue, which is safe to do outside the lock as it is comparisons
of the queue head pointers, not derefs of the contents of the queue.
as long as the sleeps and wakeups are ordered correctly with the
enqueue and dequeue operations under the mutex, this all works.
you can think of the queue as a single descriptor ring, and the
wakeup as an interrupt.
the second deadlock was identified by guenther@
ok tedu@ mpi@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
callbacks needing a process context.
The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.
Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.
dlg@ agrees with this as a temporary solution.
Manpage tweaks from jmc@
ok kettenis@, bluhm@, mikeb@
|
| |
|
|
|
|
|
|
| |
it's not enough to assign to an unsigned type because if the arithmetic
overflows the compiler may decide to do anything. so change all the
long long casts to uint64_t so that we start with the right type.
reported by Tim Newsham of NCC.
ok deraadt
|
| |
|
|
| |
ok mikeb@ tedu@
|
| |
|
|
|
|
|
| |
an immediate timeout if a positive value is specified is unexpected
behavior. Defer calling the handler for at least one tick. Do not
change that timeout_add(0) gives you an immediate timeout.
OK millert@ uebayasi@ tedu@
|
| |
|
|
| |
OK mikeb@
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.
ok tedu@ deraadt@
|
| |
|
|
|
|
|
|
| |
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.
ok mpi@ matthew@ mikeb@ guenther@
|
| |
|
|
| |
ok deraadt@
|
| | |
|
| |
|
|
| |
to_arg is void *
|
| | |
|
| |
|
|
| |
pointed out by Artturi Alm (artturi.alm (at) gmail.com)
|
| |
|
|
|
|
|
|
| |
It was useful for tracking down the last devices which weren't deleting
their timeouts on suspend and recreating them on resume, but it's too
verbose to keep around.
noted by deraadt@
|
| |
|
|
|
|
| |
just the realtime clock, triggering and adjusting timeouts to reflect that.
ok matthew@ deraadt@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
timeout or not.
without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.
now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.
discussed with drinking art and art, and most of k2k11
ok miod@
|
| |
|
|
| |
ok jsing@, miod@
|
| |
|
|
|
|
|
| |
nitems() in two places instead of coding the array size and fix a
spot of whitespace.
ok miod@ blambert@
|
| |
|
|
| |
wheel). This was safe, except for osiop bugs.
|
| |
|
|
|
|
| |
Idea and original patch mk@
ok mk@, krw@
|
| |
|
|
|
|
|
|
| |
All cpus are stopped and this cpu blocks all interrupts. It doesn't make
sense to grab locks that ddb can then jump past with longjmp.
Noticed by Pierre Riteau. I just forgot about the bug until reminded
today.
|
| |
|
|
|
|
|
|
|
| |
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.
"you're doing it wrong" art@,ray@,otto@,tedu@
ok art@
|
| |
|
|
|
|
| |
subtly different from CIRCLEQ, it is possible, when emptying the whole
timeout chain, to end up with CIRCQ_EMPTY being false, and bad things
happen. Back to the drawing board...
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
art pointed out that timeout_set is the initializer of timeout structs.
this means that the ONQUEUE flag could be set when timeout_set is given
freshly allocated memory. my commit suddenly introduced the requirement
that you bzero a timeout before initialising it. without the bzero we
could generate false positives about the timeout being already queued.
art did produce a diff that would walk the queues when the flag was set
to see if it really was in the lists, but deraadt considers this too much
of a hit.
|
| |
|
|
|
|
|
|
| |
screw up the queues that tie all the timeouts together. this makes us
panic if we detect that happening. its a lot easier to debug that the
weird side effects of broken timeout queues.
ok mickey@ kettenis@ deraadt@ pedro@
|