| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function (or of similar nature) is required to safely use a refcnt
and smr_entry together. Such functions exist on other platforms as
kref_get_unless_zero (on Linux) and refcount_acquire_if_gt (on FreeBSD).
The following diagram details the following situation with and without
refcnt_take_if_gt in 3 cases, with the first showing the "invalid" use
of refcnt_take.
Situation:
Thread #1 is removing the global referenc (o).
Thread #2 wants to reference an object (r), using a thread pointer (t).
Case:
1) refcnt_take after Thread #1 has released "o"
2) refcnt_take_if_gt before Thread #1 has released "o"
3) refcnt_take_if_gt after Thread #1 has released "o"
Data:
struct obj {
struct smr_entry smr;
struct refcnt refcnt;
} *o, *r, *t1, *t2;
Thread #1 | Thread #2
---------------------------------+------------------------------------
| r = NULL;
rw_enter_write(&lock); | smr_read_enter();
|
t1 = SMR_PTR_GET_LOCKED(&o); | t2 = SMR_PTR_GET(&o);
SMR_PTR_SET_LOCKED(&o, NULL); |
|
if (refcnt_rele(&t1->refcnt) |
smr_call(&t1->smr, free, t1); |
| if (t2 != NULL) {
| refcnt_take(&t2->refcnt);
| r = t2;
| }
rw_exit_write(&lock); | smr_read_exit();
.....
// called by smr_thread |
free(t1); |
.....
| // use after free
| *r
---------------------------------+------------------------------------
| r = NULL;
rw_enter_write(&lock); | smr_read_enter();
|
t1 = SMR_PTR_GET_LOCKED(&o); | t2 = SMR_PTR_GET(&o);
SMR_PTR_SET_LOCKED(&o, NULL); |
|
if (refcnt_rele(&t1->refcnt) |
smr_call(&t1->smr, free, t1); |
| if (t2 != NULL &&
| refcnt_take_if_gt(&t2->refcnt, 0))
| r = t2;
rw_exit_write(&lock); | smr_read_exit();
.....
// called by smr_thread | // we don't have a valid reference
free(t1); | assert(r == NULL);
---------------------------------+------------------------------------
| r = NULL;
rw_enter_write(&lock); | smr_read_enter();
|
t1 = SMR_PTR_GET_LOCKED(&o); | t2 = SMR_PTR_GET(&o);
SMR_PTR_SET_LOCKED(&o, NULL); |
| if (t2 != NULL &&
| refcnt_take_if_gt(&t2->refcnt, 0))
| r = t2;
if (refcnt_rele(&t1->refcnt) |
smr_call(&t1->smr, free, t1); |
rw_exit_write(&lock); | smr_read_exit();
.....
| // we need to put our reference
| if (refcnt_rele(&t2->refcnt))
| smr_call(&t2->smr, free, t2);
.....
// called by smr_thread |
free(t1); |
---------------------------------+------------------------------------
Currently it uses atomic_add_int_nv to atomically read the refcnt,
but I'm open to suggestions for better ways.
The atomic_cas_uint is used to ensure that refcnt hasn't been modified
since reading `old`.
|
|
|
|
|
|
|
| |
- use C99-style initialization (grep works better with that)
- use const as execsw is not modified during runtime
ok mpi@
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
single_thread_set() is modified to explicitly indicated when waiting until
sibling threads are parked is required. This is obviously not required if
a traced thread is switching away from a CPU after handling a STOP signal.
ok claudio@
|
|
|
|
| |
ok gnezdo@ semarie@ mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
are started before syslogd(8). This resulted in ugly sendsyslog(2)
dropped logs and the real message was lost.
Create a temporary stash for log messages within the kernel. It
has a limited size of 100 messages, and each message is truncated
to 8192 bytes. When the stash is exhausted, the well-known dropped
message is generated with a counter. After syslogd(8) has setup
everything, it sends a debug line through libc to flush the kernel
stash. Then syslogd receives all messages from the kernel before
the usual logs.
OK deraadt@ visa@
|
|
|
|
|
|
| |
This makes appear some redundant & racy checks.
ok semarie@
|
|
|
|
|
|
|
| |
i'm not a fan of having to cast to caddr_t when we have modern
inventions like void *s we can take advantage of.
ok claudio@ mvs@ bluhm@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend kqueue's filterops interface with new callbacks so that it
becomes easier to use with fine-grained locking. The new interface
delegates the serialization of kn_event access to event sources. Now
kqueue uses filterops callbacks to read or write kn_event. This hides
event sources' locking patterns from kqueue, and allows clean
implementation of atomic read-and-clear for EV_CLEAR, for instance.
There are so many existing filterops instances that converting all of
them in one go is tricky. This patch adds a wrapper mechanism that
kqueue uses when the new callbacks are missing.
The new filterops interface has been influenced by XNU's kqueue.
OK mpi@ semarie@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Passing local copy of socket to sbrelease() is too complicated to just
free receive buffer. We don't allocate large object on the stack. Also
we don't pass unlocked socket to soassertlocked() within sbdrop(). This
was not triggered because we lock the whole layer with one lock.
Also sorflush() is now private to kern/uipc_socket.c, so it's definition
was made to be in accordance.
ok claudio@ mpi@
|
|
|
|
|
|
|
| |
Use the SCHED_LOCK() to ensure `ps_thread' isn't being modified by a sibling
when entering tsleep(9) w/o KERNEL_LOCK().
ok visa@
|
|
|
|
|
|
|
|
| |
used as solock()'s backend to protect the whole layer.
With feedback from mpi@.
ok bluhm@ claudio@
|
|
|
|
|
| |
We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.
|
|
|
|
|
|
|
|
|
|
|
|
| |
the SCHED_LOCK().
Putting a thread on a sleep queue is reduce to the following:
sleep_setup();
/* check condition or release lock */
sleep_finish();
Previous version ok cheloha@, jmatthew@, ok claudio@
|
| |
|
|
|
|
| |
OK deraadt@, bluhm@
|
|
|
|
| |
OK bluhm@, claudio@, mpi@, semarie@
|
|
|
|
|
|
|
|
| |
This includes ujoy_hid_is_collection() to work around limitations of
hid_is_collection() until this can be combined without fallout.
input, testing with 8bitdo controller, and ok brynet@
PS4 controller testing, fix for hid_is_collection, and ok mglocker@
|
|
|
|
| |
ok mpi@
|
| |
|
| |
|
|
|
|
| |
ok sashan@
|
|
|
|
|
|
| |
This allows us to unlock getppid(2).
ok mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deliver file descriptor close notification for __EV_POLL knotes through
struct kevent that kqueue_scan() returns. This replaces the previous way
of returning EBADF from kqueue_scan(), making it easier to determine
what exactly has changed.
When a file descriptor is closed, its __EV_POLL knotes are turned into
one-shot events and queued for delivery. These knotes are "unregistered"
as they are reachable only through the queue of active events. This
reduces interference with the normal workings of kqueue. However, more
care is needed to avoid leaking knotes. In addition, the unregistering
removes a limit on the number of issued knotes. To prevent accumulation
of pending fd close notifications, kqpoll_init() flushes the active
queue at the start of a kqpoll scan.
OK mpi@
|
|
|
|
| |
OK mpi@ as part of a larger diff
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The global "tickadj" variable is a remnant of the old NTP adjustment
code we used in the kernel before the current timecounter subsystem
was imported from FreeBSD circa 2004 or 2005.
Fifteen years hence it is completely vestigial and we can remove it.
We probably should have removed it long ago but I guess it slipped
through the cracks. FreeBSD removed it in 2002:
https://cgit.freebsd.org/src/commit/?id=e1d970f1811e5e1e9c912c032acdcec6521b2a6d
NetBSD and DragonflyBSD can probably remove it, too.
We export tickadj via the kern.clockrate sysctl(2), so update sysctl.2
and sysctl(8) accordingly. Hypothetically this change could break
someone's sysctl(8) parsing script. I don't think that's very likely.
ok mvs@
|
|
|
|
|
|
|
| |
Original port from NetBSD by guenther@, required for upcoming amap & anon
locking.
ok kettenis@
|
|
|
|
|
|
|
| |
The common code is moved to sleep_signal_check() and instead of multiple
state variables for sls_sig and sls_unwind only one sls_sigerr is set.
This simplifies the checks in sleep_finish_signal() a great bit.
Idea from and OK mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the pool(9) timeouts to use the system uptime instead of ticks.
- Change the timeouts from variables to macros so we can use
SEC_TO_NSEC(). This means these timeouts are no longer patchable
via ddb(4). dlg@ does not think this will be a problem, as the
timeout intervals have not changed in years.
- Use low-res time to keep things fast. Add a local copy of
getnsecuptime() to subr_pool.c to keep the diff small. We will need
to move getnsecuptime() into kern_tc.c and document it later if we
ever have other users elsewhere in the kernel.
- Rename ph_tick -> ph_timestamp and pr_cache_tick -> pr_cache_timestamp.
Prompted by tedu@ some time ago, but the effort stalled (may have been
my fault). Input from kettenis@ and dlg@.
Special thanks to mpi@ for help with struct shuffling. This change
does not increase the size of struct pool_page_header or struct pool.
ok dlg@ mpi@
|
|
|
|
|
|
| |
These are essentially equivalent to the simple queue macros from
NetBSD but predate them and are more widely available on other systems.
OK mpi@ denis@
|
|
|
|
|
|
|
|
|
|
|
| |
devices, introduce kern.video.record for video(4) devices. By default
kern.video.record will be set to zero, blanking all data delivered
by device drivers which attach to video(4).
The idea was initially proposed by
Laurence Tratt <laurie AT tratt DOT net>.
ok mpi@
|
|
|
|
|
|
| |
This saves about 2.5 KiB off amd64's RAMDISK after gzip compression.
OK deraadt@, mpi@, cheloha@
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename klist_{insert,remove}() to klist_{insert,remove}_locked().
These functions assume that the caller has locked the klist. The current
state of locking remains intact because the kernel lock is still used
with all klists.
Add new functions klist_insert() and klist_remove() that lock the klist
internally. This allows some code simplification.
OK mpi@
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the SMR thread maintain an explicit system-wide grace period and
make CPUs observe the current grace period when crossing a quiescent
state. This lets the SMR thread avoid a forced context switch for CPUs
that have already entered the latest grace period.
This change provides a small improvement in smr_grace_wait()'s
performance in terms of context switching.
OK mpi@, anton@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It would be convenient if there were a channel a thread could sleep on
to indicate they do not want any wakeup(9) broadcasts. The easiest way
to do this is to add an "int nowake" to kern_synch.c and extern it in
sys/systm.h. You use it like this:
#include <sys/systm.h>
tsleep_nsec(&nowait, ...);
There is now no need to handroll a local dead channel, e.g.
int chan;
tsleep_nsec(&chan, ...);
which expands the stack. Local dead channels will be replaced with
&nowake in later patches.
One possible problem with this "one global channel" approach is sleep
queue congestion. If you have lots of threads sleeping on &nowake you
might slow down a wakeup(9) on a different channel that hashes into
the same queue. Unsure how much of problem this actually is, if at all.
NetBSD and FreeBSD have a "pause" interface in the kernel that chooses
a suitable channel automatically. To keep things simple and avoid
adding a new interface we will start with this global channel.
Discussed with mpi@, claudio@, kettenis@, and deraadt@.
Basically designed by kettenis@, who vetoed my other proposals.
Bugs caught by deraadt@, tb@, and patrick@.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends struct klist with a callback descriptor and
an argument. The main purpose of this is to let the kqueue subsystem
assert when a klist should be locked, and operate the klist lock
in klist_invalidate().
Access to a knote list of a kqueue-monitored object has to be
serialized somehow. Because the object often has a lock for protecting
its state, and because the object often acquires this lock at the latest
in its f_event callback function, it makes sense to use this lock also
for the knote lists. The existing uses of NOTE_SUBMIT already show
a pattern that is likely to become more prevalent.
There could be an embedded lock in klist. However, such a lock would be
redundant in many cases. The code cannot rely on a single lock type
(mutex, rwlock, something else) because the needs of monitored objects
vary. In addition, an embedded lock would introduce new lock order
constraints. Note that the patch does not rule out use of dedicated
klist locks.
The patch introduces a way to associate lock operations with a klist.
The caller can provide a custom implementation, or use a ready-made
interface with a mutex or rwlock.
For compatibility with old code, the new code falls back to using the
kernel lock if no specific klist initialization has been done. The
existing code already relies on implicit initialization of klist.
Sadly, this change increases the size of struct klist. dlg@ thinks this
is not fatal, though.
OK mpi@
|
|
|
|
|
|
|
|
|
|
| |
When the file descriptor of an __EV_POLL-flagged knote is closed,
post EBADF through the kqueue instance to the caller of kqueue_scan().
This lets kqueue-based poll() and select() preserve their current
behaviour of returning EBADF when a polled file descriptor is closed
concurrently.
OK mpi@
|
|
|
|
| |
OK mpi@
|
|
|
|
| |
ok visa@
|
|
|
|
|
|
| |
OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@
|
|
|
|
|
|
| |
This will soon be used by select(2) and poll(2).
ok anton@, visa@
|
|
|
|
|
|
|
| |
Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.
From and ok claudio@
|
|
|
|
|
|
|
| |
Stop iterating in the function and instead copy the returned events to
userland after every call.
ok visa@
|
|
|
|
|
|
| |
Similar to what NetBSD and FreeBSD have done.
OK guenther@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is now possible to call the function multiple times to collect events.
For that, the end marker has to be preserved between calls because otherwise
the scan might collect an event more than once. If a collected event gets
reactivated during scanning, it will be added at the tail of the queue,
out of reach because of the end marker.
This is required to implement select(2) and poll(2) on top of kqueue_scan().
Done & originally committed by visa@ in r1.143, in snap for more than 2 weeks.
ok visa@, anton@
|
| |
|
|
|
|
|
|
| |
Adjust variable declaration in disklabel to match.
ok millert@ deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To unlock getitimer(2) and setitimer(2) we need to protect the
per-process ITIMER_REAL state with something other than the kernel
lock. As the ITIMER_REAL timeout callback realitexpire() runs at
IPL_SOFTCLOCK the per-process mutex ps_mtx is appropriate.
In setitimer() we need to use ps_mtx instead of the global itimer_mtx
if the given timer is ITIMER_REAL. Easy.
The ITIMER_REAL timeout callback routine realitexpire() is trickier.
When we enter ps_mtx during the callback we need to check if the timer
was cancelled or rescheduled. A thread from the process can call
setitimer(2) at the exact moment the callback is about to run from
timeout_run() (see kern_timeout.c).
Update the locking annotation in sys/proc.h accordingly.
ok anton@
|