| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
in dev/kcov.c; therefore move it to dev/kcov.c.
|
|
|
|
| |
ok claudio@
|
|
|
|
| |
ok claudio@
|
| |
|
|
|
|
| |
struct sigacts since that is the only thing that is modified by siginit.
|
|
|
|
| |
ok claudio@, pirofti@
|
| |
|
| |
|
|
|
|
| |
OK deraadt@, mpi@
|
|
|
|
| |
Prompted by mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008.
Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c
the only visible difference between used and stub ctldebug structs in the
debugvars[] array is their extern keyword, indicating that it is defined
elsewhere.
sys/sysctl.h declares all debugN members as extern upfront, but these
declarations are not needed.
Remove the unused debug sysctl, rename the only remaining one to something
meaningful and remove forward declarations from /sys/sysctl.h; this way,
adding new debug sysctls is a matter of adding extern and coming up with a
name, which is nicer to read on its own and better to grep for.
OK mpi
|
|
|
|
| |
ok mvs@, visa@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding "debug.my-knob" sysctls is really helpful to select different
code paths and/or log on demand during runtime without recompile,
but as this code is under DEBUG, lots of other noise comes with it
which is often undesired, at least when looking at specific subsystems
only.
Adding globals to the kernel and breaking into DDB to change them helps,
but that does not work over SSH, hence the need for debug sysctls.
Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of
DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general
option for all of sysctl(2).
OK gnezdo
|
|
|
|
| |
OK mvs@
|
|
|
|
|
|
| |
Design by deraadt@
ok deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ITIMER_REAL itimerspec (ps_timer[0]) and timeout (ps_realit_to)
are protected by the kernel lock. Annotate them with "K".
The ITIMER_VIRTUAL and ITIMER_PROF itimerspecs (ps_timer[1],
ps_timer[2]) are protected by itimer_mtx. Annotate them with "T",
for "timer".
With input from kettenis@ and anton@.
ok kettenis@, anton@
|
|
|
|
| |
Reminded by, input & OK jca
|
|
|
|
|
|
|
|
|
|
| |
These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.
Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.
ok visa@, mpi@
|
|
|
|
|
|
|
| |
Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.
OK kn@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.
Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.
The kernel API is documented in a new kcov_remote_register(9) manual.
Remote coverage is also supported by kcov on NetBSD and Linux.
ok mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.
Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.
Suggested by procter@ several months ago.
|
|
|
|
|
|
|
| |
MD versions of these headers were unhooked. As nothing has hit those
checks we can drop them at this point.
ok visa@ and "makes sense" to millert@
|
|
|
|
|
|
|
|
| |
used by the processor chip. Although we have a SENSOR_WATTHOUR sensor
type its units are not really suitable for this sensor. So add a
SENSOR_ENERGY type that uses micro Joules as its unit.
ok deraadt@
|
|
|
|
|
|
|
|
|
| |
VERASE would perform (sometimes irrelevant) compute in the kernel which
can be heavy (especially with our insufficient tty subsystem locking). Use
tsleep_nsec for 1 tick in such circumstances to yield cpu, and also bring
interruptability to ptcwrite()
https://syzkaller.appspot.com/bug?extid=462539bc18fef8fc26cc
ok kettenis millert, discussions with greg and anton
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.
If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.
The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.
Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.
This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).
Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!
OK from at least kettenis@, cheloha@, naddy@, sthen@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a kstat is an arbitrary chunk of data that a part of the kernel
wants to expose to userland. data could mean just a chunk of raw
bytes, but generally a kernel subsystem will provide a series of
kstat key/value chunks.
this code is loosely modelled on kstat in solaris, but with a bunch
of simplifications (we don't want to provide write support for
example). the named or key/value structure is significantly richer
in this version too. eg, ssolaris kstat named data supports integer
types, but this version offers differentiation between counters
(like the number of packets transmitted on an interface) and gauges
(like how long the transmit queue is) and lets kernel providers say
what the units are (eg, packets vs bytes vs cycles).
the main motivation for this is to improve the visibility of what
the kernel is doing while it's running. i wrote this as part of the
recent work we've been doing on multiqueue and rss/toeplitz so i
could verify that network load is actually spread across multiple
rings on a single nic. without this we would be wasting memory and
interrupt vectors on multiple rings and still just using the 1st
one, and noone would know cos there's no way to see what rings are
being used.
another thing that can become visible is the different counters
that various network cards provide. i'm particularly interested in
seeing if packets get dropped because the rings aren't filled fully,
which is an effect we've never really observed directly.
a small part of wanting this is cos i spend an annoying amount of
time instrumenting the kernel when hacking code in it. if most of
the scaffolding for the instrumentation is already there, i can
avoid repeatedly writing that code and save time.
iterated a few times with claudio@ and deraadt@
|
|
|
|
|
|
|
|
|
|
| |
capital letters in locking annotations. Therefore harmonize the existing
annotations.
Also, if multiple locks are required they should be delimited using
commas.
ok mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
serializing calls to pipe_buffer_free(). Repeating the previous commit
message:
Instead of performing three distinct allocations per created pipe,
reduce it to a single one. Not only should this be more performant, it
also solves a kqueue related issue found by visa@ who also requested
this change: if you attach an EVFILT_WRITE filter to a pipe fd, the
knote gets added to the peer's klist. This is a problem for kqueue
because if you close the peer's fd, the knote is left in the list whose
head is about to be freed. knote_fdclose() is not able to clear the
knote because it is not registered with the peer's fd.
FreeBSD also takes a similar approach to pipe allocations.
once again ok mpi@ visa@
|
|
|
|
|
| |
to keep the behavior when switching poll(2) to use kqueue filters.
From mpi@
|
|
|
|
|
|
| |
function but actually a 'true' value is needed; use seltrue instead.
Problem reported, kenel bisected and diff tested by Jens A. Griepentrog.
ok deraadt@ mpi@
|
|
|
|
|
|
| |
these days, so inventing our own numbers is fine.
From drahn@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
time_second(9) has been replaced in the kernel by gettime(9).
time_uptime(9) has been replaced in the kernel by getuptime(9).
New code should use the replacement interfaces. They do not suffer
from the split-read problem inherent to the time_* variables on 32-bit
platforms.
The variables remain in sys/kern/kern_tc.c for use via kvm(3) when
examining kernel core dumps.
This commit completes the deprecation process:
- Remove the extern'd definitions for time_second and time_uptime
from sys/time.h.
- Replace manpage cross-references to time_second(9)/time_uptime(9)
with references to microtime(9) or a related interface.
- Move the time_second.9 manpage to the attic.
With input from dlg@, kettenis@, visa@, and tedu@.
ok kettenis@
|
|
|
|
| |
discussed with cheloha@
|
|
|
|
|
|
|
| |
it means we can do quick hacks to existing drivers to test interrupts
on multiple cpus. emphasis on quick and hacks.
ok jmatthew@, who will also ok the removal of it at the right time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
time_second and time_uptime are used widely in the tree. This is a
problem on 32-bit platforms because time_t is 64-bit, so there is a
potential split-read whenever they are used at or below IPL_CLOCK.
Here are two replacement interfaces: gettime(9) and getuptime(9).
The "get" prefix signifies that they do not read the hardware
timecounter, i.e. they are fast and low-res. The lack of a unit
(e.g. micro, nano) signifies that they yield a plain time_t.
As an optimization on LP64 platforms we can just return time_second or
time_uptime, as a single read is atomic. On 32-bit platforms we need
to do the lockless read loop and get the values from the timecounter.
In a subsequent diff these will be substituted for time_second and
time_uptime almost everywhere in the kernel.
With input from visa@ and dlg@.
ok kettenis@
|
|
|
|
|
|
|
|
|
|
| |
This filter, already implemented in macOS and Dragonfly BSD, returns
exceptional conditions like the reception of out-of-band data.
The functionnality is similar to poll(2)'s POLLPRI & POLLRDBAND and
it can be used by the kqfilter-based poll & select implementation.
ok millert@ on a previous version, ok visa@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reading vpd stuff is useful when you're trying to get support
information about a pci device, eg, if you want a serial number,
or firmware versions, or specific part name or number, it's likely
available via vpd. also, im sick of having the diff in my tree.
the vpd info is not accessed as bytes read from a capability, but
is read via a register in the capability. the same register also
supports updating or writing vpd info, which sounds like a bad idea
to let userland have raw access to.
this adds an ioctl so that userland can ask the kernel to read via
the vpd register on its behalf. this ensures that the only access
is read access, and it's sanity checked.
tested by hrvoje popovski on many devices.
ok jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gif used its mbuf tag to store it's interface index so it could
detect loops. gre also did this, and i cut most of the drivers
(including gif) over to using the gre tag. so the gif tag is unused.
wireguard uses the tag to store peer information between different
contexts the packet is processed in. it also needs a bit more space
to do that.
from Matt Dunwoodie and Jason A. Donenfeld
ok deraadt@
|
|
|
|
| |
from Matt Dunwoodie and Jason A. Donenfeld
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reduce it to a single one. Not only should this be more performant, it
also solves a kqueue related issue found by visa@ who also requested
this change: if you attach an EVFILT_WRITE filter to a pipe fd, the
knote gets added to the peer's klist. This is a problem for kqueue
because if you close the peer's fd, the knote is left in the list whose
head is about to be freed. knote_fdclose() is not able to clear the
knote because it is not registered with the peer's fd.
FreeBSD also takes a similar approach to pipe allocations.
ok mpi@ visa@
|
|
|
|
|
|
|
| |
of SMR lists in userspace-visible parts of system headers. In addition,
the macros allow libkvm to examine SMR data structures.
Initial diff by and OK claudio@
|
|
|
|
|
| |
i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.
|
|
|
|
|
| |
requested by kettenis@
discussed with jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
there's been discussions for years (and even some diffs!) about how we
should let drivers establish interrupts on multiple cpus.
the simple approach is to let every driver look at the number of
cpus in a box and just pin an interrupt on it, which is what pretty
much everyone else started with, but we have never seemed to get
past bikeshedding about. from what i can tell, the principal
objections to this are:
1. interrupts will tend to land on low numbered cpus.
ie, if drivers try to establish n interrupts on m cpus, they'll
start at cpu 0 and go to cpu n, which means cpu 0 will end up with more
interrupts than cpu m-1.
2. some cpus shouldn't be used for interrupts.
why a cpu should or shouldn't be used for interrupts can be pretty
arbitrary, but in practical terms i'm going to borrow from the
scheduler and say that we shouldn't run work on hyperthreads.
3. making all the drivers make the same decisions about the above is
a lot of maintenance overhead.
either we will have a bunch of inconsistencies, or we'll have a lot
of untested commits to keep everything the same.
my proposed solution to the above is this diff to provide the intrmap
api. drivers that want to establish multiple interrupts ask the api for
a set of cpus it can use, and the api considers the above issues when
generating a set of cpus for the driver to use. drivers then establish
interrupts on cpus with the info provided by the map.
it is based on the if_ringmap api in dragonflybsd, but generalised so it
could be used by something like nvme(4) in the future.
this version provides numeric ids for CPUs to drivers, but as
kettenis@ has been pointing out for a very long time, it makes more
sense to use cpu_info pointers. i'll be updating the code to address
that shortly.
discussed with deraadt@ and jmatthew@
ok claudio@ patrick@ kettenis@
|
|
|
|
| |
ok visa@, millert@
|
|
|
|
|
|
| |
This is only done in poll-compatibility mode, when __EV_POLL is set.
ok visa@, millert@
|
|
|
|
| |
Port breakages reported by naddy@
|
|
|
|
|
|
| |
While here prefix kernel-only EV flags with two underbars.
Suggested by kettenis@, ok visa@
|