| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
could crash due to missing inp_ppcb. This happend when fstat(1)
was called often and TCP was aborted with reset. Protect the sysctl
path with the net lock.
OK mpi@
|
|
|
|
|
|
|
|
|
| |
It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.
Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@
|
|
|
|
|
|
|
| |
Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.
ok visa@, cheloha@
|
|
|
|
|
|
|
| |
do word loads and stores and so partial updates should no longer be observed.
With this accessing global variables set by sysctl_int() should be mostly MP
save.
OK dlg@ mpi@
|
|
|
|
|
|
| |
current status and statistics and can be exported without super-user
rights via sysctl to make it easier for tools like systat to access those.
OK deraadt@, sashan@
|
|
|
|
|
|
|
|
| |
The new node contains the subsystem's main control variable,
kern.witness.watch. It is aliased by the old name, kern.witnesswatch.
The alias will be removed in the future.
OK anton@ mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To protect the timehands we first need to protect the basis for all UTC
time in the kernel: the boottime.
Because the boottime can be changed at any time it needs to be versioned
along with the other members of the timehands to enable safe lockless reads
when using it for anything. So the global boottime timespec goes away and
the static boottimebin becomes a member of the timehands. Instead of reading
the global boottime you use one of two interfaces: binboottime(9) or
microboottime(9). nanoboottime(9) can trivially be added later, though there
are no consumers for it at the moment.
This introduces one small change in behavior. We used to advance the
reported boottime just before launching kernel threads from main().
This makes it look to userland like we "booted" moments before those
threads were launched. Because there is no longer a boottime global we
can no longer trivially do this from main(), so the boottime we report
to userspace via e.g. kern.boottime will now reflect whatever the time
was when we bootstrapped the timehands via inittodr(9). This is usually
no more than a minute before the kernel threads are launched from main().
The prior behavior can be restored by adding a new interface to the
timecounter layer in a future commit.
Based on FreeBSD r303387.
Discussed with mpi@ and visa@.
ok visa@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of hw.smt we need a way to determine whether a given CPU is "online"
or "offline" from userspace. KERN_CPTIME2 is an array, and so cannot be
cleanly extended for this purpose, so add a new sysctl(2) KERN_CPUSTATS
with an extensible struct. At the moment it's just KERN_CPTIME2 with a
flags member, but it can grow as needed.
KERN_CPUSTATS appears to have been defined by BSDi long ago, but there are
few (if any) packages in the wild still using the symbol so breakage in ports
should be near zero. No other system inherited the symbol from BSDi, either.
Then, use the new sysctl(2) in systat(1) and top(1):
- systat(1) draws placeholder marks ('-') instead of percentages for
offline CPUs in the cpu view.
- systat(1) omits offline CPU ticks when drawing the "big bar" in
the vmstat view. The upshot is that the bar isn't half idle when
half your logical CPUs are disabled.
- top(1) does not draw lines for offline CPUs; if CPUs toggle on or
offline in interactive mode we redraw the display to expand/reduce
space for the new/missing CPUs. This is consistent with what some
top(1) implementations do on Linux.
- top(1) omits offline CPUs from the totals when CPU totals are
combined into a single line (the '-1' flag).
Originally prompted by deraadt@. Discussed endlessly with deraadt@,
ketennis@, and sthen@. Tested by jmc@ and jca@. Earlier versions also
discussed with jca@. Earlier versions tested by jmc@, tb@, and many
others.
docs ok jmc@, kernel bits ok ketennis@, everything ok sthen@,
"Is your stuff in yet?" deraadt@
|
|
|
|
| |
ok kettenis deraadt
|
|
|
|
|
|
|
| |
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets userspace distinguish between idle CPUs and those that are
not schedulable because hw.smt=0.
A subsequent commit probably needs to add documentation for this
to sysctl.2 (and perhaps elsewhere) after the dust settles.
Also included here are changes to systat(1) and top(1) that account
for the ENODEV case and adjust behavior accordingly:
- systat(1)'s cpu view prints placeholder marks ('-') instead of
percentages for each state if the given CPU is offline.
- systat(1)'s vmstat view checks for offline CPUs when computing the
machine state total and excludes them, so the CPU usage graph
only represents the states for online CPUs.
- top(1) does not draw CPU rows for offline CPUs when the view is
redrawn. If CPUs "go offline", percentages for each state are
replaced by placeholder marks ('-'); the view will need to be
redrawn to remove these rows. If CPUs "go online" the view will
need to be redrawn to show these new CPUs. In "combined CPU" mode,
the count and the state totals only represent online CPUs.
Ports using KERN_CPTIME2 will need to be updated. The changes
described above to make systat(1) and top(1) aware of the ENODEV
case *and* gracefully handle a changing HW_NCPUONLINE while the
application is running are not necessarily appropriate for each
and every port.
The changes described above are so extensive in part to demonstrate
one way a program *might* be made robust to changing CPU availability.
In particular, changing hw.smt after boot is an extremely rare event,
and this needs to be weighed when updating ports.
The logic needed to account for the KERN_CPTIME2 ENODEV case is
very roughly:
if (sysctl(...) == -1) {
if (errno != ENODEV) {
/* Actual error occurred. */
} else {
/* CPU is offline. */
}
} else {
/* CPU is online and CPU states were set by sysctl(2). */
}
Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at
length with kettenis@. Additional testing by tb@.
No complaints from hackers@ after a week.
ok kettenis@, "I think you should commit [now]" deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
| |
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The introduction of hw.smt means that logical CPUs can be disabled
after boot and prior to suspend/resume. If hw.smt=0 (the default),
there needs to be a way to count the number of hardware threads
available on the system at any given time.
So, import HW_NCPUONLINE/hw.ncpuonline from NetBSD and document it.
hw.ncpu becomes equal to the number of CPUs given to sched_init_cpu()
during boot, while hw.ncpuonline is equal to the number of CPUs available
to the scheduler in the cpuset "sched_all_cpus". Set_SC_NPROCESSORS_ONLN
equal to this new sysctl and keep _SC_NPROCESSORS_CONF equal to hw.ncpu.
This is preferable to adding a new sysctl to count the number of
configured CPUs and keeping hw.ncpu equal to the number of online
CPUs because such a change would break software in the ecosystem
that relies on HW_NCPU/hw.ncpu to measure CPU usage and the like.
Such software in base includes top(1), systat(1), and snmpd(8),
and perhaps others.
We don't need additional locking to count the cardinality of a cpuset
in this case because the only interfaces that can modify said cardinality
are sysctl(2) and ioctl(2), both of which are under the KERNEL_LOCK.
Software using HW_NCPU/hw.ncpu to determine optimal parallism will need
to be updated to use HW_NCPUONLINE/hw.ncpuonline. Until then, such software
may perform suboptimally. However, most changes will be similar to the
change included here for libcxx's std::thread:hardware_concurrency():
using HW_NCPUONLINE in lieu of HW_NCPU should be sufficient for determining
optimal parallelism for most software if the change to _SC_NPROCESSORS_ONLN
is insufficient.
Prompted by deraadt. Discussed at length with kettenis, deraadt, and sthen.
Lots of patch tweaks from kettenis.
ok kettenis, "proceed" deraadt
|
|
|
|
|
|
|
|
|
| |
instead of using a mutex for update serialization. Use a per-fdp mutex
to manage updating of file instance pointers in the `fd_ofiles' array
to let fd_getfile() acquire file references safely with concurrent file
reference releases.
OK mpi@
|
|
|
|
|
|
|
|
| |
This prevents the array from being freed too early. In the function
unp_internalize(), the locking also ensures the per-fdp flags stay
coherent with the file instance.
OK mpi@
|
|
|
|
|
|
|
|
|
|
| |
These syscalls can now be executed w/o the KERNEL_LOCK() depending on
the kind of socket.
The current solution uses a single global mutex to serialize access to,
and reference count, 'struct file'.
ok visa@, kettenis@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TLBs and L1 caches between threads. This can make cache timing
attacks a lot easier and we strongly suspect that this will make
several spectre-class bugs exploitable. Especially on Intel's SMT
implementation which is better known as Hypter-threading. We really
should not run different security domains on different processor
threads of the same core. Unfortunately changing our scheduler to
take this into account is far from trivial. Since many modern
machines no longer provide the ability to disable Hyper-threading in
the BIOS setup, provide a way to disable the use of additional
processor threads in our scheduler. And since we suspect there are
serious risks, we disable them by default. This can be controlled
through a new hw.smt sysctl. For now this only works on Intel CPUs
when running OpenBSD/amd64. But we're planning to extend this feature
to CPUs from other vendors and other hardware architectures.
Note that SMT doesn't necessarily have a posive effect on performance;
it highly depends on the workload. In all likelyhood it will actually
slow down most workloads if you have a CPU with more than two cores.
ok deraadt@
|
|
|
|
|
| |
in_pcb.h header file.
OK mpi@ visa@
|
|
|
|
| |
suggested by jsg, ok sthen.
|
|
|
|
| |
kernel builds without audio (for example, ramdisks). ok florian@
|
|
|
|
|
|
|
|
| |
knob to take the new "sysctl" value, which is the default. In this
case, the device behavior is determined by the new "kern.audio.record"
sysctl(2), which defaults to zero.
ok florian
|
|
|
|
|
|
|
| |
lock order checking is disabled but it can be enabled at runtime.
Suggested by deraadt@ / mpi@
OK mpi@
|
|
|
|
|
|
| |
This gives use refcounting for free which is what we need for MP.
ok bluhm@, visa@
|
|
|
|
|
|
| |
later.
ok bluhm@, visa@
|
|
|
|
|
|
|
|
|
| |
the other fields.
Once we no longer have any [k] (kernel lock) protections, we'll be
able to unlock almost all network related syscalls.
Inputs from and ok bluhm@, visa@
|
|
|
|
|
|
|
| |
This turns `filehead' into a local variable, that will make it easier
to protect it.
ok visa@
|
|
|
|
|
|
|
| |
The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.
No objection from millert@, ok tedu@, bluhm@
|
|
|
|
| |
ok millert@ sthen@
|
|
|
|
| |
Tested by Hrvoje Popovski, ok bluhm@
|
|
|
|
|
|
| |
They're might not be fully constructed.
ok mpi@ deraadt@ bluhm@
|
|
|
|
|
| |
theyre both wrappers around sysctl__string, which is where half the
fix is too.
|
|
|
|
|
|
|
|
|
| |
this tweaks the len argument to sysctl_rdstring, sysctl_struct, and
sysctl_rdstruct.
there's probably more to fix.
ok millert@
|
|
|
|
|
|
| |
also in the IPv6 case. This fixes "netstat -An -f inet6 -p tcp"
and shows 0x0.
report and OK dhill@
|
|
|
|
|
| |
to valid values. The so_qlimit is type short.
report Dillon Jay Pena; OK deraadt@
|
|
|
|
|
|
| |
copyout to avoid leaking kernel stack
ok deraadt@
|
|
|
|
|
| |
future disk info sysctl has pads in the structures, use M_ZERO when
allocating the storage to avoid leaking kernel memory.
|
|
|
|
|
|
|
|
| |
Get rid of the old splnet()/splx() dances. What's protecting them right
now is the KERNEL_LOCK(). but since pf(4) look at these tables we want
to protect them in another way, hence the NET_LOCK(), at least as hint.
ok bluhm@
|
|
|
|
|
|
| |
struct proc to struct process.
ok deraadt@ kettenis@
|
| |
|
|
|
|
|
|
| |
initial thread
ok jsing@ kettenis@
|
|
|
|
|
|
|
| |
each cpus counters still have to be protected by splnet, but this
is better thana single set of counters protected by a global mutex.
ok bluhm@
|
|
|
|
| |
ok jsing@ kettenis@
|
| |
|
| |
|
|
|
|
|
|
|
| |
all dns socket connections will be redirected to localhost:port.
this could be a sockopt on the listening socket, but sysctl is
an easier interface to work with right now.
ok deraadt
|
|
|
|
| |
from Sebastien Marie
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add sysctl kern.allowkmem (default 0) which controls the ability to open
/dev/mem or /dev/kmem at securelevel > 0. Over 15 years we converted 99%
of utilities in the tree to operate on sysctl-nodes (either by themselves
or via code hiding in the guts of -lkvm).
pstat -d and -v & procmap are affected and continued use of them will
require kern.allowkmem=1 in /etc/sysctl.conf. acpidump (and it's
buddy sendbug) are affected, but we'll work out a solution soon.
There will be some impact in ports.
ok kettenis guenther
|
|
|
|
|
|
| |
paths of libevent). This interface was the first generation of what
eventually became getentropy(2) and arc4random(3) -- june 1997!
Ports scan by sthen, general agreement guenther
|
| |
|