| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
to the not interrupt allocator.
|
|
|
|
|
|
|
|
|
|
| |
accomodating allocator. an interrupt safe pool may also be used in process
context, as indicated by waitok flags. thanks to the garbage collector, we
can always free pages in process context. the only complication is where
to put the pages. solve this by saving the allocation flags in the pool
page header so the free function can examine them.
not actually used in this diff. (coming soon.)
arm testing and compile fixes from phessler
|
|
|
|
|
| |
in struct ps_strings.
from NetBSD; OK deraadt@ guenther@ visa@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets witness(4) save a stack trace on each lock acquisition.
The saved traces can be viewed in ddb(4) when showing the currently
held locks, which may help when debugging incorrect locking.
Sample output:
ddb{0}> show all locks
Process 63836 (rm) thread 0xffff8000221e52c8 (435004)
exclusive rrwlock inode r = 0 (0xfffffd8119a092c0) locked @ /usr/src/sys/ufs/ufs/ufs_vnops.c:1547
#0 witness_lock+0x419
#1 _rw_enter+0x2bb
#2 _rrw_enter+0x42
#3 VOP_LOCK+0x3f
#4 vn_lock+0x36
#5 vfs_lookup+0xa1
#6 namei+0x2b3
#7 dounlinkat+0x85
#8 syscall+0x338
#9 Xsyscall+0x128
exclusive kernel_lock &kernel_lock r = 1 (0xffffffff81e6a5f0) locked @ /usr/src/sys/arch/amd64/amd64/intr.c:525
#0 witness_lock+0x419
#1 syscall+0x2b6
#2 Xsyscall+0x128
The saving adds overhead, so it is not enabled by default. It can be
taken into use by setting sysctl kern.witness.locktrace=1 at runtime
or by defining WITNESS_LOCKTRACE in the kernel configuration.
Feedback and OK anton@
|
|
|
|
| |
ok cheloha@
|
|
|
|
|
|
|
|
|
|
| |
usrreq functions move the mbuf m_freem() logic to the release block
instead of distributing it over the switch statement. Then the
goto release in the initial check, whether the pcb still exists,
will not free the mbuf for the PRU_RCVD, PRU_RVCOOB, PRU_SENSE
command.
OK claudio@ mpi@ visa@
Reported-by: syzbot+8e7997d4036ae523c79c@syzkaller.appspotmail.com
|
|
|
|
|
|
|
| |
the system priority level to IPL_HIGH. This simplifies the code a bit
relative to calling from witness_lock() and witness_unlock().
OK mpi@
|
|
|
|
|
|
|
|
|
|
| |
caller supplied pointer. Otherwise, the caller is left with a dangling
pointer that could lead to a use-after-free panic.
ok millert@ visa@
Reported-by: syzbot+ac1d7685deab53b95ace@syzkaller.appspotmail.com
Reported-by: syzbot+dbe8f002f8051f26f6fe@syzkaller.appspotmail.com
|
|
|
|
|
|
|
|
| |
this fixes an issue found by a regress test on sparc64 by claudio,
and between us took about half a day of work to understand and fix
at a2k19.
ok claudio@
|
|
|
|
|
| |
add locking in clock_gettime where needed.
ok cheloha matthew
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we come back from suspend/hibernate the BIOS/firmware/whatever can
hand us *any* TOD, so we need to check that the given TOD doesn't set our
boot offset backwards, breaking the monotonicity of e.g. CLOCK_MONOTONIC.
This is trivial to do from the BIOS on most PCs before unhibernating.
There might be other ways it can happen, accidentally or otherwise.
This is a bit messy but it can be made prettier later with a "bintimecmp"
macro or something like that.
Problem confirmed by jmatthew@.
"you are very likely right" deraadt@
|
|
|
|
|
|
| |
introduction of struct lockf_state.
ok bluhm@ visa@
|
|
|
|
|
|
|
|
| |
The new node contains the subsystem's main control variable,
kern.witness.watch. It is aliased by the old name, kern.witnesswatch.
The alias will be removed in the future.
OK anton@ mpi@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Idle threads are never placed on the runqueue so their priority doesn't
matter.
This fixes an accounting bug where top(1) would report a high CPU usage
for Idle threads of secondary CPUs right after booting. That's because
schedcpu() would give 100% CPU time to the Idle thread until "real"
threads get scheduled on the corresponding CPU.
Issue reported by bluhm@, ok visa@, kettenis@
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
and provide them as nmea(4) distance & velocity sensors.
With my 'u-blox GNSS receiver' that gives:
hw.sensors.nmea0.distance0=335.600 m (Altitude), OK
hw.sensors.nmea0.velocity0=18.337 m/s (Ground speed), OK
ok deraadt@
|
| |
|
|
|
|
|
|
| |
This eases data extraction in syzkaller.
Prompted by and OK anton@
|
| |
|
|
|
|
|
| |
level up.
ok guenther mpi visa
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we validate time input for all four of these syscalls in the
workhorse function dovutimens(). This is bad because both futimes(2)
and utimes(2) have input as timevals that need to be converted to
timespecs. This multiplication can overflow to create a "valid"
input, e.g. if tv_usec is equal to 2^61 (invalid value) on a platform
with 64-bit longs, the resulting tv_nsec is equal to zero (valid value).
This is also a bit wasteful. We aquire a vnode and do other work
under KERNEL_LOCK only to release the vnode when the time input is
invalid.
So, duplicate a bit of code to validate the time inputs before we do
any conversions or real VFS work.
probably still ok tedu@ deraadt@
|
|
|
|
|
|
| |
ok beck
Reported-by: syzbot+cc59412ed8429450a1ae@syzkaller.appspotmail.com
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the arbitrary and undocumented 24hr limits for timeouts from these
interfaces. To do so, loop tsleep(9) to chip away at timeouts larger than
what tsleep(9) can handle in one call.
Use timerisvalid(3)/timespecisvalid() for input validation instead of
itimerfix()/timespecfix() to avoid the 100 million second upper bounds
those functions introduce.
POSIX requires support for timeouts of at least 31 days for select(2) and
pselect(2), so these changes make our implementation more compliant.
Other improvements here include better variable names for the time stuff
and more consolidated timeout logic with less backwards goto jumping, all
of which made dopselect() and doppoll() a bear to read.
Naming improvements prompted by tedu@ in a prior patch for nanosleep(2).
With input from deraadt@. Validation bug spotted by matthew@ in an earlier
version.
ok visa@
|
|
|
|
|
|
| |
doesn't get freed. move the free calls into the same function as namei.
fixed bug report from Dariusz Sendkowski
ok beck
|
|
|
|
|
|
|
| |
Allows a subset of ioctls on video(4) devices, subset selected from
video(1) and firefox webrtc implementation.
ok semarie@ deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
| |
structure allows for better tracking of pending lock operations which is
essential in order to prevent a use-after-free once the underlying vnode is
gone.
Inspired by the lockf implementation in FreeBSD.
ok visa@
Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a user thread from e.g. clock_settime(2) is in the midst of changing
the boottime or calling tc_windup() when it is interrupted by hardclock(9),
the timehands could be left in a damaged state.
So protect tc_windup() calls with a mutex, timecounter_mtx. hardclock(9)
merely attempts to enter the mutex instead of spinning because it cannot
afford to wait around. In practice hardclock(9) will skip tc_windup() very
rarely, and when it does skip there aren't any negative effects because the
skip indicates that a user thread is already calling, or about to call,
tc_windup() anyway.
Based on FreeBSD r303387 and NetBSD sys/kern/kern_tc.c,v1.30
Discussed with mpi@ and visa@. Tons of nice technical detail about
lockless reads from visa@.
OK visa@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To protect the timehands we first need to protect the basis for all UTC
time in the kernel: the boottime.
Because the boottime can be changed at any time it needs to be versioned
along with the other members of the timehands to enable safe lockless reads
when using it for anything. So the global boottime timespec goes away and
the static boottimebin becomes a member of the timehands. Instead of reading
the global boottime you use one of two interfaces: binboottime(9) or
microboottime(9). nanoboottime(9) can trivially be added later, though there
are no consumers for it at the moment.
This introduces one small change in behavior. We used to advance the
reported boottime just before launching kernel threads from main().
This makes it look to userland like we "booted" moments before those
threads were launched. Because there is no longer a boottime global we
can no longer trivially do this from main(), so the boottime we report
to userspace via e.g. kern.boottime will now reflect whatever the time
was when we bootstrapped the timehands via inittodr(9). This is usually
no more than a minute before the kernel threads are launched from main().
The prior behavior can be restored by adding a new interface to the
timecounter layer in a future commit.
Based on FreeBSD r303387.
Discussed with mpi@ and visa@.
ok visa@
|
| |
|
|
|
|
|
|
|
|
| |
Linux does validation.
Document this new failure case as an EINVAL, like Linux.
"stop waiting" deraadt
|
|
|
|
|
|
|
|
|
| |
Add documentation for the new EINVAL cases for adjtime(2) and
settimeofday(2).
adjtime.2 docs ok schwarze@,
settimeofday(2)/clock_settime(2) stuff ok tedu@,
"stop waiting" deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will still be able to run i386 guests on amd64 vmm.
Reasons to delete i386 vmm:
- Been broken for a while, almost no one complained.
- Had been falling out of sync from amd64 while it worked.
- If your machine has vmx, you most probably can run amd64, so why not run that?
ok deraadt@ mlarkin@
|
|
|
|
|
|
|
|
|
| |
1) Correctly notice covering unveil when using .. - fix crash noticed by visa@
2) Notice when v_mount is NULL to not crash when unveil vnodes are on a
forcibly unmounted filesystem, noticed by yasuoka@
3) Add a flag to ni_data so that failures from unveil flag mismatches in covering
unveils return the correct EACCESS instead of ENOENT (noticed by brynet@)
ok deraadt@
|
| |
|
|
|
|
|
| |
unveil matches when .. is used correctly. Also adds regress based
upon his test program for the same issue.
|
| |
|
|
|
|
|
|
|
| |
about shared resources which no program should see. only a few pieces of
software use it, generally poorly thought out. they are being fixed, so
mincore() can be deleted.
ok guenther tedu jca sthen, others
|
|
|
|
| |
ok jca@ visa@ guenther@ deraadt@
|
|
|
|
| |
OK millert@ bluhm@
|
|
|
|
|
|
| |
flag to the other references. Then the final m_free() will clear
the memory.
OK claudio@
|
|
|
|
|
|
|
| |
return. Hopefully the other reference holder has the M_ZEROIZE flag set as
well. Triggered by syzkaller. OK deradt@ visa@
Reported-by: syzbot+c578107d70008715d41f@syzkaller.appspotmail.com
|
|
|
|
|
|
|
| |
tough (so that non-YP using developers don't break the tree for YP/LDAP
users). This check failed to handle the newish RPATH+UNVEIL_INSPECT namei
operation.
discovered by florian, ok beck
|
|
|
|
| |
ok deraadt mestre
|
|
|
|
|
|
| |
where ps_uvpcwd obviously contains a dangling pointer.
ok deraadt@, krw@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.
With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.
Use of timers adapted from FreeBSD.
OK tedu@
Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com
|
|
|
|
|
|
| |
level directories from working when you don't traverse into them starting
from /. Most found by brynet@ and a few others.
ok brynet@ deraadt@
|