summaryrefslogtreecommitdiffstats
path: root/sys/kern (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Use PWAIT instead of PUSER in exit1().mpi2019-06-131-2/+2
| | | | | | | | When the main thread of a MT process dies, it doesn't matter at which priority it gets awaken to do the lasts cleanups. Not using PUSER makes it easier to understand the existing scheduler logic. ok visa@
* When tcp_close() is running in parallel with fill_file(), the kernelbluhm2019-06-131-2/+17
| | | | | | | could crash due to missing inp_ppcb. This happend when fstat(1) was called often and TCP was aborted with reset. Protect the sysctl path with the net lock. OK mpi@
* add m_microtime for getting the wall clock time associated with a packetdlg2019-06-101-1/+14
| | | | | if the packet has the M_TIMESTAMP csum_flag, ph_timestamp is added to the boottime clock, otherwise it just uses microtime().
* Avoid changing resource limits in rucheck() by introducing a new statevisa2019-06-101-8/+9
| | | | | | | variable that tracks when to send next SIGXCPU. This eases MP work and prevents accidental alteration of shared resource limit structs. OK mpi@ semarie@
* Add a temporary workaround to make removal of giant files betterbeck2019-06-091-1/+18
| | | | | | | | | | | | mlarkin@ noticed we would freeze while removing enormous files because of the amount of work done to invalidate buffers on unlink. This adds a temporary workaround to ensure we give up the lock and yield while doing this. The longer term answer will be to move these buffers to another list and not do the work here. ok deraadt@
* Restore missing newline.visa2019-06-061-1/+3
|
* Let SP kernel work with WITNESS. The necessary instrumentation wasvisa2019-06-041-1/+5
| | | | | | | missing from the SP variant of mtx_enter() and mtx_enter_try(). mtx_leave() was correct already. Prompted by and OK patrick@
* sort struct declarationsanton2019-06-031-5/+5
|
* Switch from bintime_add() et al. to bintimeadd(9).cheloha2019-06-032-30/+28
| | | | | | | | | | | | | | | Basically just make all the bintime routines look and behave more like the timeradd(3) macros. Switch to three-argument forms for structure math, introduce and use bintimecmp(9), and rename the structure conversion routines to resemble e.g. TIMEVAL_TO_TIMESPEC(3). Document all of this in a new bintimeadd.9 page. Code input from mpi@, manpage input from schwarze@. code ok mpi@, docs ok schwarze@, docs probably still ok jmc@
* Move initialization of limit0 into a dedicated function. This newvisa2019-06-022-23/+26
| | | | | | | | | function is also a proper place for setting up the plimit pool. While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed in upcoming MP work. OK claudio@
* Revert to using the SCHED_LOCK() to protect time accounting.mpi2019-06-0111-71/+41
| | | | | | | | | It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
* Use a per-process mutex to protect time accounting instead of SCHED_LOCK().mpi2019-05-3111-41/+71
| | | | | | | Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
* Rename struct plimit field p_refcnt to pl_refcnt to avoid confusionvisa2019-05-313-8/+8
| | | | | | | with the fields of struct proc. Make pl_refcnt unsigned for upcoming atomic updating. OK deraadt@ guenther@
* Fix the initialization of bp before calling vfs_getcwd_commonbeck2019-05-301-3/+3
| | | | | | | It is bad style to make a pointer point outside the object so correct this to simply point to the last byte up front. ok deraadt@
* namei() generate KTR_NAMEI record input filenames, but getcwd(2) andderaadt2019-05-302-2/+11
| | | | | | realpath(2) have output filenames. Generate additional KTR_NAMEI records upon success. ok millert beck
* use copyoutstr, instead of fragile range math; ok beckderaadt2019-05-301-7/+5
|
* Correct call to vfs_getcwd_common from within __realpathbeck2019-05-301-15/+15
| | | | | | | | I borrowed an example usage from __getcwd poorly to begin with and then there was some other strangeness in there. diagnosed with deraadt. ok deraadt@
* The past is fuzzy, but it appears during development of __getcwd, *retvalderaadt2019-05-291-2/+1
| | | | | | was used to return the length of the path, when the actual return value is 0. This would cause confusing results in ktrace. Diagnosed with beck since __realpath() picked up the same odd behaviour
* Do not account spinning time as running time when a thread crosses ampi2019-05-251-2/+2
| | | | | | | | | tick boundary of schedlock(). This reduces the contention on the SCHED_LOCK() when the current thread is already spinning. Prompted by deraadt@, ok visa@
* rename struct for consistencyanton2019-05-241-5/+7
|
* fix incorrect order of argumentsanton2019-05-241-3/+3
|
* A source location in kubsan is an absolute path making reports quiteanton2019-05-241-3/+31
| | | | long. Instead, use everything after the first /sys/ segment as the path.
* The latest inteldrm update brought along code making use ofanton2019-05-241-4/+48
| | | | | | | | __attribute__((nonnull)); which the undefined behavior sanitizer in clang is aware of. A new handler is therefore needed in order to compile a kernel with kubsan enabled. ok visa@
* Prevent a kernel hang if an empty message is sent over an SOCK_SEQPACKETbluhm2019-05-241-2/+3
| | | | | socketpair. Do not wakeup receiver if there is no data available. OK claudio@ anton@
* SLIST-ify the timecounter list.cheloha2019-05-221-8/+9
| | | | | | | Call it "tc_list" instead of "timecounters", which is too similar to the variable "timecounter" for my taste. ok mpi@ visa@
* Read and assign the integer value only once. With this sysctl_int() willclaudio2019-05-221-3/+7
| | | | | | | do word loads and stores and so partial updates should no longer be observed. With this accessing global variables set by sysctl_int() should be mostly MP save. OK dlg@ mpi@
* Fix uninitialized return code in adjfreq(2); CID 1480285stsp2019-05-211-2/+2
| | | | ok mlarkin, otto (who both had the same diff)
* kern.timecounter.choices: Don't offer the dummy counter as an option.cheloha2019-05-201-2/+5
| | | | | | | | | | | | | | | The dummy counter is a stopgap during boot. It is not useful after a real timecounter is attached and started and there is no reason to return to using it. So don't even offer it to the admin. This is easy: never add it to the timecounter list. It will effectively cease to exist after the first real timecounter is actived in tc_init(). In principle this means that we can have an empty timecounter list so we need to check for that case in sysctl_tc_choice(). "I don't mind" mpi@, ok visa@
* include uvm.h -> uvm_extern.h; ok visa@anton2019-05-191-2/+2
|
* Add SMR_ASSERT_NONCRITICAL() in assertwaitok(). This eases debuggingvisa2019-05-172-5/+4
| | | | | | | | | | because now the error is detected before context switch. The sleep code path eventually calls assertwaitok() in mi_switch(), so the assertwaitok() in the SMR barrier function is somewhat redundant and can be removed. OK mpi@
* Remove incorrect optimization. The current logic for skipping idle CPUsvisa2019-05-161-21/+3
| | | | | | | | does not establish strong enough ordering between CPUs. Consequently, smr_grace_wait() might incorrectly skip a CPU and invoke an SMR callback too early. Prompted by haesbaert@
* rework the zero warning slightly, and more completely disable until we'retedu2019-05-151-8/+9
| | | | more ready to deal with the noise.
* Add lock order checking for smr_barrier(9). This is similar to thevisa2019-05-141-1/+22
| | | | | | checking done in taskq_barrier(9) and timeout_barrier(9). OK mpi@
* Add a kernel implementation of realpath() as __realpath().beck2019-05-137-17/+160
| | | | | | | | | | | | | We want this so that we can stop allowing readlink() on traversed vnodes in unveil(). This includes all the kernel side and the system call. This is not yet used in libc for realpath, so nothing calls this yet. The libc wrapper will be committed later. Testing by many, and ports build by naddy@ ok deraadt@
* When killing a process, the signal is handled by any thread thatbluhm2019-05-134-28/+40
| | | | | | | | | | does not block the signal. If all threads block the signal, we delivered it to the main thread. This does not conform to POSIX. If any thread unblocks the signal, it should be delivered immediately to this thread. Mark such signals pending at the process instead of a single thread. Then any thread can handle it later. OK kettenis@ guenther@
* dup2(n,n) would rlimit check before handling the n==n shortcut,deraadt2019-05-131-6/+6
| | | | | and incorrectly return EBADF when n>curlim. ok millert guenther tedu
* no need to store the wmesg passed to rwsleep() as a static variable anymoreanton2019-05-121-3/+2
|
* wxneeded binaries on wxallowed filesystems were refused execution. We havederaadt2019-05-111-14/+1
| | | | | | | | encountered a wxneeded binary that attempts correct operation when started on a nowxallowed filesystem (it tries mprotect with RWX, notices ENOTSUP and acts in a different way). So permit execution (but of course don't allow W^X violating mappings) ok sthen kettenis robert
* make rw-lock adaptivesashan2019-05-111-1/+29
| | | | OK visa@, OK mpi@
* Restore previous behavior of limiting deadlock detection to posix-styleanton2019-05-111-7/+8
| | | | | | | | locks. ok jturner@ visa@ Reported-by: syzbot+f9f13034fd656af6c48f@syzkaller.appspotmail.com
* socppc makes an extended visit to the bigbucket.deraadt2019-05-111-2/+2
| | | | ok kettenis
* Reduce number of timehands from to just two.cheloha2019-05-101-21/+10
| | | | | | | | | | | | Reduces the worst-case error for for time values retrieved via the microtime(9) functions from 10 ticks to 2 ticks. Being interrupted for over a tick is unlikely but possible. While here use C99 initializers. From FreeBSD r303383. ok mpi@
* If mallocing the array program header fails, give up on coredumpingguenther2019-05-091-2/+4
| | | | | | instead of panicing ok deraadt@, tedu@, mpi@
* Ensure that pagedaemon wakeups as a result of failed UVM_PLA_NOWAITbeck2019-05-091-3/+3
| | | | | | | | | | | | | allocations will recover some memory from the dma_constraint range. The allocation still fails, the intent is to ensure that the pagedaemon will free some memory to possibly allow a subsequent allocation to succeed. This also adds a UVM_PLA_NOWAKE flag to allow special cases in the buffer cache to not wake up the pagedaemon until they want to. ok kettenis@
* Unlock adjfreq(2), adjtime(2), clock_settime(2), and settimeofday(2).cheloha2019-05-094-12/+14
| | | | | | | | | clock_settime(2)/settimeofday(2) still need KERNEL_LOCK for a moment when resetting the RTC, as that's done periodically from a task under KERNEL_LOCK. Not quite sure how to approach that one yet. ok visa@ mpi@, "good stuff" tedu@, "please wait until after [tree] unlock" deraadt@
* Don't unconditionally throw away dma memory when we don't need to.beck2019-05-091-3/+5
| | | | | Noticed by me and otto@ ok tedu@
* Add a sysctl accessor to struct pf_status. The pf_status only holds theclaudio2019-05-091-1/+6
| | | | | | current status and statistics and can be exported without super-user rights via sysctl to make it easier for tools like systat to access those. OK deraadt@, sashan@
* disable stack printing for now since at least arm64 can't print themtedu2019-05-091-1/+3
| | | | reported by kettenis
* print a few warnings when calling free with a zero size.tedu2019-05-081-3/+14
| | | | | let's see what falls out. ok beck deraadt kettenis mpi
* group function prototypesanton2019-05-081-4/+3
|