summaryrefslogtreecommitdiffstats
path: root/sys/sys (follow)
Commit message (Collapse)AuthorAgeFilesLines
* When pledged with "fattr", allow chown to supplimentary groups. Thisderaadt2015-10-141-1/+2
| | | | | came out of a discussion regarding "sort foo -o foo". ok semarie
* Add some newer DT_* and DF_* constantsguenther2015-10-131-1/+17
| | | | ok kettenis@ miod@
* I forgot execve would go through the namei codepath, so a program markedderaadt2015-10-101-3/+4
| | | | | | "stdio rpath" this would fail to execve. pre-indicate exec actions to the namei checker to allow them through. ok semarie
* Rename tame() to pledge(). This fairly interface has evolved to be morederaadt2015-10-091-9/+9
| | | | | | strict than anticipated. It allows a programmer to pledge/promise/covenant that their program will operate within an easily defined subset of the Unix environment, or it pays the price.
* syncderaadt2015-10-092-8/+8
|
* Rename tame() to pledge(). This fairly interface has evolved to be morederaadt2015-10-093-100/+100
| | | | | | strict than anticipated. It allows a programmer to pledge/promise/covenant that their program will operate within an easily defined subset of the Unix environment, or it pays the price.
* Expose a small set of multicast join operators under the request "mcast".deraadt2015-10-081-1/+2
| | | | | | This will be used by a few daemons. If they lack this feature, then they would need to operate without tame. Discussed with renato
* steal some padding in mbuf pkthdrs to store a flow id.dlg2015-10-081-2/+6
| | | | | | | | | the flowid roughly identifies a flow or connection that the mbuf is a part of, and can be used instead of hashing contents of the packet (like src+dst mac and ip addresses) to decide which path a packet should take. ok mpi@ mikeb@ sthen@
* Split out routing sysctl's from tame "inet", and put them into thederaadt2015-10-071-1/+2
| | | | | | | | new tame "route" request. Now routing daemons and tools (such as arp), can narrowly ask for either feature. One thing remains available in both cases -- support for getifaddr()'s, since libc and programs often use that in close association with socket creation. ok benno sthen beck, some discussion with renato
* Initialize the routing table before domains.mpi2015-10-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | The routing table is not an optional component of the network stack and initializing it inside the "routing domain" requires some ugly introspection in the domain interface. This put the rtable* layer at the same level of the if* level. These two subsystem are organized around the two global data structure used in the network stack: - the global &ifnet list, to be used in process context only, and - the routing table which can be read in interrupt context. This change makes the rtable_* layer domain-aware and extends the "struct domain" such that INET, INET6 and MPLS can specify the length of the binary key used in lookups. This allows us to keep, or move towards, AF-free route and rtable layers. While here stop the madness and pass the size of the maximum key length in *byte* to rn_inithead0(). ok claudio@, mikeb@
* Add the tame "exec" request. This allows processes which requestderaadt2015-10-071-1/+3
| | | | | | | | | | | | | | | "exec" to call execve(2), potentially fork(2) beforehands if they asked for "proc". Calling execve is what "shells" (ksh, tmux, etc) have as their primary purpose. But meantime, if such a shell has a nasty bug, we want to mitigate the process from opening a socket or calling 100+ other system calls. Unfortunately silver bullets are in short supply, so if our goal is to stay in a POSIX-y environment, we have to let shells call execve(). POSIX ate the world, so choices do we all have? Warning for many: silver bullets are even more rare in other OS ecosystems, so please accept this as a narrow lowering of the bar in a very raised environment. Commited from a machine running tame "proc exec" ksh, make, etc.
* oops, mistaken commit, spotted by naddyderaadt2015-10-061-1/+3
|
* Add new "tty" request, which allows TIOCGETA, TIOCGPGRP, TIOCGWINSZ,deraadt2015-10-062-7/+8
| | | | | | | | | | | | | | | | TIOCSBRK, TIOCCDTR, TIOCSETA, TIOCSETAW, and TIOCSETAF on tty vnodes. This helps programs which call tcsetattr(), tcgetattr(), or readpassphrase(). Especially the latter - tame's goal is to satisfy the libc requirements of security-sensitive programs. Remove TIOCSETAF from the basic "ioctl" request, because it is a "set" option. "ioctl" is slowly turning into a "request information, cannot set options" package. Split the "cmsg" request into "sendfd" and "recvfd". Non-SCM_RIGHTS messages are currently flowing through freely and we'll need to think about that. This split lets us more strictly describe what our many fd-passing programs will do.
* Rework the tame cmsg handler to make it work both ways. While on recv oneclaudio2015-10-061-3/+4
| | | | | | mbuf blob with all the cmsgs inside while on send cmsgs in an mbuf chain, one mbuf per message. Adjust the calls accordingly. Putting it in so deraadt@ can move forward.
* struct knote's kn_sdata needs to be the same type as struct kevent's dataguenther2015-10-061-2/+2
| | | | ok deraadt@
* regenkettenis2015-10-022-4/+4
|
* Add ktracing of argv and envp to execve(2), with envp not traced by defaultguenther2015-10-021-1/+11
| | | | ok tedu@ deraadt@
* implement new "prot_exec" tame(2) request:semarie2015-09-301-1/+2
| | | | | | | | | | | - by default, a tamed-program don't have the possibility to use PROT_EXEC for mmap(2) or mprotect(2) - for that, use the request "prot_exec" (that could be dropped later) initial idea from deraadt@ and kettenis@ "make complete sense" beck@ ok deraadt@
* Track size of an opaque allocation to pass to free() laterderaadt2015-09-281-1/+2
| | | | ok guenther tedu
* regentedu2015-09-262-4/+4
|
* Move declaration of readdisksector() to disklabel.h. This makes itkrw2015-09-241-1/+3
| | | | | | | available to other areas of the kernel suffering from an overburden of buf tweaking to read a disk sector. ok mpi@
* remove lockmgr_printinfo stubs. from Martin Natanotedu2015-09-231-3/+1
|
* implement SRPL_INSERT_AFTER_LOCKED.dlg2015-09-181-1/+18
| | | | i thought id committed this at l2k15. sorry for the delay.
* syncguenther2015-09-132-8/+8
|
* Remove unused and incorrect defines GPT_PARTSPERSEC and GPT_SECOFFSET.krw2015-09-131-4/+1
|
* Move prototype for spoofgptlabel() from disklabel.h to subr_disk.c.krw2015-09-131-3/+1
| | | | | It's a helper function for readdoslabel(). Not something called outside of subr_disk.c.
* Rename readgptlabel() to spoofgptlabel() because that's what wekrw2015-09-131-3/+3
| | | | | | | really want it to do. Handle all the actual disklabel reading in readdoslabel(). Makes the code much simpler to understand. ok deraadt@
* Introduce sched_barrier(9), an interface that acts as a scheduler barrier inkettenis2015-09-131-1/+4
| | | | | | | | the sense that it guarantees that the specified CPU went through the scheduler. This also guarantees that interrupt handlers running on that CPU will have finished when sched_barrier() returns. ok miod@, guenther@
* tweak ordering slightlydlg2015-09-131-3/+3
|
* sys/syscall_mi is only included by MD trap.c files, which have reason toderaadt2015-09-121-4/+1
| | | | | include param.h/systm.h/proc.h themselves (and already do). ok guenther
* back out refcnt for dv_ref, there's too many hand crafted devices alldlg2015-09-111-3/+2
| | | | | | over the tree. much encouragement from l2k15
* make srp use refcnts so it can use refcnt_finalize instead ofdlg2015-09-111-3/+5
| | | | sleep_setup/sleep_finish.
* use refcnts for the device reference counts as an example of howdlg2015-09-111-2/+3
| | | | refcnt(9) can be used.
* introduce a wrapper around reference counts called refcnt.dlg2015-09-111-0/+41
| | | | | | | | | | | | | | its basically atomic inc/dec, but it includes magical sleep code in refcnt_finalise that is better written once than many times. refcnt_finalise sleeps until all references are released and does so with sleep_setup and sleep_finalize, which is fairly subtle. putting this in now so i we can get on with work in the stack, a proper discussion about visibility and how available intrinsics should be in the kernel can happen after next week. with help from guenther@ ok guenther@ deraadt@ mpi@
* Convert _TM_ flags to TAME_ flags, collapsing the entire mappingderaadt2015-09-111-37/+20
| | | | | | layer because the strings select the right options. Mechanical conversion. ok guenther
* Make room for media types of the future. Extend the ifmedia word to 64 bits.stsp2015-09-111-3/+4
| | | | | | | | | | | | | | | | This changes numbers of the SIOCSIFMEDIA and SIOCGIFMEDIA ioctls and grows struct ifmediareq. Old ifconfig and dhclient binaries can still assign addresses, however the 'media' subcommand stops working. Recompiling ifconfig and dhclient with new headers before a reboot should not be necessary unless in very special circumstances where non-default media settings must be used to get link and console access is not available. There may be some MD fallout but that will be cleared up later. ok deraadt miod with help and suggestions from several sharks attending l2k15
* Only include <sys/tame.h> in the .c files that need itguenther2015-09-112-3/+4
| | | | ok deraadt@ miod@
* Change device locators type from int to long, for the sake of 64-bit portsmiod2015-09-111-2/+2
| | | | | | | | | without proper device trees. Be sure to build and install config(8) and rerun it before attempting to build a kernel. ok kettenis@ deraadt@ jasper@ visa@
* kqueue(2) support for wsmouse(4), wskbd(4) and wsmux(4).mpi2015-09-101-3/+3
| | | | | | Needed for libinput port. ok guenther@, miod@
* Now that the GPT code tries really hard not to get in the way andkrw2015-09-101-3/+1
| | | | | | | | | | accidentally capture disks ... Eliminate kernel option GPT and associated #ifdef GPT/#endif. Let everybody get on the GPT bandwagon and we'll see what wheels fly off. Requested by & ok deraadt@
* syncderaadt2015-09-092-6/+6
|
* Move to next tame() API. The flags are now passed as a very simple string,deraadt2015-09-091-10/+6
| | | | | | | | which results in tame() code placements being much more recognizeable. tame() can be moved to unistd.h and does not need cpp symbols to turn the bits on and off. The resulting API is a bit unexpected, but simplifies the mapping to enabling bits in the kernel substantially. vague ok's from various including guenther doug semarie
* implement a singly linked list built with SRPs.dlg2015-09-091-1/+106
| | | | | | | this allows us to build lists of things that can be followed by multiple cpus. ok mpi@ claudio@
* Give the pool page allocator backends more sensible names. We now have:kettenis2015-09-081-2/+2
| | | | | | | | * pool_allocator_single: single page allocator, always interrupt safe * pool_allocator_multi: multi-page allocator, interrupt safe * pool_allocator_multi_ni: multi-page allocator, not interrupt-safe ok deraadt@, dlg@
* Delete ktracing of context switches: it's unused, and not particularly useful,guenther2015-09-071-12/+1
| | | | | | | and doing VOP_WRITE() from inside tsleep/msleep makes the locking too complicated, making it harder to move forward on MP changes. ok deraadt@ kettenis@
* These days pcc defines __GNUC__ and we don't support gcc2. Also neededdaniel2015-09-041-3/+4
| | | | | | | | for upcoming CompCert port. Final version of the diff is from kettenis@ with input from jsg@ and tedu@. ok kettenis@, jsg@, "I agree" millert@
* Make every subsystem using a radix tree call rn_init() and pass thempi2015-09-041-2/+1
| | | | | | | | | | | | | | | length of the key as argument. This way every consumer of the radix tree has a chance to explicitly initialize the shared data structures and no longer rely on another subsystem to do the initialization. As a bonus ``dom_maxrtkey'' is no longer used an die. ART kernels should now be fully usable because pf(4) and IPSEC properly initialized the radix tree. ok chris@, reyk@
* mattieu baptiste reported a problem with bpf+srps where the per cpudlg2015-09-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hazard pointers were becoming corrupt and therefore panics. the problem turned out to be that bridge_input calls if_input on behalf of a hardware interface which then calls bpf_mtap at splsoftnet, while the actual hardware nic calls if_input and bpf_mtap at splnet. the hardware interrupts ran in the middle of the bpf calls bridge runs at softnet. this means the same srps are being entered and left on the same cpu at different ipls, which led to races because of the order of operations on the per cpu hazard pointers. after a lot of experimentation, jmatthew@ figured out how to deal with this problem without introducing per cpu critical sections (ie, splhigh) calls in srp_enter and srp_leave, and without introducing atomic operations. the solution is to iterate forward through the array of hazard pointers in srp_enter, and backward in srp_leave to clear. if you guarantee that you leave srps in the reverse order to entering them, then you can use the same set of SRPs at different IPLs on the same CPU. the ordering requirement is a problem if we want to build linked data structures out of srps because you need to hold a ref to the current element containing the next srp to use it, before giving up the current ref. we're adding srp_follow() to support taking the next ref and giving up the current one while preserving the structure of the hazard pointer list. srp_follow() does this by reusing the hazard pointer for the current reference for the next ref. both mattieu baptiste and jmatthew@ have been hitting this pretty hard with a tweaked version of srp+bpf that uses srp_follow instead of interleaved srp_enter/srp_leave sequences. neither can reproduce the panics anymore. thanks to mattieu for the report and tests ok jmatthew@
* Use a global table for domains instead of building a list at run time.mpi2015-08-301-3/+2
| | | | | | | As a side effect there's no need to run if_attachdomain() after the list of domains has been built. ok claudio@, reyk@
* Rework the UNIX domain socket garbage collector, including ideas fromguenther2015-08-282-9/+12
| | | | | | | | | | | | | {Free,Net}BSD - when a socket is closed with fds in its input, defer closing them to a task to avoid recursing. This eliminates the complicated extra reference taking which had a 37 line(!) comment explanation - move flags, counts, and links only needed for this from struct file to struct unpcb - document the flow of the mark/sweep collector much help from claudio@ who made me explain the GC to him until we trusted it ok claudio@ mpi@ deraadt@