summaryrefslogtreecommitdiffstats
path: root/sys/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Bump keepalive timers unconditionally on sendHEADmasterJason A. Donenfeld2021-10-261-6/+5
| | | | | | | | | | | | | | | | | The keepalive timers -- both persistent and mandatory -- are part of the internal state machine, which needs to be cranked whether or not the packet was actually sent. A packet might be dropped by the network. Or the packet might be dropped by the local network stack. The latter case gives a hint -- which is useful for the data_sent event -- but is harmful to consider for the keepalive state machine. So, crank those timers before even calling wg_send. Incidentally, doing it this way matches exactly what Linux's send.c's wg_packet_create_data_done and Go's send.go's RoutineSequentialSender do too. Suggested-by: Kyle Evans <kevans@freebsd.org> Reported-by: Ryan Roosa <ryanroosa@gmail.com>
* Delete all peer allowed IPs at onceMatt Dunwoodie2021-04-131-43/+34
| | | | | This simplifies the deletion process, so we do not require a lookup of the node before deletion.
* Merge wg_timers and wg_peerMatt Dunwoodie2021-04-131-180/+155
| | | | | | | | The primary motivator here is to get rid of CONTAINER_OF, which is quite an ugly macro. However, any reader should also be aware of the change from d_DISabled to p_ENabled.
* Replace timer lock with SMRMatt Dunwoodie2021-04-131-36/+31
| | | | | | | | The lock was not used to protect any data structures, it was purely to ensure race-free setting of t_disabled. That is, that no other thread was halfway through any wg_timers_run_* function. With smr_* we can ensure this is still the case by calling smr_barrier() after setting t_disabled.
* Run all timeouts in process contextMatt Dunwoodie2021-04-131-32/+20
| | | | | | | So the reason timeouts were running in interrupt context was because it was quicker. Running in process context required a `task` to be added, which we ended up doing anyway. So we might as well rely on timeout API to do it for us.
* Use malloc instead of pool_* for infrequent allocationsMatt Dunwoodie2021-04-131-13/+6
| | | | | | | We can get rid of the pool overhead by using the malloc family of functions. This does lose us the ability to see directly how much each allocation is using, but it if we really want that, maybe we add new malloc types? Either way, not something we need at the moment.
* Use SMR for wg_noiseMatt Dunwoodie2021-04-133-1313/+1089
| | | | | | | | | | | | | | | | | | | | | | | | | | While the largest change here is to use SMR for wg_noise, this was motivated by other deficiencies in the module. Primarily, the nonce operations should be performed in serial (wg_queue_out, wg_deliver_in) and not parallel (wg_encap, wg_decap). This also brings in a lock-free encrypt and decrypt path, which is nice. I suppose other improvements are that local, remote and keypair structs are opaque, so no more reaching in and fiddling with things. Unfortunately, these changes make abuse of the API easier (such as calling noise_keypair_encrypt on a keypair retrieved with noise_keypair_lookup (instead of noise_keypair_current) as they have different checks). Additionally, we have to trust that the nonce passed to noise_keypair_encrypt is non repeating (retrieved with noise_keypair_nonce_next), and noise_keypair_nonce_check is valid on received nonces. One area that could use a little bit more adjustment is the *_free functions. They are used to call a function once it is safe to free a parent datastructure (one holding struct noise_{local,remote} *). This is currently used for lifetimes in the system and allows a consumer of wg_noise to opaquely manage lifetimes based on the reference counting of noise, remote and keypair. It is fine for now, but maybe revisit later.
* Check iter != NULLMatt Dunwoodie2021-04-131-2/+2
| | | | | | | | | The problem with checking peer != NULL is that we already dereference iter to get i_value. This is what was caught in the index == 0 bug reported on bugs@. Instead, we should assert that iter != NULL. This is likely to be removed when adjusting wg_noise.c in the not to distant future.
* Allow setting keepalive while interface is downMatt Dunwoodie2021-04-131-3/+4
|
* Rework encap/decap routinesMatt Dunwoodie2021-04-131-87/+84
| | | | | | | This will make further work on in place decryption a lot easier. Additionally, it improves the readability as we can get rid of the difficult _len variables. The copy in and out of wg_pkt_data is also a cleaner solution than memcpy nonces and whatnot.
* Replace wg_tag with wg_packetMatt Dunwoodie2021-04-041-291/+292
| | | | | | | | | | | | | | | | | | | I'll be the first to admit (but not the first to complain) about the wg_tag situation. It made it very difficult to manage mbufs (that may be reallocated with functions such as m_pullup). It was also not clear where allocation was occuring. This also gets rid of the ring buffers in wg_softc, which added no performance in this situation. They also used memory unnecessarily and increased the complexity. I also used this opportunity to get rid of the confusing t_mbuf/t_done situation and revert to a more understandable UNCRYPTED/CRYPTED/DEAD packet state. I don't believe there were any issues with the old style, but to improve readability is always a welcome addition. With these changes we can start encrypting packets in place (rather than copying to a new mbuf), which should increase performance. This also simplifies length calculations by using m_* functions and reading the pkthdr length.
* Count all handshake packetsMatt Dunwoodie2021-04-041-2/+1
|
* Satisfy my ordering of struct elements and prototoypesMatt Dunwoodie2021-04-041-3/+3
|
* Expand on key clearing messageMatt Dunwoodie2021-04-041-1/+3
|
* Error out if peer provider without public keyMatt Dunwoodie2021-04-041-2/+4
|
* Ensure a peer has a consistent PSK (if set when creating)Matt Dunwoodie2021-04-043-12/+13
|
* Add noise_local_deinit to zero private keysMatt Dunwoodie2021-04-043-0/+10
|
* Push kernel lock within rtable_add(9) and rework it to return 0 in themvs2021-03-262-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | case when requested table is already exists. Except initialization time, route_output() and if_createrdomain() are the only paths where we call rtable_add(9). We check requested table existence by rtable_exists(9) and it's not the error condition if the table exists. Otherwise we are trying to create requested table by rtable_add(9). Those paths are kernel locked so concurrent thread can't create requested table just after rtable_exists(9) check. Also rtable_add(9) has internal rtable_exists(9) check and in this case the table existence assumed as EEXIST error. This error path is never reached. We are going to unlock PF_ROUTE sockets. This means route_output() will not be serialized with if_createrdomain() and concurrent thread could create requested table. Table existence check and creation should be serialized and it makes sense to do this within rtable_add(9). This time kernel lock is used for this so it pushed down to rtable_add(9). The internal rtable_exists(9) check was modified and table existence is not error now. Since the external rtable_exists(9) check is useless it was removed from if_createrdomain(). It still exists in route_output() path because the logic is more complicated here. ok mpi@
* Push kernel lock down to rt_setsource() to make `ifa' dereference safe.mvs2021-03-261-3/+10
| | | | | | | | | | | Netlock doesn't make sense here because ifa_ifwithaddr() holds kernel lock while performs lists walkthrough. This was made to decrease the future diff for PF_ROUTE sockets unlocking. This time kernel lock is still held while we perform rt_setsource(). ok mpi@
* Only install route with label, fix route leak on destroykn2021-03-263-3/+15
| | | | | | | | | | | | | | | | | | | | | ifconfig mp* mplslabel N" validates the label both in ifconfig(8) and each driver's ioctl handler, but there is one case where all drivers install a route without looking at the label at all. SIOCSLIFPHYRTABLE in all three drivers just validates the rdomain and sets the label to itself (0) such that the route is (re)installed accordingly. None of the driver's helper functions dealing with labels and routes validate labels themselves but instead expect the callees, e.g. the ioctl handler to do so. That means we can install routes for the explicit NULL label in non-default routing tables but are never able to clean them up without reboot. Fix this by adding the inverse of mp*_clone_destroy()'s label check to the routines installing the MPLS route to avoid bogus ones in the first place. OK claudio
* wg(4): fix race between tx/rx handshakes, from Matt Dunwoodie, ok mpi@sthen2021-03-211-5/+4
| | | | | | | | | | | | | | | | "There is a race between sending/receiving handshake packets. This occurs if we consume an initiation, then send an initiation prior to replying to the consumed initiation. In particular, when consuming an initiation, we don't generate the index until creating the response (which is incorrect). If we attempt to create an initiation between these processes, we drop any outstanding handshake which in this case has index 0 as set when consuming the initiation. The fix attached is to generate the index when consuming the initiation so that any spurious initiation creation can drop a valid index. The patch also consolidates setting fields on the handshake."
* RFC 8981 allows the configuration of only temporary IPv6 addresses.florian2021-03-201-4/+8
| | | | | Make the interface come up when the IFXF_AUTOCONF6TEMP is set. OK kn
* When changing the link local address send a RTM_IFINFO message out.claudio2021-03-181-2/+4
| | | | | Also prefer if (error == 0) over if (!error). OK florian@ bluhm@
* Do not call rtm_ifchg() if IFF_UP changed. The code in if_up() and if_down()claudio2021-03-181-3/+6
| | | | | already call rtm_ifchg() and so this would just result in a duplicate message. Noticed by deraadt@. OK florian@ bluhm@
* Like in the sysctl case include the ifp_sadl as RTA_IFP address in RTM_IFINFOclaudio2021-03-181-3/+6
| | | | | | messages. This way userland can detect if the lladdr of an interface was changed. OK florian@ bluhm@
* Fix SIOCDELLABEL/"ifconfig mpe0 -mplslabel" to unset label completelykn2021-03-181-2/+2
| | | | | | | | While the corresponding route gets removed properly, the driver's softc kept the old label, i.e. "ifconfig mpe0" would show "mpls: label 42" instead of "mpls: label (unset)" even though it was unset. OK claudio
* Make "ifconfig mpw0 -mplslabel" workkn2021-03-171-1/+4
| | | | | | | Code is there, noone ever used it, I guess. This makes ifconfig(8) documentation actually hold true. OK claudio
* Use correct rdomain when adding/deleting routeskn2021-03-172-7/+7
| | | | | | | | | | | | | | | | mpip(4) always adds and deletes routes in rdomain 0 regardless of the `tunneldomain', i.e. the `sc_rdomain' value. mpw(4) adds routes with the specified rdomain but always deletes them in rdomain 0. mpe(4) consistently uses the softc's rdomain which is tracked consistently across the various ioctls -- no fix needed. Found while reading the code and testing ifconfig(8)'s "tunneldomain" in order to document MPLS ioctls. OK claudio
* Hide kernel internals from userland by wrapping more bits in _KERNEL blocks.claudio2021-03-171-1/+6
| | | | | Especially the includes of net/rtable.h and sys/queue.h are problematic. OK florian@
* When RFC 8981 obsoleted RFC 4941 the terminology changed fromflorian2021-03-111-2/+2
| | | | | | | | | | | | | | | | | | | "privacy extensions" to "temporary address extensions" Change ifconfig(8) to output temporary after temporary addresses and add "temporary" option which is an alias for autoconfprivacy for now. Also make AUTOCONF6TEMP a positiv flag that is set by default. Previously the negative flag "INET6_NOPRIVACY" was set when privacy addresses were disabled. This makes the flags output less ugly and will allow us to disable autoconf addresses while having temporary addresses enabled in the future. More work is needed in slaacd. input benno, jmc, deraadt previous verison OK benno OK jmc, kn
* There is no need to try to attach IPv6 to an interface when theflorian2021-03-111-2/+3
| | | | | | | | AUTOCONF6 flag is already set. This is likely a leftover from when we sent router solicitations from the kernel. This was a way to trigger sending a solicitation from userland. OK kn
* If the AUTOCONF4 or AUTOCONF6 flags get enabled, force the interface up.deraadt2021-03-111-23/+34
| | | | ok florian claudio
* spellingjsg2021-03-1025-59/+59
| | | | ok gnezdo@ semarie@ mpi@
* Issuing FIOSETOWN and TIOCSPGRP ioctl commands on a tun(4) device leaksanton2021-03-091-2/+3
| | | | | | | | | | | device references causing a hang while trying to remove the same interface since the reference count will never reach zero. Instead of returning, break out of the switch in order to ensure that tun_put() gets called. ok deraadt@ mvs@ Reported-by: syzbot+2ca11c73711a1d0b5c6c@syzkaller.appspotmail.com
* Shorten the if_cloners_lock name preventing it from being truncated inanton2021-03-091-2/+2
| | | | | | the top(1) wait column. ok mvs@
* use uint64_t ethernet addresses for compares in carp.dlg2021-03-071-2/+2
| | | | | | | | | | pass the uint64_t that ether_input has already converted from a real ethernet address into carp_input so it can use it without having to do its own conversion. tested by hrvoje popovski tested by me on amd64 and sparc64 ok patrick@ jmatthew@
* ansijsg2021-03-052-88/+42
|
* pass the uint64_t dst ethernet address from ether_input to bridges.dlg2021-03-055-27/+23
| | | | tested on amd64 and sparc64.
* work with 64bit ethernet addresses in ether_input().dlg2021-03-051-9/+10
| | | | | | | | | | | this applies the tricks with addresses from veb and etherbridge code to the normal ethernet input processing. it basically loads the destination address from the packet and the interface ethernet address into uint64_ts for comparison. tested by hrvoje popovski and chris cappuccio tested here on amd64, arm64, and sparc64 ok claudio@ jmatthew@
* clean up span ports as span ports, not bridge ports.dlg2021-03-031-3/+2
| | | | | | | | | the visible result of this is that span ports aren't made promisc like bridge ports. when cleaning up a span port, trying to take promisc off it screwed up the refs, and it makes the underlying interface not able to be promisc when it should be promisc. found by dave voutila
* fix an assert in veb_p_ioctl() that failed when called by a span port.dlg2021-03-021-3/+4
| | | | | | | | | veb_p_ioctl() is used by both veb bridge and veb span ports, but it had an assert to check that it was being called by a veb bridge port. this extends the check so using it on a span port doesnt cause a panic. found by dave voutila
* include of netinet/in.h here is incorrect, because net/route.h will pullderaadt2021-03-021-2/+1
| | | | | excessive types into scope. ok claudio
* Refactor ip_fragment() and ip6_fragment(). Use a mbuf list tobluhm2021-03-013-76/+51
| | | | | | | | | | simplify the handling of the fragment list. Now the functions ip_fragment() and ip6_fragment() always consume the mbuf. They free the mbuf and mbuf list in case of an error and take care about the counter. Adjust the code a bit to make v4 and v6 look similar. Fixes a potential mbuf leak when pf_route6() called pf_refragment6() and it failed. Now the mbuf is always freed by ip6_fragment(). OK dlg@ mvs@
* big numbers need suffixes on some platforms. fix LACP_ADDR_SLOW_E64.dlg2021-02-281-2/+2
| | | | deraadt@ says i broke hppa :(
* Rework route_input() and rtm_sendup(). While we perform foreach loopmvs2021-02-271-32/+12
| | | | | | | | | | | | | in route_input() we drop solock() after we checked socket state. We pass mbuf(9) to this socket at next loops, while it referenced as `last'. Socket's state could be changed by concurrent thread while it's not locked. Since we perform socket's checks and output in same iteration, the logic which prevents mbuf(9) chain copy for the last socket in list was removed. ok bluhm@ claudio@
* trim some code i accidentally left into the nvgre add address functiondlg2021-02-271-4/+1
|
* recover scope from v6 nvgre endpoint addresses for userland to look at.dlg2021-02-271-2/+2
|
* put the mac addr into a uint64_t to compare it to the ethernet slow addr.dlg2021-02-271-5/+9
| | | | also do the ethertype comparison before the conversion above.
* only store the current time on address table entries if it changes.dlg2021-02-261-3/+6
| | | | | | | | this avoids unecessary writes to memory. it helps a little bit with a single nettq, but we get a lot more of a boost in pps when running concurrently. thanks to hrvoje for testing.
* tpmr can use the eth64 bits too.dlg2021-02-261-9/+5
|