summaryrefslogtreecommitdiffstats
path: root/sys/netinet (follow)
Commit message (Collapse)AuthorAgeFilesLines
* wrap a long line. no functional change.dlg2020-06-211-2/+3
|
* if an inp_upcall is set, let it look at and maybe steal the udp packet.dlg2020-06-211-3/+11
| | | | | i wrote the original version of this, but it was tweaked by Matt Dunwoodie and Jason A. Donenfeld for use with wireguard.
* knf: the inp_upcall line was too long.dlg2020-06-211-2/+3
|
* add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.dlg2020-06-211-1/+3
| | | | | | | | | this is so protocols (eg, udp) can let things (eg, kernel support for wireguard or vxlan or geneve) look at and possibly steal packets before they get added to a socket buffer. i wrote the original version of this, but it was tweaked by Matt Dunwoodie and Jason A. Donenfeld for use with wireguard.
* Break a glass ceiling on cwnd due to integer division during congestionprocter2020-06-191-2/+2
| | | | | | | | | avoidance. The problem and fix is noted in RFC5681 section 3.1, page 7. Report, diff and testing from Brian Brombacher, thanks! Testing and a cosmetic tweak by myself. ok claudio
* Refuse to set 0 or a negative value for net.inet.tcp.synbucketlimit.mpi2020-06-181-1/+14
| | | | | | | | Prevent a panic in syn_cache_insert() found by syzbot. Reported-by: syzbot+aee24ad9b7bf5665912d@syzkaller.appspotmail.com ok sashan@, anton@, millert@
* Connectionless sockets like UDP can be re-connected to a differentbluhm2020-05-271-1/+8
| | | | | | | | | address. In that case, the linking to the pf state must be dissolved as the latter still contains the old address. If it is a divert state, also remove the state as any divert state must be associated with a matching socket. Call pf_remove_divert_state() and pf_inp_unlink() from in_pcbconnect(). reported by Tim Kuijsten; OK sashan@ claudio@
* Document the various flavors of NET_LOCK() and rename the reader version.mpi2020-05-272-8/+8
| | | | | | | | | | Since our last concurrency mistake only ioctl(2) ans sysctl(2) code path take the reader lock. This is mostly for documentation purpose as long as the softnet thread is converted back to use a read lock. dlg@ said that comments should be good enough. ok sashan@
* don't count packets in the carp protocol handling against an interface.dlg2020-05-211-7/+1
| | | | | | | these packets have generally already been counted on the interface because that's where they were sent or received from. the protocol handling side of things already counts things like packets, which you see with netstat -sp carp.
* implement a carp_transmit that bypasses the ifq on output.dlg2020-05-211-41/+65
| | | | | | | | | | | | this is modelled on vlan_transmit, and basically enqueues the packet directly on the parent interface. even though carp is generally not used to transmit packets, we run dhcp relays on it at work and hit a situation where we unecessarily dropped packets because it's ifq maxlen was 1. i've been running this for a month in production. ok jmatthew@
* remove some trailing whitespace. no functional change.dlg2020-04-291-5/+5
|
* Add support for autmatically moving traffic between rdomains on ipsec(4)tobhe2020-04-234-47/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | encryption or decryption. This allows us to keep plaintext and encrypted network traffic seperated and reduces the attack surface for network sidechannel attacks. The only way to reach the inner rdomain from outside is by successful decryption and integrity verification through the responsible Security Association (SA). The only way for internal traffic to get out is getting encrypted and moved through the outgoing SA. Multiple plaintext rdomains can share the same encrypted rdomain while the unencrypted packets are still kept seperate. The encrypted and unencrypted rdomains can have different default routes. The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'. If this differs from 'tdb_rdomain' then the packet is moved to 'tdb_rdomain_post' afer IPsec processing. Flows and outgoing IPsec SAs are installed in the plaintext rdomain, incoming IPsec SAs are installed in the encrypted rdomain. IPCOMP SAs are always installed in the plaintext rdomain. They can be viewed with 'route -T X exec ipsecctl -sa' where X is the rdomain ID. As the kernel does not create encX devices automatically when creating rdomains they have to be added by hand with ifconfig for IPsec to work in non-default rdomains. discussed with chris@ and kn@ ok markus@, patrick@
* Stop processing packets under non-exclusive (read) netlock.mpi2020-04-121-3/+3
| | | | | | | | | | | | Prevent concurrency in the socket layer which is not ready for that. Two recent data corruptions in pfsync(4) and the socket layer pointed out that, at least, tun(4) was incorrectly using NET_RUNLOCK(). Until we find a way in software to avoid future mistakes and to make sure that only the softnet thread and some ioctls are safe to use a read version of the lock, put everything back to the exclusive version. ok stsp@, visa@
* Guard SIOCDELMULTI if_ioctl calls with KERNEL_LOCK() where the call isvisa2020-03-152-2/+6
| | | | | | | | | | made from socket close path. Most device drivers are not MP-safe yet, and the closing of AF_INET and AF_INET6 sockets is no longer under the kernel lock. This fixes a panic seen by jcs@. OK mpi@
* Fix uninitialized use of variable 'len'.tobhe2020-03-061-6/+4
| | | | ok bluhm@
* add define for IPTOS_DSCP_LE; "low effort" DSCP codepoint standardiseddjm2020-01-261-1/+2
| | | | in RFC8622; ok job@
* rdr-to with loopback destination should work even thoughsashan2019-12-231-2/+3
| | | | | | IP forwarding is disabled. Issue reported by Daniel Jakots (danj@) OK bluhm@
* Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.tobhe2019-12-101-1/+19
| | | | | | | We only install flows for IPcomp. When processing an incoming ESP SA, look for a bundled IPcomp SA and use that in the policy check. ok bluhm@
* always pull in if_types.h, to unbreak ramdisksderaadt2019-12-091-2/+2
|
* Make sure packet destination address matches interface address,sashan2019-12-083-4/+44
| | | | | | | | | where such packet is bound to. This check is enforced if and only IP forwarding is disabled. Change discussed with bluhm@, claudio@, deraadt@, markus@, tobhe@ OK bluhm@, claudio@, tobhe@
* Checking the IPsec policy is expensive. Check only when IPsec is used.tobhe2019-12-062-30/+34
| | | | ok bluhm@
* Don't require a valid sa_len for a bunch of IPv4 "get" ioctlsjca2019-12-011-3/+6
| | | | | Same fix as for the IPv6 case. Fixes a regression in ports/net/openvpn spotted by landry@, ok bluhm@
* Change the default security level for incoming IPsec flows fromtobhe2019-11-292-60/+63
| | | | | | isakmpd and iked to REQUIRE. Filter policy violations earlier. ok sashan@ bluhm@
* Although ifconfig(8) checks it already, enforce contiguous inetbluhm2019-11-281-4/+21
| | | | | netmask in the kernel. OK visa@
* Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasinglyderaadt2019-11-131-2/+2
| | | | | | unfiltered in the future, so this prevents rresvport_af(3) from randomly exposing a service intended for local visibility only. ok florian
* Prevent underflows in tp->snd_wnd if the remote side ACKs more thanbluhm2019-11-111-3/+9
| | | | | | tp->snd_wnd. This can happen, for example, when the remote side responds to a window probe by ACKing the one byte it contains. from FreeBSD; via markus@; OK sashan@ tobhe@
* void being too clever about setting/clearing ifpromisc on the parent.dlg2019-11-081-8/+6
| | | | | | ifpromisc() already refcounts, so carp doesn't have to do it implicitly with the carpdev list. there's no functional change, the code just gets a bit simpler.
* convert interface address change hooks to tasks and a task_list.dlg2019-11-082-11/+11
| | | | | | | | | | | | | | | this follows what's been done for detach and link state hooks, and makes handling of hooks generally more robust. address hooks are a bit different to detach/link state hooks in that there's only a few things that register hooks (carp, pf, vxlan), but a lot of places to run the hooks (lots of ipv4 and ipv6 address configuration). an address hook cookie was in struct pfi_kif, which is part of the pf abi. rather than break pfctl -sI, this maintains the void * used for the cookie and uses it to store a task, which is then used as intended with the new api.
* Do propper kernel input validation for in_control() ioctl(2)bluhm2019-11-071-40/+63
| | | | | | | | | | SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFDSTADDR, SIOCGIFBRDADDR, SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFDSTADDR, and SIOCSIFBRDADDR. Name in_ioctl_set_ifaddr() consistently. Use in_sa2sin() to validate inet address. Combine if_addrlist loops and add comment. Although netmask is not a inet address, length must be valid. Reported-by: syzbot+5fc6da002fc4e8d994be@syzkaller.appspotmail.com OK visa@
* Avoid NULL dereference in arpinvalidate() and nd6_invalidate() bykrw2019-11-071-1/+3
| | | | | | making RTM_INVALIDATE code path perform same check as RTM_DELETE does. ok mpi@
* turn the linkstate hooks into a task list, like the detach hooks.dlg2019-11-071-47/+27
| | | | | | | | | | | | | | | this is largely mechanical, except for carp. this moves the addition of the carp link state hook after we're committed to using the new interface as a carpdev. because the add can't fail, we avoid a complicated unwind dance. also, this tweaks the carp linkstate hook so it only updates the relevant carp interface, not all of the carpdevs on the parent. hrvoje popovski has tested an early version of this diff and it's generally ok, but there's some splasserts that this diff fires that i'll fix in an upcoming diff. ok claudio@
* replace the hooks used with if_detachhooks with a task list.dlg2019-11-061-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | the main semantic change is that things registering detach hooks have to allocate and set a task structure that then gets added to the list. this means if the task is allocated up front (eg, as part of carps softc or bridges port structure), it avoids the possibility that adding a hook can fail. a lot of drivers weren't checking for failure, and unwinding state in the event of failure in other parts was error prone. while doing this i discovered that the list operations have to be in a particular order, but drivers weren't doing that consistently either. this diff wraps the list ops up so you have to seriously go out of your way to screw them up. ive also sprinkled some NET_ASSERT_LOCKED around the list operations so we can make sure there's no potential for the list to be corrupted, especially while it's being run. hrvoje popovski has tested this a bit, and some issues he discovered have been fixed. ok sashan@
* remove mobileip(4)dlg2019-11-043-34/+4
| | | | | | | noone seems to use it, and we should not encourage people to use it by having it available. it's been disabled for most of the last release and noones asked for it in 6.6, so i'm taking that as an ok for this removal.
* make whitespace in the IPPROTO defines consistent. no functional change.dlg2019-10-251-13/+13
|
* +#define IPPROTO_UDPLITE 136, as per RFC 3828 and the IANA allocationdlg2019-10-251-1/+2
| | | | | please don't interpret this as an intention on my part to implement UDP-Lite.
* Kernel is missing propper input validation when configuring addresses.bluhm2019-10-232-34/+66
| | | | | | Fix the SIOCAIFADDR and SIOCDIFADDR ioctl(2) by implementing in_sa2sin() to validate inet address family and address length. OK visa@
* in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.dlg2019-10-171-3/+3
| | | | | | this also brings them in line with the AF_INET equivalents. ok visa@ bluhm@
* tsleep(9) -> tsleep_nsec(9)mpi2019-10-161-2/+3
| | | | ok cheloha@, visa@
* ip_ether.c is empty, and now unlinked from the build.dlg2019-10-071-28/+0
| | | | ok jca@ deraadt@ claudio@ visa@
* gif shouldn't include netinet/ip_ether.h, cos gif doesnt do etherip.dlg2019-10-042-4/+4
| | | | | | ip_ether.h is where netinet/ip_ipip.h got the forward declaration for struct tdb from though, so fix that before cutting ip_ether.h out of gif.
* get rid of prototypes for mplsip_input and mplsip_output. they don't exist.dlg2019-10-041-6/+1
|
* remove the "copy function" argument to bpf_mtap_hdr.dlg2019-09-304-8/+8
| | | | | | | | it was previously (ab)used by pflog, which has since been fixed. apart from that nothing else used it, so we can trim the cruft. ok kn@ claudio@ visa@ visa@ also made sure i fixed ipw(4) so i386 won't break.
* Fix a route use after free in multicast route. Move the rt_mcast_del()bluhm2019-09-021-33/+36
| | | | | | | | | | | out of the rtable_walk(). This avoids recursion to prevent stack overflow. Also it allows freeing the route outside of the walk. Now mrt_mcast_del() frees the route only when it is deleted from the routing table. If that fails, it must not be freed. After the route is returned by mfc_find(), it is reference counted. Then we need a rtfree(), but not in the other caes. Move rt_timer_remove_all() into rt_mcast_del(). OK mpi@
* When we needed the kernel lock for local IP packet delivery, mpi@bluhm2019-08-061-44/+3
| | | | | | | | | | | introduced a queue to grab the lock for multiple packets. Now we have only netlock for both IP and protocol input. So the queue is not necessary anymore. It just switches CPU and decreases performance. So remove the inet and inet6 ip queue for local packets. To get TCP running on loopback, we have to queue once between TCP input and output of the two sockets. So use the loopback queue in looutput() unconditionally. OK visa@
* Add IFXF_AUTOCONF4 to if_xflags to match IFXF_AUTOCONF6. Letkrw2019-07-251-1/+4
| | | | | | ifconfig set/unset it. ok deraadt@ kmos@
* Introduce ETHER_IS_BROADCAST/ANYADDR/EQ() and use them where appropriate.mpi2019-07-172-4/+11
| | | | ok dlg@, sthen@, millert@
* Initialize struct inpcb pool not on demand, but during initialization.bluhm2019-07-153-9/+16
| | | | | Removes a global variable and avoids MP problems. OK mpi@ visa@
* Count the number of TCP SACK options that were dropped due to thebluhm2019-07-123-10/+14
| | | | | sack hole list length or pool limit. OK claudio@
* Received SACK options are managed by a linked list at the TCP socket.bluhm2019-07-102-3/+8
| | | | | | | | | | | | | There is a global tunable limit net.inet.tcp.sackholelimit, default is 32768. If an attacker manages to attach all these sack holes to a few TCP connections, the lists may grow long. Traversing them might cause higher CPU consumption on the victim machine. In practice such a situation is hard to create as the TCP retransmit and 2*msl timer flush the list periodically. For additional protection, enforce a per connection limit of 128 SACK holes in the list. reported by Reuven Plevinsky and Tal Vainshtein discussed with claudio@ and procter@; OK deraadt@
* free(9) sizes for M_RTABLE.mpi2019-07-081-2/+3
| | | | ok kn@