summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_input.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* spellingjsg2021-03-101-5/+5
| | | | ok gnezdo@ semarie@ mpi@
* Turns off the direct ACK on every other segmentjan2021-02-031-5/+4
| | | | | | | | | | | | The kernel uses a huge amount of processing time for sending ACKs to the sender on the receiving interface. After receiving a data segment, we send out two ACKs. The first one in tcp_input() direct after receiving. The second ACK is send out, after the userland or the sosplice task read some data out of the socket buffer. Thus, we save some processing time and improve network performance. Longer tested by sthen@ OK claudio@
* Break a glass ceiling on cwnd due to integer division during congestionprocter2020-06-191-2/+2
| | | | | | | | | avoidance. The problem and fix is noted in RFC5681 section 3.1, page 7. Report, diff and testing from Brian Brombacher, thanks! Testing and a cosmetic tweak by myself. ok claudio
* Checking the IPsec policy is expensive. Check only when IPsec is used.tobhe2019-12-061-14/+16
| | | | ok bluhm@
* Change the default security level for incoming IPsec flows fromtobhe2019-11-291-22/+23
| | | | | | isakmpd and iked to REQUIRE. Filter policy violations earlier. ok sashan@ bluhm@
* Prevent underflows in tp->snd_wnd if the remote side ACKs more thanbluhm2019-11-111-3/+9
| | | | | | tp->snd_wnd. This can happen, for example, when the remote side responds to a window probe by ACKing the one byte it contains. from FreeBSD; via markus@; OK sashan@ tobhe@
* Count the number of TCP SACK options that were dropped due to thebluhm2019-07-121-8/+9
| | | | | sack hole list length or pool limit. OK claudio@
* Received SACK options are managed by a linked list at the TCP socket.bluhm2019-07-101-1/+5
| | | | | | | | | | | | | There is a global tunable limit net.inet.tcp.sackholelimit, default is 32768. If an attacker manages to attach all these sack holes to a few TCP connections, the lists may grow long. Traversing them might cause higher CPU consumption on the victim machine. In practice such a situation is hard to create as the TCP retransmit and 2*msl timer flush the list periodically. For additional protection, enforce a per connection limit of 128 SACK holes in the list. reported by Reuven Plevinsky and Tal Vainshtein discussed with claudio@ and procter@; OK deraadt@
* Do not acknowledge a received ack-only tcp packet that we would drop due tofriehm2018-09-171-2/+4
| | | | | | | | | PAWS. Otherwise we could trigger a retransmit of the opposite party with another wrong timestamp and produce loop. I have seen this with a buggy server which messed up tcp timestamps. Suggested by Prof. Jacobson for FreeBSD. ok krw, bluhm, henning, mpi
* Coverity CID 1470233 complainst that the m != NULL check inbluhm2018-07-231-4/+3
| | | | | | syn_cache_get() is not neccessary. Also make the abort label consistent to resetandabort and free the mbuf there. OK mpi@
* Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just callbluhm2018-06-141-22/+4
| | | | | in_pcbconnect() to avoid the address family maze in syn_cache_get(). input claudio@; OK mpi@
* The output from tcp debug sockets was incomplete. After detach tpbluhm2018-06-111-33/+11
| | | | | | | | was NULL and nothing was traced. So save the old tcpcb and use that to retrieve some information. Note that otb may be freed and must not be dereferenced. Use a heuristic for cases where the address family is in the IP header but not provided in the PCB. OK visa@
* Historically there were slow and fast tcp timeouts. That is whybluhm2018-05-081-3/+3
| | | | | | the delack timer had a different implementation. Use the same mechanism for all TCP timer. OK mpi@ visa@
* Make divert lookup similar for all socket types. If PF_TAG_DIVERTEDbluhm2017-12-041-7/+7
| | | | | | | | is set, pf_find_divert() cannot fail so put an assert there. Explicitly check all possible divert types, panic in the default case. For raw sockets call pf_find_divert() before of the socket loop. Divert reply should not match on TCP or UDP listen sockets. OK sashan@ visa@
* Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOSTbluhm2017-12-011-10/+5
| | | | | | | | security check prevents that the user accidentally configures redirect where a divert-to would be appropriate. Instead of spreading the logic into tcp and udp input, check the flag during PCB listen lookup. This also reduces parameters of in_pcblookup_listen(). OK visa@
* Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare runningmpi2017-11-201-1/+2
| | | | | | pr_input handlers without KERNEL_LOCK(). ok visa@
* The TF_BLOCKOUTPUT flag is set around all sorwakeup() and sowwakeup()bluhm2017-11-081-1/+13
| | | | | | | | | | | calls in tcp_input(). When I added this code for socket splicing, I have missed that they may be called indirectly through functions. Although not strictly necessary since we have the sosplice thread, put that flag consistently when we want to prevent that tcp_output() is called in the middle of tcp_input(). As soisconnected(), soisdisconnected(), and socantrcvmore() call the wakeup functions from tcp_input(), set the TF_BLOCKOUTPUT flag around them. OK visa@
* Remove the TCP_FACK option and associated #if{,n}def code.job2017-10-251-111/+2
| | | | | | | | | TCP_FACK was disabled by provos@ in June 1999. TCP_FACK is an algorithm that decides that when something is lost, all not SACKed packets until the most forward SACK are lost. It may be a correct estimate, if network does not reorder packets. OK visa@ mpi@ mikeb@
* Refactor handling of partial TCP acknowledgementsmikeb2017-10-241-93/+81
| | | | With input from Klemens Nanni, OK visa, mpi, bluhm
* Unconditionally enable TCP selective acknowledgements (SACK)mikeb2017-10-221-72/+25
| | | | OK deraadt, mpi, visa, job
* Remove NET_LOCK()'s argument.mpi2017-08-111-5/+4
| | | | Tested by Hrvoje Popovski, ok bluhm@
* Assert that the corresponding socket is locked when manipulating socketmpi2017-06-261-14/+14
| | | | | | | | | | | | | | | | buffers. This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected. Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD. Tested by Hrvoje Popovski. ok claudio@, bluhm@, mikeb@
* Merge the content of <netinet/tcpip.h> and <netinet6/tcpipv6.h> inmpi2017-05-181-2/+1
| | | | | | | | | <netinet/tcp_debug.h>. The IPv6 variant was always included and the IPv4 version is not present on all systems. Most of the offending ports are already fixed, thanks to sthen@!
* Checking for IPv4 mapped addreses and dropping the packet is donebluhm2017-05-061-8/+1
| | | | | | in ip6_input(). Do not check that again in the protocol input functions. OK mpi@
* If m is not a continuous mbuf cluster, m_pullup() in pr_input maybluhm2017-05-041-2/+2
| | | | | | | change the pointer. Then *mp keeps the invalid pointer and it might be used. Fix the potential use after free and also reset *mp in other places to have less dangling pointers to freed mbufs. OK mpi@ mikeb@
* Back out rev 1.185 (which made the code match the comment) andmillert2017-05-031-7/+4
| | | | | | adjust the comment to match reality (or at least rfc7323) instead. This brings us back in line with the behavior of Net and Free. From Lauri Tirkkonen. OK bluhm@
* Use the rt_rmx defines that hide the struct rt_kmetrics indirection.bluhm2017-04-191-4/+4
| | | | | No binary change. OK mpi@
* Use the address family passed down with pr_input to simplifybluhm2017-04-171-48/+4
| | | | | tcp_input(). OK florian@
* Pass down the address family through the pr_input calls. Thisbluhm2017-04-141-3/+2
| | | | | allows to simplify code used for both IPv4 and IPv6. OK mikeb@ deraadt@
* percpu counters for TCP statsjca2017-02-091-85/+80
| | | | ok mpi@ bluhm@
* Change the IPv4 pr_input function to the way IPv6 is implemented,bluhm2017-01-291-27/+18
| | | | | | | to get rid of struct ip6protosw and some wrapper functions. It is more consistent to have less different structures. The divert_input functions cannot be called anyway, so remove them. OK visa@ mpi@
* Since raw_input() and route_input() are gone from pr_input, we canbluhm2017-01-251-8/+2
| | | | | | make the variable parameters of the protocol input functions fixed. Also add the proto to make it similar to IPv6. OK mpi@ guenther@ millert@
* Remove NULL checks before m_free(9), it deals with it.mpi2017-01-101-7/+4
| | | | ok bluhm@, kettenis@
* Introduce the NET_LOCK() a rwlock used to serialize accesses to the partsmpi2016-12-191-10/+10
| | | | | | | | | | | of the network stack that are not yet ready to be executed in parallel or where new sleeping points are not possible. This first pass replace all the entry points leading to ip_output(). This is done to not introduce new sleeping points when trying to acquire ART's write lock, needed when a new L2 entry is created via the RT_RESOLVE. Inputs from and ok bluhm@, ok dlg@
* Be consistent and do not use braces for single line statements.mpi2016-11-161-10/+7
| | | | Prodded by and ok bluhm@
* Kill recursive splsoftnet()s.mpi2016-11-161-28/+27
| | | | | | While here keep local definitions local. ok bluhm@
* Use __func__ in panic strings to reduce noise when grepping.mpi2016-11-151-4/+4
|
* Use goto for consistently instead of splx() and return.mpi2016-11-071-5/+4
| | | | This will allow to have a single lock/unlock dance per timer.
* One more timeout_set_proc(9) conversion.mpi2016-10-041-2/+2
| | | | Found by Chris Jackman, thanks!
* For incomming connections keep the TF_NOPUSH flag if TCP_NOPUSH wasbluhm2016-09-191-2/+2
| | | | | set on the listen socket. From David Hill; OK vgross@
* all pools have their ipl set via pool_setipl, so fold it into pool_init.dlg2016-09-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | the ioff argument to pool_init() is unused and has been for many years, so this replaces it with an ipl argument. because the ipl will be set on init we no longer need pool_setipl. most of these changes have been done with coccinelle using the spatch below. cocci sucks at formatting code though, so i fixed that by hand. the manpage and subr_pool.c bits i did myself. ok tedu@ jmatthew@ @ipl@ expression pp; expression ipl; expression s, a, o, f, m, p; @@ -pool_init(pp, s, a, o, f, m, p); -pool_setipl(pp, ipl); +pool_init(pp, s, a, ipl, f, m, p);
* Use 'sc_route{4,6}' directly instead of casting them to 'struct route *'.mpi2016-08-311-6/+3
| | | | | | This is another little step towards deprecating 'struct route{,_in6}'. ok florian@
* Make the size for the syn cache hash array tunable. As we arebluhm2016-07-201-8/+31
| | | | | | | | swapping between two syn caches for random reseeding anyway, this feature can be added easily. When the cache is empty, there is an opportunity to change the hash size. This allows an admin under SYN flood attack to defend his machine. Suggested by claudio@; OK jung@ claudio@ jmc@
* Make accepted sockets inherit IP_TTL from the listening socket.jca2016-07-011-2/+5
| | | | | | | | | This is consistent with the IPV6_UNICAST_HOPS behavior, and is the only way to allow applications to completely control the TTL of outgoing packets (else an application could temporariy send packets with the default TTL, until it sets again IP_TTL ; this is harmful eg for GTSM). ok bluhm@
* Missing "break;" in switch statement; repairs IP_MINTTL.jca2016-06-271-1/+2
|
* Implement IPV6_MINHOPCOUNT support.jca2016-06-271-3/+13
| | | | | Useful to implement GTSM support in daemons such as bgpd(8). Diff from 2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@
* Copy inp_hops from the listening socket to the accepted one and usebluhm2016-06-271-2/+3
| | | | | | its value for the SYN+ACK packet. This makes the IPV6_UNICAST_HOPS socket option usable for incoming TCP connections. tested by renato@; OK jca@
* The variable swapping between inp, newinp and oldinpcb in syn_cache_get()bluhm2016-06-271-20/+9
| | | | | was overly complicated. Simplify the code without functional change. OK jca@
* Fix typo in comment. From Kapetanakis Giannisbluhm2016-06-091-2/+2
|
* If one of the TCP syn cache buckets overflow, it might be a collisionbluhm2016-03-311-1/+6
| | | | | | | attack against our hash function. In this case, switch to the passive syn cache as soon as possible. It will start with a new random seed for the hash. input and OK mpi@