summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_input.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Do not acknowledge a received ack-only tcp packet that we would drop due tofriehm2018-09-171-2/+4
| | | | | | | | | PAWS. Otherwise we could trigger a retransmit of the opposite party with another wrong timestamp and produce loop. I have seen this with a buggy server which messed up tcp timestamps. Suggested by Prof. Jacobson for FreeBSD. ok krw, bluhm, henning, mpi
* Coverity CID 1470233 complainst that the m != NULL check inbluhm2018-07-231-4/+3
| | | | | | syn_cache_get() is not neccessary. Also make the abort label consistent to resetandabort and free the mbuf there. OK mpi@
* Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just callbluhm2018-06-141-22/+4
| | | | | in_pcbconnect() to avoid the address family maze in syn_cache_get(). input claudio@; OK mpi@
* The output from tcp debug sockets was incomplete. After detach tpbluhm2018-06-111-33/+11
| | | | | | | | was NULL and nothing was traced. So save the old tcpcb and use that to retrieve some information. Note that otb may be freed and must not be dereferenced. Use a heuristic for cases where the address family is in the IP header but not provided in the PCB. OK visa@
* Historically there were slow and fast tcp timeouts. That is whybluhm2018-05-081-3/+3
| | | | | | the delack timer had a different implementation. Use the same mechanism for all TCP timer. OK mpi@ visa@
* Make divert lookup similar for all socket types. If PF_TAG_DIVERTEDbluhm2017-12-041-7/+7
| | | | | | | | is set, pf_find_divert() cannot fail so put an assert there. Explicitly check all possible divert types, panic in the default case. For raw sockets call pf_find_divert() before of the socket loop. Divert reply should not match on TCP or UDP listen sockets. OK sashan@ visa@
* Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOSTbluhm2017-12-011-10/+5
| | | | | | | | security check prevents that the user accidentally configures redirect where a divert-to would be appropriate. Instead of spreading the logic into tcp and udp input, check the flag during PCB listen lookup. This also reduces parameters of in_pcblookup_listen(). OK visa@
* Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare runningmpi2017-11-201-1/+2
| | | | | | pr_input handlers without KERNEL_LOCK(). ok visa@
* The TF_BLOCKOUTPUT flag is set around all sorwakeup() and sowwakeup()bluhm2017-11-081-1/+13
| | | | | | | | | | | calls in tcp_input(). When I added this code for socket splicing, I have missed that they may be called indirectly through functions. Although not strictly necessary since we have the sosplice thread, put that flag consistently when we want to prevent that tcp_output() is called in the middle of tcp_input(). As soisconnected(), soisdisconnected(), and socantrcvmore() call the wakeup functions from tcp_input(), set the TF_BLOCKOUTPUT flag around them. OK visa@
* Remove the TCP_FACK option and associated #if{,n}def code.job2017-10-251-111/+2
| | | | | | | | | TCP_FACK was disabled by provos@ in June 1999. TCP_FACK is an algorithm that decides that when something is lost, all not SACKed packets until the most forward SACK are lost. It may be a correct estimate, if network does not reorder packets. OK visa@ mpi@ mikeb@
* Refactor handling of partial TCP acknowledgementsmikeb2017-10-241-93/+81
| | | | With input from Klemens Nanni, OK visa, mpi, bluhm
* Unconditionally enable TCP selective acknowledgements (SACK)mikeb2017-10-221-72/+25
| | | | OK deraadt, mpi, visa, job
* Remove NET_LOCK()'s argument.mpi2017-08-111-5/+4
| | | | Tested by Hrvoje Popovski, ok bluhm@
* Assert that the corresponding socket is locked when manipulating socketmpi2017-06-261-14/+14
| | | | | | | | | | | | | | | | buffers. This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected. Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD. Tested by Hrvoje Popovski. ok claudio@, bluhm@, mikeb@
* Merge the content of <netinet/tcpip.h> and <netinet6/tcpipv6.h> inmpi2017-05-181-2/+1
| | | | | | | | | <netinet/tcp_debug.h>. The IPv6 variant was always included and the IPv4 version is not present on all systems. Most of the offending ports are already fixed, thanks to sthen@!
* Checking for IPv4 mapped addreses and dropping the packet is donebluhm2017-05-061-8/+1
| | | | | | in ip6_input(). Do not check that again in the protocol input functions. OK mpi@
* If m is not a continuous mbuf cluster, m_pullup() in pr_input maybluhm2017-05-041-2/+2
| | | | | | | change the pointer. Then *mp keeps the invalid pointer and it might be used. Fix the potential use after free and also reset *mp in other places to have less dangling pointers to freed mbufs. OK mpi@ mikeb@
* Back out rev 1.185 (which made the code match the comment) andmillert2017-05-031-7/+4
| | | | | | adjust the comment to match reality (or at least rfc7323) instead. This brings us back in line with the behavior of Net and Free. From Lauri Tirkkonen. OK bluhm@
* Use the rt_rmx defines that hide the struct rt_kmetrics indirection.bluhm2017-04-191-4/+4
| | | | | No binary change. OK mpi@
* Use the address family passed down with pr_input to simplifybluhm2017-04-171-48/+4
| | | | | tcp_input(). OK florian@
* Pass down the address family through the pr_input calls. Thisbluhm2017-04-141-3/+2
| | | | | allows to simplify code used for both IPv4 and IPv6. OK mikeb@ deraadt@
* percpu counters for TCP statsjca2017-02-091-85/+80
| | | | ok mpi@ bluhm@
* Change the IPv4 pr_input function to the way IPv6 is implemented,bluhm2017-01-291-27/+18
| | | | | | | to get rid of struct ip6protosw and some wrapper functions. It is more consistent to have less different structures. The divert_input functions cannot be called anyway, so remove them. OK visa@ mpi@
* Since raw_input() and route_input() are gone from pr_input, we canbluhm2017-01-251-8/+2
| | | | | | make the variable parameters of the protocol input functions fixed. Also add the proto to make it similar to IPv6. OK mpi@ guenther@ millert@
* Remove NULL checks before m_free(9), it deals with it.mpi2017-01-101-7/+4
| | | | ok bluhm@, kettenis@
* Introduce the NET_LOCK() a rwlock used to serialize accesses to the partsmpi2016-12-191-10/+10
| | | | | | | | | | | of the network stack that are not yet ready to be executed in parallel or where new sleeping points are not possible. This first pass replace all the entry points leading to ip_output(). This is done to not introduce new sleeping points when trying to acquire ART's write lock, needed when a new L2 entry is created via the RT_RESOLVE. Inputs from and ok bluhm@, ok dlg@
* Be consistent and do not use braces for single line statements.mpi2016-11-161-10/+7
| | | | Prodded by and ok bluhm@
* Kill recursive splsoftnet()s.mpi2016-11-161-28/+27
| | | | | | While here keep local definitions local. ok bluhm@
* Use __func__ in panic strings to reduce noise when grepping.mpi2016-11-151-4/+4
|
* Use goto for consistently instead of splx() and return.mpi2016-11-071-5/+4
| | | | This will allow to have a single lock/unlock dance per timer.
* One more timeout_set_proc(9) conversion.mpi2016-10-041-2/+2
| | | | Found by Chris Jackman, thanks!
* For incomming connections keep the TF_NOPUSH flag if TCP_NOPUSH wasbluhm2016-09-191-2/+2
| | | | | set on the listen socket. From David Hill; OK vgross@
* all pools have their ipl set via pool_setipl, so fold it into pool_init.dlg2016-09-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | the ioff argument to pool_init() is unused and has been for many years, so this replaces it with an ipl argument. because the ipl will be set on init we no longer need pool_setipl. most of these changes have been done with coccinelle using the spatch below. cocci sucks at formatting code though, so i fixed that by hand. the manpage and subr_pool.c bits i did myself. ok tedu@ jmatthew@ @ipl@ expression pp; expression ipl; expression s, a, o, f, m, p; @@ -pool_init(pp, s, a, o, f, m, p); -pool_setipl(pp, ipl); +pool_init(pp, s, a, ipl, f, m, p);
* Use 'sc_route{4,6}' directly instead of casting them to 'struct route *'.mpi2016-08-311-6/+3
| | | | | | This is another little step towards deprecating 'struct route{,_in6}'. ok florian@
* Make the size for the syn cache hash array tunable. As we arebluhm2016-07-201-8/+31
| | | | | | | | swapping between two syn caches for random reseeding anyway, this feature can be added easily. When the cache is empty, there is an opportunity to change the hash size. This allows an admin under SYN flood attack to defend his machine. Suggested by claudio@; OK jung@ claudio@ jmc@
* Make accepted sockets inherit IP_TTL from the listening socket.jca2016-07-011-2/+5
| | | | | | | | | This is consistent with the IPV6_UNICAST_HOPS behavior, and is the only way to allow applications to completely control the TTL of outgoing packets (else an application could temporariy send packets with the default TTL, until it sets again IP_TTL ; this is harmful eg for GTSM). ok bluhm@
* Missing "break;" in switch statement; repairs IP_MINTTL.jca2016-06-271-1/+2
|
* Implement IPV6_MINHOPCOUNT support.jca2016-06-271-3/+13
| | | | | Useful to implement GTSM support in daemons such as bgpd(8). Diff from 2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@
* Copy inp_hops from the listening socket to the accepted one and usebluhm2016-06-271-2/+3
| | | | | | its value for the SYN+ACK packet. This makes the IPV6_UNICAST_HOPS socket option usable for incoming TCP connections. tested by renato@; OK jca@
* The variable swapping between inp, newinp and oldinpcb in syn_cache_get()bluhm2016-06-271-20/+9
| | | | | was overly complicated. Simplify the code without functional change. OK jca@
* Fix typo in comment. From Kapetanakis Giannisbluhm2016-06-091-2/+2
|
* If one of the TCP syn cache buckets overflow, it might be a collisionbluhm2016-03-311-1/+6
| | | | | | | attack against our hash function. In this case, switch to the passive syn cache as soon as possible. It will start with a new random seed for the hash. input and OK mpi@
* Allow to adjust tcp_syn_use_limit with sysctl net.inet.tcp.synuselimit.bluhm2016-03-291-9/+2
| | | | | | | | This is convenient to test the feature and may be useful to defend against syn flooding in a denial of service condition. It is consistent to the existing syn cache sysctls. Move some declarations to tcp_var.h to access the syn cache sets from tcp_sysctl(). OK mpi@
* To prevent attacks on the hash buckets of the syn cache, our TCPbluhm2016-03-271-49/+78
| | | | | | | | | | | stack reseeds the hash function every time the cache is empty. Unfortunatly the attacker can prevent the reseeding by sending unanswered SYN packes periodically. Fix this by having an active syn cache that gets new entries and a passive one that is idling out. When the passive one is empty and the active one has been used 100000 times, they switch roles and the hash function is reseeded with new random. tedu@ agrees; OK mpi@
* Add a tcps_sc_seedrandom counter in TCP SYN cache and netstat -s.bluhm2016-03-211-2/+4
| | | | | | This shows how often the hash function is reseeded and the random bucket distribution changes. OK mpi@ claudio@
* Sync no-argument function declaration and definition by adding (void).naddy2016-03-071-2/+2
| | | | ok mpi@ millert@
* fix a missing if_put() in the default af path of tcp_mss()jsg2016-01-221-3/+3
| | | | ok mpi@
* upgrade tcp/ip to use the latest in C89 technology: memcpy.tedu2015-12-051-17/+16
| | | | ok henning
* To avoid that the stack manipules the pf statekeys directly, introducebluhm2015-12-031-23/+4
| | | | | | | | | pf_inp_...() lookup, link and unlink functions as an interface. Locking can be added to them later. Remove the first linking at the beginning of tcp_input() and udp_input() as it is not necessary. It will be done later anyway. That code was a relict, from the time before I had added the second linking. Input from mikeb@ and sashan@; OK sashan@
* Fix an hypotetical NULL dereference which might become true once the TCPmpi2015-11-291-9/+5
| | | | | | layer will be turned mpsafe. We're not there yet. Reported by David Hill, ok florian@