summaryrefslogtreecommitdiffstats
path: root/sys/netinet/ip_output.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* If pf changes the routing table when sending packets, the kernelbluhm2021-02-101-2/+15
| | | | | | | | could get stuck in an endless recursion during TCP path MTU discovery. Create a dynamic host route in ip_output() that can be used by tcp_mtudisc() to store the MTU. Reported by Peter Mueller and Sebastian Sturm OK claudio@
* Simplex interface sends packet back without hardware checksumbluhm2021-02-061-13/+28
| | | | | | | | offloading. The checksum must be calculated in software. Use the same condition in ether_resolve() to send the broadcast packet back to the stack and in in_ifcap_cksum() to force software checksumming. This fixes regress/sys/kern/sosplice/loop. OK procter@
* If IP_MULTICAST_IF or IP_ADD_MEMBERSHIP pass a interface index to theclaudio2021-02-021-3/+6
| | | | | | | kernel make sure that the rdomain of that interface is the same as the rdomain of the inpcb. Problem spotted and fix tested by semarie@ OK bluhm@ mvs@
* Fix path MTU discovery for ESP tunneled in IPv6. We always wantbluhm2021-02-011-1/+4
| | | | | | | short TCP segments or fragments encapsulated in ESP instead of fragmented ESP packets. Pass the don't fragment flag down along the stack so that dynamic routes with MTU are created eventually. with and OK markus@; OK tobhe@
* Extend IP_MULTICAST_IF to take either an address (struct in_addr), aclaudio2021-01-161-3/+32
| | | | | | | | struct ip_mreq or a struct ip_mreqn. Using struct ip_mreqn allows to pass a interface index instead of specifying the multicast interface via its IP address. This is also the API implemented by Linux and FreeBSD and should help porting software. OK bluhm@ phessler@ robert@
* Create a path MTU host route for IPsec over IPv6. Basically thebluhm2021-01-111-2/+2
| | | | | | | | | | | | | | | | code is copied from IPv4 and adapted. Some things are changed in v4 to make it look similar. - ip6_forward increases the noroute error counter, do that in ip_forward, too. - Pass more specific sockaddr_in6 to icmp6_mtudisc_clone(). - IPv6 may also use reject routes for IPsec PMTU clones. - To pass a route_in6 to ip6_output_ipsec_send() introduce one in ip6_forward(). That is the same what IPv4 does. Note that dst and sin6 switch roles. - Copy comments from ip_output_ipsec_send() to ip6_output_ipsec_send() to make code similar. - Implement dynamic IPv6 IPsec PMTU routes. OK tobhe@
* Extend IP_ADD_MEMBERSHIP to also support struct ip_mreqn.claudio2021-01-071-63/+80
| | | | | | | struct ip_mreqn allows to use the interface index to select the interface for multicast packets which makes it possible to use this with unnumbered interfaces. OK dlg@ robert@
* Accept reject and blackhole routes for IPsec PMTU discovery.bluhm2020-12-201-2/+2
| | | | | | | | | | | | | | | Since revision 1.87 of ip_icmp.c icmp_mtudisc_clone() ignored reject routes. Otherwise TCP would clone these routes for PMTU discovery. They will not work, even after dynamic routing has found a better route than the reject route. With IPsec the use case is different. First you need a route, but then the flow handles the packet without routing. Usually this route should be a reject route to avoid sending unencrypted traffic if the flow is missing. But IPsec needs this route for PMTU discovery, so use it for that. OK claudio@ tobhe@
* kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha2020-06-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
* Fix uninitialized use of variable 'len'.tobhe2020-03-061-6/+4
| | | | ok bluhm@
* Use mallocarray(9) & put some free(9) sizes for M_IPMOPTS allocations.mpi2019-06-101-8/+8
| | | | ok semarie@, visa@
* Removes the KERNEL_LOCK() from bridge(4)'s output fast-path.mpi2019-04-281-6/+6
| | | | | | | | This redefines the ifp <-> bridge relationship. No lock can be currently used across the multiples contexts where the bridge has tentacles to protect a pointer, use an interface index. Tested by various, ok dlg@, visa@
* Bring back the ip_pcbopts() refactor. Pad the option buffer and thereforclaudio2019-01-181-39/+54
| | | | | | the mbuf to the next word length as it is required by the standard. Also use the correct offset from the input mbuf. OK visa@, input & OK bluhm@
* Revert Rev 1.351, the change is not quite right yet.claudio2019-01-181-49/+36
|
* Rewrite ip_pcbopts() to fill a fresh mbuf with the ip options insteadclaudio2019-01-061-36/+49
| | | | | of fiddling with the user supplied mbuf and then copy it at the end. OK visa@
* Replace a funky 'else switch' construct into something that is equal butclaudio2019-01-031-4/+5
| | | | | | a lot easier to read. The if can simply return the error and so the else branch is no longer needed. Input and OK dhill@
* Replace a wrong poor mans m_trailingspace() with the real thing. The mbufclaudio2018-12-201-2/+2
| | | | | | | | passed to ip_pcbopts could be a cluster and so the size check is all wrong. found by Greg Steuck; OK bluhm@ Reported-by: syzbot+c2543ae6b6692a5843e3@syzkaller.appspotmail.com eVS: ----------------------------------------------------------------------
* Add per-TDB counters and a new SADB extension to export them tompi2018-08-281-2/+4
| | | | | | userland. Inputs from markus@, ok sthen@
* Introduce ipsec_output_cb() to merge duplicate code and account formpi2018-07-121-2/+6
| | | | | | | | dropped packets in the output path. While here fix a memory leak when compression is not needed w/ IPcomp. ok markus@
* In ip6_output() check that the interface of a route is valid. Forbluhm2018-03-211-1/+8
| | | | | | IPv4 we do the same and there are races that triggers it. Increment the statistics counter for both. from markus@; OK mpi@
* Remove almost unused `flags' argument of suser().mpi2018-02-191-6/+6
| | | | | | | The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
* It does not make sense to call pcb lookup from pf during packetbluhm2017-11-221-8/+8
| | | | | | | forwarding. It should never match and would cause MP locking problems. While there remove an useless ifp parameter from ip_output_ipsec_send(). from markus@; OK visa@ sashan@
* Stop grabbing the KERNEL_LOCK() in network tasks when `ipsec_in_use'mpi2017-10-261-3/+1
| | | | | | | | | is set. Accesses to IPsec global data structure are now serialized by the NET_LOCK(). Tested by many, ok visa@, bluhm@
* Use m_copym() instead of m_dup_pkt() to fix a kernel assert whenvisa2017-09-201-2/+2
| | | | | | | setting IP options. Issue reported by Kapetanakis Giannis OK mpi@
* Change sosetopt() to no longer free the mbuf it receives and changempi2017-09-011-16/+13
| | | | | | all the callers to call m_freem(9). Support from deraadt@ and tedu@, ok visa@, bluhm@
* Per-interface list of addresses, both multicast and unicast, arempi2017-05-291-9/+3
| | | | | | | | | | | | | currently protected by the NET_LOCK(). They are not accessed in the hot path, so protecting them with a mutex could be an option. However since we're now going to run with a NET_LOCK() for some time, assert that it is held. IPsec is not yet ready to run without KERNEL_LOCK(), so assert it is held, even in the forwarding path. Tested by sthen@, ok visa@, claudio@, bluhm@
* Use the rt_rmx defines that hide the struct rt_kmetrics indirection.bluhm2017-04-191-7/+7
| | | | | No binary change. OK mpi@
* Partially revert previous mallocarray conversions that containdhill2017-04-111-3/+3
| | | | | | | | | constants. The consensus is that if both operands are constant, we don't need mallocarray. Reminded by tedu@ ok deraadt@
* Use mallocarray to allocate multicast group memberships.dhill2017-04-091-5/+5
| | | | ok deraadt@
* percpu counters for TCP statsjca2017-02-091-2/+2
| | | | ok mpi@ bluhm@
* In sogetopt, preallocate an mbuf to avoid using sleeping mallocs withdhill2017-02-011-20/+11
| | | | | | | | the netlock held. This also changes the prototypes of the *ctloutput functions to take an mbuf instead of an mbuf pointer. help, guidance from bluhm@ and mpi@ ok bluhm@
* Remove NULL checks before m_free(9), it deals with it.mpi2017-01-101-7/+4
| | | | ok bluhm@, kettenis@
* Extend the multicast sockets and multicast hash table support to multiplerzalamena2016-12-191-2/+2
| | | | | | | domains. This is one step towards supporting to run more than one multicast socket in different domains at the same time. ok mpi@
* Introduce the NET_LOCK() a rwlock used to serialize accesses to the partsmpi2016-12-191-1/+3
| | | | | | | | | | | of the network stack that are not yet ready to be executed in parallel or where new sleeping points are not possible. This first pass replace all the entry points leading to ip_output(). This is done to not introduce new sleeping points when trying to acquire ART's write lock, needed when a new L2 entry is created via the RT_RESOLVE. Inputs from and ok bluhm@, ok dlg@
* Kill a micro optimization that no longer make sense since the two routingmpi2016-11-281-6/+1
| | | | | | blocks have been merged in r1.292. ok claudio@
* turn ipstat into a set of percpu counters.dlg2016-11-181-2/+2
| | | | | | | | | | | | | each counter is identified by an enum value which correspond to the original members of the udpstat struct. udpstat_inc(udps_foo) replaces udpstat.udps_foo++ for the actual updates. udpstat_inc is a thin wrapper around counters_inc. counters are still returned to userland via the udpstat struct for now. ok mpi@ mikeb@ deraadt@
* Automatically create a default lo(4) interface per rdomain.mpi2016-11-141-2/+2
| | | | | | | | | | | | | | | | | | In order to stop abusing lo0 for all rdomains, a new loopback interface will be created every time a rdomain is created. The unit number will be the same as the rdomain, i.e. lo1 will be attached to rdomain 1. If this loopback interface is already in use it wont be possible to create the corresponding rdomain. In order to know which lo(4) interface is attached to a rdomain, its index is stored in the rtable/rdomain map. This is a long overdue since the introduction of rtable/rdomain. It also fixes a recent regression due to resetting the rdomain of an incoming packet reported by semarie@, Andreas Bartelt and Nils Frohberg. ok claudio@
* turn ipstat into a set of percpu counters.dlg2016-11-141-13/+13
| | | | | | | | | | | | each counter is identified by an enum value which correspond to the original members of the ipstat struct. ipstat_inc(ips_foo) replaces ipstat.ips_foo++ for the actual updates. ipstat_inc is a thin wrapper around counters_inc. counters are still returned to userland via the ipstat struct for now. ok mpi@ mikeb@
* Prevent a NULL derefernce in ip_output().mpi2016-09-041-1/+5
| | | | | | | | | | | A race can happen if a task, like the watchog, sleeps too long keeping an ifp reference while the interface is detached. In this case a TCP timer will try to send packets with a cached route. Since the ifp is being detached if_get(9) returns NULL. Found the hardway by awolk@. ok bluhm@
* replace the last uses of m_copym2 with m_dup_pkt.dlg2016-08-151-2/+2
| | | | ok mpi@ visa@
* Allow resetting the IP_TTL and IP_MINTTL sockoptsjca2016-07-011-2/+4
| | | | | | | | IP_TTL can be reset by passing -1, IP_MINTTL can be reset by passing 0. This is consistent with what Linux does and IPV6_UNICAST_HOPS/IPV6_MINHOPCOUNT. ok bluhm@
* when pf_test returns something but PF_PASS, set error to EACCEShenning2016-06-231-2/+2
| | | | | | | | | | | instead of EHOSTUNREACH. On the latter, ip_forward can generate undesired icmp errors - either pf generates those itself (block return), or there shouldn't be any. Bizarrely enough, ip_forward has EACCES handling with a comment specifically pointing to packets blocked by pf, but the code in ip_output used EHOSTUNREACH from day #1 on. found & analyzed by Kristof Provost <kp at FreeBSD>, discussed at BSDcan ok mpi millert
* Inverse two conditions to not grabe the KERNEL_LOCK for every multicastmpi2016-05-311-7/+10
| | | | | | packet. ok visa@, stsp@, sthen@
* Preserve DiffServ value when fragmenting an ipv4 packet.vgross2016-05-041-2/+3
| | | | Ok phessler@, henning@
* Do not allow to change the routing table of a bound socket. Thisbluhm2016-04-291-1/+6
| | | | | | | | | is not intended and will behave unexpectedly if the address is already used in another domain. It did not work anyway, as the PCB ended in the wrong hash bucket after changing the rtable. Fail with EBUSY if the socket is already bound and rehash the PCB if its rtable changes. input claudio@; OK mpi@
* Unbreak RAMDISK, found by deraadt@mpi2016-04-181-2/+5
|
* Put a KERNEL_LOCK/UNLOCK dance around sections that still need somempi2016-04-181-9/+22
| | | | | | work in the forwarding path. Tested by Hrvoje Popovski, ok dlg@
* Return ENOBUFS when bumping in the multicast max group membershipsjca2016-02-111-2/+2
| | | | | | | | This removes the only use of ETOOMANYREFS in our code, making intro(2) match reality. No software out there explicitely checks for ETOOMANYREFS in multicast code. Discussed with millert@ and mpi@ (who suggested using ENOBUFS)
* Introduce in{,6}_hasmulti(), two functions to check in the hot path ifmpi2016-01-211-5/+3
| | | | | | an interface joined a specific multicast group. ok phessler@, visa@, dlg@
* Prevent a double if_put().mpi2016-01-131-1/+2
| | | | ok mikeb@, bluhm@