aboutsummaryrefslogtreecommitdiffstats
path: root/net/dccp/output.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-10-30net: dccp: Fix most of the kerneldoc warningsAndrew Lunn1-0/+9
net/dccp/ccids/ccid2.c:190: warning: Function parameter or member 'hc' not described in 'ccid2_update_used_window' net/dccp/ccids/ccid2.c:190: warning: Function parameter or member 'new_wnd' not described in 'ccid2_update_used_window' net/dccp/ccids/ccid2.c:360: warning: Function parameter or member 'sk' not described in 'ccid2_rtt_estimator' net/dccp/ccids/ccid3.c:112: warning: Function parameter or member 'sk' not described in 'ccid3_hc_tx_update_x' net/dccp/ccids/ccid3.c:159: warning: Function parameter or member 'hc' not described in 'ccid3_hc_tx_update_s' net/dccp/ccids/ccid3.c:268: warning: Function parameter or member 'sk' not described in 'ccid3_hc_tx_send_packet' net/dccp/ccids/ccid3.c:667: warning: Function parameter or member 'sk' not described in 'ccid3_first_li' net/dccp/ccids/ccid3.c:85: warning: Function parameter or member 'hc' not described in 'ccid3_update_send_interval' net/dccp/ccids/lib/loss_interval.c:85: warning: Function parameter or member 'lh' not described in 'tfrc_lh_update_i_mean' net/dccp/ccids/lib/loss_interval.c:85: warning: Function parameter or member 'skb' not described in 'tfrc_lh_update_i_mean' net/dccp/ccids/lib/packet_history.c:392: warning: Function parameter or member 'h' not described in 'tfrc_rx_hist_sample_rtt' net/dccp/ccids/lib/packet_history.c:392: warning: Function parameter or member 'skb' not described in 'tfrc_rx_hist_sample_rtt' net/dccp/feat.c:1003: warning: Function parameter or member 'dreq' not described in 'dccp_feat_server_ccid_dependencies' net/dccp/feat.c:1040: warning: Function parameter or member 'array_len' not described in 'dccp_feat_prefer' net/dccp/feat.c:1040: warning: Function parameter or member 'array' not described in 'dccp_feat_prefer' net/dccp/feat.c:1040: warning: Function parameter or member 'preferred_value' not described in 'dccp_feat_prefer' net/dccp/output.c:151: warning: Function parameter or member 'dp' not described in 'dccp_determine_ccmps' net/dccp/output.c:242: warning: Function parameter or member 'sk' not described in 'dccp_xmit_packet' net/dccp/output.c:305: warning: Function parameter or member 'sk' not described in 'dccp_flush_write_queue' net/dccp/output.c:305: warning: Function parameter or member 'time_budget' not described in 'dccp_flush_write_queue' net/dccp/output.c:378: warning: Function parameter or member 'sk' not described in 'dccp_retransmit_skb' net/dccp/qpolicy.c:88: warning: Function parameter or member '' not described in 'dccp_qpolicy_operations' net/dccp/qpolicy.c:88: warning: Function parameter or member '{' not described in 'dccp_qpolicy_operations' net/dccp/qpolicy.c:88: warning: Function parameter or member 'params' not described in 'dccp_qpolicy_operations' Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20201028011412.931250-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva1-4/+4
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-22net: dccp: Convert to use the preferred fallthrough macroMiaohe Lin1-4/+4
Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>Ingo Molnar1-0/+1
Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-30net: Generalise wq_has_sleeper helperHerbert Xu1-1/+1
The memory barrier in the helper wq_has_sleeper is needed by just about every user of waitqueue_active. This patch generalises it by making it take a wait_queue_head_t directly. The existing helper is renamed to skwq_has_sleeper. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25dccp: constify dccp_make_response() socket argumentEric Dumazet1-6/+11
Like tcp_make_synack() the only time we might change the socket is when calling sock_wmalloc(), which is using atomic operation to update sk->sk_wmem_alloc Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15ipv4: add a sock pointer to ip_queue_xmit()Eric Dumazet1-1/+1
ip_queue_xmit() assumes the skb it has to transmit is attached to an inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership") changed l2tp to not change skb ownership and thus broke this assumption. One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(), so that we do not assume skb->sk points to the socket used by l2tp tunnel. Fixes: 31c70d5956fc ("l2tp: keep original skb ownership") Reported-by: Zhan Jianyu <nasa4836@gmail.com> Tested-by: Zhan Jianyu <nasa4836@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-10inet: rename ir_loc_port to ir_numEric Dumazet1-1/+1
In commit 634fb979e8f ("inet: includes a sock_common in request_sock") I forgot that the two ports in sock_common do not have same byte order : skc_dport is __be16 (network order), but skc_num is __u16 (host order) So sparse complains because ir_loc_port (mapped into skc_num) is considered as __u16 while it should be __be16 Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num), and perform appropriate htons/ntohs conversions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-10inet: includes a sock_common in request_sockEric Dumazet1-2/+2
TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10net: Fix (nearly-)kernel-doc comments for various functionsBen Hutchings1-0/+1
Fix incorrect start markers, wrapped summary lines, missing section breaks, incorrect separators, and some name mismatches. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-03dccp: fix bug in sequence number validation during connection setupSamuel Jero1-5/+5
This fixes a bug in the sequence number validation during the initial handshake. The code did not treat the initial sequence numbers ISS and ISR as read-only and did not keep state for GSR and GSS as required by the specification. This causes problems with retransmissions during the initial handshake, causing the budding connection to be reset. This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-07-04dccp: combine the functionality of enqeueing and cloningGerrit Renker1-7/+7
Realising the following call pattern, * first dccp_entail() is called to enqueue a new skb and * then skb_clone() is called to transmit a clone of that skb, this patch integrates both into the same function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-05-08inet: Pass flowi to ->queue_xmit().David S. Miller1-2/+2
This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31Fix common misspellingsLucas De Marchi1-1/+1
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2010-12-07dccp: Policy-based packet dequeueing infrastructureTomasz Grobelny1-4/+3
This patch adds a generic infrastructure for policy-based dequeueing of TX packets and provides two policies: * a simple FIFO policy (which is the default) and * a priority based policy (set via socket options). Both policies honour the tx_qlen sysctl for the maximum size of the write queue (can be overridden via socket options). The priority policy uses skb->priority internally to assign an u32 priority identifier, using the same ranking as SO_PRIORITY. The skb->priority field is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary data using cmsg(3), the patch also provides the requisite parsing routines. Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-15dccp ccid-2: Schedule Sync as out-of-band mechanismGerrit Renker1-0/+15
The problem with Ack Vectors is that i) their length is variable and can in principle grow quite large, ii) it is hard to predict exactly how large they will be. Due to the second point it seems not a good idea to reduce the MPS; in particular when on average there is enough room for the Ack Vector and an increase in length is momentarily due to some burst loss, after which the Ack Vector returns to its normal/average length. The solution taken by this patch is to subtract a minimum-expected Ack Vector length from the MPS, and to defer any larger Ack Vectors onto a separate Sync - but only if indeed there is no space left on the skb. This patch provides the infrastructure to schedule Sync-packets for transporting (urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since it does not need to wait for new application data. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-10-28dccp: Refine the wait-for-ccid mechanismGerrit Renker1-50/+65
This extends the existing wait-for-ccid routine so that it may be used with different types of CCID, addressing the following problems: 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for example has a full TX queue and becomes network-limited just as the application wants to close, then waiting for CCID-2 to become unblocked could lead to an indefinite delay (i.e., application "hangs"). 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes in its sending policy while the queue is being drained. This can lead to further delays during which the application will not be able to terminate. 3) The minimum wait time for CCID-3/4 can be expected to be the queue length times the current inter-packet delay. For example if tx_qlen=100 and a delay of 15 ms is used for each packet, then the application would have to wait for a minimum of 1.5 seconds before being allowed to exit. 4) There is no way for the user/application to control this behaviour. It would be good to use the timeout argument of dccp_close() as an upper bound. Then the maximum time that an application is willing to wait for its CCIDs to can be set via the SO_LINGER option. These problems are addressed by giving the CCID a grace period of up to the `timeout' value. The wait-for-ccid function is, as before, used when the application (a) has read all the data in its receive buffer and (b) if SO_LINGER was set with a non-zero linger time, or (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ state (client application closes after receiving CloseReq). In addition, there is a catch-all case of __skb_queue_purge() after waiting for the CCID. This is necessary since the write queue may still have data when (a) the host has been passively-closed, (b) abnormal termination (unread data, zero linger time), (c) wait-for-ccid could not finish within the given time limit. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28dccp: Extend CCID packet dequeueing interfaceGerrit Renker1-46/+72
This extends the packet dequeuing interface of dccp_write_xmit() to allow 1. CCIDs to take care of timing when the next packet may be sent; 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds). The main purpose is to take CCID-2 out of its polling mode (when it is network- limited, it tries every millisecond to send, without interruption). The mode of operation for (2) is as follows: * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(), * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full), * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER', * dccp_write_xmit() returns without further action; * after some time the wait-condition for CCID becomes true, * that CCID schedules the tasklet, * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(), * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now", * packet is sent, and possibly more (since dccp_write_xmit() loops). Code reuse: the taskled function calls dccp_write_xmit(), the timer function reduces to a wrapper around the same code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-12dccp: remove unused argument in CCID tx functionGerrit Renker1-1/+1
This removes the argument `more' from ccid_hc_tx_packet_sent, since it was nowhere used in the entire code. (Btw, this argument was not even used in the original KAME code where the function initially came from; compare the variable moreToSend in the freebsd61-dccp-kame-28.08.2006.patch kept by Emmanuel Lochin.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-10-12dccp: merge now-reduced connect_init() functionGerrit Renker1-13/+5
After moving the assignment of GAR/ISS from dccp_connect_init() to dccp_transmit_skb(), the former function becomes very small, so that a merger with dccp_connect() suggests itself. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-05-01net: sock_def_readable() and friends RCU conversionEric Dumazet1-4/+6
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20net: sk_sleep() helperEric Dumazet1-3/+3
Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15net: replace ipfragok with skb->local_dfShan Wei1-1/+1
As Herbert Xu said: we should be able to simply replace ipfragok with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function) has droped ipfragok and set local_df value properly. The patch kills the ipfragok parameter of .queue_xmit(). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11inet: Remove unused send_check length argumentHerbert Xu1-1/+1
inet: Remove unused send_check length argument This patch removes the unused length argument from the send_check function in struct inet_connection_sock_af_ops. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Yinghai <yinghai.lu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.hTejun Heo1-0/+1
percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-10-18inet: rename some inet_sock fieldsEric Dumazet1-2/+2
In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09net: adding memory barrier to the poll and receive callbacksJiri Olsa1-1/+1
Adding memory barrier after the poll_wait function, paired with receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper to wrap the memory barrier. Without the memory barrier, following race can happen. The race fires, when following code paths meet, and the tp->rcv_nxt and __add_wait_queue updates stay in CPU caches. CPU1 CPU2 sys_select receive packet ... ... __add_wait_queue update tp->rcv_nxt ... ... tp->rcv_nxt check sock_def_readable ... { schedule ... if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible(sk->sk_sleep) ... } If there was no cache the code would work ok, since the wait_queue and rcv_nxt are opposit to each other. Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already passed the tp->rcv_nxt check and sleeps, or will get the new value for tp->rcv_nxt and will return with new data mask. In both cases the process (CPU1) is being added to the wait queue, so the waitqueue_active (CPU2) call cannot miss and will wake up CPU1. The bad case is when the __add_wait_queue changes done by CPU1 stay in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then endup calling schedule and sleep forever if there are no more data on the socket. Calls to poll_wait in following modules were ommited: net/bluetooth/af_bluetooth.c net/irda/af_irda.c net/irda/irnet/irnet_ppp.c net/mac80211/rc80211_pid_debugfs.c net/phonet/socket.c net/rds/af_rds.c net/rfkill/core.c net/sunrpc/cache.c net/sunrpc/rpc_pipe.c net/tipc/socket.c Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03net: skb->dst accessorsEric Dumazet1-1/+1
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02dccp: Do not let initial option overhead shrink the MPSGerrit Renker1-1/+14
This fixes a problem caused by the overlap of the connection-setup and established-state phases of DCCP connections. During connection setup, the client retransmits Confirm Feature-Negotiation options until a response from the server signals that it can move from the half-established PARTOPEN into the OPEN state, whereupon the connection is fully established on both ends (RFC 4340, 8.1.5). However, since the client may already send data while it is in the PARTOPEN state, consequences arise for the Maximum Packet Size: the problem is that the initial option overhead is much higher than for the subsequent established phase, as it involves potentially many variable-length list-type options (server-priority options, RFC 4340, 6.4). Applying the standard MPS is insufficient here: especially with larger payloads this can lead to annoying, counter-intuitive EMSGSIZE errors. On the other hand, reducing the MPS available for the established phase by the added initial overhead is highly wasteful and inefficient. The solution chosen therefore is a two-phase strategy: If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent to carry the options, and the feature-negotiation list is then flushed. This means that the server gets two Acks for one Response. If both Acks get lost, it is probably better to restart the connection anyway and devising yet another special-case does not seem worth the extra complexity. The result is a higher utilisation of the available packet space for the data transmission phase (established state) of a connection. The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly seen values were around 90 bytes for initial feature-negotiation options. It uses sizeof(u32) to mean "aligned units of 4 bytes". For consistency, another use of 4-byte alignment is adapted. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02dccp: Minimise header option overhead in setting the MPSGerrit Renker1-8/+14
This patch resolves a long-standing FIXME to dynamically update the Maximum Packet Size depending on actual options usage. It uses the flags set by the feature-negotiation infrastructure to compute the required header option size. Most options are fixed-size, a notable exception are Ack Vectors (required currently only by CCID-2). These can have any length between 3 and 1020 bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack Vector cells) have been found to be sufficient for loss-free situations. There are currently no CCID-specific header options which may appear on data packets, thus it is not necessary to define a corresponding CCID field as suggested in the old comment. Further changes: ---------------- Adjusted the type of 'cur_mps' to match the unsigned return type of the function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-05dccp: use roundup instead of opencodingIlpo Järvinen1-1/+1
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16dccp: Mechanism to resolve CCID dependenciesGerrit Renker1-4/+9
This adds a hook to resolve features whose value depends on the choice of CCID. It is done at the server since it can only be done after the CCID values have been negotiated; i.e. the client will add its CCID preference list on the Change options sent in the Request, which will be reconciled with the local preference list of the server. The concept is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\ implementation_notes.html#ccid_dependencies Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12dccp: Resolve dependencies of features on choice of CCIDGerrit Renker1-0/+4
This provides a missing link in the code chain, as several features implicitly depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector feature, but also Ack Ratio and Send Loss Event Rate (also taken care of). For Send Ack Vector, the situation is as follows: * since CCID2 mandates the use of Ack Vectors, there is no point in allowing endpoints which use CCID2 to disable Ack Vector features such a connection; * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4); * for all other CCIDs, the use of (Send) Ack Vector is optional and thus negotiable. However, this implies that the code negotiating the use of Ack Vectors also supports it (i.e. is able to supply and to either parse or ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack Vector support), the use of Ack Vectors is here disabled, with a comment in the source code. An analogous consideration arises for the Send Loss Event Rate feature, since the CCID-3 implementation does not support the loss interval options of RFC 4342. To make such use explicit, corresponding feature-negotiation options are inserted which signal the use of the loss event rate option, as it is used by the CCID3 code. Lastly, the values of the Ack Ratio feature are matched to the choice of CCID. The patch implements this as a function which is called after the user has made all other registrations for changing default values of features. The table is variable-length, the reserved (and hence for feature-negotiation invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0' is used to mark the end of the table. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-19dccp: Port redirection support for DCCPGerrit Renker1-1/+1
Commit a3116ac5c216fc3c145906a46df9ce542ff7dcf2 from 1st October ("tcp: Port redirection support for TCP") broke DCCP skb lookup by changing inet_csk_clone, which is used by DCCP to generate the child socket after the handshake. This patch updates DCCP to use 'loc_port' instead of 'sport', which fixes the problem, and thus inheriting port redirection support via the new interface. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp"Gerrit Renker1-177/+102
as it accentally contained the wrong set of patches. These will be submitted separately. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Policy-based packet dequeueing infrastructureTomasz Grobelny1-4/+3
This patch adds a generic infrastructure for policy-based dequeueing of TX packets and provides two policies: * a simple FIFO policy (which is the default) and * a priority based policy (set via socket options). Both policies honour the tx_qlen sysctl for the maximum size of the write queue (can be overridden via socket options). The priority policy uses skb->priority internally to assign an u32 priority identifier, using the same ranking as SO_PRIORITY. The skb->priority field is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary data using cmsg(3), the patch also provides the requisite parsing routines. Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Combine the functionality of enqeueing and cloningGerrit Renker1-7/+7
Realising the following call pattern, * first dccp_entail() is called to enqueue a new skb and * then skb_clone() is called to transmit a clone of that skb, this patch integrates both interrelated steps into dccp_entail(). Note: the return value of skb_clone is not checked. It may be an idea to add a warning if this occurs. In both instances, however, a timer is set for retransmission, so that cloning is re-tried via dccp_retransmit_skb(). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Refine the wait-for-ccid mechanismGerrit Renker1-50/+65
This extends the existing wait-for-ccid routine so that it may be used with different types of CCID. It further addresses the problems listed below. The code looks if the write queue is non-empty and grants the TX CCID up to `timeout' jiffies to drain the queue. It will instead purge that queue if * the delay suggested by the CCID exceeds the time budget; * a socket error occurred while waiting for the CCID; * there is a signal pending (eg. annoyed user pressed Control-C); * the CCID does not support delays (we don't know how long it will take). D e t a i l s [can be removed] ------------------------------- DCCP's sending mechanism functions a bit like non-blocking I/O: dccp_sendmsg() will enqueue up to net.dccp.default.tx_qlen packets (default=5), without waiting for them to be released to the network. Rate-based CCIDs, such as CCID3/4, can impose sending delays of up to maximally 64 seconds (t_mbi in RFC 3448). Hence the write queue may still contain packets when the application closes. Since the write queue is congestion-controlled by the CCID, draining the queue is also under control of the CCID. There are several problems that needed to be addressed: 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID2 for example has a full TX queue and becomes network-limited just as the application wants to close, then waiting for CCID2 to become unblocked could lead to an indefinite delay (i.e., application "hangs"). 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes in its sending policy while the queue is being drained. This can lead to further delays during which the application will not be able to terminate. 3) The minimum wait time for CCID3/4 can be expected to be the queue length times the current inter-packet delay. For example if tx_qlen=100 and a delay of 15 ms is used for each packet, then the application would have to wait for a minimum of 1.5 seconds before being allowed to exit. 4) There is no way for the user/application to control this behaviour. It would be good to use the timeout argument of dccp_close() as an upper bound. Then the maximum time that an application is willing to wait for its CCIDs to can be set via the SO_LINGER option. These problems are addressed by giving the CCID a grace period of up to the `timeout' value. The wait-for-ccid function is, as before, used when the application (a) has read all the data in its receive buffer and (b) if SO_LINGER was set with a non-zero linger time, or (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ state (client application closes after receiving CloseReq). In addition, there is a catch-all case by calling __skb_queue_purge() after waiting for the CCID. This is necessary since the write queue may still have data when (a) the host has been passively-closed, (b) abnormal termination (unread data, zero linger time), (c) wait-for-ccid could not finish within the given time limit. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Extend CCID packet dequeueing interfaceGerrit Renker1-48/+81
This extends the packet dequeuing interface of dccp_write_xmit() to allow 1. CCIDs to take care of timing when the next packet may be sent; 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds). The main purpose is to take CCID2 out of its polling mode (when it is network- limited, it tries every millisecond to send, without interruption). The interface can also be used to support other CCIDs. The mode of operation for (2) is as follows: * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(), * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full), * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER', * dccp_write_xmit() returns without further action; * after some time the wait-condition for CCID becomes true, * that CCID schedules the tasklet, * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(), * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now", * packet is sent, and possibly more (since dccp_write_xmit() loops). Code reuse: the taskled function calls dccp_write_xmit(), the timer function reduces to a wrapper around the same code. If the tasklet finds that the socket is locked, it re-schedules the tasklet function (not the tasklet) after one jiffy. Changed DCCP_BUG to dccp_pr_debug when transmit_skb returns an error (e.g. when a local qdisc is used, NET_XMIT_DROP=1 can be returned for many packets). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp ccid-2: Schedule Sync as out-of-band mechanismGerrit Renker1-0/+8
The problem with Ack Vectors is that i) their length is variable and can in principle grow quite large, ii) it is hard to predict exactly how large they will be. Due to the second point it seems not a good idea to reduce the MPS; in particular when on average there is enough room for the Ack Vector and an increase in length is momentarily due to some burst loss, after which the Ack Vector returns to its normal/average length. The solution taken by this patch is to subtract a minimum-expected Ack Vector length from the MPS (previous patch), and to defer any larger Ack Vectors onto a separate Sync - but only if indeed there is no space left on the skb. This patch provides the infrastructure to schedule Sync-packets for transporting (urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since it does not need to wait for new application data. It can thus serve other parts of the DCCP code as well. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Merge now-reduced connect_init() functionGerrit Renker1-13/+5
After moving the assignment of GAR/ISS from dccp_connect_init() to dccp_transmit_skb(), the former function becomes very small, so that a merger with dccp_connect() suggests itself. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Unused argument in CCID tx functionGerrit Renker1-1/+1
This removes the argument `more' from ccid_hc_tx_packet_sent, since it was nowhere used in the entire code. (Anecdotally, this argument was not even used in the original KAME code where the function originally came from; compare the variable moreToSend in the freebsd61-dccp-kame-28.08.2006.patch now maintained by Emmanuel Lochin.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Special case of the MPS for client-PARTOPEN with DataAcksGerrit Renker1-1/+14
To increase robustness, it is necessary to resend Confirm feature-negotiation options, even though the RFC does not mandate it. But feature negotiation options can take (much) more room than the options on common DataAck packets. Instead of reducing the MPS always for a case which only applies to the three messages send during initial handshake, this patch devises a special case: if the payload length of the DataAck in PARTOPEN is too large, an Ack is sent to carry the options, and the feature-negotiation list is then flushed. This means that the server gets two Acks for one Response. If both Acks get lost, it is probably better to restart the connection anyway and devising yet another special-case does not seem worth the extra complexity. The patch (over-)estimates the expected overhead to be 32*4 bytes -- commonly seen values were 20-90 bytes for initial feature-negotiation options. It uses sizeof(u32) to mean "aligned units of 4 bytes". For consistency, another use of sizeof is modified. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-09-04dccp: Leave headroom for options when calculating the MPSGerrit Renker1-8/+14
The Maximum Packet Size (MPS) is of interest for applications which want to transfer data, so it is only relevant to the data transfer phase of a connection (unless one wants to send data on the DCCP-Request, but that is not considered here). The strategy chosen to deal with this requirement is to leave room for only such options that may appear on data packets. A special consideration applies to Ack Vectors: this is purely guesswork, since these can have any length between 3 and 1020 bytes. The strategy chosen here is to subtract a configurable minimum, the value of 16 bytes (2 bytes for type/length plus 14 Ack Vector cells) has been found by experimentatation. If people experience this as too much or too little, this could later be turned into a Kconfig option. There are currently no CCID-specific header options which may appear on data packets, hence it is not necessary to define a corresponding CCID field. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04dccp: Mechanism to resolve CCID dependenciesGerrit Renker1-4/+9
This adds a hook to resolve features whose value depends on the choice of CCID. It is done at the server since it can only be done after the CCID values have been negotiated; i.e. the client will add its CCID preference list on the Change options sent in the Request, which will be reconciled with the local preference list of the server. The concept is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\ implementation_notes.html#ccid_dependencies Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-09-04dccp: Resolve dependencies of features on choice of CCIDGerrit Renker1-0/+4
This provides a missing link in the code chain, as several features implicitly depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector feature, but also Ack Ratio and Send Loss Event Rate (also taken care of). For Send Ack Vector, the situation is as follows: * since CCID2 mandates the use of Ack Vectors, there is no point in allowing endpoints which use CCID2 to disable Ack Vector features such a connection; * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4); * for all other CCIDs, the use of (Send) Ack Vector is optional and thus negotiable. However, this implies that the code negotiating the use of Ack Vectors also supports it (i.e. is able to supply and to either parse or ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack Vector support), the use of Ack Vectors is here disabled, with a comment in the source code. An analogous consideration arises for the Send Loss Event Rate feature, since the CCID-3 implementation does not support the loss interval options of RFC 4342. To make such use explicit, corresponding feature-negotiation options are inserted which signal the use of the loss event rate option, as it is used by the CCID3 code. Lastly, the values of the Ack Ratio feature are matched to the choice of CCID. The patch implements this as a function which is called after the user has made all other registrations for changing default values of features. The table is variable-length, the reserved (and hence for feature-negotiation invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0' is used to mark the end of the table. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-07-26dccp: Bug-Fix - AWL was never updatedGerrit Renker1-18/+15
The AWL lower Ack validity window advances in proportion to GSS, the greatest sequence number sent. Updating AWL other than at connection setup (in the DCCP-Request sent by dccp_v{4,6}_connect()) was missing in the DCCP code. This bug lead to syslog messages such as "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...] P.ackno exists or LAWL(82947089) <= P.ackno(82948208) <= S.AWH(82948728), sending SYNC..." The difference between AWL/AWH here is 1639 packets, while the expected value (the Sequence Window) would have been 100 (the default). A closer look showed that LAWL = AWL = 82947089 equalled the ISS on the Response. The patch now updates AWL with each increase of GSS. Further changes: ---------------- The patch also enforces more stringent checks on the ISS sequence number: * AWL is initialised to ISS at connection setup and remains at this value; * AWH is then always set to GSS (via dccp_update_gss()); * so on the first Request: AWL = AWH = ISS, and on the n-th Request: AWL = ISS, AWH = ISS + n. As a consequence, only Response packets that refer to Requests sent by this host will pass, all others are discarded. This is the intention and in effect implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2008-07-26dccp: Allow to distinguish original and retransmitted packetsGerrit Renker1-4/+16
This patch allows the sender to distinguish original and retransmitted packets, which is in particular needed for the retransmission of DCCP-Requests: * the first Request uses ISS (generated in net/dccp/ip*.c), and sets GSS = ISS; * all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted Request has sequence number ISS + n (mod 48). To add generic support, the patch reorganises existing code so that: * icsk_retransmits == 0 for the original packet and * icsk_retransmits = n > 0 for the n-th retransmitted packet at the time dccp_transmit_skb() is called, via dccp_retransmit_skb(). Thanks to Wei Yongjun for pointing this problem out. Further changes: ---------------- * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head is used for all retransmissions (the exception is client-Acks in PARTOPEN state, but these do not use sk_send_head); * since sk_send_head always contains the original skb (via dccp_entail()), skb_cloned() never evaluated to true and thus pskb_copy() was never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-06-11dccp: Fix sparse warningsGerrit Renker1-0/+2
This patch fixes the following sparse warnings: * nested min(max()) expression: net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one * Declaration of function prototypes in .c instead of .h file, resulting in "should it be static?" warnings. * Declared "struct dccpw" static (local to dccp_probe). * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3 ("Receivers SHOULD implement delayed acknowledgement timers ..."). * Used a different local variable name to avoid net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one net/dccp/ackvec.c:238:33: originally declared here * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>