aboutsummaryrefslogtreecommitdiffstats
path: root/net/sctp/socket.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+5
Two trivial overlapping changes conflicts in MPLS and mlx5. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25sctp: sctp_addr_id2transport should verify the addr before looking up assocXin Long1-1/+5
sctp_addr_id2transport is a function for sockopt to look up assoc by address. As the address is from userspace, it can be a v4-mapped v6 address. But in sctp protocol stack, it always handles a v4-mapped v6 address as a v4 address. So it's necessary to convert it to a v4 address before looking up assoc by address. This patch is to fix it by calling sctp_verify_addr in which it can do this conversion before calling sctp_endpoint_lookup_assoc, just like what sctp_sendmsg and __sctp_connect do for the address from users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24Introduce a sysctl that modifies the value of PROT_SOCK.Krister Johansen1-4/+6
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl that denotes the first unprivileged inet port in the namespace. To disable all privileged ports set this to zero. It also checks for overlap with the local port range. The privileged and local range may not overlap. The use case for this change is to allow containerized processes to bind to priviliged ports, but prevent them from ever being allowed to modify their container's network configuration. The latter is accomplished by ensuring that the network namespace is not a child of the user namespace. This modification was needed to allow the container manager to disable a namespace's priviliged port restrictions without exposing control of the network namespace to processes in the user namespace. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18sctp: implement sender-side procedures for SSN Reset Request ParameterXin Long1-0/+29
This patch is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3. It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2 for users. Note that the new asoc member strreset_outstanding is to make sure only one reconf request chunk on the fly as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18sctp: add sockopt SCTP_ENABLE_STREAM_RESETXin Long1-0/+84
This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set strreset_enable to indicate which reconf request type it supports, which is described in rfc6525 section 6.3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16sctp: remove useless code from sctp_apply_peer_addr_paramsMarcelo Ricardo Leitner1-1/+0
sctp_frag_point() doesn't store anything, and thus just calling it cannot do anything useful. sctp_apply_peer_addr_params is only called by sctp_setsockopt_peer_addr_params. When operating on an asoc, sctp_setsockopt_peer_addr_params will call sctp_apply_peer_addr_params once for the asoc, and then once for each transport this asoc has, meaning that the frag_point will be recomputed when updating the transports and calling it when updating the asoc is not necessary. IOW, no action is needed here and we can remove this call. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is nullXin Long1-4/+3
Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock when it failed to find a transport by sctp_addrs_lookup_transport. This patch is to fix it by moving up rcu_read_unlock right before checking transport and also to remove the out path. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16sctp: use new rhlist interface on sctp transport rhashtableXin Long1-5/+2
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-14sctp: change sk state only when it has assocs in sctp_shutdownXin Long1-8/+7
Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if this sock has no connection (assoc), sk state would be changed to SCTP_SS_CLOSING, which is not as we expect. Besides, after that if users try to listen on this sock, kernel could even panic when it dereference sctp_sk(sk)->bind_hash in sctp_inet_listen, as bind_hash is null when sock has no assoc. This patch is to move sk state change after checking sk assocs is not empty, and also merge these two if() conditions and reduce indent level. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07sctp: assign assoc_id earlier in __sctp_connectMarcelo Ricardo Leitner1-2/+5
sctp_wait_for_connect() currently already holds the asoc to keep it alive during the sleep, in case another thread release it. But Andrey Konovalov and Dmitry Vyukov reported an use-after-free in such situation. Problem is that __sctp_connect() doesn't get a ref on the asoc and will do a read on the asoc after calling sctp_wait_for_connect(), but by then another thread may have closed it and the _put on sctp_wait_for_connect will actually release it, causing the use-after-free. Fix is, instead of doing the read after waiting for the connect, do it before so, and avoid this issue as the socket is still locked by then. There should be no issue on returning the asoc id in case of failure as the application shouldn't trust on that number in such situations anyway. This issue doesn't exist in sctp_sendmsg() path. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31sctp: hold transport instead of assoc in sctp_diagXin Long1-4/+1
In sctp_transport_lookup_process(), Commit 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out of rcu lock, but it put transport and hold assoc instead, and ignore that cb() still uses transport. It may cause a use-after-free issue. This patch is to hold transport instead of assoc there. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23net: sctp, forbid negative lengthJiri Slaby1-1/+4
Most of getsockopt handlers in net/sctp/socket.c check len against sizeof some structure like: if (len < sizeof(int)) return -EINVAL; On the first look, the check seems to be correct. But since len is int and sizeof returns size_t, int gets promoted to unsigned size_t too. So the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is false. Fix this in sctp by explicitly checking len < 0 before any getsockopt handler is called. Note that sctp_getsockopt_events already handled the negative case. Since we added the < 0 check elsewhere, this one can be removed. If not checked, this is the result: UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19 shift exponent 52 is too large for 32-bit type 'int' CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422 Call Trace: [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300 ... [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp] [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp] [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+7
Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lockXin Long1-3/+7
When sctp dumps all the ep->assocs, it needs to lock_sock first, but now it locks sock in rcu_read_lock, and lock_sock may sleep, which would break rcu_read_lock. This patch is to get and hold one sock when traversing the list. After that and get out of rcu_read_lock, lock and dump it. Then it will traverse the list again to get the next one until all sctp socks are dumped. For sctp_diag_dump_one, it fixes this issue by holding asoc and moving cb() out of rcu_read_lock in sctp_transport_lookup_process. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18sctp: free msg->chunks when sctp_primitive_SEND return errXin Long1-2/+6
Last patch "sctp: do not return the transmit err back to sctp_sendmsg" made sctp_primitive_SEND return err only when asoc state is unavailable. In this case, chunks are not enqueued, they have no chance to be freed if we don't take care of them later. This Patch is actually to revert commit 1cd4d5c4326a ("sctp: remove the unused sctp_datamsg_free()"), commit 69b5777f2e57 ("sctp: hold the chunks only after the chunk is enqueued in outq") and commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg"), to use sctp_datamsg_free to free the chunks of current msg. Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30sctp: allow receiving msg when TCP-style sk is in CLOSED stateXin Long1-1/+1
Commit 141ddefce7c8 ("sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the assoc is closed when sctp_accept clones a new sk. If there is still data in sk receive queue, users will not be able to read it any more, as sctp_recvmsg returns directly if sk state is CLOSED. This patch is to add CLOSED state check in sctp_recvmsg to allow reading data from TCP-style sk with CLOSED state as what TCP does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25net/sctp: terminate rhashtable walk correctlyVegard Nossum1-0/+1
I was seeing a lot of these: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2 Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150 [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280 [<ffffffff83295722>] _raw_spin_lock+0x12/0x40 [<ffffffff811aac87>] console_unlock+0x2f7/0x930 [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520 [<ffffffff811aba6a>] vprintk_default+0x1a/0x20 [<ffffffff812c171a>] printk+0x94/0xb0 [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170 [<ffffffff8115835e>] ___might_sleep+0x3be/0x460 [<ffffffff81158490>] __might_sleep+0x90/0x1a0 [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0 [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0 [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60 [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150 [<ffffffff8143a82b>] seq_read+0x27b/0x1180 [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180 [<ffffffff813d471b>] __vfs_read+0xdb/0x610 [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0 [<ffffffff813d615b>] SyS_pread64+0x11b/0x150 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Apparently we always need to call rhashtable_walk_stop(), even when rhashtable_walk_start() fails: * rhashtable_walk_start - Start a hash table walk * @iter: Hash table iterator * * Start a hash table walk. Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. otherwise we never call rcu_read_unlock() and we get the splat above. Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*") Cc: Xin Long <lucien.xin@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16sctp: recvmsg should be able to run even if sock is in closing stateMarcelo Ricardo Leitner1-15/+17
Commit d46e416c11c8 missed to update some other places which checked for the socket being TCP-style AND Established state, as Closing state has some overlapping with the previous understanding of Established. Without this fix, one of the effects is that some already queued rx messages may not be readable anymore depending on how the association teared down, and sending may also not be possible if peer initiated the shutdown. Also merge two if() blocks into one condition on sctp_sendmsg(). Cc: Xin Long <lucien.xin@gmail.com> Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: allow GSO frags to access the chunk tooMarcelo Ricardo Leitner1-3/+7
SCTP will try to access original IP headers on sctp_recvmsg in order to copy the addresses used. There are also other places that do similar access to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO support") they aren't always there because they are only present in the header skb. SCTP handles the queueing of incoming data by cloning the incoming skb and limiting to only the relevant payload. This clone has its cb updated to something different and it's then queued on socket rx queue. Thus we need to fix this in two moments. For rx path, not related to socket queue yet, this patch uses a partially copied sctp_input_cb to such GSO frags. This restores the ability to access the headers for this part of the code. Regarding the socket rx queue, it removes iif member from sctp_event and also add a chunk pointer on it. With these changes we're always able to reach the headers again. The biggest change here is that now the sctp_chunk struct and the original skb are only freed after the application consumed the buffer. Note however that the original payload was already like this due to the skb cloning. For iif, SCTP's IPv4 code doesn't use it, so no change is necessary. IPv6 now can fetch it directly from original's IPv6 CB as the original skb is still accessible. In the future we probably can simplify sctp_v*_skb_iif() stuff, as sctp_v4_skb_iif() was called but it's return value not used, and now it's not even called, but such cleanup is out of scope for this change. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: implement prsctp PRIO policyXin Long1-0/+3
prsctp PRIO policy is a policy to abandon lower priority chunks when asoc doesn't have enough snd buffer, so that the current chunk with higher priority can be queued successfully. Similar to TTL/RTX policy, we will set the priority of the chunk to prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So if PRIO policy is enabled, msg->expire_at won't work. asoc->sent_cnt_removable will record how many chunks can be checked to remove. If priority policy is enabled, when the chunk is queued into the out_queue, we will increase sent_cnt_removable. When the chunk is moved to abandon_queue or dequeue and free, we will decrease sent_cnt_removable. In sctp_sendmsg, we will check if there is enough snd buffer for current msg and if sent_cnt_removable is not 0. Then try to abandon chunks in sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and free chunks from out_queue in right order until the abandon+free size > msg_len - sctp_wfree. For the abandon size, we have to wait until it sends FORWARD TSN, receives the sack and the chunks are really freed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: implement prsctp TTL policyXin Long1-2/+2
prsctp TTL policy is a policy to abandon chunks when they expire at the specific time in local stack. It's similar with expires_at in struct sctp_datamsg. This patch uses sinfo->sinfo_timetolive to set the specific time for TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at. So if prsctp_enable or TTL policy is not enabled, msg->expires_at still works as before. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: add SCTP_PR_ASSOC_STATUS on sctp sockoptXin Long1-0/+62
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used to dump the prsctp statistics info from the asoc. The prsctp statistics includes abandoned_sent/unsent from the asoc. abandoned_sent is the count of the packets we drop packets from retransmit/transmited queue, and abandoned_unsent is the count of the packets we drop from out_queue according to the policy. Note: another option for prsctp statistics dump described in rfc is SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics info from each stream. But by now, linux doesn't yet have per stream statistics info, it needs rfc6525 to be implemented. As the prsctp statistics for each stream has to be based on per stream statistics, we will delay it until rfc6525 is done in linux. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: add SCTP_DEFAULT_PRINFO into sctp sockoptXin Long1-0/+91
This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used to set/get sctp Partially Reliable Policies' default params, which includes 3 policies (ttl, rtx, prio) and their values. Still, if we set policy params in sndinfo, we will use the params of sndinfo against chunks, instead of the default params. In this patch, we will use 5-8bit of sp/asoc->default_flags to store prsctp policies, and reuse asoc->default_timetolive to store their values. It means if we enable and set prsctp policy, prior ttl timeout in sctp will not work any more. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: add SCTP_PR_SUPPORTED on sctp sockoptXin Long1-0/+80
According to section 4.5 of rfc7496, prsctp_enable should be per asoc. We will add prsctp_enable to both asoc and ep, and replace the places where it used net.sctp->prsctp_enable with asoc->prsctp_enable. ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can also modify it's value through sockopt SCTP_PR_SUPPORTED. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrateXin Long1-1/+1
Commit d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") may set sk_state CLOSING in sctp_sock_migrate, but inet_accept doesn't allow the sk_state other than ESTABLISHED/ CLOSED for sctp. So we will change sk_state to CLOSED, instead of CLOSING, as actually sk is closed already there. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10sctp: sctp should change socket state when shutdown is receivedXin Long1-2/+6
Now sctp doesn't change socket state upon shutdown reception. It changes just the assoc state, even though it's a TCP-style socket. For some cases, if we really need to check sk->sk_state, it's necessary to fix this issue, at least when we use ss or netstat to dump, we can get a more exact information. As an improvement, we will change sk->sk_state when we change asoc->state to SHUTDOWN_RECEIVED, and also do it in sctp_shutdown to keep consistent with sctp_close. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sctp: Add GSO supportMarcelo Ricardo Leitner1-0/+2
SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer. This patch takes a different approach. SCTP will now build a skb as it would be if it was received using GRO. That is, there will be a cover skb with protocol headers and children ones containing the actual segments, already segmented to a way that respects SCTP RFCs. With that, we can tell skb_segment() to just split based on frag_list, trusting its sizes are already in accordance. This way SCTP can benefit from GSO and instead of passing several packets through the stack, it can pass a single large packet. v2: - Added support for receiving GSO frames, as requested by Dave Miller. - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP) - Added heuristics similar to what we have in TCP for not generating single GSO packets that fills cwnd. v3: - consider sctphdr size in skb_gso_transport_seglen() - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31sctp: sctp_diag should dump sctp socket typeXin Long1-0/+1
Now we cannot distinguish that one sk is a udp or sctp style when we use ss to dump sctp_info. it's necessary to dump it as well. For sctp_diag, ss support is not officially available, thus there are no official users of this yet, so we can add this field in the middle of sctp_info without breaking user API. v1->v2: - move 'sctpi_s_type' field to the end of struct sctp_info, so that it won't cause incompatibility with applications already built. - add __reserved3 in sctp_info to make sure sctp_info is 8-byte alignment. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15sctp: fix some rhashtable functions using in sctp proc/diagXin Long1-5/+10
When rhashtable_walk_init return err, no release function should be called, and when rhashtable_walk_start return err, we should only invoke rhashtable_walk_exit to release the source. But now when sctp_transport_walk_start return err, we just call rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init or start return err, which is so bad. We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start return err in sctp_transport_walk_start, and if sctp_transport_walk_start return err, we do not need to call sctp_transport_walk_stop any more. For sctp proc, we will use 'iter->start_fail' to decide if we will call rhashtable_walk_stop/exit. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15sctp: export some apis or variables for sctp_diag and reuse some for procXin Long1-0/+125
For some main variables in sctp.ko, we couldn't export it to other modules, so we have to define some api to access them. It will include sctp transport and endpoint's traversal. There are some transport traversal functions for sctp_diag, we can also use it for sctp_proc. cause they have the similar situation to traversal transport. v2->v3: - rhashtable_walk_init need the parameter gfp, because of recent upstrem update Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15sctp: add sctp_info dump api for sctp_diagXin Long1-0/+86
sctp_diag will dump some important details of sctp's assoc or ep, we use sctp_info to describe them, sctp_get_sctp_info to get them, and export it to sctp_diag.ko. v2->v3: - we will not use list_for_each_safe in sctp_get_sctp_info, cause all the callers of it will use lock_sock. - fix the holes in struct sctp_info with __reserved* field. because sctp_diag is a new feature, and sctp_info is just for now, it may be changed in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15sctp: simplify sk_receive_queue lockingMarcelo Ricardo Leitner1-3/+1
SCTP already serializes access to rcvbuf through its sock lock: sctp_recvmsg takes it right in the start and release at the end, while rx path will also take the lock before doing any socket processing. On sctp_rcv() it will check if there is an user using the socket and, if there is, it will queue incoming packets to the backlog. The backlog processing will do the same. Even timers will do such check and re-schedule if an user is using the socket. Simplifying this will allow us to remove sctp_skb_list_tail and get ride of some expensive lockings. The lists that it is used on are also mangled with functions like __skb_queue_tail and __skb_unlink in the same context, like on sctp_ulpq_tail_event() and sctp_clear_pd(). sctp_close() will also purge those while using only the sock lock. Therefore the lockings performed by sctp_skb_list_tail() are not necessary. This patch removes this function and replaces its calls with just skb_queue_splice_tail_init() instead. The biggest gain is at sctp_ulpq_tail_event(), because the events always contain a list, even if it's queueing a single skb and this was triggering expensive calls to spin_lock_irqsave/_irqrestore for every data chunk received. As SCTP will deliver each data chunk on a corresponding recvmsg, the more effective the change will be. Before this patch, with chunks with 30 bytes: netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000 400000 -s 400000 400000 on a 10Gbit link with 1500 MTU: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 137.45 7.34 7.36 52.504 52.608 With it: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 179.10 7.97 6.70 43.740 36.788 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14sctp: add support for RPS and RFSMarcelo Ricardo Leitner1-0/+3
This patch adds what's missing to properly support RPS and RFS on SCTP, as some of it is already implemented in common calls. Having support for RPS and RFS allows better scaling specially because not all NICs support hashing SCTP headers. Save the hash right when we dequeue a skb from inqueue so we do it only once per skb instead of per chunk. New sockets will then inherit the hash through sctp_copy_sock(). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+3
Pull networking bugfixes from David Miller: "Several bug fixes rolling in, some for changes introduced in this merge window, and some for problems that have existed for some time: 1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda. 2) The new DST_CACHE should be a silent config option, from Dave Jones. 3) inet_current_timestamp() unintentionally truncates timestamps to 16-bit, from Deepa Dinamani. 4) Missing reference to netns in ppp, from Guillaume Nault. 5) Free memory reference in hv_netvsc driver, from Haiyang Zhang. 6) Missing kernel doc documentation for function arguments in various spots around the networking, from Luis de Bethencourt. 7) UDP stopped receiving broadcast packets properly, due to overzealous multicast checks, fix from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) net: ping: make ping_v6_sendmsg static hv_netvsc: Fix the order of num_sc_offered decrement net: Fix typos and whitespace. hv_netvsc: Fix the array sizes to be max supported channels hv_netvsc: Fix accessing freed memory in netvsc_change_mtu() ppp: take reference on channels netns net: Reset encap_level to avoid resetting features on inner IP headers net: mediatek: fix checking for NULL instead of IS_ERR() in .probe net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY at803x: fix reset handling AF_VSOCK: Shrink the area influenced by prepare_to_wait Revert "vsock: Fix blocking ops call in prepare_to_wait" macb: fix PHY reset ipv4: initialize flowi4_flags before calling fib_lookup() fsl/fman: Workaround for Errata A-007273 ipv4: fix broadcast packets reception net: hns: bug fix about the overflow of mss net: hns: adds limitation for debug port mtu net: hns: fix the bug about mtu setting net: hns: fixes a bug of RSS ...
2016-03-22net/sctp: use in_compat_syscall for sctp_getsockopt_connectx3Andy Lutomirski1-1/+1
SCTP unfortunately has a different ABI for SCTP_SOCKOPT_CONNECTX3 for 32-bit and 64-bit callers. Use in_compat_syscall to correctly distinguish them on all architectures. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller1-9/+8
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-5/+4
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...
2016-03-17Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-4/+4
Pull crypto update from Herbert Xu: "Here is the crypto update for 4.6: API: - Convert remaining crypto_hash users to shash or ahash, also convert blkcipher/ablkcipher users to skcipher. - Remove crypto_hash interface. - Remove crypto_pcomp interface. - Add crypto engine for async cipher drivers. - Add akcipher documentation. - Add skcipher documentation. Algorithms: - Rename crypto/crc32 to avoid name clash with lib/crc32. - Fix bug in keywrap where we zero the wrong pointer. Drivers: - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver. - Add PIC32 hwrng driver. - Support BCM6368 in bcm63xx hwrng driver. - Pack structs for 32-bit compat users in qat. - Use crypto engine in omap-aes. - Add support for sama5d2x SoCs in atmel-sha. - Make atmel-sha available again. - Make sahara hashing available again. - Make ccp hashing available again. - Make sha1-mb available again. - Add support for multiple devices in ccp. - Improve DMA performance in caam. - Add hashing support to rockchip" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: qat - remove redundant arbiter configuration crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: qat - Change the definition of icp_qat_uof_regtype hwrng: exynos - use __maybe_unused to hide pm functions crypto: ccp - Add abstraction for device-specific calls crypto: ccp - CCP versioning support crypto: ccp - Support for multiple CCPs crypto: ccp - Remove check for x86 family and model crypto: ccp - memset request context to zero during import lib/mpi: use "static inline" instead of "extern inline" lib/mpi: avoid assembler warning hwrng: bcm63xx - fix non device tree compatibility crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode. crypto: qat - The AE id should be less than the maximal AE number lib/mpi: Endianness fix crypto: rockchip - add hash support for crypto engine in rk3288 crypto: xts - fix compile errors crypto: doc - add skcipher API documentation crypto: doc - update AEAD AD handling ...
2016-03-16sctp: consolidate local_bh_disable/enable + spin_lock/unlock to _bh variantNicholas Mc Guire1-4/+2
local_bh_disable() + spin_lock() is equivalent to spin_lock_bh(), same for the unlock/enable case, so replace the calls by the appropriate wrappers. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08sctp: fix copying more bytes than expected in sctp_add_bind_addrMarcelo Ricardo Leitner1-1/+3
Dmitry reported that sctp_add_bind_addr may read more bytes than expected in case the parameter is a IPv4 addr supplied by the user through calls such as sctp_bindx_add(), because it always copies sizeof(union sctp_addr) while the buffer may be just a struct sockaddr_in, which is smaller. This patch then fixes it by limiting the memcpy to the min between the union size and a (new parameter) provided addr size. Where possible this parameter still is the size of that union, except for reading from user-provided buffers, which then it accounts for protocol type. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+7
Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11sock: struct proto hash function may errorCraig Gallek1-1/+2
In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or propagates this return value at all call sites. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09sctp: translate network order to host order when users get a hmacidXin Long1-2/+7
Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") corrected the hmacid byte-order when setting a hmacid. but the same issue also exists on getting a hmacid. We fix it by changing hmacids to host order when users get them with getsockopt. Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-27sctp: Use shashHerbert Xu1-4/+4
This patch replaces uses of the long obsolete hash interface with shash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
2016-01-24sctp: allow setting SCTP_SACK_IMMEDIATELY by the applicationMarcelo Ricardo Leitner1-0/+2
This patch extends commit b93d6471748d ("sctp: implement the sender side for SACK-IMMEDIATELY extension") as it didn't white list SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be understood as an invalid flag and returning -EINVAL to the application. Note that the actual handling of the flag is already there in sctp_datamsg_from_user(). https://tools.ietf.org/html/rfc7053#section-7 Fixes: b93d6471748d ("sctp: implement the sender side for SACK-IMMEDIATELY extension") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05sctp: drop the old assoc hashtable of sctpXin Long1-5/+1
transport hashtable will replace the association hashtable, so association hashtable is not used in sctp any more, so drop the codes about that. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+9
2015-12-30sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_closeXin Long1-2/+1
In sctp_close, sctp_make_abort_user may return NULL because of memory allocation failure. If this happens, it will bypass any state change and never free the assoc. The assoc has no chance to be freed and it will be kept in memory with the state it had even after the socket is closed by sctp_close(). So if sctp_make_abort_user fails to allocate memory, we should abort the asoc via sctp_primitive_ABORT as well. Just like the annotation in sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said, "Even if we can't send the ABORT due to low memory delete the TCB. This is a departure from our typical NOMEM handling". But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would dereference the chunk pointer, and system crash. So we should add SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other places where it adds SCTP_CMD_REPLY cmd. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-28sctp: label accepted/peeled off socketsMarcelo Ricardo Leitner1-0/+2
Accepted or peeled off sockets were missing a security label (e.g. SELinux) which means that socket was in "unlabeled" state. This patch clones the sock's label from the parent sock and resolves the issue (similar to AF_BLUETOOTH protocol family). Cc: Paul Moore <pmoore@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-28sctp: use GFP_USER for user-controlled kmallocMarcelo Ricardo Leitner1-3/+6
Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc") missed two other spots. For connectx, as it's more likely to be used by kernel users of the API, it detects if GFP_USER should be used or not. Fixes: cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>