aboutsummaryrefslogtreecommitdiffstats
path: root/net/tls/tls_sw.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-03-03tls: Fix tls_device receiveBoris Pismenny1-11/+14
Currently, the receive function fails to handle records already decrypted by the device due to the commit mentioned below. This commit advances the TLS record sequence number and prepares the context to handle the next record. Fixes: fedf201e1296 ("net: tls: Refactor control message handling on recv") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix mixing between async capable and asyncEran Ben Elisha1-6/+9
Today, tls_sw_recvmsg is capable of using asynchronous mode to handle application data TLS records. Moreover, it assumes that if the cipher can be handled asynchronously, then all packets will be processed asynchronously. However, this assumption is not always true. Specifically, for AES-GCM in TLS1.2, it causes data corruption, and breaks user applications. This patch fixes this problem by separating the async capability from the decryption operation result. Fixes: c0ab4732d4c6 ("net/tls: Do not use async crypto for non-data records") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix write space handlingBoris Pismenny1-0/+13
TLS device cannot use the sw context. This patch returns the original tls device write space handler and moves the sw/device specific portions to the relevant files. Also, we remove the write_space call for the tls_sw flow, because it handles partial records in its delayed tx work handler. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24tls: Return type of non-data records retrieved using MSG_PEEK in recvmsgVakul Garg1-11/+67
The patch enables returning 'type' in msghdr for records that are retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked from socket from getting clubbed with any other record of different type when records are subsequently dequeued from strparser. For each record, we now retain its type in sk_buff's control buffer cb[]. Inside control buffer, record's full length and offset are already stored by strparser in 'struct strp_msg'. We store record type after 'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is stored just after record dequeue. For tls1.3, the type is stored after record has been decrypted. Inside process_rx_list(), before processing a non-data record, we check that we must be able to return back the record type to the user application. If not, the decrypted records in tls context's rx_list is left there without consuming any data. Fixes: 692d7b5d1f912 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-19net/tls: Move protocol constants from cipher context to tls contextVakul Garg1-81/+91
Each tls context maintains two cipher contexts (one each for tx and rx directions). For each tls session, the constants such as protocol version, ciphersuite, iv size, associated data size etc are same for both the directions and need to be stored only once per tls context. Hence these are moved from 'struct cipher_context' to 'struct tls_prot_info' and stored only once in 'struct tls_context'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/tls: Do not use async crypto for non-data recordsVakul Garg1-6/+12
Addition of tls1.3 support broke tls1.2 handshake when async crypto accelerator is used. This is because the record type for non-data records is not propagated to user application. Also when async decryption happens, the decryption does not stop when two different types of records get dequeued and submitted for decryption. To address it, we decrypt tls1.2 non-data records in synchronous way. We check whether the record we just processed has same type as the previous one before checking for async condition and jumping to dequeue next record. Fixes: 130b392c6cd6b ("net: tls: Add tls 1.3 support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-09net/tls: Disable async decrytion for tls1.3Vakul Garg1-2/+6
Function tls_sw_recvmsg() dequeues multiple records from stream parser and decrypts them. In case the decryption is done by async accelerator, the records may get submitted for decryption while the previous ones may not have been decryted yet. For tls1.3, the record type is known only after decryption. Therefore, for tls1.3, tls_sw_recvmsg() may submit records for decryption even if it gets 'handshake' records after 'data' records. These intermediate 'handshake' records may do a key updation. By the time new keys are given to ktls by userspace, it is possible that ktls has already submitted some records i(which are encrypted with new keys) for decryption using old keys. This would lead to decrypt failure. Therefore, async decryption of records should be disabled for tls1.3. Fixes: 130b392c6cd6b ("net: tls: Add tls 1.3 support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESSDave Watson1-2/+2
Currently we don't zerocopy if the crypto framework async bit is set. However some crypto algorithms (such as x86 AESNI) support async, but in the context of sendmsg, will never run asynchronously. Instead, check for actual EINPROGRESS return code before assuming algorithm is async. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Add tls 1.3 supportDave Watson1-22/+94
TLS 1.3 has minor changes from TLS 1.2 at the record layer. * Header now hardcodes the same version and application content type in the header. * The real content type is appended after the data, before encryption (or after decryption). * The IV is xored with the sequence number, instead of concatinating four bytes of IV with the explicit IV. * Zero-padding: No exlicit length is given, we search backwards from the end of the decrypted data for the first non-zero byte, which is the content type. Currently recv supports reading zero-padding, but there is no way for send to add zero padding. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Refactor control message handling on recvDave Watson1-44/+44
For TLS 1.3, the control message is encrypted. Handle control message checks after decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Refactor tls aad space size calculationDave Watson1-8/+9
TLS 1.3 has a different AAD size, use a variable in the code to make TLS 1.3 support easy. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Support 256 bit keysDave Watson1-4/+25
Wire up support for 256 bit keys from the setsockopt to the crypto framework Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+5
2019-01-28net: tls: Fix deadlock in free_resources txDave Watson1-0/+2
If there are outstanding async tx requests (when crypto returns EINPROGRESS), there is a potential deadlock: the tx work acquires the lock, while we cancel_delayed_work_sync() while holding the lock. Drop the lock while waiting for the work to complete. Fixes: a42055e8d2c30 ("Add support for async encryption of records...") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28net: tls: Save iv in tls_rec for async crypto requestsDave Watson1-1/+3
aead_request_set_crypt takes an iv pointer, and we change the iv soon after setting it. Some async crypto algorithms don't save the iv, so we need to save it in the tls_rec for async requests. Found by hardcoding x64 aesni to use async crypto manager (to test the async codepath), however I don't think this combination can happen in the wild. Presumably other hardware offloads will need this fix, but there have been no user reports. Fixes: a42055e8d2c30 ("Add support for async encryption of records...") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17tls: Fix recvmsg() to be able to peek across multiple recordsVakul Garg1-70/+196
This fixes recvmsg() to be able to peek across multiple tls records. Without this patch, the tls's selftests test case 'recv_peek_large_buf_mult_recs' fails. Each tls receive context now maintains a 'rx_list' to retain incoming skb carrying tls records. If a tls record needs to be retained e.g. for peek case or for the case when the buffer passed to recvmsg() has a length smaller than decrypted record length, then it is added to 'rx_list'. Additionally, records are added in 'rx_list' if the crypto operation runs in async mode. The records are dequeued from 'rx_list' after the decrypted data is consumed by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK flag is used in recvmsg(), then records are not consumed or removed from the 'rx_list'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net/tls: Make function tls_sw_do_sendpage staticYueHaibing1-2/+2
Fixes the following sparse warning: net/tls/tls_sw.c:1023:5: warning: symbol 'tls_sw_do_sendpage' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net/tls: remove unused function tls_sw_sendpage_lockedYueHaibing1-10/+0
There are no in-tree callers. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+6
2018-12-21tls: Do not call sk_memcopy_from_iter with zero lengthVakul Garg1-4/+6
In some conditions e.g. when tls_clone_plaintext_msg() returns -ENOSPC, the number of bytes to be copied using subsequent function sk_msg_memcopy_from_iter() becomes zero. This causes function sk_msg_memcopy_from_iter() to fail which in turn causes tls_sw_sendmsg() to return failure. To prevent it, do not call sk_msg_memcopy_from_iter() when number of bytes to copy (indicated by 'try_to_copy') is zero. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20bpf: sk_msg, sock{map|hash} redirect through ULPJohn Fastabend1-13/+30
A sockmap program that redirects through a kTLS ULP enabled socket will not work correctly because the ULP layer is skipped. This fixes the behavior to call through the ULP layer on redirect to ensure any operations required on the data stream at the ULP layer continue to be applied. To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid calling the BPF layer on a redirected message. This is required to avoid calling the BPF layer multiple times (possibly recursively) which is not the current/expected behavior without ULPs. In the future we may add a redirect flag if users _do_ want the policy applied again but this would need to work for both ULP and non-ULP sockets and be opt-in to avoid breaking existing programs. Also to avoid polluting the flag space with an internal flag we reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with MSG_WAITFORONE. Here WAITFORONE is specific to recv path and SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing to verify is user space API is masked correctly to ensure the flag can not be set by user. (Note this needs to be true regardless because we have internal flags already in-use that user space should not be able to set). But for completeness we have two UAPI paths into sendpage, sendfile and splice. In the sendfile case the function do_sendfile() zero's flags, ./fs/read_write.c: static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, size_t count, loff_t max) { ... fl = 0; #if 0 /* * We need to debate whether we can enable this or not. The * man page documents EAGAIN return for the output at least, * and the application is arguably buggy if it doesn't expect * EAGAIN on a non-blocking file descriptor. */ if (in.file->f_flags & O_NONBLOCK) fl = SPLICE_F_NONBLOCK; #endif file_start_write(out.file); retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl); } In the splice case the pipe_to_sendpage "actor" is used which masks flags with SPLICE_F_MORE. ./fs/splice.c: static int pipe_to_sendpage(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { ... more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; ... } Confirming what we expect that internal flags are in fact internal to socket side. Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-28bpf: helper to pop data from messagesJohn Fastabend1-2/+9
This adds a BPF SK_MSG program helper so that we can pop data from a msg. We use this to pop metadata from a previous push data call. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-01Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+2
Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...
2018-10-24iov_iter: Use accessor functionDavid Howells1-2/+2
Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-17bpf: sockmap, support for msg_peek in sk_msg with redirect ingressJohn Fastabend1-1/+2
This adds support for the MSG_PEEK flag when doing redirect to ingress and receiving on the sk_msg psock queue. Previously the flag was being ignored which could confuse applications if they expected the flag to work as normal. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-15tls: add bpf support to sk_msg handlingJohn Fastabend1-64/+375
This work adds BPF sk_msg verdict program support to kTLS allowing BPF and kTLS to be combined together. Previously kTLS and sk_msg verdict programs were mutually exclusive in the ULP layer which created challenges for the orchestrator when trying to apply TCP based policy, for example. To resolve this, leveraging the work from previous patches that consolidates the use of sk_msg, we can finally enable BPF sk_msg verdict programs so they continue to run after the kTLS socket is created. No change in behavior when kTLS is not used in combination with BPF, the kselftest suite for kTLS also runs successfully. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: replace poll implementation with read hookJohn Fastabend1-13/+3
Instead of re-implementing poll routine use the poll callback to trigger read from kTLS, we reuse the stream_memory_read callback which is simpler and achieves the same. This helps to align sockmap and kTLS so we can more easily embed BPF in kTLS. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: convert to generic sk_msg interfaceDaniel Borkmann1-327/+184
Convert kTLS over to make use of sk_msg interface for plaintext and encrypted scattergather data, so it reuses all the sk_msg helpers and data structure which later on in a second step enables to glue this to BPF. This also allows to remove quite a bit of open coded helpers which are covered by the sk_msg API. Recent changes in kTLs 80ece6a03aaf ("tls: Remove redundant vars from tls record structure") and 4e6d47206c32 ("tls: Add support for inplace records encryption") changed the data path handling a bit; while we've kept the latter optimization intact, we had to undo the former change to better fit the sk_msg model, hence the sg_aead_in and sg_aead_out have been brought back and are linked into the sk_msg sgs. Now the kTLS record contains a msg_plaintext and msg_encrypted sk_msg each. In the original code, the zerocopy_from_iter() has been used out of TX but also RX path. For the strparser skb-based RX path, we've left the zerocopy_from_iter() in decrypt_internal() mostly untouched, meaning it has been moved into tls_setup_from_iter() with charging logic removed (as not used from RX). Given RX path is not based on sk_msg objects, we haven't pursued setting up a dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it could be an option to prusue in a later step. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-02tls: Add support for inplace records encryptionVakul Garg1-18/+73
Presently, for non-zero copy case, separate pages are allocated for storing plaintext and encrypted text of records. These pages are stored in sg_plaintext_data and sg_encrypted_data scatterlists inside record structure. Further, sg_plaintext_data & sg_encrypted_data are passed to cryptoapis for record encryption. Allocating separate pages for plaintext and encrypted text is inefficient from both required memory and performance point of view. This patch adds support of inplace encryption of records. For non-zero copy case, we reuse the pages from sg_encrypted_data scatterlist to copy the application's plaintext data. For the movement of pages from sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new function move_to_plaintext_sg(). This function add pages into sg_plaintext_data from sg_encrypted_data scatterlists. tls_do_encryption() is modified to pass the same scatterlist as both source and destination into aead_request_set_crypt() if inplace crypto has been enabled. A new ariable 'inplace_crypto' has been introduced in record structure to signify whether the same scatterlist can be used. By default, the inplace_crypto is enabled in get_rec(). If zero-copy is used (i.e. plaintext data is not copied), inplace_crypto is set to '0'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29tls: Remove redundant vars from tls record structureVakul Garg1-49/+43
Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point to a aad_space and then chain scatterlists sg_plaintext_data, sg_encrypted_data respectively. Rather than using chained scatterlists for plaintext and encrypted data in aead_req, it is efficient to store aad_space in sg_encrypted_data and sg_plaintext_data itself in the first index and get rid of sg_aead_in, sg_aead_in and further chaining. This requires increasing size of sg_encrypted_data & sg_plaintext_data arrarys by 1 to accommodate entry for aad_space. The code which uses sg_encrypted_data and sg_plaintext_data has been modified to skip first index as it points to aad_space. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28net/tls: Make function get_rec() staticWei Yongjun1-1/+1
Fixes the following sparse warning: net/tls/tls_sw.c:655:16: warning: symbol 'get_rec' was not declared. Should it be static? Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25tls: Fixed a memory leak during socket closeVakul Garg1-2/+4
During socket close, if there is a open record with tx context, it needs to be be freed apart from freeing up plaintext and encrypted scatter lists. This patch frees up the open record if present in tx context. Also tls_free_both_sg() has been renamed to tls_free_open_rec() to indicate that the free record in tx context is being freed inside the function. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25tls: Fix socket mem accounting error under async encryptionVakul Garg1-5/+16
Current async encryption implementation sometimes showed up socket memory accounting error during socket close. This results in kernel warning calltrace. The root cause of the problem is that socket var sk_forward_alloc gets corrupted due to access in sk_mem_charge() and sk_mem_uncharge() being invoked from multiple concurrent contexts in multicore processor. The apis sk_mem_charge() and sk_mem_uncharge() are called from functions alloc_plaintext_sg(), free_sg() etc. It is required that memory accounting apis are called under a socket lock. The plaintext sg data sent for encryption is freed using free_sg() in tls_encryption_done(). It is wrong to call free_sg() from this function. This is because this function may run in irq context. We cannot acquire socket lock in this function. We remove calling of function free_sg() for plaintext data from tls_encryption_done() and defer freeing up of plaintext data to the time when the record is picked up from tx_list and transmitted/freed. When tls_tx_records() gets called, socket is already locked and thus there is no concurrent access problem. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24tls: Fixed uninitialised vars warningVakul Garg1-2/+2
In tls_sw_sendmsg() and tls_sw_sendpage(), it is possible that the uninitialised variable 'ret' gets passed to sk_stream_error(). So initialise local variable 'ret' to '0. The warnings were detected by 'smatch' tool. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24net/tls: Fixed race condition in async encryptionVakul Garg1-51/+30
On processors with multi-engine crypto accelerators, it is possible that multiple records get encrypted in parallel and their encryption completion is notified to different cpus in multicore processor. This leads to the situation where tls_encrypt_done() starts executing in parallel on different cores. In current implementation, encrypted records are queued to tx_ready_list in tls_encrypt_done(). This requires addition to linked list 'tx_ready_list' to be protected. As tls_decrypt_done() could be executing in irq content, it is not possible to protect linked list addition operation using a lock. To fix the problem, we remove linked list addition operation from the irq context. We do tx_ready_list addition/removal operation from application context only and get rid of possible multiple access to the linked list. Before starting encryption on the record, we add it to the tail of tx_ready_list. To prevent tls_tx_records() from transmitting it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'. When record encryption gets completed, tls_encrypt_done() has to only update the 'tx_ready' flag to true & linked list add operation is not required. The changed logic brings some other side benefits. Since the records are always submitted in tls sequence number order for encryption, the tx_ready_list always remains sorted and addition of new records to it does not have to traverse the linked list. Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to 'tx_list'. This is because now, the some of the records at the tail are not ready to transmit. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net/tls: Add support for async encryption of records for performanceVakul Garg1-143/+443
In current implementation, tls records are encrypted & transmitted serially. Till the time the previously submitted user data is encrypted, the implementation waits and on finish starts transmitting the record. This approach of encrypt-one record at a time is inefficient when asynchronous crypto accelerators are used. For each record, there are overheads of interrupts, driver softIRQ scheduling etc. Also the crypto accelerator sits idle most of time while an encrypted record's pages are handed over to tcp stack for transmission. This patch enables encryption of multiple records in parallel when an async capable crypto accelerator is present in system. This is achieved by allowing the user space application to send more data using sendmsg() even while previously issued data is being processed by crypto accelerator. This requires returning the control back to user space application after submitting encryption request to accelerator. This also means that zero-copy mode of encryption cannot be used with async accelerator as we must be done with user space application buffer before returning from sendmsg(). There can be multiple records in flight to/from the accelerator. Each of the record is represented by 'struct tls_rec'. This is used to store the memory pages for the record. After the records are encrypted, they are added in a linked list called tx_ready_list which contains encrypted tls records sorted as per tls sequence number. The records from tx_ready_list are transmitted using a newly introduced function called tls_tx_records(). The tx_ready_list is polled for any record ready to be transmitted in sendmsg(), sendpage() after initiating encryption of new tls records. This achieves parallel encryption and transmission of records when async accelerator is present. There could be situation when crypto accelerator completes encryption later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore we need a deferred work context to be able to transmit records from tx_ready_list. The deferred work context gets scheduled if applications are not sending much data through the socket. If the applications issue sendmsg()/sendpage() in quick succession, then the scheduling of tx_work_handler gets cancelled as the tx_ready_list would be polled from application's context itself. This saves scheduling overhead of deferred work. The patch also brings some side benefit. We are able to get rid of the concept of CLOSED record. This is because the records once closed are either encrypted and then placed into tx_ready_list or if encryption fails, the socket error is set. This simplifies the kernel tls sendpath. However since tls_device.c is still using macros, accessory functions for CLOSED records have been retained. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-8/+12
Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17tls: fix currently broken MSG_PEEK behaviorDaniel Borkmann1-0/+8
In kTLS MSG_PEEK behavior is currently failing, strace example: [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] listen(4, 10) = 0 [pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] close(4) = 0 [pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64 As can be seen from strace, there are two TLS records sent, i) 'test_read_peek' and ii) '_mult_recs\0' where we end up peeking 'test_read_peektest_read_peektest'. This is clearly wrong, and what happens is that given peek cannot call into tls_sw_advance_skb() to unpause strparser and proceed with the next skb, we end up looping over the current one, copying the 'test_read_peek' over and over into the user provided buffer. Here, we can only peek into the currently held skb (current, full TLS record) as otherwise we would end up having to hold all the original skb(s) (depending on the peek depth) in a separate queue when unpausing strparser to process next records, minimally intrusive is to return only up to the current record's size (which likely was what c46234ebb4d1 ("tls: RX path for ktls") originally intended as well). Thus, after patch we properly peek the first record: [pid 2046] wait4(2075, <unfinished ...> [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] listen(4, 10) = 0 [pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] close(4) = 0 [pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14 Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17tls: async support causes out-of-bounds access in crypto APIsJohn Fastabend1-16/+23
When async support was added it needed to access the sk from the async callback to report errors up the stack. The patch tried to use space after the aead request struct by directly setting the reqsize field in aead_request. This is an internal field that should not be used outside the crypto APIs. It is used by the crypto code to define extra space for private structures used in the crypto context. Users of the API then use crypto_aead_reqsize() and add the returned amount of bytes to the end of the request memory allocation before posting the request to encrypt/decrypt APIs. So this breaks (with general protection fault and KASAN error, if enabled) because the request sent to decrypt is shorter than required causing the crypto API out-of-bounds errors. Also it seems unlikely the sk is even valid by the time it gets to the callback because of memset in crypto layer. Anyways, fix this by holding the sk in the skb->sk field when the callback is set up and because the skb is already passed through to the callback handler via void* we can access it in the handler. Then in the handler we need to be careful to NULL the pointer again before kfree_skb. I added comments on both the setup (in tls_do_decryption) and when we clear it from the crypto callback handler tls_decrypt_done(). After this selftests pass again and fixes KASAN errors/warnings. Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: zero the crypto information from tls_context before freeingSabrina Dubroca1-4/+4
This contains key material in crypto_send_aes_gcm_128 and crypto_recv_aes_gcm_128. Introduce union tls_crypto_context, and replace the two identical unions directly embedded in struct tls_context with it. We can then use this union to clean up the memory in the new tls_ctx_free() function. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: don't copy the key out of tls12_crypto_info_aes_gcm_128Sabrina Dubroca1-4/+1
There's no need to copy the key to an on-stack buffer before calling crypto_aead_setkey(). Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+6
2018-09-12net/tls: Fixed return value when tls_complete_pending_work() failsVakul Garg1-4/+6
In tls_sw_sendmsg() and tls_sw_sendpage(), the variable 'ret' has been set to return value of tls_complete_pending_work(). This allows return of proper error code if tls_complete_pending_work() fails. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPCVakul Garg1-0/+6
tls_sw_sendmsg() allocates plaintext and encrypted SG entries using function sk_alloc_sg(). In case the number of SG entries hit MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for current SG index to '0'. This leads to calling of function tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes kernel crash. To fix this, set the number of SG elements to the number of elements in plaintext/encrypted SG arrays in case sk_alloc_sg() returns -ENOSPC. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01net/tls: Add support for async decryption of tls recordsVakul Garg1-13/+121
When tls records are decrypted using asynchronous acclerators such as NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on getting -EINPROGRESS, the tls record processing stops till the time the crypto accelerator finishes off and returns the result. This incurs a context switch and is not an efficient way of accessing the crypto accelerators. Crypto accelerators work efficient when they are queued with multiple crypto jobs without having to wait for the previous ones to complete. The patch submits multiple crypto requests without having to wait for for previous ones to complete. This has been implemented for records which are decrypted in zero-copy mode. At the end of recvmsg(), we wait for all the asynchronous decryption requests to complete. The references to records which have been sent for async decryption are dropped. For cases where record decryption is not possible in zero-copy mode, asynchronous decryption is not used and we wait for decryption crypto api to complete. For crypto requests executing in async fashion, the memory for aead_request, sglists and skb etc is freed from the decryption completion handler. The decryption completion handler wakesup the sleeping user context when recvmsg() flags that it has done sending all the decryption requests and there are no more decryption requests pending to be completed. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net/tls: Calculate nsg for zerocopy path without skb_cow_data.Doron Roberts-Kedes1-1/+79
decrypt_skb fails if the number of sg elements required to map it is greater than MAX_SKB_FRAGS. nsg must always be calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy case. The new function skb_nsg calculates the number of scatterlist elements required to map the skb without the extra overhead of skb_cow_data. This patch reduces memcpy by 50% on my encrypted NBD benchmarks. Reported-by: Vakul Garg <Vakul.garg@nxp.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Tested-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-13net/tls: Combined memory allocation for decryption requestVakul Garg1-96/+142
For preparing decryption request, several memory chunks are required (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to an accelerator, it is required that the buffers which are read by the accelerator must be dma-able and not come from stack. The buffers for aad and iv can be separately kmalloced each, but it is inefficient. This patch does a combined allocation for preparing decryption request and then segments into aead_req || sgin || sgout || iv || aad. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05net/tls: Mark the end in scatterlist tableVakul Garg1-0/+3
Function zerocopy_from_iter() unmarks the 'end' in input sgtable while adding new entries in it. The last entry in sgtable remained unmarked. This results in KASAN error report on using apis like sg_nents(). Before returning, the function needs to mark the 'end' in the last entry it adds. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net/tls: Use kmemdup to simplify the codezhong jiang1-2/+1
Kmemdup is better than kmalloc+memcpy. So replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net/tls: Use socket data_ready callback on record availabilityVakul Garg1-1/+1
On receipt of a complete tls record, use socket's saved data_ready callback instead of state_change callback. In function tls_queue(), the TLS record is queued in encrypted state. But the decryption happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked. So it should be ok to notify the waiting context about the availability of data as soon as we could collect a full TLS record. For new data availability notification, sk_data_ready callback is more appropriate. It points to sock_def_readable() which wakes up specifically for EPOLLIN event. This is in contrast to the socket callback sk_state_change which points to sock_def_wakeup() which issues a wakeup unconditionally (without event mask). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>