aboutsummaryrefslogtreecommitdiffstats
path: root/net/tls/tls_main.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-03-20net/tls: Add support of AES128-CCM based ciphersVakul Garg1-13/+18
Added support for AES128-CCM based record encryption. AES128-CCM is similar to AES128-GCM. Both of them have same salt/iv/mac size. The notable difference between the two is that while invoking AES128-CCM operation, the salt||nonce (which is passed as IV) has to be prefixed with a hardcoded value '2'. Further, CCM implementation in kernel requires IV passed in crypto_aead_request() to be full '16' bytes. Therefore, the record structure 'struct tls_rec' has been modified to reserve '16' bytes for IV. This works for both GCM and CCM based cipher. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-13net/tls: Inform user space about send buffer availabilityVakul Garg1-1/+2
A previous fix ("tls: Fix write space handling") assumed that user space application gets informed about the socket send buffer availability when tls_push_sg() gets called. Inside tls_push_sg(), in case do_tcp_sendpages() returns 0, the function returns without calling ctx->sk_write_space. Further, the new function tls_sw_write_space() did not invoke ctx->sk_write_space. This leads to situation that user space application encounters a lockup always waiting for socket send buffer to become available. Rather than call ctx->sk_write_space from tls_push_sg(), it should be called from tls_write_space. So whenever tcp stack invokes sk->sk_write_space after freeing socket send buffer, we always declare the same to user space by the way of invoking ctx->sk_write_space. Fixes: 7463d3a2db0ef ("tls: Fix write space handling") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix write space handlingBoris Pismenny1-9/+6
TLS device cannot use the sw context. This patch returns the original tls device write space handler and moves the sw/device specific portions to the relevant files. Also, we remove the write_space call for the tls_sw flow, because it handles partial records in its delayed tx work handler. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix tls_device handling of partial recordsBoris Pismenny1-13/+0
Cleanup the handling of partial records while fixing a bug where the tls_push_pending_closed_record function is using the software tls context instead of the hardware context. The bug resulted in the following crash: [ 88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 88.793271] #PF error: [normal kernel read fault] [ 88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0 [ 88.795958] Oops: 0000 [#1] SMP PTI [ 88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3 [ 88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls] [ 88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd 4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39 c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00 [ 88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213 [ 88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000 [ 88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980 [ 88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700 [ 88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000 [ 88.814506] FS: 00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000 [ 88.816337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0 [ 88.819328] Call Trace: [ 88.820123] tls_push_data+0x628/0x6a0 [tls] [ 88.821283] ? remove_wait_queue+0x20/0x60 [ 88.822383] ? n_tty_read+0x683/0x910 [ 88.823363] tls_device_sendmsg+0x53/0xa0 [tls] [ 88.824505] sock_sendmsg+0x36/0x50 [ 88.825492] sock_write_iter+0x87/0x100 [ 88.826521] __vfs_write+0x127/0x1b0 [ 88.827499] vfs_write+0xad/0x1b0 [ 88.828454] ksys_write+0x52/0xc0 [ 88.829378] do_syscall_64+0x5b/0x180 [ 88.830369] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 88.831603] RIP: 0033:0x7fcad1451680 [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 1248.472564] #PF error: [normal kernel read fault] [ 1248.473790] PGD 0 P4D 0 [ 1248.474642] Oops: 0000 [#1] SMP PTI [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G OE 5.0.0-rc4+ #3 [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls] [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c 1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39 c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00 [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213 [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006 [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286 [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1 [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000 [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000 [ 1248.494923] FS: 0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000 [ 1248.496847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0 [ 1248.500136] Call Trace: [ 1248.500998] ? tcp_check_oom+0xd0/0xd0 [ 1248.502106] tls_sk_proto_close+0x127/0x1e0 [tls] [ 1248.503411] inet_release+0x3c/0x60 [ 1248.504530] __sock_release+0x3d/0xb0 [ 1248.505611] sock_close+0x11/0x20 [ 1248.506612] __fput+0xb4/0x220 [ 1248.507559] task_work_run+0x88/0xa0 [ 1248.508617] do_exit+0x2cb/0xbc0 [ 1248.509597] ? core_sys_select+0x17a/0x280 [ 1248.510740] do_group_exit+0x39/0xb0 [ 1248.511789] get_signal+0x1d0/0x630 [ 1248.512823] do_signal+0x36/0x620 [ 1248.513822] exit_to_usermode_loop+0x5c/0xc6 [ 1248.515003] do_syscall_64+0x157/0x180 [ 1248.516094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1248.517456] RIP: 0033:0x7fb398bd3f53 [ 1248.518537] Code: Bad RIP value. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-19net/tls: Move protocol constants from cipher context to tls contextVakul Garg1-2/+15
Each tls context maintains two cipher contexts (one each for tx and rx directions). For each tls session, the constants such as protocol version, ciphersuite, iv size, associated data size etc are same for both the directions and need to be stored only once per tls context. Hence these are moved from 'struct cipher_context' to 'struct tls_prot_info' and stored only once in 'struct tls_context'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Add tls 1.3 supportDave Watson1-1/+2
TLS 1.3 has minor changes from TLS 1.2 at the record layer. * Header now hardcodes the same version and application content type in the header. * The real content type is appended after the data, before encryption (or after decryption). * The IV is xored with the sequence number, instead of concatinating four bytes of IV with the explicit IV. * Zero-padding: No exlicit length is given, we search backwards from the end of the decrypted data for the first non-zero byte, which is the content type. Currently recv supports reading zero-padding, but there is no way for send to add zero padding. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Support 256 bit keysDave Watson1-2/+31
Wire up support for 256 bit keys from the setsockopt to the crypto framework Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22net/tls: free ctx in sock destructAtul Gupta1-2/+17
free tls context in sock destruct. close may not be the last call to free sock but force releasing the ctx in close will result in GPF when ctx referred again in tcp_done [ 515.330477] general protection fault: 0000 [#1] SMP PTI [ 515.330539] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc7+ #10 [ 515.330657] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0b 11/07/2013 [ 515.330844] RIP: 0010:tls_hw_unhash+0xbf/0xd0 [ [ 515.332220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 515.332340] CR2: 00007fab32c55000 CR3: 000000009261e000 CR4: 00000000000006e0 [ 515.332519] Call Trace: [ 515.332632] <IRQ> [ 515.332793] tcp_set_state+0x5a/0x190 [ 515.332907] ? tcp_update_metrics+0xe3/0x350 [ 515.333023] tcp_done+0x31/0xd0 [ 515.333130] tcp_rcv_state_process+0xc27/0x111a [ 515.333242] ? __lock_is_held+0x4f/0x90 [ 515.333350] ? tcp_v4_do_rcv+0xaf/0x1e0 [ 515.333456] tcp_v4_do_rcv+0xaf/0x1e0 Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22net/tls: build_protos moved to common routineAtul Gupta1-22/+32
build protos is required for tls_hw_prot also hence moved to 'tls_build_proto' and called as required from tls_init and tls_hw_proto. This is required since build_protos for v4 is moved from tls_register to tls_init in commit <28cb6f1eaffdc5a6a9707cac55f4a43aa3fd7895> Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-2/+12
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-12-21 The following pull-request contains BPF updates for your *net-next* tree. There is a merge conflict in test_verifier.c. Result looks as follows: [...] }, { "calls: cross frame pruning", .insns = { [...] .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .errstr_unpriv = "function calls to other bpf functions are allowed for root only", .result_unpriv = REJECT, .errstr = "!read_ok", .result = REJECT, }, { "jset: functional", .insns = { [...] { "jset: unknown const compare not taken", .insns = { BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1), BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .errstr_unpriv = "!read_ok", .result_unpriv = REJECT, .errstr = "!read_ok", .result = REJECT, }, [...] { "jset: range", .insns = { [...] }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .result_unpriv = ACCEPT, .result = ACCEPT, }, The main changes are: 1) Various BTF related improvements in order to get line info working. Meaning, verifier will now annotate the corresponding BPF C code to the error log, from Martin and Yonghong. 2) Implement support for raw BPF tracepoints in modules, from Matt. 3) Add several improvements to verifier state logic, namely speeding up stacksafe check, optimizations for stack state equivalence test and safety checks for liveness analysis, from Alexei. 4) Teach verifier to make use of BPF_JSET instruction, add several test cases to kselftests and remove nfp specific JSET optimization now that verifier has awareness, from Jakub. 5) Improve BPF verifier's slot_type marking logic in order to allow more stack slot sharing, from Jiong. 6) Add sk_msg->size member for context access and add set of fixes and improvements to make sock_map with kTLS usable with openssl based applications, from John. 7) Several cleanups and documentation updates in bpftool as well as auto-mount of tracefs for "bpftool prog tracelog" command, from Quentin. 8) Include sub-program tags from now on in bpf_prog_info in order to have a reliable way for user space to get all tags of the program e.g. needed for kallsyms correlation, from Song. 9) Add BTF annotations for cgroup_local_storage BPF maps and implement bpf fs pretty print support, from Roman. 10) Fix bpftool in order to allow for cross-compilation, from Ivan. 11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order to be compatible with libbfd and allow for Debian packaging, from Jakub. 12) Remove an obsolete prog->aux sanitation in dump and get rid of version check for prog load, from Daniel. 13) Fix a memory leak in libbpf's line info handling, from Prashant. 14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info does not get unaligned, from Jesper. 15) Fix test_progs kselftest to work with older compilers which are less smart in optimizing (and thus throwing build error), from Stanislav. 16) Cleanup and simplify AF_XDP socket teardown, from Björn. 17) Fix sk lookup in BPF kselftest's test_sock_addr with regards to netns_id argument, from Andrey. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20bpf: tls_sw, init TLS ULP removes BPF proto hooksJohn Fastabend1-2/+12
The existing code did not expect users would initialize the TLS ULP without subsequently calling the TLS TX enabling socket option. If the application tries to send data after the TLS ULP enable op but before the TLS TX enable op the BPF sk_msg verdict program is skipped. This patch resolves this by converting the ipv4 sock ops to be calculated at init time the same way ipv6 ops are done. This pulls in any changes to the sock ops structure that have been made after the socket was created including the changes from adding the socket to a sock{map|hash}. This was discovered by running OpenSSL master branch which calls the TLS ULP setsockopt early in TLS handshake but only enables the TLS TX path once the handshake has completed. As a result the datapath missed the initial handshake messages. Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-19net/tls: allocate tls context using GFP_ATOMICGanesh Goudar1-1/+1
create_ctx can be called from atomic context, hence use GFP_ATOMIC instead of GFP_KERNEL. [ 395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl [ 395.996564] 2 locks held by openssl/16254: [ 396.010492] #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0 [ 396.029838] #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280 [ 396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G O 4.20.0-rc6+ #25 [ 396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017 [ 396.083537] Call Trace: [ 396.096265] dump_stack+0x5e/0x8b [ 396.109876] ___might_sleep+0x216/0x250 [ 396.123940] kmem_cache_alloc_trace+0x1b0/0x240 [ 396.138800] create_ctx+0x1f/0x60 [ 396.152504] tls_init+0xbd/0x280 [ 396.166135] tcp_set_ulp+0x191/0x2d0 [ 396.180035] ? tcp_set_ulp+0x2c/0x2d0 [ 396.193960] do_tcp_setsockopt.isra.44+0x148/0x9a0 [ 396.209013] __sys_setsockopt+0x7c/0xe0 [ 396.223054] __x64_sys_setsockopt+0x20/0x30 [ 396.237378] do_syscall_64+0x4a/0x180 [ 396.251200] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context") Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net/tls: sleeping function from invalid contextAtul Gupta1-13/+23
HW unhash within mutex for registered tls devices cause sleep when called from tcp_set_state for TCP_CLOSE. Release lock and re-acquire after function call with ref count incr/dec. defined kref and fp release for tls_device to ensure device is not released outside lock. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:748 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7 INFO: lockdep is turned off. CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O Call Trace: <IRQ> dump_stack+0x5e/0x8b ___might_sleep+0x222/0x260 __mutex_lock+0x5c/0xa50 ? vprintk_emit+0x1f3/0x440 ? kmem_cache_free+0x22d/0x2a0 ? tls_hw_unhash+0x2f/0x80 ? printk+0x52/0x6e ? tls_hw_unhash+0x2f/0x80 tls_hw_unhash+0x2f/0x80 tcp_set_state+0x5f/0x180 tcp_done+0x2e/0xe0 tcp_rcv_state_process+0x92c/0xdd3 ? lock_acquire+0xf5/0x1f0 ? tcp_v4_rcv+0xa7c/0xbe0 ? tcp_v4_do_rcv+0x70/0x1e0 Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net/tls: Init routines in create_ctxAtul Gupta1-3/+3
create_ctx is called from tls_init and tls_hw_prot hence initialize function pointers in common routine. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-20ulp: remove uid and user_visible membersDaniel Borkmann1-2/+0
They are not used anymore and therefore should be removed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: replace poll implementation with read hookJohn Fastabend1-5/+6
Instead of re-implementing poll routine use the poll callback to trigger read from kTLS, we reuse the stream_memory_read callback which is simpler and achieves the same. This helps to align sockmap and kTLS so we can more easily embed BPF in kTLS. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-24net/tls: Fixed race condition in async encryptionVakul Garg1-2/+2
On processors with multi-engine crypto accelerators, it is possible that multiple records get encrypted in parallel and their encryption completion is notified to different cpus in multicore processor. This leads to the situation where tls_encrypt_done() starts executing in parallel on different cores. In current implementation, encrypted records are queued to tx_ready_list in tls_encrypt_done(). This requires addition to linked list 'tx_ready_list' to be protected. As tls_decrypt_done() could be executing in irq content, it is not possible to protect linked list addition operation using a lock. To fix the problem, we remove linked list addition operation from the irq context. We do tx_ready_list addition/removal operation from application context only and get rid of possible multiple access to the linked list. Before starting encryption on the record, we add it to the tail of tx_ready_list. To prevent tls_tx_records() from transmitting it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'. When record encryption gets completed, tls_encrypt_done() has to only update the 'tx_ready' flag to true & linked list add operation is not required. The changed logic brings some other side benefits. Since the records are always submitted in tls sequence number order for encryption, the tx_ready_list always remains sorted and addition of new records to it does not have to traverse the linked list. Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to 'tx_list'. This is because now, the some of the records at the tail are not ready to transmit. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net/tls: Add support for async encryption of records for performanceVakul Garg1-33/+21
In current implementation, tls records are encrypted & transmitted serially. Till the time the previously submitted user data is encrypted, the implementation waits and on finish starts transmitting the record. This approach of encrypt-one record at a time is inefficient when asynchronous crypto accelerators are used. For each record, there are overheads of interrupts, driver softIRQ scheduling etc. Also the crypto accelerator sits idle most of time while an encrypted record's pages are handed over to tcp stack for transmission. This patch enables encryption of multiple records in parallel when an async capable crypto accelerator is present in system. This is achieved by allowing the user space application to send more data using sendmsg() even while previously issued data is being processed by crypto accelerator. This requires returning the control back to user space application after submitting encryption request to accelerator. This also means that zero-copy mode of encryption cannot be used with async accelerator as we must be done with user space application buffer before returning from sendmsg(). There can be multiple records in flight to/from the accelerator. Each of the record is represented by 'struct tls_rec'. This is used to store the memory pages for the record. After the records are encrypted, they are added in a linked list called tx_ready_list which contains encrypted tls records sorted as per tls sequence number. The records from tx_ready_list are transmitted using a newly introduced function called tls_tx_records(). The tx_ready_list is polled for any record ready to be transmitted in sendmsg(), sendpage() after initiating encryption of new tls records. This achieves parallel encryption and transmission of records when async accelerator is present. There could be situation when crypto accelerator completes encryption later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore we need a deferred work context to be able to transmit records from tx_ready_list. The deferred work context gets scheduled if applications are not sending much data through the socket. If the applications issue sendmsg()/sendpage() in quick succession, then the scheduling of tx_work_handler gets cancelled as the tx_ready_list would be polled from application's context itself. This saves scheduling overhead of deferred work. The patch also brings some side benefit. We are able to get rid of the concept of CLOSED record. This is because the records once closed are either encrypted and then placed into tx_ready_list or if encryption fails, the socket error is set. This simplifies the kernel tls sendpath. However since tls_device.c is still using macros, accessory functions for CLOSED records have been retained. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: clear key material from kernel memory when do_tls_setsockopt_conf failsSabrina Dubroca1-1/+1
Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: zero the crypto information from tls_context before freeingSabrina Dubroca1-5/+15
This contains key material in crypto_send_aes_gcm_128 and crypto_recv_aes_gcm_128. Introduce union tls_crypto_context, and replace the two identical unions directly embedded in struct tls_context with it. We can then use this union to clean up the memory in the new tls_ctx_free() function. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-22tls: possible hang when do_tcp_sendpages hits sndbuf is full caseJohn Fastabend1-2/+7
Currently, the lower protocols sk_write_space handler is not called if TLS is sending a scatterlist via tls_push_sg. However, normally tls_push_sg calls do_tcp_sendpage, which may be under memory pressure, that in turn may trigger a wait via sk_wait_event. Typically, this happens when the in-flight bytes exceed the sdnbuf size. In the normal case when enough ACKs are received sk_write_space() will be called and the sk_wait_event will be woken up allowing it to send more data and/or return to the user. But, in the TLS case because the sk_write_space() handler does not wake up the events the above send will wait until the sndtimeo is exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a hang to the user (especially this impatient user). To fix this pass the sk_write_space event to the lower layers sk_write_space event which in the TCP case will wake any pending events. I observed the above while integrating sockmap and ktls. It initially appeared as test_sockmap (modified to use ktls) occasionally hanging. To reliably reproduce this reduce the sndbuf size and stress the tls layer by sending many 1B sends. This results in every byte needing a header and each byte individually being sent to the crypto layer. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-16tcp, ulp: add alias for all ulp modulesDaniel Borkmann1-0/+1
Lets not turn the TCP ULP lookup into an arbitrary module loader as we only intend to load ULP modules through this mechanism, not other unrelated kernel modules: [root@bar]# cat foo.c #include <sys/types.h> #include <sys/socket.h> #include <linux/tcp.h> #include <linux/in.h> int main(void) { int sock = socket(PF_INET, SOCK_STREAM, 0); setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp")); return 0; } [root@bar]# gcc foo.c -O2 -Wall [root@bar]# lsmod | grep sctp [root@bar]# ./a.out [root@bar]# lsmod | grep sctp sctp 1077248 4 libcrc32c 16384 3 nf_conntrack,nf_nat,sctp [root@bar]# Fix it by adding module alias to TCP ULP modules, so probing module via request_module() will be limited to tcp-ulp-[name]. The existing modules like kTLS will load fine given tcp-ulp-tls alias, but others will fail to load: [root@bar]# lsmod | grep sctp [root@bar]# ./a.out [root@bar]# lsmod | grep sctp [root@bar]# Sockmap is not affected from this since it's either built-in or not. Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-07-16tls: Add rx inline crypto offloadBoris Pismenny1-12/+20
This patch completes the generic infrastructure to offload TLS crypto to a network device. It enables the kernel to skip decryption and authentication of some skbs marked as decrypted by the NIC. In the fast path, all packets received are decrypted by the NIC and the performance is comparable to plain TCP. This infrastructure doesn't require a TCP offload engine. Instead, the NIC only decrypts packets that contain the expected TCP sequence number. Out-Of-Order TCP packets are provided unmodified. As a result, at the worst case a received TLS record consists of both plaintext and ciphertext packets. These partially decrypted records must be reencrypted, only to be decrypted. The notable differences between SW KTLS Rx and this offload are as follows: 1. Partial decryption - Software must handle the case of a TLS record that was only partially decrypted by HW. This can happen due to packet reordering. 2. Resynchronization - tls_read_size calls the device driver to resynchronize HW after HW lost track of TLS record framing in the TCP stream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLLLinus Torvalds1-1/+1
The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-11tls: fix NULL pointer dereference on pollDaniel Borkmann1-1/+1
While hacking on kTLS, I ran into the following panic from an unprivileged netserver / netperf TCP session: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0 Oops: 0010 [#1] SMP KASAN PTI CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffff88036abcf740 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26 RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280 RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48 R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280 R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350 FS: 00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? tls_sw_poll+0xa4/0x160 [tls] ? sock_poll+0x20a/0x680 ? do_select+0x77b/0x11a0 ? poll_schedule_timeout.constprop.12+0x130/0x130 ? pick_link+0xb00/0xb00 ? read_word_at_a_time+0x13/0x20 ? vfs_poll+0x270/0x270 ? deref_stack_reg+0xad/0xe0 ? __read_once_size_nocheck.constprop.6+0x10/0x10 [...] Debugging further, it turns out that calling into ctx->sk_poll() is invalid since sk_poll itself is NULL which was saved from the original TCP socket in order for tls_sw_poll() to invoke it. Looks like the recent conversion from poll to poll_mask callback started in 152524231023 ("net: add support for ->poll_mask in proto_ops") missed to eventually convert kTLS, too: TCP's ->poll was converted over to the ->poll_mask in commit 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") and therefore kTLS wrongly saved the ->poll old one which is now NULL. Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN | POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in tcp_poll_mask() as well that is mangled here. Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Watson <davejwatson@fb.com> Tested-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-8/+6
The bpf syscall and selftests conflicts were trivial overlapping changes. The r8169 change involved moving the added mdelay from 'net' into a different function. A TLS close bug fix overlapped with the splitting of the TLS state into separate TX and RX parts. I just expanded the tests in the bug fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf == X". Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07net/tls: Fix connection stall on partial tls recordAndre Tomt1-0/+1
In the case of writing a partial tls record we forgot to clear the ctx->in_tcp_sendpages flag, causing some connections to stall. Fixes: c212d2c7fc47 ("net/tls: Don't recursively call push_record during tls_write_space callbacks") Signed-off-by: Andre Tomt <andre@tomt.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07tls: fix use after free in tls_sk_proto_closeEric Dumazet1-7/+4
syzbot reported a use-after-free in tls_sk_proto_close Add a boolean value to cleanup a bit this function. BUG: KASAN: use-after-free in tls_sk_proto_close+0x8ab/0x9c0 net/tls/tls_main.c:297 Read of size 1 at addr ffff8801ae40a858 by task syz-executor363/4503 CPU: 0 PID: 4503 Comm: syz-executor363 Not tainted 4.17.0-rc3+ #34 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 tls_sk_proto_close+0x8ab/0x9c0 net/tls/tls_main.c:297 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460 sock_release+0x96/0x1b0 net/socket.c:594 sock_close+0x16/0x20 net/socket.c:1149 __fput+0x34d/0x890 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1e4/0x290 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1aee/0x2730 kernel/exit.c:865 do_group_exit+0x16f/0x430 kernel/exit.c:968 get_signal+0x886/0x1960 kernel/signal.c:2469 do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810 exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4457b9 RSP: 002b:00007fdf4d766da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00000000006dac3c RCX: 00000000004457b9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000006dac3c RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dac38 R13: 3692738801137283 R14: 6bf92c39443c4c1d R15: 0000000000000006 Allocated by task 4498: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620 kmalloc include/linux/slab.h:512 [inline] kzalloc include/linux/slab.h:701 [inline] create_ctx net/tls/tls_main.c:521 [inline] tls_init+0x1f9/0xb00 net/tls/tls_main.c:633 tcp_set_ulp+0x1bc/0x520 net/ipv4/tcp_ulp.c:153 do_tcp_setsockopt.isra.39+0x44a/0x2600 net/ipv4/tcp.c:2588 tcp_setsockopt+0xc1/0xe0 net/ipv4/tcp.c:2893 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 4503: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xd9/0x260 mm/slab.c:3813 tls_sw_free_resources+0x2a3/0x360 net/tls/tls_sw.c:1037 tls_sk_proto_close+0x67c/0x9c0 net/tls/tls_main.c:288 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460 sock_release+0x96/0x1b0 net/socket.c:594 sock_close+0x16/0x20 net/socket.c:1149 __fput+0x34d/0x890 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1e4/0x290 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1aee/0x2730 kernel/exit.c:865 do_group_exit+0x16f/0x430 kernel/exit.c:968 get_signal+0x886/0x1960 kernel/signal.c:2469 do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810 exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801ae40a800 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 88 bytes inside of 256-byte region [ffff8801ae40a800, ffff8801ae40a900) The buggy address belongs to the page: page:ffffea0006b90280 count:1 mapcount:0 mapping:ffff8801ae40a080 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffff8801ae40a080 0000000000000000 000000010000000c raw: ffffea0006bea9e0 ffffea0006bc94a0 ffff8801da8007c0 0000000000000000 page dumped because: kasan: bad access detected Fixes: dd0bed1665d6 ("tls: support for Inline tls record") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Atul Gupta <atul.gupta@chelsio.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Ilya Lesokhin <ilyal@mellanox.com> Cc: Aviad Yehezkel <aviadye@mellanox.com> Cc: Dave Watson <davejwatson@fb.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+7
Overlapping changes in selftests Makefile. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01net/tls: Don't recursively call push_record during tls_write_space callbacksDave Watson1-0/+7
It is reported that in some cases, write_space may be called in do_tcp_sendpages, such that we recursively invoke do_tcp_sendpages again: [ 660.468802] ? do_tcp_sendpages+0x8d/0x580 [ 660.468826] ? tls_push_sg+0x74/0x130 [tls] [ 660.468852] ? tls_push_record+0x24a/0x390 [tls] [ 660.468880] ? tls_write_space+0x6a/0x80 [tls] ... tls_push_sg already does a loop over all sending sg's, so ignore any tls_write_space notifications until we are done sending. We then have to call the previous write_space to wake up poll() waiters after we are done with the send loop. Reported-by: Andre Tomt <andre@tomt.net> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01net/tls: Add generic NIC offload infrastructureIlya Lesokhin1-3/+39
This patch adds a generic infrastructure to offload TLS crypto to a network device. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmits the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01net/tls: Split conf to rx + txBoris Pismenny1-52/+51
In TLS inline crypto, we can have one direction in software and another in hardware. Thus, we split the TLS configuration to separate structures for receive and transmit. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tls: support for Inline tls recordAtul Gupta1-3/+111
Facility to register Inline TLS drivers to net/tls. Setup TLS_HW_RECORD prot to listen on offload device. Cases handled - Inline TLS device exists, setup prot for TLS_HW_RECORD - Atleast one Inline TLS exists, sets TLS_HW_RECORD. - If non-inline device establish connection, move to TLS_SW_TX Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: RX path for ktlsDave Watson1-10/+52
Add rx path for tls software implementation. recvmsg, splice_read, and poll implemented. An additional sockopt TLS_RX is added, with the same interface as TLS_TX. Either TLX_RX or TLX_TX may be provided separately, or together (with two different setsockopt calls with appropriate keys). Control messages are passed via CMSG in a similar way to transmit. If no cmsg buffer is passed, then only application data records will be passed to userspace, and EIO is returned for other types of alerts. EBADMSG is passed for decryption errors, and EMSGSIZE is passed for framing too big, and EBADMSG for framing too small (matching openssl semantics). EINVAL is returned for TLS versions that do not match the original setsockopt call. All are unrecoverable. strparser is used to parse TLS framing. Decryption is done directly in to userspace buffers if they are large enough to support it, otherwise sk_cow_data is called (similar to ipsec), and buffers are decrypted in place and copied. splice_read always decrypts in place, since no buffers are provided to decrypt in to. sk_poll is overridden, and only returns POLLIN if a full TLS message is received. Otherwise we wait for strparser to finish reading a full frame. Actual decryption is only done during recvmsg or splice_read calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Refactor variable namesDave Watson1-13/+13
Several config variables are prefixed with tx, drop the prefix since these will be used for both tx and rx. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Move cipher info to a separate structDave Watson1-4/+4
Separate tx crypto parameters to a separate cipher_context struct. The same parameters will be used for rx using the same struct. tls_advance_record_sn is modified to only take the cipher info. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27tls: Use correct sk->sk_prot for IPV6Boris Pismenny1-15/+37
The tls ulp overrides sk->prot with a new tls specific proto structs. The tls specific structs were previously based on the ipv4 specific tcp_prot sturct. As a result, attaching the tls ulp to an ipv6 tcp socket replaced some ipv6 callback with the ipv4 equivalents. This patch adds ipv6 tls proto structs and uses them when attached to ipv6 sockets. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tls: getsockopt return record sequence numberBoris Pismenny1-0/+2
Return the TLS record sequence number in getsockopt. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tls: reset the crypto info if copy_from_user failsBoris Pismenny1-1/+1
copy_from_user could copy some partial information, as a result TLS_CRYPTO_INFO_READY(crypto_info) could be true while crypto_info is using uninitialzed data. This patch resets crypto_info when copy_from_user fails. fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tls: retrun the correct IV in getsockoptBoris Pismenny1-1/+2
Current code returns four bytes of salt followed by four bytes of IV. This patch returns all eight bytes of IV. fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06net: add a UID to use for ULP socket assignmentJohn Fastabend1-0/+2
Create a UID field and enum that can be used to assign ULPs to sockets. This saves a set of string comparisons if the ULP id is known. For sockmap, which is added in the next patches, a ULP is used to hook into TCP sockets close state. In this case the ULP being added is done at map insert time and the ULP is known and done on the kernel side. In this case the named lookup is not needed. Because we don't want to expose psock internals to user space socket options a user visible flag is also added. For TLS this is set for BPF it will be cleared. Alos remove pr_notice, user gets an error code back and should check that rather than rely on logs. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-17tls: reset crypto_info when do_tls_setsockopt_tx failsSabrina Dubroca1-2/+2
The current code copies directly from userspace to ctx->crypto_send, but doesn't always reinitialize it to 0 on failure. This causes any subsequent attempt to use this setsockopt to fail because of the TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually ready. This should result in a correctly set up socket after the 3rd call, but currently it does not: size_t s = sizeof(struct tls12_crypto_info_aes_gcm_128); struct tls12_crypto_info_aes_gcm_128 crypto_good = { .info.version = TLS_1_2_VERSION, .info.cipher_type = TLS_CIPHER_AES_GCM_128, }; struct tls12_crypto_info_aes_gcm_128 crypto_bad_type = crypto_good; crypto_bad_type.info.cipher_type = 42; setsockopt(sock, SOL_TLS, TLS_TX, &crypto_bad_type, s); setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s - 1); setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s); Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17tls: return -EBUSY if crypto_info is already setSabrina Dubroca1-1/+3
do_tls_setsockopt_tx returns 0 without doing anything when crypto_info is already set. Silent failure is confusing for users. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net/tls: Only attach to sockets in ESTABLISHED stateIlya Lesokhin1-0/+9
Calling accept on a TCP socket with a TLS ulp attached results in two sockets that share the same ulp context. The ulp context is freed while a socket is destroyed, so after one of the sockets is released, the second second will trigger a use after free when it tries to access the ulp context attached to it. We restrict the TLS ulp to sockets in ESTABLISHED state to prevent the scenario above. Fixes: 3c4d7559159b ("tls: kernel TLS support") Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14tls: don't override sk_write_space if tls_set_sw_offload fails.Ilya Lesokhin1-3/+2
If we fail to enable tls in the kernel we shouldn't override the sk_write_space callback Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14tls: Avoid copying crypto_info again after cipher_type check.Ilya Lesokhin1-17/+12
Avoid copying crypto_info again after cipher_type check to avoid a TOCTOU exploits. The temporary array on the stack is removed as we don't really need it Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14tls: Fix TLS ulp context leak, when TLS_TX setsockopt is not used.Ilya Lesokhin1-8/+14
Previously the TLS ulp context would leak if we attached a TLS ulp to a socket but did not use the TLS_TX setsockopt, or did use it but it failed. This patch solves the issue by overriding prot[TLS_BASE_TX].close and fixing tls_sk_proto_close to work properly when its called with ctx->tx_conf == TLS_BASE_TX. This patch also removes ctx->free_resources as we can use ctx->tx_conf to obtain the relevant information. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14tls: Add function to update the TLS socket configurationIlya Lesokhin1-14/+32
The tx configuration is now stored in ctx->tx_conf. And sk->sk_prot is updated trough a function This will simplify things when we add rx and support for different possible tx and rx cross configurations. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-06TLS: Fix length check in do_tls_getsockopt_tx()Matthias Rosenfelder1-1/+1
copy_to_user() copies the struct the pointer is pointing to, but the length check compares against sizeof(pointer) and not sizeof(struct). On 32-bit the size is probably the same, so it might have worked accidentally. Signed-off-by: Matthias Rosenfelder <mrosenfelder.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23tls: return -EFAULT if copy_to_user() failsDan Carpenter1-4/+6
The copy_to_user() function returns the number of bytes remaining but we want to return -EFAULT here. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>