<feed xmlns='http://www.w3.org/2005/Atom'>
<title>wireguard-linux/kernel/bpf, branch jd/unified-crypt-queue</title>
<subtitle>WireGuard for the Linux kernel</subtitle>
<id>https://git.zx2c4.com/wireguard-linux/atom/kernel/bpf?h=jd%2Funified-crypt-queue</id>
<link rel='self' href='https://git.zx2c4.com/wireguard-linux/atom/kernel/bpf?h=jd%2Funified-crypt-queue'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/'/>
<updated>2020-04-14T19:40:06Z</updated>
<entry>
<title>bpf: remove unneeded conversion to bool in __mark_reg_unknown</title>
<updated>2020-04-14T19:40:06Z</updated>
<author>
<name>Zou Wei</name>
<email>zou_wei@huawei.com</email>
</author>
<published>2020-04-13T11:57:56Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=89f33dcadb349eb926a92633e2c5f61466afc596'/>
<id>urn:sha1:89f33dcadb349eb926a92633e2c5f61466afc596</id>
<content type='text'>
This issue was detected by using the Coccinelle software:

  kernel/bpf/verifier.c:1259:16-21: WARNING: conversion to bool not needed here

The conversion to bool is unneeded, remove it.

Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Zou Wei &lt;zou_wei@huawei.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/1586779076-101346-1-git-send-email-zou_wei@huawei.com
</content>
</entry>
<entry>
<title>bpf: Prevent re-mmap()'ing BPF map as writable for initially r/o mapping</title>
<updated>2020-04-14T19:28:57Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andriin@fb.com</email>
</author>
<published>2020-04-10T20:26:12Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=1f6cb19be2e231fe092f40decb71f066eba090d7'/>
<id>urn:sha1:1f6cb19be2e231fe092f40decb71f066eba090d7</id>
<content type='text'>
VM_MAYWRITE flag during initial memory mapping determines if already mmap()'ed
pages can be later remapped as writable ones through mprotect() call. To
prevent user application to rewrite contents of memory-mapped as read-only and
subsequently frozen BPF map, remove VM_MAYWRITE flag completely on initially
read-only mapping.

Alternatively, we could treat any memory-mapping on unfrozen map as writable
and bump writecnt instead. But there is little legitimate reason to map
BPF map as read-only and then re-mmap() it as writable through mprotect(),
instead of just mmap()'ing it as read/write from the very beginning.

Also, at the suggestion of Jann Horn, drop unnecessary refcounting in mmap
operations. We can just rely on VMA holding reference to BPF map's file
properly.

Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jann Horn &lt;jannh@google.com&gt;
Link: https://lore.kernel.org/bpf/20200410202613.3679837-1-andriin@fb.com
</content>
</entry>
<entry>
<title>bpf: Fix a typo "inacitve" -&gt; "inactive"</title>
<updated>2020-04-06T19:54:10Z</updated>
<author>
<name>Qiujun Huang</name>
<email>hqjagain@gmail.com</email>
</author>
<published>2020-04-03T08:07:34Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=0ac16296ffc638f5163f9aeeeb1fe447268e449f'/>
<id>urn:sha1:0ac16296ffc638f5163f9aeeeb1fe447268e449f</id>
<content type='text'>
There is a typo in struct bpf_lru_list's next_inactive_rotation
description, thus fix s/inacitve/inactive/.

Signed-off-by: Qiujun Huang &lt;hqjagain@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/1585901254-30377-1-git-send-email-hqjagain@gmail.com
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2020-03-31T02:52:37Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-03-31T02:52:37Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=ed52f2c608c9451fa2bad298b2ab927416105d65'/>
<id>urn:sha1:ed52f2c608c9451fa2bad298b2ab927416105d65</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: Implement bpf_prog replacement for an active bpf_cgroup_link</title>
<updated>2020-03-31T00:36:33Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andriin@fb.com</email>
</author>
<published>2020-03-30T02:59:59Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=0c991ebc8c69d29b7fc44db17075c5aa5253e2ab'/>
<id>urn:sha1:0c991ebc8c69d29b7fc44db17075c5aa5253e2ab</id>
<content type='text'>
Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.

For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, no changes are
performed, old bpf_link stays intact and attached, operation returns
a failure.

cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.

Signed-off-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com
</content>
</entry>
<entry>
<title>bpf: Implement bpf_link-based cgroup BPF program attachment</title>
<updated>2020-03-31T00:35:59Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andriin@fb.com</email>
</author>
<published>2020-03-30T02:59:58Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=af6eea57437a830293eab56246b6025cc7d46ee7'/>
<id>urn:sha1:af6eea57437a830293eab56246b6025cc7d46ee7</id>
<content type='text'>
Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. Cgroup bpf_link supports only
BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
attachments can be freely intermixed.

To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.

The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.

Signed-off-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Roman Gushchin &lt;guro@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com
</content>
</entry>
<entry>
<title>bpf: Verifier, refine 32bit bound in do_refine_retval_range</title>
<updated>2020-03-30T22:00:30Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-03-30T21:36:59Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=fa123ac022e425becce11f1a6c7ee4d283f75a90'/>
<id>urn:sha1:fa123ac022e425becce11f1a6c7ee4d283f75a90</id>
<content type='text'>
Further refine return values range in do_refine_retval_range by noting
these are int return types (We will assume here that int is a 32-bit type).

Two reasons to pull this out of original patch. First it makes the original
fix impossible to backport. And second I've not seen this as being problematic
in practice unlike the other case.

Fixes: 849fa50662fbc ("bpf/verifier: refine retval R0 state for bpf_get_stack helper")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/158560421952.10843.12496354931526965046.stgit@john-Precision-5820-Tower
</content>
</entry>
<entry>
<title>bpf: Verifier, do explicit ALU32 bounds tracking</title>
<updated>2020-03-30T21:59:53Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-03-30T21:36:39Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=3f50f132d8400e129fc9eb68b5020167ef80a244'/>
<id>urn:sha1:3f50f132d8400e129fc9eb68b5020167ef80a244</id>
<content type='text'>
It is not possible for the current verifier to track ALU32 and JMP ops
correctly. This can result in the verifier aborting with errors even though
the program should be verifiable. BPF codes that hit this can work around
it by changin int variables to 64-bit types, marking variables volatile,
etc. But this is all very ugly so it would be better to avoid these tricks.

But, the main reason to address this now is do_refine_retval_range() was
assuming return values could not be negative. Once we fixed this code that
was previously working will no longer work. See do_refine_retval_range()
patch for details. And we don't want to suddenly cause programs that used
to work to fail.

The simplest example code snippet that illustrates the problem is likely
this,

 53: w8 = w0                    // r8 &lt;- [0, S32_MAX],
                                // w8 &lt;- [-S32_MIN, X]
 54: w8 &lt;s 0                    // r8 &lt;- [0, U32_MAX]
                                // w8 &lt;- [0, X]

The expected 64-bit and 32-bit bounds after each line are shown on the
right. The current issue is without the w* bounds we are forced to use
the worst case bound of [0, U32_MAX]. To resolve this type of case,
jmp32 creating divergent 32-bit bounds from 64-bit bounds, we add explicit
32-bit register bounds s32_{min|max}_value and u32_{min|max}_value. Then
from branch_taken logic creating new bounds we can track 32-bit bounds
explicitly.

The next case we observed is ALU ops after the jmp32,

 53: w8 = w0                    // r8 &lt;- [0, S32_MAX],
                                // w8 &lt;- [-S32_MIN, X]
 54: w8 &lt;s 0                    // r8 &lt;- [0, U32_MAX]
                                // w8 &lt;- [0, X]
 55: w8 += 1                    // r8 &lt;- [0, U32_MAX+1]
                                // w8 &lt;- [0, X+1]

In order to keep the bounds accurate at this point we also need to track
ALU32 ops. To do this we add explicit ALU32 logic for each of the ALU
ops, mov, add, sub, etc.

Finally there is a question of how and when to merge bounds. The cases
enumerate here,

1. MOV ALU32   - zext 32-bit -&gt; 64-bit
2. MOV ALU64   - copy 64-bit -&gt; 32-bit
3. op  ALU32   - zext 32-bit -&gt; 64-bit
4. op  ALU64   - n/a
5. jmp ALU32   - 64-bit: var32_off | upper_32_bits(var64_off)
6. jmp ALU64   - 32-bit: (&gt;&gt; (&lt;&lt; var64_off))

Details for each case,

For "MOV ALU32" BPF arch zero extends so we simply copy the bounds
from 32-bit into 64-bit ensuring we truncate var_off and 64-bit
bounds correctly. See zext_32_to_64.

For "MOV ALU64" copy all bounds including 32-bit into new register. If
the src register had 32-bit bounds the dst register will as well.

For "op ALU32" zero extend 32-bit into 64-bit the same as move,
see zext_32_to_64.

For "op ALU64" calculate both 32-bit and 64-bit bounds no merging
is done here. Except we have a special case. When RSH or ARSH is
done we can't simply ignore shifting bits from 64-bit reg into the
32-bit subreg. So currently just push bounds from 64-bit into 32-bit.
This will be correct in the sense that they will represent a valid
state of the register. However we could lose some accuracy if an
ARSH is following a jmp32 operation. We can handle this special
case in a follow up series.

For "jmp ALU32" mark 64-bit reg unknown and recalculate 64-bit bounds
from tnum by setting var_off to ((&lt;&lt;(&gt;&gt;var_off)) | var32_off). We
special case if 64-bit bounds has zero'd upper 32bits at which point
we can simply copy 32-bit bounds into 64-bit register. This catches
a common compiler trick where upper 32-bits are zeroed and then
32-bit ops are used followed by a 64-bit compare or 64-bit op on
a pointer. See __reg_combine_64_into_32().

For "jmp ALU64" cast the bounds of the 64bit to their 32-bit
counterpart. For example s32_min_value = (s32)reg-&gt;smin_value. For
tnum use only the lower 32bits via, (&gt;&gt;(&lt;&lt;var_off)). See
__reg_combine_64_into_32().

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/158560419880.10843.11448220440809118343.stgit@john-Precision-5820-Tower
</content>
</entry>
<entry>
<title>bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly</title>
<updated>2020-03-30T21:44:15Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-03-30T21:36:19Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=100605035e151c187360670c1776fb684808f145'/>
<id>urn:sha1:100605035e151c187360670c1776fb684808f145</id>
<content type='text'>
do_refine_retval_range() is called to refine return values from specified
helpers, probe_read_str and get_stack at the moment, the reasoning is
because both have a max value as part of their input arguments and
because the helper ensure the return value will not be larger than this
we can set smax values of the return register, r0.

However, the return value is a signed integer so setting umax is incorrect
It leads to further confusion when the do_refine_retval_range() then calls,
__reg_deduce_bounds() which will see a umax value as meaning the value is
unsigned and then assuming it is unsigned set the smin = umin which in this
case results in 'smin = 0' and an 'smax = X' where X is the input argument
from the helper call.

Here are the comments from _reg_deduce_bounds() on why this would be safe
to do.

 /* Learn sign from unsigned bounds.  Signed bounds cross the sign
  * boundary, so we must be careful.
  */
 if ((s64)reg-&gt;umax_value &gt;= 0) {
	/* Positive.  We can't learn anything from the smin, but smax
	 * is positive, hence safe.
	 */
	reg-&gt;smin_value = reg-&gt;umin_value;
	reg-&gt;smax_value = reg-&gt;umax_value = min_t(u64, reg-&gt;smax_value,
						  reg-&gt;umax_value);

But now we incorrectly have a return value with type int with the
signed bounds (0,X). Suppose the return value is negative, which is
possible the we have the verifier and reality out of sync. Among other
things this may result in any error handling code being falsely detected
as dead-code and removed. For instance the example below shows using
bpf_probe_read_str() causes the error path to be identified as dead
code and removed.

&gt;From the 'llvm-object -S' dump,

 r2 = 100
 call 45
 if r0 s&lt; 0 goto +4
 r4 = *(u32 *)(r7 + 0)

But from dump xlate

  (b7) r2 = 100
  (85) call bpf_probe_read_compat_str#-96768
  (61) r4 = *(u32 *)(r7 +0)  &lt;-- dropped if goto

Due to verifier state after call being

 R0=inv(id=0,umax_value=100,var_off=(0x0; 0x7f))

To fix omit setting the umax value because its not safe. The only
actual bounds we know is the smax. This results in the correct bounds
(SMIN, X) where X is the max length from the helper. After this the
new verifier state looks like the following after call 45.

R0=inv(id=0,smax_value=100)

Then xlated version no longer removed dead code giving the expected
result,

  (b7) r2 = 100
  (85) call bpf_probe_read_compat_str#-96768
  (c5) if r0 s&lt; 0x0 goto pc+4
  (61) r4 = *(u32 *)(r7 +0)

Note, bpf_probe_read_* calls are root only so we wont hit this case
with non-root bpf users.

v3: comment had some documentation about meta set to null case which
is not relevant here and confusing to include in the comment.

v2 note: In original version we set msize_smax_value from check_func_arg()
and propagated this into smax of retval. The logic was smax is the bound
on the retval we set and because the type in the helper is ARG_CONST_SIZE
we know that the reg is a positive tnum_const() so umax=smax. Alexei
pointed out though this is a bit odd to read because the register in
check_func_arg() has a C type of u32 and the umax bound would be the
normally relavent bound here. Pulling in extra knowledge about future
checks makes reading the code a bit tricky. Further having a signed
meta data that can only ever be positive is also a bit odd. So dropped
the msize_smax_value metadata and made it a u64 msize_max_value to
indicate its unsigned. And additionally save bound from umax value in
check_arg_funcs which is the same as smax due to as noted above tnumx_cont
and negative check but reads better. By my analysis nothing functionally
changes in v2 but it does get easier to read so that is win.

Fixes: 849fa50662fbc ("bpf/verifier: refine retval R0 state for bpf_get_stack helper")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/158560417900.10843.14351995140624628941.stgit@john-Precision-5820-Tower
</content>
</entry>
<entry>
<title>bpf: btf: Fix arg verification in btf_ctx_access()</title>
<updated>2020-03-30T20:28:02Z</updated>
<author>
<name>KP Singh</name>
<email>kpsingh@google.com</email>
</author>
<published>2020-03-30T14:42:46Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=f50b49a0bfcaf53e6394a873b588bc4cca2aab78'/>
<id>urn:sha1:f50b49a0bfcaf53e6394a873b588bc4cca2aab78</id>
<content type='text'>
The bounds checking for the arguments accessed in the BPF program breaks
when the expected_attach_type is not BPF_TRACE_FEXIT, BPF_LSM_MAC or
BPF_MODIFY_RETURN resulting in no check being done for the default case
(the programs which do not receive the return value of the attached
function in its arguments) when the index of the argument being accessed
is equal to the number of arguments (nr_args).

This was a result of a misplaced "else if" block  introduced by the
Commit 6ba43b761c41 ("bpf: Attachment verification for
BPF_MODIFY_RETURN")

Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: KP Singh &lt;kpsingh@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200330144246.338-1-kpsingh@chromium.org
</content>
</entry>
</feed>
