aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-10-08seccomp: Make duplicate listener detection non-racyJann Horn1-7/+31
Currently, init_listener() tries to prevent adding a filter with SECCOMP_FILTER_FLAG_NEW_LISTENER if one of the existing filters already has a listener. However, this check happens without holding any lock that would prevent another thread from concurrently installing a new filter (potentially with a listener) on top of the ones we already have. Theoretically, this is also a data race: The plain load from current->seccomp.filter can race with concurrent writes to the same location. Fix it by moving the check into the region that holds the siglock to guard against concurrent TSYNC. (The "Fixes" tag points to the commit that introduced the theoretical data race; concurrent installation of another filter with TSYNC only became possible later, in commit 51891498f2da ("seccomp: allow TSYNC and USER_NOTIF together").) Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Reviewed-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201005014401.490175-1-jannh@google.com
2020-10-08tracing: Fix synthetic print fmt check for use of __get_str()Steven Rostedt (VMware)1-1/+1
A cut and paste error had the check to use __get_str() test "is_dynamic" twice, instead of checking "is_string && is_dynamic". Link: https://lore.kernel.org/r/d34dccd5-96ba-a2d9-46ea-de8807525deb@canonical.com Reported-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08tracing: Remove a pointless assignmentSudip Mukherjee1-1/+0
The variable 'len' has been assigned a value but is not used after that. So, remove the assignement. Link: https://lkml.kernel.org/r/20200930184303.22896-1-sudipm.mukherjee@gmail.com Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08ftrace: Format variable declarations of ftrace_allocate_recordsSteven Rostedt (VMware)1-1/+2
I hate when unrelated variables are declared on the same line. Split them. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08ftrace: Simplify the calculation of page number for ftrace_page->recordsWei Yang1-3/+4
Based on the following two reasones, we could simplify the calculation: - If the number after roundup count is not power of 2, we would definitely have more than 1 empty page with a higher order. - get_count_order() just return current order, so one lower order could meet the requirement. The calculation could be simplified by lower one order level when pages are not power of 2. Link: https://lkml.kernel.org/r/20200831031104.23322-5-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08ftrace: Simplify the hash calculationSteven Rostedt (VMware)1-4/+3
No need to add a check to subtract the number of bits if bits is zero after fls(). Just divide the size by two before calling it. This does give the same answer for size of 0 and 1, but that's fine. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08ftrace: Use fls() to get the bits for dup_hash()Wei Yang1-2/+3
The effect here is to get the number of bits, lets use fls() to do this job. Link: https://lkml.kernel.org/r/20200831031104.23322-3-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08tracing: Change synthetic event string format to limit printed lengthSteven Rostedt (VMware)1-1/+3
Change the format for printing synthetic field strings to limit the length of the string printed even if it's not correctly terminated. Link: https://lore.kernel.org/r/20201002210036.0200371b@oasis.local.home Link: https://lkml.kernel.org/r/b6bdb34e70d970e8026daa3503db6b8e5cdad524.1601848695.git.zanussi@kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08tracing: Add README information for synthetic_events fileTom Zanussi1-1/+6
Add an entry with a basic description of events/synthetic_events along with a simple example. Link: https://lkml.kernel.org/r/3c7f178cf95aaeebc01eda7d95600dd937233eb7.1601848695.git.zanussi@kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-4/+4
Daniel Borkmann says: ==================== pull-request: bpf 2020-10-08 The main changes are: 1) Fix "unresolved symbol" build error under CONFIG_NET w/o CONFIG_INET due to missing tcp_timewait_sock and inet_timewait_sock BTF, from Yonghong Song. 2) Fix 32 bit sub-register bounds tracking for OR case, from Daniel Borkmann. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08bpf: Fix scalar32_min_max_or bounds trackingDaniel Borkmann1-4/+4
Simon reported an issue with the current scalar32_min_max_or() implementation. That is, compared to the other 32 bit subreg tracking functions, the code in scalar32_min_max_or() stands out that it's using the 64 bit registers instead of 32 bit ones. This leads to bounds tracking issues, for example: [...] 8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 8: (79) r1 = *(u64 *)(r0 +0) R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 9: (b7) r0 = 1 10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 10: (18) r2 = 0x600000002 12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 12: (ad) if r1 < r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: (95) exit 14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 14: (25) if r1 > 0x0 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: (95) exit 16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 16: (47) r1 |= 0 17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm [...] The bound tests on the map value force the upper unsigned bound to be 25769803777 in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By using OR they are truncated and thus result in the range [1,1] for the 32 bit reg tracker. This is incorrect given the only thing we know is that the value must be positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes sense, for example, for the case where we update dst_reg->s32_{min,max}_value in the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as we know that these are positive. Previously, in the else branch the 64 bit values of umin_value=1 and umax_value=32212254719 were used and latter got truncated to be 1 as upper bound there. After the fix the subreg range is now correct: [...] 8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 8: (79) r1 = *(u64 *)(r0 +0) R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 9: (b7) r0 = 1 10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 10: (18) r2 = 0x600000002 12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 12: (ad) if r1 < r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: (95) exit 14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 14: (25) if r1 > 0x0 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: (95) exit 16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 16: (47) r1 |= 0 17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm [...] Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Simon Scannell <scannell.smn@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2020-10-07bpf: Fix build failure for kernel/trace/bpf_trace.c with CONFIG_NET=nYonghong Song1-0/+2
When CONFIG_NET is not defined, I hit the following build error: kernel/trace/bpf_trace.o:(.rodata+0x110): undefined reference to `bpf_prog_test_run_raw_tp' Commit 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") added test_run support for raw_tracepoint in /kernel/trace/bpf_trace.c. But the test_run function bpf_prog_test_run_raw_tp is defined in net/bpf/test_run.c, only available with CONFIG_NET=y. Adding a CONFIG_NET guard for .test_run = bpf_prog_test_run_raw_tp; fixed the above build issue. Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201007062933.3425899-1-yhs@fb.com
2020-10-07kernel/bpf/verifier: Fix build when NET is not enabledRandy Dunlap1-0/+4
Fix build errors in kernel/bpf/verifier.c when CONFIG_NET is not enabled. ../kernel/bpf/verifier.c:3995:13: error: ‘btf_sock_ids’ undeclared here (not in a function); did you mean ‘bpf_sock_ops’? .btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], ../kernel/bpf/verifier.c:3995:26: error: ‘BTF_SOCK_TYPE_SOCK_COMMON’ undeclared here (not in a function); did you mean ‘PTR_TO_SOCK_COMMON’? .btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], Fixes: 1df8f55a37bd ("bpf: Enable bpf_skc_to_* sock casting helper to networking prog type") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201007021613.13646-1-rdunlap@infradead.org
2020-10-07cpufreq: schedutil: Simplify sugov_fast_switch()Rafael J. Wysocki1-6/+2
Drop a redundant local variable definition from sugov_fast_switch() and rearrange the code in there to avoid the redundant logical negation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-07dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handlingChristoph Hellwig1-61/+39
Use and entirely separate code path for the DMA_ATTR_NO_KERNEL_MAPPING path. This avoids any confusion about the ret type, and avoids lots of attr checks and helpers that can be significantly simplified now. It also ensures that common handling is applied to architetures still using the arch alloc/free hooks. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07dma-direct: factor out a dma_direct_alloc_from_pool helperChristoph Hellwig1-19/+20
This ensures dma_direct_alloc_pages will use the right gfp mask, as well as keeping the code for that common between the two allocators. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07dma-direct check for highmem pages in dma_direct_alloc_pagesChristoph Hellwig1-0/+11
Check for highmem pages from CMA, just like in the dma_direct_alloc path. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06usermodehelper: reset umask to default before executing user processLinus Torvalds1-0/+9
Kernel threads intentionally do CLONE_FS in order to follow any changes that 'init' does to set up the root directory (or cwd). It is admittedly a bit odd, but it avoids the situation where 'init' does some extensive setup to initialize the system environment, and then we execute a usermode helper program, and it uses the original FS setup from boot time that may be very limited and incomplete. [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will follow the root regardless, since it fixes up other users of root (see chroot_fs_refs() for details), but overmounting root and doing a chroot() would not. ] However, Vegard Nossum noticed that the CLONE_FS not only means that we follow the root and current working directories, it also means we share umask with whatever init changed it to. That wasn't intentional. Just reset umask to the original default (0022) before actually starting the usermode helper program. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-06Merge branch 'irq/qcom-pdc-wakeup' into irq/irqchip-nextMarc Zyngier2-4/+33
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-06genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flagMaulik Shah2-4/+33
An interrupt that is disabled/masked but set for wakeup may still need to be able to wake up the system from sleep states like "suspend to RAM". To that effect, introduce the IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag. If the irqchip have this flag set, the irq PM code will enable/unmask the irqs that are marked for wakeup, but that are in a disabled state. On resume, such irqs will be restored back to their disabled state. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Maulik Shah <mkshah@codeaurora.org> [maz: commit message fix-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/1601267524-20199-4-git-send-email-mkshah@codeaurora.org
2020-10-06dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h>Christoph Hellwig3-3/+1
Move more nitty gritty DMA implementation details into the common internal header. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dmaChristoph Hellwig3-2/+121
Most of the dma_direct symbols should only be used by direct.c and mapping.c, so move them to kernel/dma. In fact more of dma-direct.h should eventually move, but that will require more coordination with other subsystems. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-mapping: move dma-debug.h to kernel/dma/Christoph Hellwig3-3/+125
Most of dma-debug.h is not required by anything outside of kernel/dma. Move the four declarations needed by dma-mappin.h or dma-ops providers into dma-mapping.h and dma-map-ops.h, and move the remainder of the file to kernel/dma/debug.h. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-mapping: remove <asm/dma-contiguous.h>Christoph Hellwig1-1/+5
Just provide a weak default definition of dma_contiguous_early_fixup and let arm override it. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>Christoph Hellwig5-5/+31
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment describing the contiguous allocator into kernel/dma/contigous.c. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-contiguous: remove dma_contiguous_set_defaultChristoph Hellwig1-1/+1
dma_contiguous_set_default contains a trivial assignment, and has a single caller that is compiled if CONFIG_CMA_DMA is enabled. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-contiguous: remove dev_set_cma_areaChristoph Hellwig1-2/+2
dev_set_cma_area contains a trivial assignment. It has just three callers that all have a non-NULL device and depend on CONFIG_DMA_CMA, so remove the wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06dma-mapping: split <linux/dma-mapping.h>Christoph Hellwig6-3/+6
Split out all the bits that are purely for dma_map_ops implementations and related code into a new <linux/dma-map-ops.h> header so that they don't get pulled into all the drivers. That also means the architecture specific <asm/dma-mapping.h> is not pulled in by <linux/dma-mapping.h> any more, which leads to a missing includes that were pulled in by the x86 or arm versions in a few not overly portable drivers. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06cma: decrease CMA_ALIGNMENT lower limit to 2Paul Cercueil1-1/+1
On an embedded system with a tiny (1 MiB) CMA area for video memory, and a simple enough video pipeline, we can decrease the CMA_ALIGNMENT by a factor of 2 to avoid wasting memory, as all the allocations for video buffers will be of the exact same size (dictated by the size of the screen). Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller5-12/+15
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05tracing: Add support for dynamic strings to synthetic eventsTom Zanussi4-38/+241
Currently, sythetic events only support static string fields such as: # echo 'test_latency u64 lat; char somename[32]' > /sys/kernel/debug/tracing/synthetic_events Which is fine, but wastes a lot of space in the event. It also prevents the most commonly-defined strings in the existing trace events e.g. those defined using __string(), from being passed to synthetic events via the trace() action. With this change, synthetic events with dynamic fields can be defined: # echo 'test_latency u64 lat; char somename[]' > /sys/kernel/debug/tracing/synthetic_events And the trace() action can be used to generate events using either dynamic or static strings: # echo 'hist:keys=name:lat=common_timestamp.usecs-$ts0:onmatch(sys.event).test_latency($lat,name)' > /sys/kernel/debug/tracing/events The synthetic event dynamic strings are implemented in the same way as the existing __data_loc strings and appear as such in the format file. [ <rostedt@goodmis.org>: added __set_synth_event_print_fmt() changes: I added the following to make it work with trace-cmd. Dynamic strings must have __get_str() for events in the print_fmt otherwise it can't be parsed correctly. ] Link: https://lore.kernel.org/r/cover.1601588066.git.zanussi@kernel.org Link: https://lkml.kernel.org/r/3ed35b6d0e390f5b94cb4a9ba1cc18f5982ab277.1601848695.git.zanussi@kernel.org Tested-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-05tracing: Save normal string variablesTom Zanussi1-2/+32
String variables created as field variables and save variables are already handled properly by having their values copied when set. The same isn't done for normal variables, but needs to be - simply saving a pointer to a string contained in an old event isn't sufficient, since that event's data may quickly become overwritten and therefore a string pointer to it could yield garbage. This change uses the same mechanism as field variables and simply appends the new strings to the existing per-element field_var_str[] array allocated for that purpose. Link: https://lkml.kernel.org/r/1c1a03798b02e67307412a0c719d1bfb69b13007.1601848695.git.zanussi@kernel.org Fixes: 02205a6752f2 (tracing: Add support for 'field variables') Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-05tracing: Fix parse_synth_field() error handlingTom Zanussi1-2/+4
synth_field_size() returns either a positive size or an error (zero or a negative value). However, the existing code assumes the only error value is 0. It doesn't handle negative error codes, as it assigns directly to field->size (a size_t; unsigned), thereby interpreting the error code as a valid size instead. Do the test before assignment to field->size. [ axelrasmussen@google.com: changelog addition, first paragraph above ] Link: https://lkml.kernel.org/r/9b6946d9776b2eeb43227678158196de1c3c6e1d.1601848695.git.zanussi@kernel.org Fixes: 4b147936fa50 (tracing: Add support for 'synthetic' events) Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-05tracing: Change STR_VAR_MAX_LENTom Zanussi2-1/+3
32 is too small for this value, and anyway it makes more sense to use MAX_FILTER_STR_VAL, as this is also the value used for variable-length __strings. Link: https://lkml.kernel.org/r/6adfd1668ac1fd8670bd58206944a762061a5559.1601848695.git.zanussi@kernel.org Tested-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-06bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMISong Liu2-5/+97
Recent improvements in LOCKDEP highlighted a potential A-A deadlock with pcpu_freelist in NMI: ./tools/testing/selftests/bpf/test_progs -t stacktrace_build_id_nmi [ 18.984807] ================================ [ 18.984807] WARNING: inconsistent lock state [ 18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted [ 18.984809] -------------------------------- [ 18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage. [ 18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes: [ 18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at: __pcpu_freelist_pop+0xe3/0x180 [ 18.984813] {INITIAL USE} state was registered at: [ 18.984814] lock_acquire+0x175/0x7c0 [ 18.984814] _raw_spin_lock+0x2c/0x40 [ 18.984815] __pcpu_freelist_pop+0xe3/0x180 [ 18.984815] pcpu_freelist_pop+0x31/0x40 [ 18.984816] htab_map_alloc+0xbbf/0xf40 [ 18.984816] __do_sys_bpf+0x5aa/0x3ed0 [ 18.984817] do_syscall_64+0x2d/0x40 [ 18.984818] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 18.984818] irq event stamp: 12 [...] [ 18.984822] other info that might help us debug this: [ 18.984823] Possible unsafe locking scenario: [ 18.984823] [ 18.984824] CPU0 [ 18.984824] ---- [ 18.984824] lock(&head->lock); [ 18.984826] <Interrupt> [ 18.984826] lock(&head->lock); [ 18.984827] [ 18.984828] *** DEADLOCK *** [ 18.984828] [ 18.984829] 2 locks held by test_progs/1990: [...] [ 18.984838] <NMI> [ 18.984838] dump_stack+0x9a/0xd0 [ 18.984839] lock_acquire+0x5c9/0x7c0 [ 18.984839] ? lock_release+0x6f0/0x6f0 [ 18.984840] ? __pcpu_freelist_pop+0xe3/0x180 [ 18.984840] _raw_spin_lock+0x2c/0x40 [ 18.984841] ? __pcpu_freelist_pop+0xe3/0x180 [ 18.984841] __pcpu_freelist_pop+0xe3/0x180 [ 18.984842] pcpu_freelist_pop+0x17/0x40 [ 18.984842] ? lock_release+0x6f0/0x6f0 [ 18.984843] __bpf_get_stackid+0x534/0xaf0 [ 18.984843] bpf_prog_1fd9e30e1438d3c5_oncpu+0x73/0x350 [ 18.984844] bpf_overflow_handler+0x12f/0x3f0 This is because pcpu_freelist_head.lock is accessed in both NMI and non-NMI context. Fix this issue by using raw_spin_trylock() in NMI. Since NMI interrupts non-NMI context, when NMI context tries to lock the raw_spinlock, non-NMI context of the same CPU may already have locked a lock and is blocked from unlocking the lock. For a system with N CPUs, there could be N NMIs at the same time, and they may block N non-NMI raw_spinlocks. This is tricky for pcpu_freelist_push(), where unlike _pop(), failing _push() means leaking memory. This issue is more likely to trigger in non-SMP system. Fix this issue with an extra list, pcpu_freelist.extralist. The extralist is primarily used to take _push() when raw_spin_trylock() failed on all the per CPU lists. It should be empty most of the time. The following table summarizes the behavior of pcpu_freelist in NMI and non-NMI: non-NMI pop(): use _lock(); check per CPU lists first; if all per CPU lists are empty, check extralist; if extralist is empty, return NULL. non-NMI push(): use _lock(); only push to per CPU lists. NMI pop(): use _trylock(); check per CPU lists first; if all per CPU lists are locked or empty, check extralist; if extralist is locked or empty, return NULL. NMI push(): use _trylock(); check per CPU lists first; if all per CPU lists are locked; try push to extralist; if extralist is also locked, keep trying on per CPU lists. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201005165838.3735218-1-songliubraving@fb.com
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-3/+3
Pull networking fixes from David Miller: 1) Make sure SKB control block is in the proper state during IPSEC ESP-in-TCP encapsulation. From Sabrina Dubroca. 2) Various kinds of attributes were not being cloned properly when we build new xfrm_state objects from existing ones. Fix from Antony Antony. 3) Make sure to keep BTF sections, from Tony Ambardar. 4) TX DMA channels need proper locking in lantiq driver, from Hauke Mehrtens. 5) Honour route MTU during forwarding, always. From Maciej Żenczykowski. 6) Fix races in kTLS which can result in crashes, from Rohit Maheshwari. 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan Jha. 8) Use correct address family in xfrm state lookups, from Herbert Xu. 9) A bridge FDB flush should not clear out user managed fdb entries with the ext_learn flag set, from Nikolay Aleksandrov. 10) Fix nested locking of netdev address lists, from Taehee Yoo. 11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau. 12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit. 13) Don't free command entries in mlx5 while comp handler could still be running, from Eran Ben Elisha. 14) Error flow of request_irq() in mlx5 is busted, due to an off by one we try to free and IRQ never allocated. From Maor Gottlieb. 15) Fix leak when dumping netlink policies, from Johannes Berg. 16) Sendpage cannot be performed when a page is a slab page, or the page count is < 1. Some subsystems such as nvme were doing so. Create a "sendpage_ok()" helper and use it as needed, from Coly Li. 17) Don't leak request socket when using syncookes with mptcp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net/core: check length before updating Ethertype in skb_mpls_{push,pop} net: mvneta: fix double free of txq->buf net_sched: check error pointer in tcf_dump_walker() net: team: fix memory leak in __team_options_register net: typhoon: Fix a typo Typoon --> Typhoon net: hinic: fix DEVLINK build errors net: stmmac: Modify configuration method of EEE timers tcp: fix syn cookied MPTCP request socket leak libceph: use sendpage_ok() in ceph_tcp_sendpage() scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map() drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage() tcp: use sendpage_ok() to detect misused .sendpage nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send net: introduce helper sendpage_ok() in include/linux/net.h net: usb: pegasus: Proper error handing when setting pegasus' MAC address net: core: document two new elements of struct net_device netlink: fix policy dump leak net/mlx5e: Fix race condition on nhe->n pointer in neigh update net/mlx5e: Fix VLAN create flow ...
2020-10-05PM: hibernate: remove the bogus call to get_gendisk() in software_resume()Christoph Hellwig1-11/+0
get_gendisk grabs a reference on the disk and file operation, so this code will leak both of them while having absolutely no use for the gendisk itself. This effectively reverts commit 2df83fa4bce421f ("PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-05block: add a bdget_part helperChristoph Hellwig1-7/+2
All remaining callers of bdget() outside of fs/block_dev.c want to get a reference to the struct block_device for a given struct hd_struct. Add a helper just for that and then mark bdget static. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-05printk: Use fallthrough pseudo-keywordGustavo A. R. Silva1-1/+1
Replace /* FALL THRU */ comment with the new pseudo-keyword macro fallthrough[1]. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20201002224627.GA30475@embeddedor
2020-10-05bpf, verifier: Use fallthrough pseudo-keywordGustavo A. R. Silva1-2/+2
Replace /* fallthrough */ comments with the new pseudo-keyword macro fallthrough [1]. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201002234217.GA12280@embeddedor
2020-10-05cpufreq: Move traces and update to policy->cur to cpufreq coreViresh Kumar1-11/+1
The cpufreq core handles the updates to policy->cur and recording of cpufreq trace events for all the governors except schedutil's fast switch case. Move that as well to cpufreq core for consistency and readability. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-05fs/kernel_file_read: Add "offset" arg for partial readsKees Cook2-3/+3
To perform partial reads, callers of kernel_read_file*() must have a non-NULL file_size argument and a preallocated buffer. The new "offset" argument can then be used to seek to specific locations in the file to fill the buffer to, at most, "buf_size" per call. Where possible, the LSM hooks can report whether a full file has been read or not so that the contents can be reasoned about. Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201002173828.2099543-14-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05module: Call security_kernel_post_load_data()Kees Cook1-4/+10
Now that there is an API for checking loaded contents for modules loaded without a file, call into the LSM hooks. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: KP Singh <kpsingh@google.com> Acked-by: Jessica Yu <jeyu@kernel.org> Link: https://lore.kernel.org/r/20201002173828.2099543-11-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05LSM: Introduce kernel_post_load_data() hookKees Cook2-2/+2
There are a few places in the kernel where LSMs would like to have visibility into the contents of a kernel buffer that has been loaded or read. While security_kernel_post_read_file() (which includes the buffer) exists as a pairing for security_kernel_read_file(), no such hook exists to pair with security_kernel_load_data(). Earlier proposals for just using security_kernel_post_read_file() with a NULL file argument were rejected (i.e. "file" should always be valid for the security_..._file hooks, but it appears at least one case was left in the kernel during earlier refactoring. (This will be fixed in a subsequent patch.) Since not all cases of security_kernel_load_data() can have a single contiguous buffer made available to the LSM hook (e.g. kexec image segments are separately loaded), there needs to be a way for the LSM to reason about its expectations of the hook coverage. In order to handle this, add a "contents" argument to the "kernel_load_data" hook that indicates if the newly added "kernel_post_load_data" hook will be called with the full contents once loaded. That way, LSMs requiring full contents can choose to unilaterally reject "kernel_load_data" with contents=false (which is effectively the existing hook coverage), but when contents=true they can allow it and later evaluate the "kernel_post_load_data" hook once the buffer is loaded. With this change, LSMs can gain coverage over non-file-backed data loads (e.g. init_module(2) and firmware userspace helper), which will happen in subsequent patches. Additionally prepare IMA to start processing these cases. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05fs/kernel_read_file: Add file_size output argumentKees Cook2-3/+3
In preparation for adding partial read support, add an optional output argument to kernel_read_file*() that reports the file size so callers can reason more easily about their reading progress. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Link: https://lore.kernel.org/r/20201002173828.2099543-8-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05fs/kernel_read_file: Remove redundant size argumentKees Cook2-11/+10
In preparation for refactoring kernel_read_file*(), remove the redundant "size" argument which is not needed: it can be included in the return code, with callers adjusted. (VFS reads already cannot be larger than INT_MAX.) Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Link: https://lore.kernel.org/r/20201002173828.2099543-6-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05fs/kernel_read_file: Split into separate include fileScott Branden2-0/+2
Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h include file. That header gets pulled in just about everywhere and doesn't really need functions not related to the general fs interface. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05fs/kernel_read_file: Remove FIRMWARE_PREALLOC_BUFFER enumKees Cook1-1/+1
FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs that are interested in filtering between types of things. The "how" should be an internal detail made uninteresting to the LSMs. Fixes: a098ecd2fa7d ("firmware: support loading into a pre-allocated buffer") Fixes: fd90bc559bfb ("ima: based on policy verify firmware signatures (pre-allocated buffer)") Fixes: 4f0496d8ffa3 ("ima: based on policy warn about loading firmware (pre-allocated buffer)") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Scott Branden <scott.branden@broadcom.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201002173828.2099543-2-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05tracing: Add trace_export support for trace_markerTingwei Zhang1-0/+9
Add the support to route trace_marker buffer to other destination via trace_export. Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20201005071319.78508-5-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05tracing: Add trace_export support for event traceTingwei Zhang1-125/+134
Only function traces can be exported to other destinations currently. This patch exports event trace as well. Move trace export related function to the beginning of file so other trace can call trace_process_export() to export. Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20201005071319.78508-4-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>