aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-31kdb: Gid rid of implicit setting of the current task / regsDouglas Anderson3-9/+2
Some (but not all?) of the kdb backtrace paths would cause the kdb_current_task and kdb_current_regs to remain changed. As discussed in a review of a previous patch [1], this doesn't seem intuitive, so let's fix that. ...but, it turns out that there's actually no longer any reason to set the current task / current regs while backtracing anymore anyway. As of commit 2277b492582d ("kdb: Fix stack crawling on 'running' CPUs that aren't the master") if we're backtracing on a task running on a CPU we ask that CPU to do the backtrace itself. Linux can do that without anything fancy. If we're doing backtrace on a sleeping task we can also do that fine without updating globals. So this patch mostly just turns into deleting a bunch of code. [1] https://lore.kernel.org/r/20191010150735.dhrj3pbjgmjrdpwr@holly.lan Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191109111624.4.Ibc3d982bbeb9e46872d43973ba808cd4c79537c7@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: kdb_current_task shouldn't be exportedDouglas Anderson1-1/+0
The kdb_current_task variable has been declared in "kernel/debug/kdb/kdb_private.h" since 2010 when kdb was added to the mainline kernel. This is not a public header. There should be no reason that kdb_current_task should be exported and there are no in-kernel users that need it. Remove the export. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191109111623.3.I14b22b5eb15ca8f3812ab33e96621231304dc1f7@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31kdb: kdb_current_regs should be privateDouglas Anderson1-0/+1
As of the patch ("MIPS: kdb: Remove old workaround for backtracing on other CPUs") there is no reason for kdb_current_regs to be in the public "kdb.h". Let's move it next to kdb_current_task. Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20191109111623.2.Iadbfb484e90b557cc4b5ac9890bfca732cd99d77@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-30cgroup: init_tasks shouldn't be linked to the root cgroupTejun Heo1-5/+8
5153faac18d2 ("cgroup: remove cgroup_enable_task_cg_lists() optimization") removed lazy initialization of css_sets so that new tasks are always lniked to its css_set. In the process, it incorrectly ended up adding init_tasks to root css_set. They show up as PID 0's in root's cgroup.procs triggering warnings in systemd and generally confusing people. Fix it by skip css_set linking for init_tasks. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: https://github.com/joanbm Link: https://github.com/systemd/systemd/issues/14682 Fixes: 5153faac18d2 ("cgroup: remove cgroup_enable_task_cg_lists() optimization") Cc: stable@vger.kernel.org # v5.5+
2020-01-30tracing: Move tracing selftests to bottom of menuSteven Rostedt (VMware)1-84/+84
Move all the tracing selftest configs to the bottom of the tracing menu. There's no reason for them to be interspersed throughout. Also, move the bootconfig menu to the top. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Move mmio tracer config up with the other tracersSteven Rostedt (VMware)1-13/+13
Move the config that enables the mmiotracer with the other tracers such that all the tracers are together. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Move tracing test module configs togetherSteven Rostedt (VMware)1-11/+11
The MMIO test module was by itself, move it to the other test modules. Also, add the text "Test module" to PREEMPTIRQ_DELAY_TEST as that create a test module as well. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Move all function tracing configs togetherSteven Rostedt (VMware)1-71/+71
The features that depend on the function tracer were spread out through the tracing menu, pull them together as it is easier to manage. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add kprobe event command generation test moduleTom Zanussi3-0/+238
Add a test module that checks the basic functionality of the in-kernel kprobe event command generation API by creating kprobe events from a module. Link: http://lkml.kernel.org/r/97e502b204f9dba948e3fa3a4315448298218787.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Change trace_boot to use kprobe_event interfaceTom Zanussi2-27/+17
Have trace_boot_add_kprobe_event() use the kprobe_event interface. Also, rename kprobe_event_run_cmd() to kprobe_event_run_command() now that trace_boot's version is gone. Link: http://lkml.kernel.org/r/af5429d11291ab1e9a85a0ff944af3b2bcf193c7.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add kprobe event command generation functionsTom Zanussi1-0/+161
Add functions used to generate kprobe event commands, built on top of the dynevent_cmd interface. kprobe_event_gen_cmd_start() is used to create a kprobe event command using a variable arg list, and kretprobe_event_gen_cmd_start() does the same for kretprobe event commands. kprobe_event_add_fields() can be used to add single fields one by one or as a group. Once all desired fields are added, kprobe_event_gen_cmd_end() or kretprobe_event_gen_cmd_end() respectively are used to actually execute the command and create the event. Link: http://lkml.kernel.org/r/95cc4696502bb6017f9126f306a45ad19b4cc14f.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add synth event generation test moduleTom Zanussi3-0/+537
Add a test module that checks the basic functionality of the in-kernel synthetic event generation API by generating and tracing synthetic events from a module. Link: http://lkml.kernel.org/r/fcb4dd9eb9eefb70ab20538d3529d51642389664.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add synth_event_trace() and related functionsTom Zanussi1-0/+463
Add an exported function named synth_event_trace(), allowing modules or other kernel code to trace synthetic events. Also added are several functions that allow the same functionality to be broken out in a piecewise fashion, which are useful in situations where tracing an event from a full array of values would be cumbersome. Those functions are synth_event_trace_start/end() and synth_event_add_(next)_val(). Link: http://lkml.kernel.org/r/7a84de5f1854acf4144b57efe835ca645afa764f.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add synthetic event command generation functionsTom Zanussi1-4/+375
Add functions used to generate synthetic event commands, built on top of the dynevent_cmd interface. synth_event_gen_cmd_start() is used to create a synthetic event command using a variable arg list and synth_event_gen_cmd_array_start() does the same thing but using an array of field descriptors. synth_event_add_field(), synth_event_add_field_str() and synth_event_add_fields() can be used to add single fields one by one or as a group. Once all desired fields are added, synth_event_gen_cmd_end() is used to actually execute the command and create the event. synth_event_create() does everything, including creating the event, in a single call. Link: http://lkml.kernel.org/r/38fef702fad5ef208009f459552f34a94befd860.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add dynamic event command creation interfaceTom Zanussi2-0/+273
Add an interface used to build up dynamic event creation commands, such as synthetic and kprobe events. Interfaces specific to those particular types of events and others can be built on top of this interface. Command creation is started by first using the dynevent_cmd_init() function to initialize the dynevent_cmd object. Following that, args are appended and optionally checked by the dynevent_arg_add() and dynevent_arg_pair_add() functions, which use objects representing arguments and pairs of arguments, initialized respectively by dynevent_arg_init() and dynevent_arg_pair_init(). Finally, once all args have been successfully added, the command is finalized and actually created using dynevent_create(). The code here for actually printing into the dyn_event->cmd buffer using snprintf() etc was adapted from v4 of Masami's 'tracing/boot: Add synthetic event support' patch. Link: http://lkml.kernel.org/r/1f65fa44390b6f238f6036777c3784ced1dcc6a0.1580323897.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add synth_event_delete()Tom Zanussi1-16/+41
create_or_delete_synth_event() contains code to delete a synthetic event, which would be useful on its own - specifically, it would be useful to allow event-creating modules to call it separately. Separate out the delete code from that function and create an exported function named synth_event_delete(). Link: http://lkml.kernel.org/r/050db3b06df7f0a4b8a2922da602d1d879c7c1c2.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add trace_get/put_event_file()Tom Zanussi1-0/+85
Add a function to get an event file and prevent it from going away on module or instance removal. trace_get_event_file() will find an event file in a given instance (if instance is NULL, it assumes the top trace array) and return it, pinning the instance's trace array as well as the event's module, if applicable, so they won't go away while in use. trace_put_event_file() does the matching release. Link: http://lkml.kernel.org/r/bb31ac4bdda168d5ed3c4b5f5a4c8f633e8d9118.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> [ Moved trace_array_put() to end of trace_put_event_file() ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add trace_array_find/_get() to find instance trace arraysTom Zanussi2-10/+35
Add a new trace_array_find() function that can be used to find a trace array given the instance name, and replace existing code that does the same thing with it. Also add trace_array_find_get() which does the same but returns the trace array after upping its refcount. Also make both available for use outside of trace.c. Link: http://lkml.kernel.org/r/cb68528c975eba95bee4561ac67dd1499423b2e5.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30trigger_next should increase position indexVasily Averin1-2/+3
if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. Without patch: # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist # Available triggers: # traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist 6+1 records in 6+1 records out 206 bytes copied, 0.00027916 s, 738 kB/s Notice the printing of "# Available triggers:..." after the line. With the patch: # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist 2+1 records in 2+1 records out 88 bytes copied, 0.000526867 s, 167 kB/s It only prints the end of the file, and does not restart. Link: http://lkml.kernel.org/r/3c35ee24-dd3a-8119-9c19-552ed253388a@virtuozzo.com https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: eval_map_next() should always increase position indexVasily Averin1-3/+1
if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. Link: http://lkml.kernel.org/r/7ad85b22-1866-977c-db17-88ac438bc764@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> [ This is not a bug fix, it just makes it "technically correct" which is why I applied it. NULL is only returned on an anomaly which triggers a WARN_ON ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30ftrace: fpid_next() should increase position indexVasily Averin1-2/+3
if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. Without patch: # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset id no pid 2+1 records in 2+1 records out 10 bytes copied, 0.000213285 s, 46.9 kB/s Notice the "id" followed by "no pid". With the patch: # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset id 0+1 records in 0+1 records out 3 bytes copied, 0.000202112 s, 14.8 kB/s Notice that it only prints "id" and not the "no pid" afterward. Link: http://lkml.kernel.org/r/4f87c6ad-f114-30bb-8506-c32274ce2992@virtuozzo.com https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Fix sched switch start/stop refcount racy updatesMathieu Desnoyers1-1/+3
Reading the sched_cmdline_ref and sched_tgid_ref initial state within tracing_start_sched_switch without holding the sched_register_mutex is racy against concurrent updates, which can lead to tracepoint probes being registered more than once (and thus trigger warnings within tracepoint.c). [ May be the fix for this bug ] Link: https://lore.kernel.org/r/000000000000ab6f84056c786b93@google.com Link: http://lkml.kernel.org/r/20190817141208.15226-1-mathieu.desnoyers@efficios.com Cc: stable@vger.kernel.org CC: Steven Rostedt (VMware) <rostedt@goodmis.org> CC: Joel Fernandes (Google) <joel@joelfernandes.org> CC: Peter Zijlstra <peterz@infradead.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul E. McKenney <paulmck@linux.ibm.com> Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-29Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-2/+2
Pull mmu_notifier updates from Jason Gunthorpe: "This small series revises the names in mmu_notifier to make the code clearer and more readable" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/mmu_notifiers: Use 'interval_sub' as the variable for mmu_interval_notifier mm/mmu_notifiers: Use 'subscription' as the variable name for mmu_notifier mm/mmu_notifier: Rename struct mmu_notifier_mm to mmu_notifier_subscriptions
2020-01-29Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linuxLinus Torvalds3-1/+116
Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper
2020-01-29Merge tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftestLinus Torvalds1-1/+3
Pull Kselftest kunit updates from Shuah Khan: "This kunit update consists of: - Support for building kunit as a module from Alan Maguire - AppArmor KUnit tests for policy unpack from Mike Salvatore" * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: building kunit as a module breaks allmodconfig kunit: update documentation to describe module-based build kunit: allow kunit to be loaded as a module kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds kunit: allow kunit tests to be loaded as a module kunit: hide unexported try-catch interface in try-catch-impl.h kunit: move string-stream.h to lib/kunit apparmor: add AppArmor KUnit tests for policy unpack
2020-01-29Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playgroundLinus Torvalds4-66/+23
Pull y2038 updates from Arnd Bergmann: "Core, driver and file system changes These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs" [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits) Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC" y2038: sh: remove timeval/timespec usage from headers y2038: sparc: remove use of struct timex y2038: rename itimerval to __kernel_old_itimerval y2038: remove obsolete jiffies conversion functions nfs: fscache: use timespec64 in inode auxdata nfs: fix timstamp debug prints nfs: use time64_t internally sunrpc: convert to time64_t for expiry drm/etnaviv: avoid deprecated timespec drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC drm/msm: avoid using 'timespec' hfs/hfsplus: use 64-bit inode timestamps hostfs: pass 64-bit timestamps to/from user space packet: clarify timestamp overflow tsacct: add 64-bit btime field acct: stop using get_seconds() um: ubd: use 64-bit time_t where possible xtensa: ISS: avoid struct timeval dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD ...
2020-01-29Merge tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printkLinus Torvalds1-2/+2
Pull printk update from Petr Mladek: "Prevent replaying log on all consoles" * tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: fix exclusive_console replaying
2020-01-29Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-7/+7
Pull openat2 support from Al Viro: "This is the openat2() series from Aleksa Sarai. I'm afraid that the rest of namei stuff will have to wait - it got zero review the last time I'd posted #work.namei, and there had been a leak in the posted series I'd caught only last weekend. I was going to repost it on Monday, but the window opened and the odds of getting any review during that... Oh, well. Anyway, openat2 part should be ready; that _did_ get sane amount of review and public testing, so here it comes" From Aleksa's description of the series: "For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Furthermore, the need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum project[5]) with a few additions and changes made based on the previous discussion within [6] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags. However, instead of being an openat(2) flag it is provided through a new syscall openat2(2) which provides several other improvements to the openat(2) interface (see the patch description for more details). The following new LOOKUP_* flags are added: LOOKUP_NO_XDEV: Blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. Magic-link traversal which implies a vfsmount jump is also blocked (though magic-link jumps on the same vfsmount are permitted). LOOKUP_NO_MAGICLINKS: Blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic-links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of ~LOOKUP_FOLLOW in that it applies to all path components. However, you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it will *not* fail (assuming that no parent component was a magic-link), and you will have an fd for the magic-link. In order to correctly detect magic-links, the introduction of a new LOOKUP_MAGICLINK_JUMPED state flag was required. LOOKUP_BENEATH: Disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using "..". Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks done as in LOOKUP_IN_ROOT, but that requires more discussion. In addition, two new flags are added that expand on the above ideas: LOOKUP_NO_SYMLINKS: Does what it says on the tin. No symlink resolution is allowed at all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an fd for the symlink as long as no parent path had a symlink component. LOOKUP_IN_ROOT: This is an extension of LOOKUP_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with LOOKUP_BENEATH) then an error is generated, and similar to LOOKUP_BENEATH it is not permitted to cross magic-links with LOOKUP_IN_ROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[7] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In order to make all of the above more usable, I'm working on libpathrs[8] which is a C-friendly library for safe path resolution. It features a userspace-emulated backend if the kernel doesn't support openat2(2). Hopefully we can get userspace to switch to using it, and thus get openat2(2) support for free once it's ready. Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes though stale NFS handles)" * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Documentation: path-lookup: include new LOOKUP flags selftests: add openat2(2) selftests open: introduce openat2(2) syscall namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution namei: LOOKUP_IN_ROOT: chroot-like scoped resolution namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution namei: LOOKUP_NO_XDEV: block mountpoint crossing namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution namei: LOOKUP_NO_SYMLINKS: block symlink resolution namei: allow set_root() to produce errors namei: allow nd_jump_link() to produce errors nsfs: clean-up ns_get_path() signature to return int namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcuLinus Torvalds1-1/+0
Pull RCU warning removal from Paul McKenney: "A single commit that fixes an embarrassing bug discussed here: https://lore.kernel.org/lkml/20200125131425.GB16136@zn.tnic/ which apparently also affects smaller systems" [ This was sent to Ingo, but since I see the issue on the laptop I use for testing during the merge window, I'm doing the pull directly - Linus ] * 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Forgive slow expedited grace periods at boot time
2020-01-29bpf: Reuse log from btf_prase_vmlinux() in btf_struct_ops_init()Martin KaFai Lau2-4/+3
Instead of using a locally defined "struct bpf_verifier_log log = {}", btf_struct_ops_init() should reuse the "log" from its calling function "btf_parse_vmlinux()". It should also resolve the frame-size too large compiler warning in some ARCH. Fixes: 27ae7997a661 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200127175145.1154438-1-kafai@fb.com
2020-01-29tracing/boot: Move external function declarations to kernel/trace/trace.hMasami Hiramatsu2-15/+17
Move external function declarations into kernel/trace/trace.h from trace_boot.c for tracing subsystem internal use. Link: http://lkml.kernel.org/r/158029060405.12381.11944554430359702545.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-29tracing/boot: Include required headers and sort it alphabeticallyMasami Hiramatsu1-1/+8
Include some required (but currently indirectly included) headers and sort it alphabetically. Link: http://lkml.kernel.org/r/158029059514.12381.6597832266860248781.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-28tracing: Add 'hist:' to hist trigger error log error stringTom Zanussi1-1/+2
The 'hist:' prefix gets stripped from the command text during command processing, but should be added back when displaying the command during error processing. Not only because it's what should be displayed but also because not having it means the test cases fail because the caret is miscalculated by the length of the prefix string. Link: http://lkml.kernel.org/r/449df721f560042e22382f67574bcc5b4d830d3d.1561743018.git.zanussi@kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-28tracing: Add hist trigger error messages for sort specificationTom Zanussi1-4/+17
Add error codes and messages for all the error paths leading to sort specification parsing errors. Link: http://lkml.kernel.org/r/237830dc05e583fbb53664d817a784297bf961be.1561743018.git.zanussi@kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-28tracing: Simplify assignment parsing for hist triggersTom Zanussi1-43/+27
In the process of adding better error messages for sorting, I realized that strsep was being used incorrectly and some of the error paths I was expecting to be hit weren't and just fell through to the common invalid key error case. It also became obvious that for keyword assignments, it wasn't necessary to save the full assignment and reparse it later, and having a common empty-assignment check would also make more sense in terms of error processing. Change the code to fix these problems and simplify it for new error message changes in a subsequent patch. Link: http://lkml.kernel.org/r/1c3ef0b6655deaf345f6faee2584a0298ac2d743.1561743018.git.zanussi@kernel.org Fixes: e62347d24534 ("tracing: Add hist trigger support for user-defined sorting ('sort=' param)") Fixes: 7ef224d1d0e3 ("tracing: Add 'hist' event trigger command") Fixes: a4072fe85ba3 ("tracing: Add a clock attribute for hist triggers") Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-28Merge tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/umlLinus Torvalds1-1/+1
Pull UML updates from Anton Ivanov: "I am sending this on behalf of Richard who is traveling. This contains the following changes for UML: - Fix for time travel mode - Disable CONFIG_CONSTRUCTORS again - A new command line option to have an non-raw serial line - Preparations to remove obsolete UML network drivers" * tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Fix time-travel=inf-cpu with xor/raid6 Revert "um: Enable CONFIG_CONSTRUCTORS" um: Mark non-vector net transports as obsolete um: Add an option to make serial driver non-raw
2020-01-28Merge tag 'trace-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds1-2/+4
Pull tracing fix from Steven Rostedt: "Kprobe events added 'ustring' to distinguish reading strings from kernel space or user space. But the creating of the event format file only checks for 'string' to display string formats. 'ustring' must also be handled" * tag 'trace-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Have uname use __get_str() in print_fmt
2020-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds21-621/+2798
Pull networking updates from David Miller: 1) Add WireGuard 2) Add HE and TWT support to ath11k driver, from John Crispin. 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca. 4) Add variable window congestion control to TIPC, from Jon Maloy. 5) Add BCM84881 PHY driver, from Russell King. 6) Start adding netlink support for ethtool operations, from Michal Kubecek. 7) Add XDP drop and TX action support to ena driver, from Sameeh Jubran. 8) Add new ipv4 route notifications so that mlxsw driver does not have to handle identical routes itself. From Ido Schimmel. 9) Add BPF dynamic program extensions, from Alexei Starovoitov. 10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes. 11) Add support for macsec HW offloading, from Antoine Tenart. 12) Add initial support for MPTCP protocol, from Christoph Paasch, Matthieu Baerts, Florian Westphal, Peter Krystad, and many others. 13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu Cherian, and others. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits) net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC udp: segment looped gso packets correctly netem: change mailing list qed: FW 8.42.2.0 debug features qed: rt init valid initialization changed qed: Debug feature: ilt and mdump qed: FW 8.42.2.0 Add fw overlay feature qed: FW 8.42.2.0 HSI changes qed: FW 8.42.2.0 iscsi/fcoe changes qed: Add abstraction for different hsi values per chip qed: FW 8.42.2.0 Additional ll2 type qed: Use dmae to write to widebus registers in fw_funcs qed: FW 8.42.2.0 Parser offsets modified qed: FW 8.42.2.0 Queue Manager changes qed: FW 8.42.2.0 Expose new registers and change windows qed: FW 8.42.2.0 Internal ram offsets modifications MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver Documentation: net: octeontx2: Add RVU HW and drivers overview octeontx2-pf: ethtool RSS config support octeontx2-pf: Add basic ethtool support ...
2020-01-28Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-192/+194
Pull crypto updates from Herbert Xu: "API: - Removed CRYPTO_TFM_RES flags - Extended spawn grabbing to all algorithm types - Moved hash descsize verification into API code Algorithms: - Fixed recursive pcrypt dead-lock - Added new 32 and 64-bit generic versions of poly1305 - Added cryptogams implementation of x86/poly1305 Drivers: - Added support for i.MX8M Mini in caam - Added support for i.MX8M Nano in caam - Added support for i.MX8M Plus in caam - Added support for A33 variant of SS in sun4i-ss - Added TEE support for Raven Ridge in ccp - Added in-kernel API to submit TEE commands in ccp - Added AMD-TEE driver - Added support for BCM2711 in iproc-rng200 - Added support for AES256-GCM based ciphers for chtls - Added aead support on SEC2 in hisilicon" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits) crypto: arm/chacha - fix build failured when kernel mode NEON is disabled crypto: caam - add support for i.MX8M Plus crypto: x86/poly1305 - emit does base conversion itself crypto: hisilicon - fix spelling mistake "disgest" -> "digest" crypto: chacha20poly1305 - add back missing test vectors and test chunking crypto: x86/poly1305 - fix .gitignore typo tee: fix memory allocation failure checks on drv_data and amdtee crypto: ccree - erase unneeded inline funcs crypto: ccree - make cc_pm_put_suspend() void crypto: ccree - split overloaded usage of irq field crypto: ccree - fix PM race condition crypto: ccree - fix FDE descriptor sequence crypto: ccree - cc_do_send_request() is void func crypto: ccree - fix pm wrongful error reporting crypto: ccree - turn errors to debug msgs crypto: ccree - fix AEAD decrypt auth fail crypto: ccree - fix typo in comment crypto: ccree - fix typos in error msgs crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data crypto: x86/sha - Eliminate casts on asm implementations ...
2020-01-28perf/cgroups: Install cgroup events to correct cpuctxSong Liu1-3/+4
cgroup events are always installed in the cpuctx. However, when it is not installed via IPI, list_update_cgroup_event() adds it to cpuctx of current CPU, which triggers list corruption: [] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0. To reproduce this, we can simply run: # perf stat -e cs -a & # perf stat -e cs -G anycgroup Fix this by installing it to cpuctx that contains event->ctx, and the proper cgrp_cpuctx_list. Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200122195027.2112449-1-songliubraving@fb.com
2020-01-28perf/core: Fix mlock accounting in perf_mmap()Song Liu1-1/+9
Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of a perf ring buffer may lead to an integer underflow in locked memory accounting. This may lead to the undesired behaviors, such as failures in BPF map creation. Address this by adjusting the accounting logic to take into account the possibility that the amount of already locked memory may exceed the current limit. Fixes: c4b75479741c ("perf/core: Make the mlock accounting simple again") Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20200123181146.2238074-1-songliubraving@fb.com
2020-01-28Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds19-191/+315
Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...
2020-01-28Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds10-236/+185
Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...
2020-01-28Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-21/+19
Pull locking updates from Ingo Molnar: "Just a handful of changes in this cycle: an ARM64 performance optimization, a comment fix and a debug output fix" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/osq: Use optimized spinning loop for arm64 locking/qspinlock: Fix inaccessible URL of MCS lock paper locking/lockdep: Fix lockdep_stats indentation problem
2020-01-28Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds17-385/+778
Pull RCU updates from Ingo Molnar: "The RCU changes in this cycle were: - Expedited grace-period updates - kfree_rcu() updates - RCU list updates - Preemptible RCU updates - Torture-test updates - Miscellaneous fixes - Documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) rcu: Remove unused stop-machine #include powerpc: Remove comment about read_barrier_depends() .mailmap: Add entries for old paulmck@kernel.org addresses srcu: Apply *_ONCE() to ->srcu_last_gp_end rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask() rcu: Move rcu_{expedited,normal} definitions into rcupdate.h rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h rcu: Remove the declaration of call_rcu() in tree.h rcu: Fix tracepoint tracking RCU CPU kthread utilization rcu: Fix harmless omission of "CONFIG_" from #if condition rcu: Avoid tick_dep_set_cpu() misordering rcu: Provide wrappers for uses of ->rcu_read_lock_nesting rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special() rcu: Clear ->rcu_read_unlock_special only once rcu: Clear .exp_hint only when deferred quiescent state has been reported rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU rcu: Remove kfree_call_rcu_nobatch() rcu: Remove kfree_rcu() special casing and lazy-callback handling rcu: Add support for debug_objects debugging for kfree_rcu() rcu: Add multiple in-flight batches of kfree_rcu() work ...
2020-01-28smp: Remove superfluous cond_func check in smp_call_function_many_cond()Sebastian Andrzej Siewior1-1/+1
It was requested to remove the cond_func check but the follow up patch was overlooked. Remove it now. Fixes: 67719ef25eeb ("smp: Add a smp_cond_func_t argument to smp_call_function_many()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200127083915.434tdkztorkklpdu@linutronix.de
2020-01-28prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaimMike Christie1-0/+25
There are several storage drivers like dm-multipath, iscsi, tcmu-runner, amd nbd that have userspace components that can run in the IO path. For example, iscsi and nbd's userspace deamons may need to recreate a socket and/or send IO on it, and dm-multipath's daemon multipathd may need to send SG IO or read/write IO to figure out the state of paths and re-set them up. In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the memalloc_*_save/restore functions to control the allocation behavior, but for userspace we would end up hitting an allocation that ended up writing data back to the same device we are trying to allocate for. The device is then in a state of deadlock, because to execute IO the device needs to allocate memory, but to allocate memory the memory layers want execute IO to the device. Here is an example with nbd using a local userspace daemon that performs network IO to a remote server. We are using XFS on top of the nbd device, but it can happen with any FS or other modules layered on top of the nbd device that can write out data to free memory. Here a nbd daemon helper thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute a request. This kicks off a reclaim operation which results in a WRITE to the nbd device and the nbd thread calling back into the mm layer. [ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000 [ 1626.609193] Call Trace: [ 1626.609195] ? __schedule+0x29b/0x630 [ 1626.609197] ? wait_for_completion+0xe0/0x170 [ 1626.609198] schedule+0x30/0xb0 [ 1626.609200] schedule_timeout+0x1f6/0x2f0 [ 1626.609202] ? blk_finish_plug+0x21/0x2e [ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410 [ 1626.609206] ? wait_for_completion+0xe0/0x170 [ 1626.609208] wait_for_completion+0x108/0x170 [ 1626.609210] ? wake_up_q+0x70/0x70 [ 1626.609212] ? __xfs_buf_submit+0x12e/0x250 [ 1626.609214] ? xfs_bwrite+0x25/0x60 [ 1626.609215] xfs_buf_iowait+0x22/0xf0 [ 1626.609218] __xfs_buf_submit+0x12e/0x250 [ 1626.609220] xfs_bwrite+0x25/0x60 [ 1626.609222] xfs_reclaim_inode+0x2e8/0x310 [ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300 [ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40 [ 1626.609228] super_cache_scan+0x152/0x1a0 [ 1626.609231] do_shrink_slab+0x12c/0x2d0 [ 1626.609233] shrink_slab+0x9c/0x2a0 [ 1626.609235] shrink_node+0xd7/0x470 [ 1626.609237] do_try_to_free_pages+0xbf/0x380 [ 1626.609240] try_to_free_pages+0xd9/0x1f0 [ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30 [ 1626.609251] ? ___slab_alloc+0x238/0x560 [ 1626.609254] __alloc_pages_nodemask+0x30c/0x350 [ 1626.609259] skb_page_frag_refill+0x97/0xd0 [ 1626.609274] sk_page_frag_refill+0x1d/0x80 [ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0 [ 1626.609304] tcp_sendmsg+0x27/0x40 [ 1626.609307] sock_sendmsg+0x54/0x60 [ 1626.609308] ___sys_sendmsg+0x29f/0x320 [ 1626.609313] ? sock_poll+0x66/0xb0 [ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0 [ 1626.609320] ? ep_send_events_proc+0xe6/0x230 [ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0 [ 1626.609324] ? ep_read_events_proc+0xc0/0xc0 [ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20 [ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230 [ 1626.609329] ? __hrtimer_init+0xb0/0xb0 [ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20 [ 1626.609334] ? ep_poll+0x26c/0x4a0 [ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0 [ 1626.609339] ? release_sock+0x43/0x90 [ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20 [ 1626.609342] __sys_sendmsg+0x47/0x80 [ 1626.609347] do_syscall_64+0x5f/0x1c0 [ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0 [ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch adds a new prctl command that daemons can use after they have done their initial setup, and before they start to do allocations that are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags so both userspace block and FS threads can use it to avoid the allocation recursion and try to prevent from being throttled while writing out data to free up memory. Signed-off-by: Mike Christie <mchristi@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Masato Suzuki <masato.suzuki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-28Merge branch 'core/kprobes' into perf/core, to pick up fixesIngo Molnar2-25/+45
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-27Merge tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-5/+87
Pull irq updates from Thomas Gleixner: "The interrupt departement provides: - A mechanism to shield isolated tasks from managed interrupts: The affinity of managed interrupts is completely controlled by the kernel and user space has no influence on them. The reason is that the automatically assigned affinity correlates to the multi-queue CPU handling of block devices. If the generated affinity mask spaws both housekeeping and isolated CPUs the interrupt could be routed to an isolated CPU which would then be disturbed by I/O submitted by a housekeeping CPU. The new mechamism ensures that as long as one housekeeping CPU is online in the assigned affinity mask the interrupt is routed to a housekeeping CPU. If there is no online housekeeping CPU in the affinity mask, then the interrupt is routed to an isolated CPU to keep the device queue intact, but unless the isolated CPU submits I/O by itself these interrupts are not raised. - A small addon to the device tree irqdomain core code to avoid duplication in irq chip drivers - Conversion of the SiFive PLIC to hierarchical domains - The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI, NXP INTMUX, Meson A1 GPIO - The first cut of support for the new ARM GICv4.1 - The usual pile of fixes and improvements in core and driver code" * tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) genirq, sched/isolation: Isolate from handling managed interrupts irqchip/gic-v4.1: Allow direct invalidation of VLPIs irqchip/gic-v4.1: Suppress per-VLPI doorbell irqchip/gic-v4.1: Add VPE INVALL callback irqchip/gic-v4.1: Add VPE eviction callback irqchip/gic-v4.1: Add VPE residency callback irqchip/gic-v4.1: Add mask/unmask doorbell callbacks irqchip/gic-v4.1: Plumb skeletal VPE irqchip irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation irqchip/gic-v3: Add GICv4.1 VPEID size discovery irqchip/gic-v3: Detect GICv4.1 supporting RVPEID irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells irqdomain: Fix a memory leak in irq_domain_push_irq() irqchip: Add NXP INTMUX interrupt multiplexer support dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer irqchip: Define EXYNOS_IRQ_COMBINER irqchip/meson-gpio: Add support for meson a1 SoCs ...
2020-01-27Merge tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-63/+48
Pull core SMP updates from Thomas Gleixner: "A small set of SMP core code changes: - Rework the smp function call core code to avoid the allocation of an additional cpumask - Remove the not longer required GFP argument from on_each_cpu_cond() and on_each_cpu_cond_mask() and fixup the callers" * tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove allocation mask from on_each_cpu_cond.*() smp: Add a smp_cond_func_t argument to smp_call_function_many() smp: Use smp_cond_func_t as type for the conditional function