wireguard-linux/kernel/sched, branch jd/unified-crypt-queue

sched/debug: Add task uclamp values to SCHED_DEBUG procfs

2020-04-08T09:35:27Z

Requested and effective uclamp values can be a bit tricky to decipher when playing with cgroup hierarchies. Add them to a task's procfs when SCHED_DEBUG is enabled. Reviewed-by: Qais Yousef Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20200226124543.31986-4-valentin.schneider@arm.com

sched/debug: Factor out printing formats into common macros

2020-04-08T09:35:26Z

The printing macros in debug.c keep redefining the same output format. Collect each output format in a single definition, and reuse that definition in the other macros. While at it, add a layer of parentheses and replace printf's with the newly introduced macros. Reviewed-by: Qais Yousef Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20200226124543.31986-3-valentin.schneider@arm.com

sched/debug: Remove redundant macro define

2020-04-08T09:35:24Z

Most printing macros for procfs are defined globally in debug.c, and they are re-defined (to the exact same thing) within proc_sched_show_task(). Get rid of the duplicate defines. Reviewed-by: Qais Yousef Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20200226124543.31986-2-valentin.schneider@arm.com

sched/core: Remove unused rq::last_load_update_tick

2020-04-08T09:35:23Z

The following commit: 5e83eafbfd3b ("sched/fair: Remove the rq->cpu_load[] update code") eliminated the last use case for rq->last_load_update_tick, so remove the field as well. Reviewed-by: Dietmar Eggemann Reviewed-by: Vincent Guittot Signed-off-by: Vincent Donnefort Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/1584710495-308969-1-git-send-email-vincent.donnefort@arm.com

workqueue: Remove the warning in wq_worker_sleeping()

2020-04-08T09:35:20Z

The kernel test robot triggered a warning with the following race: task-ctx A interrupt-ctx B worker -> process_one_work() -> work_item() -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> ->sleeping = 1 atomic_dec_and_test(nr_running) __schedule(); *interrupt* async_page_fault() -> local_irq_enable(); -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> if (WARN_ON(->sleeping)) return -> __schedule() -> sched_update_worker() -> wq_worker_running() -> atomic_inc(nr_running); -> ->sleeping = 0; -> sched_update_worker() -> wq_worker_running() if (!->sleeping) return In this context the warning is pointless everything is fine. An interrupt before wq_worker_sleeping() will perform the ->sleeping assignment (0 -> 1 > 0) twice. An interrupt after wq_worker_sleeping() will trigger the warning and nr_running will be decremented (by A) and incremented once (only by B, A will skip it). This is the case until the ->sleeping is zeroed again in wq_worker_running(). Remove the WARN statement because this condition may happen. Document that preemption around wq_worker_sleeping() needs to be disabled to protect ->sleeping and not just as an optimisation. Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reported-by: kernel test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Cc: Tejun Heo Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian

sched/fair: Fix negative imbalance in imbalance calculation

2020-04-08T09:35:20Z

A negative imbalance value was observed after imbalance calculation, this happens when the local sched group type is group_fully_busy, and the average load of local group is greater than the selected busiest group. Fix this problem by comparing the average load of the local and busiest group before imbalance calculation formula. Suggested-by: Vincent Guittot Reviewed-by: Phil Auld Reviewed-by: Vincent Guittot Acked-by: Mel Gorman Signed-off-by: Aubrey Li Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/1585201349-70192-1-git-send-email-aubrey.li@intel.com

sched/fair: Fix race between runtime distribution and assignment

2020-04-08T09:35:19Z

Currently, there is a potential race between distribute_cfs_runtime() and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read, distributes without holding lock and finds out there is not enough runtime to charge against after distribution. Because assign_cfs_rq_runtime() might be called during distribution, and use cfs_b->runtime at the same time. Fibtest is the tool to test this race. Assume all gcfs_rq is throttled and cfs period timer runs, slow threads might run and sleep, returning unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local pool. If all this happens sufficiently quickly, cfs_b->runtime will drop a lot. If runtime distributed is large too, over-use of runtime happens. A runtime over-using by about 70 percent of quota is seen when we test fibtest on a 96-core machine. We run fibtest with 1 fast thread and 95 slow threads in test group, configure 10ms quota for this group and see the CPU usage of fibtest is 17.0%, which is far more than the expected 10%. On a smaller machine with 32 cores, we also run fibtest with 96 threads. CPU usage is more than 12%, which is also more than expected 10%. This shows that on similar workloads, this race do affect CPU bandwidth control. Solve this by holding lock inside distribute_cfs_runtime(). Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop") Reviewed-by: Ben Segall Signed-off-by: Huaixin Chang Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/

sched/fair: Align rq->avg_idle and rq->avg_scan_cost

2020-04-08T09:35:18Z

sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an open-coded version (with the exact same decay factor) for rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to compare these two fields. The only difference between the two is that rq->avg_scan_cost is computed using a pure division rather than a shift. Turns out it actually matters, first of all because the shifted value can be negative, and the standard has this to say about it: """ The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1 has a signed type and a negative value, the resulting value is implementation-defined. """ Not only this, but (arithmetic) right shifting a negative value (using 2's complement) is *not* equivalent to dividing it by the corresponding power of 2. Let's look at a few examples: -4 -> 0xF..FC -4 >> 3 -> 0xF..FF == -1 != -4 / 8 -8 -> 0xF..F8 -8 >> 3 -> 0xF..FF == -1 == -8 / 8 -9 -> 0xF..F7 -9 >> 3 -> 0xF..FE == -2 != -9 / 8 Make update_avg() use a division, and export it to the private scheduler header to reuse it where relevant. Note that this still lets compilers use a shift here, but should prevent any unwanted surprise. The disassembly of select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2 instructions; the diff sort of looks like this: - sub x1, x1, x0 + subs x1, x1, x0 // set condition codes + add x0, x1, #0x7 + csel x0, x0, x1, mi // x0 = x1 < 0 ? x0 : x1 add x0, x3, x0, asr #3 which does the right thing (i.e. gives us the expected result while still using an arithmetic shift) Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com

mm/vma: make vma_is_accessible() available for general use

2020-04-07T17:43:37Z

Lets move vma_is_accessible() helper to include/linux/mm.h which makes it available for general use. While here, this replaces all remaining open encodings for VMA access check with vma_is_accessible(). Signed-off-by: Anshuman Khandual Signed-off-by: Andrew Morton Acked-by: Geert Uytterhoeven Acked-by: Guo Ren Acked-by: Vlastimil Babka Cc: Guo Ren Cc: Geert Uytterhoeven Cc: Ralf Baechle Cc: Paul Burton Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Yoshinori Sato Cc: Rich Felker Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Steven Rostedt Cc: Mel Gorman Cc: Alexander Viro Cc: "Aneesh Kumar K.V" Cc: Arnaldo Carvalho de Melo Cc: Arnd Bergmann Cc: Nick Piggin Cc: Paul Mackerras Cc: Will Deacon Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds

Merge tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2020-03-31T01:06:39Z

Pull core SMP updates from Thomas Gleixner: "CPU (hotplug) updates: - Support for locked CSD objects in smp_call_function_single_async() which allows to simplify callsites in the scheduler core and MIPS - Treewide consolidation of CPU hotplug functions which ensures the consistency between the sysfs interface and kernel state. The low level functions cpu_up/down() are now confined to the core code and not longer accessible from random code" * tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus() cpu/hotplug: Hide cpu_up/down() cpu/hotplug: Move bringup of secondary CPUs out of smp_init() torture: Replace cpu_up/down() with add/remove_cpu() firmware: psci: Replace cpu_up/down() with add/remove_cpu() xen/cpuhotplug: Replace cpu_up/down() with device_online/offline() parisc: Replace cpu_up/down() with add/remove_cpu() sparc: Replace cpu_up/down() with add/remove_cpu() powerpc: Replace cpu_up/down() with add/remove_cpu() x86/smp: Replace cpu_up/down() with add/remove_cpu() arm64: hibernate: Use bringup_hibernate_cpu() cpu/hotplug: Provide bringup_hibernate_cpu() arm64: Use reboot_cpu instead of hardconding it to 0 arm64: Don't use disable_nonboot_cpus() ARM: Use reboot_cpu instead of hardcoding it to 0 ARM: Don't use disable_nonboot_cpus() ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus() cpu/hotplug: Create a new function to shutdown nonboot cpus cpu/hotplug: Add new {add,remove}_cpu() functions sched/core: Remove rq.hrtick_csd_pending ...