<feed xmlns='http://www.w3.org/2005/Atom'>
<title>wireguard-linux/kernel/sched/fair.c, branch jd/unified-crypt-queue</title>
<subtitle>WireGuard for the Linux kernel</subtitle>
<id>https://git.zx2c4.com/wireguard-linux/atom/kernel/sched/fair.c?h=jd%2Funified-crypt-queue</id>
<link rel='self' href='https://git.zx2c4.com/wireguard-linux/atom/kernel/sched/fair.c?h=jd%2Funified-crypt-queue'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/'/>
<updated>2020-04-08T09:35:20Z</updated>
<entry>
<title>sched/fair: Fix negative imbalance in imbalance calculation</title>
<updated>2020-04-08T09:35:20Z</updated>
<author>
<name>Aubrey Li</name>
<email>aubrey.li@intel.com</email>
</author>
<published>2020-03-26T05:42:29Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=111688ca1c4a43a7e482f5401f82c46326b8ed49'/>
<id>urn:sha1:111688ca1c4a43a7e482f5401f82c46326b8ed49</id>
<content type='text'>
A negative imbalance value was observed after imbalance calculation,
this happens when the local sched group type is group_fully_busy,
and the average load of local group is greater than the selected
busiest group. Fix this problem by comparing the average load of the
local and busiest group before imbalance calculation formula.

Suggested-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Phil Auld &lt;pauld@redhat.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Aubrey Li &lt;aubrey.li@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/1585201349-70192-1-git-send-email-aubrey.li@intel.com
</content>
</entry>
<entry>
<title>sched/fair: Fix race between runtime distribution and assignment</title>
<updated>2020-04-08T09:35:19Z</updated>
<author>
<name>Huaixin Chang</name>
<email>changhuaixin@linux.alibaba.com</email>
</author>
<published>2020-03-27T03:26:25Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=26a8b12747c975b33b4a82d62e4a307e1c07f31b'/>
<id>urn:sha1:26a8b12747c975b33b4a82d62e4a307e1c07f31b</id>
<content type='text'>
Currently, there is a potential race between distribute_cfs_runtime()
and assign_cfs_rq_runtime(). Race happens when cfs_b-&gt;runtime is read,
distributes without holding lock and finds out there is not enough
runtime to charge against after distribution. Because
assign_cfs_rq_runtime() might be called during distribution, and use
cfs_b-&gt;runtime at the same time.

Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
and cfs period timer runs, slow threads might run and sleep, returning
unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
pool. If all this happens sufficiently quickly, cfs_b-&gt;runtime will drop
a lot. If runtime distributed is large too, over-use of runtime happens.

A runtime over-using by about 70 percent of quota is seen when we
test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
95 slow threads in test group, configure 10ms quota for this group and
see the CPU usage of fibtest is 17.0%, which is far more than the
expected 10%.

On a smaller machine with 32 cores, we also run fibtest with 96
threads. CPU usage is more than 12%, which is also more than expected
10%. This shows that on similar workloads, this race do affect CPU
bandwidth control.

Solve this by holding lock inside distribute_cfs_runtime().

Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Huaixin Chang &lt;changhuaixin@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/
</content>
</entry>
<entry>
<title>sched/fair: Align rq-&gt;avg_idle and rq-&gt;avg_scan_cost</title>
<updated>2020-04-08T09:35:18Z</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2020-03-30T09:01:27Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb'/>
<id>urn:sha1:d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb</id>
<content type='text'>
sched/core.c uses update_avg() for rq-&gt;avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq-&gt;avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.

The only difference between the two is that rq-&gt;avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:

  """
  The result of E1 &gt;&gt; E2 is E1 right-shifted E2 bit positions. [...] If E1
  has a signed type and a negative value, the resulting value is
  implementation-defined.
  """

Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:

  -4      -&gt; 0xF..FC
  -4 &gt;&gt; 3 -&gt; 0xF..FF == -1 != -4 / 8

  -8      -&gt; 0xF..F8
  -8 &gt;&gt; 3 -&gt; 0xF..FF == -1 == -8 / 8

  -9      -&gt; 0xF..F7
  -9 &gt;&gt; 3 -&gt; 0xF..FE == -2 != -9 / 8

Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:

  - sub x1, x1, x0
  + subs x1, x1, x0 // set condition codes
  + add x0, x1, #0x7
  + csel x0, x0, x1, mi // x0 = x1 &lt; 0 ? x0 : x1
    add x0, x3, x0, asr #3

which does the right thing (i.e. gives us the expected result while still
using an arithmetic shift)

Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com
</content>
</entry>
<entry>
<title>mm/vma: make vma_is_accessible() available for general use</title>
<updated>2020-04-07T17:43:37Z</updated>
<author>
<name>Anshuman Khandual</name>
<email>anshuman.khandual@arm.com</email>
</author>
<published>2020-04-07T03:03:47Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=3122e80efc0faf4a2accba7a46c7ed795edbfded'/>
<id>urn:sha1:3122e80efc0faf4a2accba7a46c7ed795edbfded</id>
<content type='text'>
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use.  While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().

Signed-off-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Acked-by: Guo Ren &lt;guoren@kernel.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Paul Burton &lt;paulburton@kernel.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Nick Piggin &lt;npiggin@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@ozlabs.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix condition of avg_load calculation</title>
<updated>2020-03-20T12:06:20Z</updated>
<author>
<name>Tao Zhou</name>
<email>ouwen210@hotmail.com</email>
</author>
<published>2020-03-19T03:39:20Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=6c8116c914b65be5e4d6f66d69c8142eb0648c22'/>
<id>urn:sha1:6c8116c914b65be5e4d6f66d69c8142eb0648c22</id>
<content type='text'>
In update_sg_wakeup_stats(), the comment says:

Computing avg_load makes sense only when group is fully
busy or overloaded.

But, the code below this comment does not check like this.

From reading the code about avg_load in other functions, I
confirm that avg_load should be calculated in fully busy or
overloaded case. The comment is correct and the checking
condition is wrong. So, change that condition.

Fixes: 57abff067a08 ("sched/fair: Rework find_idlest_group()")
Signed-off-by: Tao Zhou &lt;ouwen210@hotmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Link: https://lkml.kernel.org/r/Message-ID:
</content>
</entry>
<entry>
<title>sched/fair: Improve spreading of utilization</title>
<updated>2020-03-20T12:06:20Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2020-03-12T16:54:29Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=c32b4308295aaaaedd5beae56cb42e205ae63e58'/>
<id>urn:sha1:c32b4308295aaaaedd5beae56cb42e205ae63e58</id>
<content type='text'>
During load_balancing, a group with spare capacity will try to pull some
utilizations from an overloaded group. In such case, the load balance
looks for the runqueue with the highest utilization. Nevertheless, it
should also ensure that there are some pending tasks to pull otherwise
the load balance will fail to pull a task and the spread of the load will
be delayed.

This situation is quite transient but it's possible to highlight the
effect with a short run of sysbench test so the time to spread task impacts
the global result significantly.

Below are the average results for 15 iterations on an arm64 octo core:
sysbench --test=cpu --num-threads=8  --max-requests=1000 run

                           tip/sched/core  +patchset
total time:                172ms           158ms
per-request statistics:
         avg:                1.337ms         1.244ms
         max:               21.191ms        10.753ms

The average max doesn't fully reflect the wide spread of the value which
ranges from 1.350ms to more than 41ms for the tip/sched/core and from
1.350ms to 21ms with the patch.

Other factors like waiting for an idle load balance or cache hotness
can delay the spreading of the tasks which explains why we can still
have up to 21ms with the patch.

Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200312165429.990-1-vincent.guittot@linaro.org
</content>
</entry>
<entry>
<title>sched/fair: Fix enqueue_task_fair warning</title>
<updated>2020-03-20T12:06:18Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2020-03-06T13:52:57Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=fe61468b2cbc2b7ce5f8d3bf32ae5001d4c434e9'/>
<id>urn:sha1:fe61468b2cbc2b7ce5f8d3bf32ae5001d4c434e9</id>
<content type='text'>
When a cfs rq is throttled, the latter and its child are removed from the
leaf list but their nr_running is not changed which includes staying higher
than 1. When a task is enqueued in this throttled branch, the cfs rqs must
be added back in order to ensure correct ordering in the list but this can
only happens if nr_running == 1.
When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq()
when enqueuing an entity to make sure that the complete branch will be
added.

Similarly unthrottle_cfs_rq() can stop adding cfs in the list when a parent
is throttled. Iterate the remaining entity to ensure that the complete
branch will be added in the list.

Reported-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Tested-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org #v5.1+
Link: https://lkml.kernel.org/r/20200306135257.25044-1-vincent.guittot@linaro.org
</content>
</entry>
<entry>
<title>sched/fair: Fix reordering of enqueue/dequeue_task_fair()</title>
<updated>2020-03-06T11:57:25Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2020-03-06T08:42:08Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=5ab297bab984310267734dfbcc8104566658ebef'/>
<id>urn:sha1:5ab297bab984310267734dfbcc8104566658ebef</id>
<content type='text'>
Even when a cgroup is throttled, the group se of a child cgroup can still
be enqueued and its gse-&gt;on_rq stays true. When a task is enqueued on such
child, we still have to update the load_avg and increase
h_nr_running of the throttled cfs. Nevertheless, the 1st
for_each_sched_entity() loop is skipped because of gse-&gt;on_rq == true and the
2nd loop because the cfs is throttled whereas we have to update both
load_avg with the old h_nr_running and increase h_nr_running in such case.

The same sequence can happen during dequeue when se moves to parent before
breaking in the 1st loop.

Note that the update of load_avg will effectively happen only once in order
to sync up to the throttled time. Next call for updating load_avg will stop
early because the clock stays unchanged.

Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Fixes: 6d4d22468dae ("sched/fair: Reorder enqueue/dequeue_task_fair path")
Link: https://lkml.kernel.org/r/20200306084208.12583-1-vincent.guittot@linaro.org
</content>
</entry>
<entry>
<title>sched/fair: Fix runnable_avg for throttled cfs</title>
<updated>2020-03-06T11:57:25Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2020-02-27T15:41:15Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=6212437f0f6043e825e021e4afc5cd63e248a2b4'/>
<id>urn:sha1:6212437f0f6043e825e021e4afc5cd63e248a2b4</id>
<content type='text'>
When a cfs_rq is throttled, its group entity is dequeued and its running
tasks are removed. We must update runnable_avg with the old h_nr_running
and update group_se-&gt;runnable_weight with the new h_nr_running at each
level of the hierarchy.

Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Fixes: 9f68395333ad ("sched/pelt: Add a new runnable average signal")
Link: https://lkml.kernel.org/r/20200227154115.8332-1-vincent.guittot@linaro.org
</content>
</entry>
<entry>
<title>sched/numa: Acquire RCU lock for checking idle cores during NUMA balancing</title>
<updated>2020-03-06T11:57:22Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2020-02-27T19:18:04Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=0621df315402dd7bc56f7272fae9778701289825'/>
<id>urn:sha1:0621df315402dd7bc56f7272fae9778701289825</id>
<content type='text'>
Qian Cai reported the following bug:

  The linux-next commit ff7db0bf24db ("sched/numa: Prefer using an idle CPU as a
  migration target instead of comparing tasks") introduced a boot warning,

  [   86.520534][    T1] WARNING: suspicious RCU usage
  [   86.520540][    T1] 5.6.0-rc3-next-20200227 #7 Not tainted
  [   86.520545][    T1] -----------------------------
  [   86.520551][    T1] kernel/sched/fair.c:5914 suspicious rcu_dereference_check() usage!
  [   86.520555][    T1]
  [   86.520555][    T1] other info that might help us debug this:
  [   86.520555][    T1]
  [   86.520561][    T1]
  [   86.520561][    T1] rcu_scheduler_active = 2, debug_locks = 1
  [   86.520567][    T1] 1 lock held by systemd/1:
  [   86.520571][    T1]  #0: ffff8887f4b14848 (&amp;mm-&gt;mmap_sem#2){++++}, at: do_page_fault+0x1d2/0x998
  [   86.520594][    T1]
  [   86.520594][    T1] stack backtrace:
  [   86.520602][    T1] CPU: 1 PID: 1 Comm: systemd Not tainted 5.6.0-rc3-next-20200227 #7

task_numa_migrate() checks for idle cores when updating NUMA-related statistics.
This relies on reading a RCU-protected structure in test_idle_cores() via this
call chain

task_numa_migrate
  -&gt; update_numa_stats
    -&gt; numa_idle_core
      -&gt; test_idle_cores

While the locking could be fine-grained, it is more appropriate to acquire
the RCU lock for the entire scan of the domain. This patch removes the
warning triggered at boot time.

Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Fixes: ff7db0bf24db ("sched/numa: Prefer using an idle CPU as a migration target instead of comparing tasks")
Link: https://lkml.kernel.org/r/20200227191804.GJ3818@techsingularity.net
</content>
</entry>
</feed>
