aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/sched/sched.h
diff options
context:
space:
mode:
authorClement Courbet <courbet@google.com>2021-03-03 14:46:53 -0800
committerPeter Zijlstra <peterz@infradead.org>2021-03-10 09:51:49 +0100
commit1e17fb8edc5ad6587e9303ccdebce853bc8cf30c (patch)
tree65052ced18c491a4a3b229f5dc780b94efa3f063 /kernel/sched/sched.h
parentpsi: Optimize task switch inside shared cgroups (diff)
downloadlinux-dev-1e17fb8edc5ad6587e9303ccdebce853bc8cf30c.tar.xz
linux-dev-1e17fb8edc5ad6587e9303ccdebce853bc8cf30c.zip
sched: Optimize __calc_delta()
A significant portion of __calc_delta() time is spent in the loop shifting a u64 by 32 bits. Use `fls` instead of iterating. This is ~7x faster on benchmarks. The generic `fls` implementation (`generic_fls`) is still ~4x faster than the loop. Architectures that have a better implementation will make use of it. For example, on x86 we get an additional factor 2 in speed without dedicated implementation. On GCC, the asm versions of `fls` are about the same speed as the builtin. On Clang, the versions that use fls are more than twice as slow as the builtin. This is because the way the `fls` function is written, clang puts the value in memory: https://godbolt.org/z/EfMbYe. This bug is filed at https://bugs.llvm.org/show_bug.cgi?idI406. ``` name cpu/op BM_Calc<__calc_delta_loop> 9.57ms Â=B112% BM_Calc<__calc_delta_generic_fls> 2.36ms Â=B113% BM_Calc<__calc_delta_asm_fls> 2.45ms Â=B113% BM_Calc<__calc_delta_asm_fls_nomem> 1.66ms Â=B112% BM_Calc<__calc_delta_asm_fls64> 2.46ms Â=B113% BM_Calc<__calc_delta_asm_fls64_nomem> 1.34ms Â=B115% BM_Calc<__calc_delta_builtin> 1.32ms Â=B111% ``` Signed-off-by: Clement Courbet <courbet@google.com> Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210303224653.2579656-1-joshdon@google.com
Diffstat (limited to 'kernel/sched/sched.h')
-rw-r--r--kernel/sched/sched.h1
1 files changed, 1 insertions, 0 deletions
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bb8bb06582c4..d2e09a647c4f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -36,6 +36,7 @@
#include <uapi/linux/sched/types.h>
#include <linux/binfmts.h>
+#include <linux/bitops.h>
#include <linux/blkdev.h>
#include <linux/compat.h>
#include <linux/context_tracking.h>