erofs: add per-cpu threads for decompression as an option

Using per-cpu thread pool we can reduce the scheduling latency compared to workqueue implementation. With this patch scheduling latency and variation is reduced as per-cpu threads are high priority kthread_workers. The results were evaluated on arm64 Android devices running 5.10 kernel. The table below shows resulting improvements of total scheduling latency for the same app launch benchmark runs with 50 iterations. Scheduling latency is the latency between when the task (workqueue kworker vs kthread_worker) became eligible to run to when it actually started running. +-------------------------+-----------+----------------+---------+ | | workqueue | kthread_worker | diff | +-------------------------+-----------+----------------+---------+ | Average (us) | 15253 | 2914 | -80.89% | | Median (us) | 14001 | 2912 | -79.20% | | Minimum (us) | 3117 | 1027 | -67.05% | | Maximum (us) | 30170 | 3805 | -87.39% | | Standard deviation (us) | 7166 | 359 | | +-------------------------+-----------+----------------+---------+ Background: Boot times and cold app launch benchmarks are very important to the Android ecosystem as they directly translate to responsiveness from user point of view. While EROFS provides a lot of important features like space savings, we saw some performance penalty in cold app launch benchmarks in few scenarios. Analysis showed that the significant variance was coming from the scheduling cost while decompression cost was more or less the same. Having per-cpu thread pool we can see from the above table that this variation is reduced by ~80% on average. This problem was discussed at LPC 2022. Link to LPC 2022 slides and talk at [1] [1] https://lpc.events/event/16/contributions/1338/ [ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue issue [2] on many arm64 devices is resolved. ] [2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com Signed-off-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com
author: Sandeep Dhavale <dhavale@google.com> 2023-02-08 17:33:22 +0800
committer: Gao Xiang <hsiangkao@linux.alibaba.com> 2023-02-15 08:11:26 +0800
commit: 3fffb589b9a6e331e39cb75373ee7691acd7b109 (patch)
tree: 9f7923b8cac5a5095b3d40e01c9e82517055851a /fs/erofs/Kconfig
parent: erofs: tidy up internal.h (diff)
download: wireguard-linux-3fffb589b9a6e331e39cb75373ee7691acd7b109.tar.xz
wireguard-linux-3fffb589b9a6e331e39cb75373ee7691acd7b109.zip
1 files changed, 18 insertions, 0 deletions
diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index 85490370e0ca..704fb59577e0 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -108,3 +108,21 @@ config EROFS_FS_ONDEMAND
 	  read support.
 
 	  If unsure, say N.
+
+config EROFS_FS_PCPU_KTHREAD
+	bool "EROFS per-cpu decompression kthread workers"
+	depends on EROFS_FS_ZIP
+	help
+	  Saying Y here enables per-CPU kthread workers pool to carry out
+	  async decompression for low latencies on some architectures.
+
+	  If unsure, say N.
+
+config EROFS_FS_PCPU_KTHREAD_HIPRI
+	bool "EROFS high priority per-CPU kthread workers"
+	depends on EROFS_FS_ZIP && EROFS_FS_PCPU_KTHREAD
+	help
+	  This permits EROFS to configure per-CPU kthread workers to run
+	  at higher priority.
+
+	  If unsure, say N.
author	Sandeep Dhavale <dhavale@google.com>	2023-02-08 17:33:22 +0800
committer	Gao Xiang <hsiangkao@linux.alibaba.com>	2023-02-15 08:11:26 +0800
commit	3fffb589b9a6e331e39cb75373ee7691acd7b109 (patch)
tree	9f7923b8cac5a5095b3d40e01c9e82517055851a /fs/erofs/Kconfig
parent	erofs: tidy up internal.h (diff)
download	wireguard-linux-3fffb589b9a6e331e39cb75373ee7691acd7b109.tar.xz wireguard-linux-3fffb589b9a6e331e39cb75373ee7691acd7b109.zip