aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/timers
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/timers')
-rw-r--r--Documentation/timers/delay_sleep_functions.rst121
-rw-r--r--Documentation/timers/hrtimers.rst21
-rw-r--r--Documentation/timers/index.rst4
-rw-r--r--Documentation/timers/no_hz.rst27
-rw-r--r--Documentation/timers/timers-howto.rst115
5 files changed, 141 insertions, 147 deletions
diff --git a/Documentation/timers/delay_sleep_functions.rst b/Documentation/timers/delay_sleep_functions.rst
new file mode 100644
index 000000000000..49d603a3f113
--- /dev/null
+++ b/Documentation/timers/delay_sleep_functions.rst
@@ -0,0 +1,121 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Delay and sleep mechanisms
+==========================
+
+This document seeks to answer the common question: "What is the
+RightWay (TM) to insert a delay?"
+
+This question is most often faced by driver writers who have to
+deal with hardware delays and who may not be the most intimately
+familiar with the inner workings of the Linux Kernel.
+
+The following table gives a rough overview about the existing function
+'families' and their limitations. This overview table does not replace the
+reading of the function description before usage!
+
+.. list-table::
+ :widths: 20 20 20 20 20
+ :header-rows: 2
+
+ * -
+ - `*delay()`
+ - `usleep_range*()`
+ - `*sleep()`
+ - `fsleep()`
+ * -
+ - busy-wait loop
+ - hrtimers based
+ - timer list timers based
+ - combines the others
+ * - Usage in atomic Context
+ - yes
+ - no
+ - no
+ - no
+ * - precise on "short intervals"
+ - yes
+ - yes
+ - depends
+ - yes
+ * - precise on "long intervals"
+ - Do not use!
+ - yes
+ - max 12.5% slack
+ - yes
+ * - interruptible variant
+ - no
+ - yes
+ - yes
+ - no
+
+A generic advice for non atomic contexts could be:
+
+#. Use `fsleep()` whenever unsure (as it combines all the advantages of the
+ others)
+#. Use `*sleep()` whenever possible
+#. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient
+#. Use `*delay()` for very, very short delays
+
+Find some more detailed information about the function 'families' in the next
+sections.
+
+`*delay()` family of functions
+------------------------------
+
+These functions use the jiffy estimation of clock speed and will busy wait for
+enough loop cycles to achieve the desired delay. udelay() is the basic
+implementation and ndelay() as well as mdelay() are variants.
+
+These functions are mainly used to add a delay in atomic context. Please make
+sure to ask yourself before adding a delay in atomic context: Is this really
+required?
+
+.. kernel-doc:: include/asm-generic/delay.h
+ :identifiers: udelay ndelay
+
+.. kernel-doc:: include/linux/delay.h
+ :identifiers: mdelay
+
+
+`usleep_range*()` and `*sleep()` family of functions
+----------------------------------------------------
+
+These functions use hrtimers or timer list timers to provide the requested
+sleeping duration. In order to decide which function is the right one to use,
+take some basic information into account:
+
+#. hrtimers are more expensive as they are using an rb-tree (instead of hashing)
+#. hrtimers are more expensive when the requested sleeping duration is the first
+ timer which means real hardware has to be programmed
+#. timer list timers always provide some sort of slack as they are jiffy based
+
+The generic advice is repeated here:
+
+#. Use `fsleep()` whenever unsure (as it combines all the advantages of the
+ others)
+#. Use `*sleep()` whenever possible
+#. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient
+
+First check fsleep() function description and to learn more about accuracy,
+please check msleep() function description.
+
+
+`usleep_range*()`
+~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: include/linux/delay.h
+ :identifiers: usleep_range usleep_range_idle
+
+.. kernel-doc:: kernel/time/sleep_timeout.c
+ :identifiers: usleep_range_state
+
+
+`*sleep()`
+~~~~~~~~~~
+
+.. kernel-doc:: kernel/time/sleep_timeout.c
+ :identifiers: msleep msleep_interruptible
+
+.. kernel-doc:: include/linux/delay.h
+ :identifiers: ssleep fsleep
diff --git a/Documentation/timers/hrtimers.rst b/Documentation/timers/hrtimers.rst
index c1c20a693e8f..f88ff8bae89c 100644
--- a/Documentation/timers/hrtimers.rst
+++ b/Documentation/timers/hrtimers.rst
@@ -118,22 +118,17 @@ existing timer wheel code, as it is mature and well suited. Sharing code
was not really a win, due to the different data structures. Also, the
hrtimer functions now have clearer behavior and clearer names - such as
hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly
-equivalent to del_timer() and del_timer_sync()] - so there's no direct
+equivalent to timer_delete() and timer_delete_sync()] - so there's no direct
1:1 mapping between them on the algorithmic level, and thus no real
potential for code sharing either.
Basic data types: every time value, absolute or relative, is in a
-special nanosecond-resolution type: ktime_t. The kernel-internal
-representation of ktime_t values and operations is implemented via
-macros and inline functions, and can be switched between a "hybrid
-union" type and a plain "scalar" 64bit nanoseconds representation (at
-compile time). The hybrid union type optimizes time conversions on 32bit
-CPUs. This build-time-selectable ktime_t storage format was implemented
-to avoid the performance impact of 64-bit multiplications and divisions
-on 32bit CPUs. Such operations are frequently necessary to convert
-between the storage formats provided by kernel and userspace interfaces
-and the internal time format. (See include/linux/ktime.h for further
-details.)
+special nanosecond-resolution 64bit type: ktime_t.
+(Originally, the kernel-internal representation of ktime_t values and
+operations was implemented via macros and inline functions, and could be
+switched between a "hybrid union" type and a plain "scalar" 64bit
+nanoseconds representation (at compile time). This was abandoned in the
+context of the Y2038 work.)
hrtimers - rounding of timer values
-----------------------------------
@@ -148,7 +143,7 @@ a given clock has - be it low-res, high-res, or artificially-low-res.
hrtimers - testing and verification
-----------------------------------
-We used the high-resolution clock subsystem ontop of hrtimers to verify
+We used the high-resolution clock subsystem on top of hrtimers to verify
the hrtimer implementation details in praxis, and we also ran the posix
timer tests in order to ensure specification compliance. We also ran
tests on low-resolution clocks.
diff --git a/Documentation/timers/index.rst b/Documentation/timers/index.rst
index df510ad0c989..4e88116e4dcf 100644
--- a/Documentation/timers/index.rst
+++ b/Documentation/timers/index.rst
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
======
-timers
+Timers
======
.. toctree::
@@ -12,7 +12,7 @@ timers
hrtimers
no_hz
timekeeping
- timers-howto
+ delay_sleep_functions
.. only:: subproject and html
diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
index c4c70e1aada3..7fe8ef9718d8 100644
--- a/Documentation/timers/no_hz.rst
+++ b/Documentation/timers/no_hz.rst
@@ -1,4 +1,4 @@
-======================================
+======================================
NO_HZ: Reducing Scheduling-Clock Ticks
======================================
@@ -70,6 +70,10 @@ interrupt. After all, the primary purpose of a scheduling-clock interrupt
is to force a busy CPU to shift its attention among multiple duties,
and an idle CPU has no duties to shift its attention among.
+An idle CPU that is not receiving scheduling-clock interrupts is said to
+be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
+tickless". The remainder of this document will use "dyntick-idle mode".
+
The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
scheduling-clock interrupts to idle CPUs, which is critically important
both to battery-powered devices and to highly virtualized mainframes.
@@ -91,10 +95,6 @@ Therefore, systems with aggressive real-time response constraints often
run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
in order to avoid degrading from-idle transition latencies.
-An idle CPU that is not receiving scheduling-clock interrupts is said to
-be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
-tickless". The remainder of this document will use "dyntick-idle mode".
-
There is also a boot parameter "nohz=" that can be used to disable
dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
@@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system
calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
-user processes to observe slight drifts in clock rate.) Therefore, the
-boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
-"nohz_full=" mask that includes the boot CPU will result in a boot-time
-error message, and the boot CPU will be removed from the mask. Note that
-this means that your system must have at least two CPUs in order for
+user processes to observe slight drifts in clock rate.) Note that this
+means that your system must have at least two CPUs in order for
CONFIG_NO_HZ_FULL=y to do anything for you.
Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
@@ -184,16 +181,12 @@ There are situations in which idle CPUs cannot be permitted to
enter either dyntick-idle mode or adaptive-tick mode, the most
common being when that CPU has RCU callbacks pending.
-The CONFIG_RCU_FAST_NO_HZ=y Kconfig option may be used to cause such CPUs
-to enter dyntick-idle mode or adaptive-tick mode anyway. In this case,
-a timer will awaken these CPUs every four jiffies in order to ensure
-that the RCU callbacks are processed in a timely fashion.
-
-Another approach is to offload RCU callback processing to "rcuo" kthreads
+Avoid this by offloading RCU callback processing to "rcuo" kthreads
using the CONFIG_RCU_NOCB_CPU=y Kconfig option. The specific CPUs to
offload may be selected using The "rcu_nocbs=" kernel boot parameter,
which takes a comma-separated list of CPUs and CPU ranges, for example,
-"1,3-5" selects CPUs 1, 3, 4, and 5.
+"1,3-5" selects CPUs 1, 3, 4, and 5. Note that CPUs specified by
+the "nohz_full" kernel boot parameter are also offloaded.
The offloaded CPUs will never queue RCU callbacks, and therefore RCU
never prevents offloaded CPUs from entering either dyntick-idle mode
diff --git a/Documentation/timers/timers-howto.rst b/Documentation/timers/timers-howto.rst
deleted file mode 100644
index afb0a43b8cdf..000000000000
--- a/Documentation/timers/timers-howto.rst
+++ /dev/null
@@ -1,115 +0,0 @@
-===================================================================
-delays - Information on the various kernel delay / sleep mechanisms
-===================================================================
-
-This document seeks to answer the common question: "What is the
-RightWay (TM) to insert a delay?"
-
-This question is most often faced by driver writers who have to
-deal with hardware delays and who may not be the most intimately
-familiar with the inner workings of the Linux Kernel.
-
-
-Inserting Delays
-----------------
-
-The first, and most important, question you need to ask is "Is my
-code in an atomic context?" This should be followed closely by "Does
-it really need to delay in atomic context?" If so...
-
-ATOMIC CONTEXT:
- You must use the `*delay` family of functions. These
- functions use the jiffie estimation of clock speed
- and will busy wait for enough loop cycles to achieve
- the desired delay:
-
- ndelay(unsigned long nsecs)
- udelay(unsigned long usecs)
- mdelay(unsigned long msecs)
-
- udelay is the generally preferred API; ndelay-level
- precision may not actually exist on many non-PC devices.
-
- mdelay is macro wrapper around udelay, to account for
- possible overflow when passing large arguments to udelay.
- In general, use of mdelay is discouraged and code should
- be refactored to allow for the use of msleep.
-
-NON-ATOMIC CONTEXT:
- You should use the `*sleep[_range]` family of functions.
- There are a few more options here, while any of them may
- work correctly, using the "right" sleep function will
- help the scheduler, power management, and just make your
- driver better :)
-
- -- Backed by busy-wait loop:
-
- udelay(unsigned long usecs)
-
- -- Backed by hrtimers:
-
- usleep_range(unsigned long min, unsigned long max)
-
- -- Backed by jiffies / legacy_timers
-
- msleep(unsigned long msecs)
- msleep_interruptible(unsigned long msecs)
-
- Unlike the `*delay` family, the underlying mechanism
- driving each of these calls varies, thus there are
- quirks you should be aware of.
-
-
- SLEEPING FOR "A FEW" USECS ( < ~10us? ):
- * Use udelay
-
- - Why not usleep?
- On slower systems, (embedded, OR perhaps a speed-
- stepped PC!) the overhead of setting up the hrtimers
- for usleep *may* not be worth it. Such an evaluation
- will obviously depend on your specific situation, but
- it is something to be aware of.
-
- SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):
- * Use usleep_range
-
- - Why not msleep for (1ms - 20ms)?
- Explained originally here:
- http://lkml.org/lkml/2007/8/3/250
-
- msleep(1~20) may not do what the caller intends, and
- will often sleep longer (~20 ms actual sleep for any
- value given in the 1~20ms range). In many cases this
- is not the desired behavior.
-
- - Why is there no "usleep" / What is a good range?
- Since usleep_range is built on top of hrtimers, the
- wakeup will be very precise (ish), thus a simple
- usleep function would likely introduce a large number
- of undesired interrupts.
-
- With the introduction of a range, the scheduler is
- free to coalesce your wakeup with any other wakeup
- that may have happened for other reasons, or at the
- worst case, fire an interrupt for your upper bound.
-
- The larger a range you supply, the greater a chance
- that you will not trigger an interrupt; this should
- be balanced with what is an acceptable upper bound on
- delay / performance for your specific code path. Exact
- tolerances here are very situation specific, thus it
- is left to the caller to determine a reasonable range.
-
- SLEEPING FOR LARGER MSECS ( 10ms+ )
- * Use msleep or possibly msleep_interruptible
-
- - What's the difference?
- msleep sets the current task to TASK_UNINTERRUPTIBLE
- whereas msleep_interruptible sets the current task to
- TASK_INTERRUPTIBLE before scheduling the sleep. In
- short, the difference is whether the sleep can be ended
- early by a signal. In general, just use msleep unless
- you know you have a need for the interruptible variant.
-
- FLEXIBLE SLEEPING (any delay, uninterruptible)
- * Use fsleep