aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/arm/cluster-pm-race-avoidance.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/arm/cluster-pm-race-avoidance.txt')
-rw-r--r--Documentation/arm/cluster-pm-race-avoidance.txt498
1 files changed, 0 insertions, 498 deletions
diff --git a/Documentation/arm/cluster-pm-race-avoidance.txt b/Documentation/arm/cluster-pm-race-avoidance.txt
deleted file mode 100644
index 750b6fc24af9..000000000000
--- a/Documentation/arm/cluster-pm-race-avoidance.txt
+++ /dev/null
@@ -1,498 +0,0 @@
-Cluster-wide Power-up/power-down race avoidance algorithm
-=========================================================
-
-This file documents the algorithm which is used to coordinate CPU and
-cluster setup and teardown operations and to manage hardware coherency
-controls safely.
-
-The section "Rationale" explains what the algorithm is for and why it is
-needed. "Basic model" explains general concepts using a simplified view
-of the system. The other sections explain the actual details of the
-algorithm in use.
-
-
-Rationale
----------
-
-In a system containing multiple CPUs, it is desirable to have the
-ability to turn off individual CPUs when the system is idle, reducing
-power consumption and thermal dissipation.
-
-In a system containing multiple clusters of CPUs, it is also desirable
-to have the ability to turn off entire clusters.
-
-Turning entire clusters off and on is a risky business, because it
-involves performing potentially destructive operations affecting a group
-of independently running CPUs, while the OS continues to run. This
-means that we need some coordination in order to ensure that critical
-cluster-level operations are only performed when it is truly safe to do
-so.
-
-Simple locking may not be sufficient to solve this problem, because
-mechanisms like Linux spinlocks may rely on coherency mechanisms which
-are not immediately enabled when a cluster powers up. Since enabling or
-disabling those mechanisms may itself be a non-atomic operation (such as
-writing some hardware registers and invalidating large caches), other
-methods of coordination are required in order to guarantee safe
-power-down and power-up at the cluster level.
-
-The mechanism presented in this document describes a coherent memory
-based protocol for performing the needed coordination. It aims to be as
-lightweight as possible, while providing the required safety properties.
-
-
-Basic model
------------
-
-Each cluster and CPU is assigned a state, as follows:
-
- DOWN
- COMING_UP
- UP
- GOING_DOWN
-
- +---------> UP ----------+
- | v
-
- COMING_UP GOING_DOWN
-
- ^ |
- +--------- DOWN <--------+
-
-
-DOWN: The CPU or cluster is not coherent, and is either powered off or
- suspended, or is ready to be powered off or suspended.
-
-COMING_UP: The CPU or cluster has committed to moving to the UP state.
- It may be part way through the process of initialisation and
- enabling coherency.
-
-UP: The CPU or cluster is active and coherent at the hardware
- level. A CPU in this state is not necessarily being used
- actively by the kernel.
-
-GOING_DOWN: The CPU or cluster has committed to moving to the DOWN
- state. It may be part way through the process of teardown and
- coherency exit.
-
-
-Each CPU has one of these states assigned to it at any point in time.
-The CPU states are described in the "CPU state" section, below.
-
-Each cluster is also assigned a state, but it is necessary to split the
-state value into two parts (the "cluster" state and "inbound" state) and
-to introduce additional states in order to avoid races between different
-CPUs in the cluster simultaneously modifying the state. The cluster-
-level states are described in the "Cluster state" section.
-
-To help distinguish the CPU states from cluster states in this
-discussion, the state names are given a CPU_ prefix for the CPU states,
-and a CLUSTER_ or INBOUND_ prefix for the cluster states.
-
-
-CPU state
----------
-
-In this algorithm, each individual core in a multi-core processor is
-referred to as a "CPU". CPUs are assumed to be single-threaded:
-therefore, a CPU can only be doing one thing at a single point in time.
-
-This means that CPUs fit the basic model closely.
-
-The algorithm defines the following states for each CPU in the system:
-
- CPU_DOWN
- CPU_COMING_UP
- CPU_UP
- CPU_GOING_DOWN
-
- cluster setup and
- CPU setup complete policy decision
- +-----------> CPU_UP ------------+
- | v
-
- CPU_COMING_UP CPU_GOING_DOWN
-
- ^ |
- +----------- CPU_DOWN <----------+
- policy decision CPU teardown complete
- or hardware event
-
-
-The definitions of the four states correspond closely to the states of
-the basic model.
-
-Transitions between states occur as follows.
-
-A trigger event (spontaneous) means that the CPU can transition to the
-next state as a result of making local progress only, with no
-requirement for any external event to happen.
-
-
-CPU_DOWN:
-
- A CPU reaches the CPU_DOWN state when it is ready for
- power-down. On reaching this state, the CPU will typically
- power itself down or suspend itself, via a WFI instruction or a
- firmware call.
-
- Next state: CPU_COMING_UP
- Conditions: none
-
- Trigger events:
-
- a) an explicit hardware power-up operation, resulting
- from a policy decision on another CPU;
-
- b) a hardware event, such as an interrupt.
-
-
-CPU_COMING_UP:
-
- A CPU cannot start participating in hardware coherency until the
- cluster is set up and coherent. If the cluster is not ready,
- then the CPU will wait in the CPU_COMING_UP state until the
- cluster has been set up.
-
- Next state: CPU_UP
- Conditions: The CPU's parent cluster must be in CLUSTER_UP.
- Trigger events: Transition of the parent cluster to CLUSTER_UP.
-
- Refer to the "Cluster state" section for a description of the
- CLUSTER_UP state.
-
-
-CPU_UP:
- When a CPU reaches the CPU_UP state, it is safe for the CPU to
- start participating in local coherency.
-
- This is done by jumping to the kernel's CPU resume code.
-
- Note that the definition of this state is slightly different
- from the basic model definition: CPU_UP does not mean that the
- CPU is coherent yet, but it does mean that it is safe to resume
- the kernel. The kernel handles the rest of the resume
- procedure, so the remaining steps are not visible as part of the
- race avoidance algorithm.
-
- The CPU remains in this state until an explicit policy decision
- is made to shut down or suspend the CPU.
-
- Next state: CPU_GOING_DOWN
- Conditions: none
- Trigger events: explicit policy decision
-
-
-CPU_GOING_DOWN:
-
- While in this state, the CPU exits coherency, including any
- operations required to achieve this (such as cleaning data
- caches).
-
- Next state: CPU_DOWN
- Conditions: local CPU teardown complete
- Trigger events: (spontaneous)
-
-
-Cluster state
--------------
-
-A cluster is a group of connected CPUs with some common resources.
-Because a cluster contains multiple CPUs, it can be doing multiple
-things at the same time. This has some implications. In particular, a
-CPU can start up while another CPU is tearing the cluster down.
-
-In this discussion, the "outbound side" is the view of the cluster state
-as seen by a CPU tearing the cluster down. The "inbound side" is the
-view of the cluster state as seen by a CPU setting the CPU up.
-
-In order to enable safe coordination in such situations, it is important
-that a CPU which is setting up the cluster can advertise its state
-independently of the CPU which is tearing down the cluster. For this
-reason, the cluster state is split into two parts:
-
- "cluster" state: The global state of the cluster; or the state
- on the outbound side:
-
- CLUSTER_DOWN
- CLUSTER_UP
- CLUSTER_GOING_DOWN
-
- "inbound" state: The state of the cluster on the inbound side.
-
- INBOUND_NOT_COMING_UP
- INBOUND_COMING_UP
-
-
- The different pairings of these states results in six possible
- states for the cluster as a whole:
-
- CLUSTER_UP
- +==========> INBOUND_NOT_COMING_UP -------------+
- # |
- |
- CLUSTER_UP <----+ |
- INBOUND_COMING_UP | v
-
- ^ CLUSTER_GOING_DOWN CLUSTER_GOING_DOWN
- # INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
-
- CLUSTER_DOWN | |
- INBOUND_COMING_UP <----+ |
- |
- ^ |
- +=========== CLUSTER_DOWN <------------+
- INBOUND_NOT_COMING_UP
-
- Transitions -----> can only be made by the outbound CPU, and
- only involve changes to the "cluster" state.
-
- Transitions ===##> can only be made by the inbound CPU, and only
- involve changes to the "inbound" state, except where there is no
- further transition possible on the outbound side (i.e., the
- outbound CPU has put the cluster into the CLUSTER_DOWN state).
-
- The race avoidance algorithm does not provide a way to determine
- which exact CPUs within the cluster play these roles. This must
- be decided in advance by some other means. Refer to the section
- "Last man and first man selection" for more explanation.
-
-
- CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
- cluster can actually be powered down.
-
- The parallelism of the inbound and outbound CPUs is observed by
- the existence of two different paths from CLUSTER_GOING_DOWN/
- INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
- model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
- COMING_UP in the basic model). The second path avoids cluster
- teardown completely.
-
- CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
- model. The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
- is trivial and merely resets the state machine ready for the
- next cycle.
-
- Details of the allowable transitions follow.
-
- The next state in each case is notated
-
- <cluster state>/<inbound state> (<transitioner>)
-
- where the <transitioner> is the side on which the transition
- can occur; either the inbound or the outbound side.
-
-
-CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
-
- Next state: CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
- Conditions: none
- Trigger events:
-
- a) an explicit hardware power-up operation, resulting
- from a policy decision on another CPU;
-
- b) a hardware event, such as an interrupt.
-
-
-CLUSTER_DOWN/INBOUND_COMING_UP:
-
- In this state, an inbound CPU sets up the cluster, including
- enabling of hardware coherency at the cluster level and any
- other operations (such as cache invalidation) which are required
- in order to achieve this.
-
- The purpose of this state is to do sufficient cluster-level
- setup to enable other CPUs in the cluster to enter coherency
- safely.
-
- Next state: CLUSTER_UP/INBOUND_COMING_UP (inbound)
- Conditions: cluster-level setup and hardware coherency complete
- Trigger events: (spontaneous)
-
-
-CLUSTER_UP/INBOUND_COMING_UP:
-
- Cluster-level setup is complete and hardware coherency is
- enabled for the cluster. Other CPUs in the cluster can safely
- enter coherency.
-
- This is a transient state, leading immediately to
- CLUSTER_UP/INBOUND_NOT_COMING_UP. All other CPUs on the cluster
- should consider treat these two states as equivalent.
-
- Next state: CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
- Conditions: none
- Trigger events: (spontaneous)
-
-
-CLUSTER_UP/INBOUND_NOT_COMING_UP:
-
- Cluster-level setup is complete and hardware coherency is
- enabled for the cluster. Other CPUs in the cluster can safely
- enter coherency.
-
- The cluster will remain in this state until a policy decision is
- made to power the cluster down.
-
- Next state: CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
- Conditions: none
- Trigger events: policy decision to power down the cluster
-
-
-CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
-
- An outbound CPU is tearing the cluster down. The selected CPU
- must wait in this state until all CPUs in the cluster are in the
- CPU_DOWN state.
-
- When all CPUs are in the CPU_DOWN state, the cluster can be torn
- down, for example by cleaning data caches and exiting
- cluster-level coherency.
-
- To avoid wasteful unnecessary teardown operations, the outbound
- should check the inbound cluster state for asynchronous
- transitions to INBOUND_COMING_UP. Alternatively, individual
- CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
-
-
- Next states:
-
- CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
- Conditions: cluster torn down and ready to power off
- Trigger events: (spontaneous)
-
- CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
- Conditions: none
- Trigger events:
-
- a) an explicit hardware power-up operation,
- resulting from a policy decision on another
- CPU;
-
- b) a hardware event, such as an interrupt.
-
-
-CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
-
- The cluster is (or was) being torn down, but another CPU has
- come online in the meantime and is trying to set up the cluster
- again.
-
- If the outbound CPU observes this state, it has two choices:
-
- a) back out of teardown, restoring the cluster to the
- CLUSTER_UP state;
-
- b) finish tearing the cluster down and put the cluster
- in the CLUSTER_DOWN state; the inbound CPU will
- set up the cluster again from there.
-
- Choice (a) permits the removal of some latency by avoiding
- unnecessary teardown and setup operations in situations where
- the cluster is not really going to be powered down.
-
-
- Next states:
-
- CLUSTER_UP/INBOUND_COMING_UP (outbound)
- Conditions: cluster-level setup and hardware
- coherency complete
- Trigger events: (spontaneous)
-
- CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
- Conditions: cluster torn down and ready to power off
- Trigger events: (spontaneous)
-
-
-Last man and First man selection
---------------------------------
-
-The CPU which performs cluster tear-down operations on the outbound side
-is commonly referred to as the "last man".
-
-The CPU which performs cluster setup on the inbound side is commonly
-referred to as the "first man".
-
-The race avoidance algorithm documented above does not provide a
-mechanism to choose which CPUs should play these roles.
-
-
-Last man:
-
-When shutting down the cluster, all the CPUs involved are initially
-executing Linux and hence coherent. Therefore, ordinary spinlocks can
-be used to select a last man safely, before the CPUs become
-non-coherent.
-
-
-First man:
-
-Because CPUs may power up asynchronously in response to external wake-up
-events, a dynamic mechanism is needed to make sure that only one CPU
-attempts to play the first man role and do the cluster-level
-initialisation: any other CPUs must wait for this to complete before
-proceeding.
-
-Cluster-level initialisation may involve actions such as configuring
-coherency controls in the bus fabric.
-
-The current implementation in mcpm_head.S uses a separate mutual exclusion
-mechanism to do this arbitration. This mechanism is documented in
-detail in vlocks.txt.
-
-
-Features and Limitations
-------------------------
-
-Implementation:
-
- The current ARM-based implementation is split between
- arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
- arch/arm/common/mcpm_entry.c (everything else):
-
- __mcpm_cpu_going_down() signals the transition of a CPU to the
- CPU_GOING_DOWN state.
-
- __mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
- state.
-
- A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
- low-level power-up code in mcpm_head.S. This could
- involve CPU-specific setup code, but in the current
- implementation it does not.
-
- __mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
- handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
- and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
- the case of an aborted cluster power-down).
-
- These functions are more complex than the __mcpm_cpu_*()
- functions due to the extra inter-CPU coordination which
- is needed for safe transitions at the cluster level.
-
- A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
- the low-level power-up code in mcpm_head.S. This
- typically involves platform-specific setup code,
- provided by the platform-specific power_up_setup
- function registered via mcpm_sync_init.
-
-Deep topologies:
-
- As currently described and implemented, the algorithm does not
- support CPU topologies involving more than two levels (i.e.,
- clusters of clusters are not supported). The algorithm could be
- extended by replicating the cluster-level states for the
- additional topological levels, and modifying the transition
- rules for the intermediate (non-outermost) cluster levels.
-
-
-Colophon
---------
-
-Originally created and documented by Dave Martin for Linaro Limited, in
-collaboration with Nicolas Pitre and Achin Gupta.
-
-Copyright (C) 2012-2013 Linaro Limited
-Distributed under the terms of Version 2 of the GNU General Public
-License, as defined in linux/COPYING.