aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/LSM/LoadPin.rst21
-rw-r--r--Documentation/admin-guide/LSM/SELinux.rst33
-rw-r--r--Documentation/admin-guide/LSM/Smack.rst857
-rw-r--r--Documentation/admin-guide/LSM/Yama.rst74
-rw-r--r--Documentation/admin-guide/LSM/apparmor.rst51
-rw-r--r--Documentation/admin-guide/LSM/index.rst41
-rw-r--r--Documentation/admin-guide/LSM/tomoyo.rst65
-rw-r--r--Documentation/admin-guide/README.rst6
-rw-r--r--Documentation/admin-guide/devices.txt4
-rw-r--r--Documentation/admin-guide/index.rst2
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt106
-rw-r--r--Documentation/admin-guide/pm/cpufreq.rst29
-rw-r--r--Documentation/admin-guide/pm/index.rst1
-rw-r--r--Documentation/admin-guide/pm/intel_pstate.rst753
-rw-r--r--Documentation/admin-guide/ras.rst10
-rw-r--r--Documentation/admin-guide/thunderbolt.rst199
16 files changed, 2208 insertions, 44 deletions
diff --git a/Documentation/admin-guide/LSM/LoadPin.rst b/Documentation/admin-guide/LSM/LoadPin.rst
new file mode 100644
index 000000000000..32070762d24c
--- /dev/null
+++ b/Documentation/admin-guide/LSM/LoadPin.rst
@@ -0,0 +1,21 @@
+=======
+LoadPin
+=======
+
+LoadPin is a Linux Security Module that ensures all kernel-loaded files
+(modules, firmware, etc) all originate from the same filesystem, with
+the expectation that such a filesystem is backed by a read-only device
+such as dm-verity or CDROM. This allows systems that have a verified
+and/or unchangeable filesystem to enforce module and firmware loading
+restrictions without needing to sign the files individually.
+
+The LSM is selectable at build-time with ``CONFIG_SECURITY_LOADPIN``, and
+can be controlled at boot-time with the kernel command line option
+"``loadpin.enabled``". By default, it is enabled, but can be disabled at
+boot ("``loadpin.enabled=0``").
+
+LoadPin starts pinning when it sees the first file loaded. If the
+block device backing the filesystem is not read-only, a sysctl is
+created to toggle pinning: ``/proc/sys/kernel/loadpin/enabled``. (Having
+a mutable filesystem means pinning is mutable too, but having the
+sysctl allows for easy testing on systems with a mutable filesystem.)
diff --git a/Documentation/admin-guide/LSM/SELinux.rst b/Documentation/admin-guide/LSM/SELinux.rst
new file mode 100644
index 000000000000..f722c9b4173a
--- /dev/null
+++ b/Documentation/admin-guide/LSM/SELinux.rst
@@ -0,0 +1,33 @@
+=======
+SELinux
+=======
+
+If you want to use SELinux, chances are you will want
+to use the distro-provided policies, or install the
+latest reference policy release from
+
+ http://oss.tresys.com/projects/refpolicy
+
+However, if you want to install a dummy policy for
+testing, you can do using ``mdp`` provided under
+scripts/selinux. Note that this requires the selinux
+userspace to be installed - in particular you will
+need checkpolicy to compile a kernel, and setfiles and
+fixfiles to label the filesystem.
+
+ 1. Compile the kernel with selinux enabled.
+ 2. Type ``make`` to compile ``mdp``.
+ 3. Make sure that you are not running with
+ SELinux enabled and a real policy. If
+ you are, reboot with selinux disabled
+ before continuing.
+ 4. Run install_policy.sh::
+
+ cd scripts/selinux
+ sh install_policy.sh
+
+Step 4 will create a new dummy policy valid for your
+kernel, with a single selinux user, role, and type.
+It will compile the policy, will set your ``SELINUXTYPE`` to
+``dummy`` in ``/etc/selinux/config``, install the compiled policy
+as ``dummy``, and relabel your filesystem.
diff --git a/Documentation/admin-guide/LSM/Smack.rst b/Documentation/admin-guide/LSM/Smack.rst
new file mode 100644
index 000000000000..6a5826a13aea
--- /dev/null
+++ b/Documentation/admin-guide/LSM/Smack.rst
@@ -0,0 +1,857 @@
+=====
+Smack
+=====
+
+
+ "Good for you, you've decided to clean the elevator!"
+ - The Elevator, from Dark Star
+
+Smack is the Simplified Mandatory Access Control Kernel.
+Smack is a kernel based implementation of mandatory access
+control that includes simplicity in its primary design goals.
+
+Smack is not the only Mandatory Access Control scheme
+available for Linux. Those new to Mandatory Access Control
+are encouraged to compare Smack with the other mechanisms
+available to determine which is best suited to the problem
+at hand.
+
+Smack consists of three major components:
+
+ - The kernel
+ - Basic utilities, which are helpful but not required
+ - Configuration data
+
+The kernel component of Smack is implemented as a Linux
+Security Modules (LSM) module. It requires netlabel and
+works best with file systems that support extended attributes,
+although xattr support is not strictly required.
+It is safe to run a Smack kernel under a "vanilla" distribution.
+
+Smack kernels use the CIPSO IP option. Some network
+configurations are intolerant of IP options and can impede
+access to systems that use them as Smack does.
+
+Smack is used in the Tizen operating system. Please
+go to http://wiki.tizen.org for information about how
+Smack is used in Tizen.
+
+The current git repository for Smack user space is:
+
+ git://github.com/smack-team/smack.git
+
+This should make and install on most modern distributions.
+There are five commands included in smackutil:
+
+chsmack:
+ display or set Smack extended attribute values
+
+smackctl:
+ load the Smack access rules
+
+smackaccess:
+ report if a process with one label has access
+ to an object with another
+
+These two commands are obsolete with the introduction of
+the smackfs/load2 and smackfs/cipso2 interfaces.
+
+smackload:
+ properly formats data for writing to smackfs/load
+
+smackcipso:
+ properly formats data for writing to smackfs/cipso
+
+In keeping with the intent of Smack, configuration data is
+minimal and not strictly required. The most important
+configuration step is mounting the smackfs pseudo filesystem.
+If smackutil is installed the startup script will take care
+of this, but it can be manually as well.
+
+Add this line to ``/etc/fstab``::
+
+ smackfs /sys/fs/smackfs smackfs defaults 0 0
+
+The ``/sys/fs/smackfs`` directory is created by the kernel.
+
+Smack uses extended attributes (xattrs) to store labels on filesystem
+objects. The attributes are stored in the extended attribute security
+name space. A process must have ``CAP_MAC_ADMIN`` to change any of these
+attributes.
+
+The extended attributes that Smack uses are:
+
+SMACK64
+ Used to make access control decisions. In almost all cases
+ the label given to a new filesystem object will be the label
+ of the process that created it.
+
+SMACK64EXEC
+ The Smack label of a process that execs a program file with
+ this attribute set will run with this attribute's value.
+
+SMACK64MMAP
+ Don't allow the file to be mmapped by a process whose Smack
+ label does not allow all of the access permitted to a process
+ with the label contained in this attribute. This is a very
+ specific use case for shared libraries.
+
+SMACK64TRANSMUTE
+ Can only have the value "TRUE". If this attribute is present
+ on a directory when an object is created in the directory and
+ the Smack rule (more below) that permitted the write access
+ to the directory includes the transmute ("t") mode the object
+ gets the label of the directory instead of the label of the
+ creating process. If the object being created is a directory
+ the SMACK64TRANSMUTE attribute is set as well.
+
+SMACK64IPIN
+ This attribute is only available on file descriptors for sockets.
+ Use the Smack label in this attribute for access control
+ decisions on packets being delivered to this socket.
+
+SMACK64IPOUT
+ This attribute is only available on file descriptors for sockets.
+ Use the Smack label in this attribute for access control
+ decisions on packets coming from this socket.
+
+There are multiple ways to set a Smack label on a file::
+
+ # attr -S -s SMACK64 -V "value" path
+ # chsmack -a value path
+
+A process can see the Smack label it is running with by
+reading ``/proc/self/attr/current``. A process with ``CAP_MAC_ADMIN``
+can set the process Smack by writing there.
+
+Most Smack configuration is accomplished by writing to files
+in the smackfs filesystem. This pseudo-filesystem is mounted
+on ``/sys/fs/smackfs``.
+
+access
+ Provided for backward compatibility. The access2 interface
+ is preferred and should be used instead.
+ This interface reports whether a subject with the specified
+ Smack label has a particular access to an object with a
+ specified Smack label. Write a fixed format access rule to
+ this file. The next read will indicate whether the access
+ would be permitted. The text will be either "1" indicating
+ access, or "0" indicating denial.
+
+access2
+ This interface reports whether a subject with the specified
+ Smack label has a particular access to an object with a
+ specified Smack label. Write a long format access rule to
+ this file. The next read will indicate whether the access
+ would be permitted. The text will be either "1" indicating
+ access, or "0" indicating denial.
+
+ambient
+ This contains the Smack label applied to unlabeled network
+ packets.
+
+change-rule
+ This interface allows modification of existing access control rules.
+ The format accepted on write is::
+
+ "%s %s %s %s"
+
+ where the first string is the subject label, the second the
+ object label, the third the access to allow and the fourth the
+ access to deny. The access strings may contain only the characters
+ "rwxat-". If a rule for a given subject and object exists it will be
+ modified by enabling the permissions in the third string and disabling
+ those in the fourth string. If there is no such rule it will be
+ created using the access specified in the third and the fourth strings.
+
+cipso
+ Provided for backward compatibility. The cipso2 interface
+ is preferred and should be used instead.
+ This interface allows a specific CIPSO header to be assigned
+ to a Smack label. The format accepted on write is::
+
+ "%24s%4d%4d"["%4d"]...
+
+ The first string is a fixed Smack label. The first number is
+ the level to use. The second number is the number of categories.
+ The following numbers are the categories::
+
+ "level-3-cats-5-19 3 2 5 19"
+
+cipso2
+ This interface allows a specific CIPSO header to be assigned
+ to a Smack label. The format accepted on write is::
+
+ "%s%4d%4d"["%4d"]...
+
+ The first string is a long Smack label. The first number is
+ the level to use. The second number is the number of categories.
+ The following numbers are the categories::
+
+ "level-3-cats-5-19 3 2 5 19"
+
+direct
+ This contains the CIPSO level used for Smack direct label
+ representation in network packets.
+
+doi
+ This contains the CIPSO domain of interpretation used in
+ network packets.
+
+ipv6host
+ This interface allows specific IPv6 internet addresses to be
+ treated as single label hosts. Packets are sent to single
+ label hosts only from processes that have Smack write access
+ to the host label. All packets received from single label hosts
+ are given the specified label. The format accepted on write is::
+
+ "%h:%h:%h:%h:%h:%h:%h:%h label" or
+ "%h:%h:%h:%h:%h:%h:%h:%h/%d label".
+
+ The "::" address shortcut is not supported.
+ If label is "-DELETE" a matched entry will be deleted.
+
+load
+ Provided for backward compatibility. The load2 interface
+ is preferred and should be used instead.
+ This interface allows access control rules in addition to
+ the system defined rules to be specified. The format accepted
+ on write is::
+
+ "%24s%24s%5s"
+
+ where the first string is the subject label, the second the
+ object label, and the third the requested access. The access
+ string may contain only the characters "rwxat-", and specifies
+ which sort of access is allowed. The "-" is a placeholder for
+ permissions that are not allowed. The string "r-x--" would
+ specify read and execute access. Labels are limited to 23
+ characters in length.
+
+load2
+ This interface allows access control rules in addition to
+ the system defined rules to be specified. The format accepted
+ on write is::
+
+ "%s %s %s"
+
+ where the first string is the subject label, the second the
+ object label, and the third the requested access. The access
+ string may contain only the characters "rwxat-", and specifies
+ which sort of access is allowed. The "-" is a placeholder for
+ permissions that are not allowed. The string "r-x--" would
+ specify read and execute access.
+
+load-self
+ Provided for backward compatibility. The load-self2 interface
+ is preferred and should be used instead.
+ This interface allows process specific access rules to be
+ defined. These rules are only consulted if access would
+ otherwise be permitted, and are intended to provide additional
+ restrictions on the process. The format is the same as for
+ the load interface.
+
+load-self2
+ This interface allows process specific access rules to be
+ defined. These rules are only consulted if access would
+ otherwise be permitted, and are intended to provide additional
+ restrictions on the process. The format is the same as for
+ the load2 interface.
+
+logging
+ This contains the Smack logging state.
+
+mapped
+ This contains the CIPSO level used for Smack mapped label
+ representation in network packets.
+
+netlabel
+ This interface allows specific internet addresses to be
+ treated as single label hosts. Packets are sent to single
+ label hosts without CIPSO headers, but only from processes
+ that have Smack write access to the host label. All packets
+ received from single label hosts are given the specified
+ label. The format accepted on write is::
+
+ "%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label".
+
+ If the label specified is "-CIPSO" the address is treated
+ as a host that supports CIPSO headers.
+
+onlycap
+ This contains labels processes must have for CAP_MAC_ADMIN
+ and ``CAP_MAC_OVERRIDE`` to be effective. If this file is empty
+ these capabilities are effective at for processes with any
+ label. The values are set by writing the desired labels, separated
+ by spaces, to the file or cleared by writing "-" to the file.
+
+ptrace
+ This is used to define the current ptrace policy
+
+ 0 - default:
+ this is the policy that relies on Smack access rules.
+ For the ``PTRACE_READ`` a subject needs to have a read access on
+ object. For the ``PTRACE_ATTACH`` a read-write access is required.
+
+ 1 - exact:
+ this is the policy that limits ``PTRACE_ATTACH``. Attach is
+ only allowed when subject's and object's labels are equal.
+ ``PTRACE_READ`` is not affected. Can be overridden with ``CAP_SYS_PTRACE``.
+
+ 2 - draconian:
+ this policy behaves like the 'exact' above with an
+ exception that it can't be overridden with ``CAP_SYS_PTRACE``.
+
+revoke-subject
+ Writing a Smack label here sets the access to '-' for all access
+ rules with that subject label.
+
+unconfined
+ If the kernel is configured with ``CONFIG_SECURITY_SMACK_BRINGUP``
+ a process with ``CAP_MAC_ADMIN`` can write a label into this interface.
+ Thereafter, accesses that involve that label will be logged and
+ the access permitted if it wouldn't be otherwise. Note that this
+ is dangerous and can ruin the proper labeling of your system.
+ It should never be used in production.
+
+relabel-self
+ This interface contains a list of labels to which the process can
+ transition to, by writing to ``/proc/self/attr/current``.
+ Normally a process can change its own label to any legal value, but only
+ if it has ``CAP_MAC_ADMIN``. This interface allows a process without
+ ``CAP_MAC_ADMIN`` to relabel itself to one of labels from predefined list.
+ A process without ``CAP_MAC_ADMIN`` can change its label only once. When it
+ does, this list will be cleared.
+ The values are set by writing the desired labels, separated
+ by spaces, to the file or cleared by writing "-" to the file.
+
+If you are using the smackload utility
+you can add access rules in ``/etc/smack/accesses``. They take the form::
+
+ subjectlabel objectlabel access
+
+access is a combination of the letters rwxatb which specify the
+kind of access permitted a subject with subjectlabel on an
+object with objectlabel. If there is no rule no access is allowed.
+
+Look for additional programs on http://schaufler-ca.com
+
+The Simplified Mandatory Access Control Kernel (Whitepaper)
+===========================================================
+
+Casey Schaufler
+casey@schaufler-ca.com
+
+Mandatory Access Control
+------------------------
+
+Computer systems employ a variety of schemes to constrain how information is
+shared among the people and services using the machine. Some of these schemes
+allow the program or user to decide what other programs or users are allowed
+access to pieces of data. These schemes are called discretionary access
+control mechanisms because the access control is specified at the discretion
+of the user. Other schemes do not leave the decision regarding what a user or
+program can access up to users or programs. These schemes are called mandatory
+access control mechanisms because you don't have a choice regarding the users
+or programs that have access to pieces of data.
+
+Bell & LaPadula
+---------------
+
+From the middle of the 1980's until the turn of the century Mandatory Access
+Control (MAC) was very closely associated with the Bell & LaPadula security
+model, a mathematical description of the United States Department of Defense
+policy for marking paper documents. MAC in this form enjoyed a following
+within the Capital Beltway and Scandinavian supercomputer centers but was
+often sited as failing to address general needs.
+
+Domain Type Enforcement
+-----------------------
+
+Around the turn of the century Domain Type Enforcement (DTE) became popular.
+This scheme organizes users, programs, and data into domains that are
+protected from each other. This scheme has been widely deployed as a component
+of popular Linux distributions. The administrative overhead required to
+maintain this scheme and the detailed understanding of the whole system
+necessary to provide a secure domain mapping leads to the scheme being
+disabled or used in limited ways in the majority of cases.
+
+Smack
+-----
+
+Smack is a Mandatory Access Control mechanism designed to provide useful MAC
+while avoiding the pitfalls of its predecessors. The limitations of Bell &
+LaPadula are addressed by providing a scheme whereby access can be controlled
+according to the requirements of the system and its purpose rather than those
+imposed by an arcane government policy. The complexity of Domain Type
+Enforcement and avoided by defining access controls in terms of the access
+modes already in use.
+
+Smack Terminology
+-----------------
+
+The jargon used to talk about Smack will be familiar to those who have dealt
+with other MAC systems and shouldn't be too difficult for the uninitiated to
+pick up. There are four terms that are used in a specific way and that are
+especially important:
+
+ Subject:
+ A subject is an active entity on the computer system.
+ On Smack a subject is a task, which is in turn the basic unit
+ of execution.
+
+ Object:
+ An object is a passive entity on the computer system.
+ On Smack files of all types, IPC, and tasks can be objects.
+
+ Access:
+ Any attempt by a subject to put information into or get
+ information from an object is an access.
+
+ Label:
+ Data that identifies the Mandatory Access Control
+ characteristics of a subject or an object.
+
+These definitions are consistent with the traditional use in the security
+community. There are also some terms from Linux that are likely to crop up:
+
+ Capability:
+ A task that possesses a capability has permission to
+ violate an aspect of the system security policy, as identified by
+ the specific capability. A task that possesses one or more
+ capabilities is a privileged task, whereas a task with no
+ capabilities is an unprivileged task.
+
+ Privilege:
+ A task that is allowed to violate the system security
+ policy is said to have privilege. As of this writing a task can
+ have privilege either by possessing capabilities or by having an
+ effective user of root.
+
+Smack Basics
+------------
+
+Smack is an extension to a Linux system. It enforces additional restrictions
+on what subjects can access which objects, based on the labels attached to
+each of the subject and the object.
+
+Labels
+~~~~~~
+
+Smack labels are ASCII character strings. They can be up to 255 characters
+long, but keeping them to twenty-three characters is recommended.
+Single character labels using special characters, that being anything
+other than a letter or digit, are reserved for use by the Smack development
+team. Smack labels are unstructured, case sensitive, and the only operation
+ever performed on them is comparison for equality. Smack labels cannot
+contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
+(quote) and '"' (double-quote) characters.
+Smack labels cannot begin with a '-'. This is reserved for special options.
+
+There are some predefined labels::
+
+ _ Pronounced "floor", a single underscore character.
+ ^ Pronounced "hat", a single circumflex character.
+ * Pronounced "star", a single asterisk character.
+ ? Pronounced "huh", a single question mark character.
+ @ Pronounced "web", a single at sign character.
+
+Every task on a Smack system is assigned a label. The Smack label
+of a process will usually be assigned by the system initialization
+mechanism.
+
+Access Rules
+~~~~~~~~~~~~
+
+Smack uses the traditional access modes of Linux. These modes are read,
+execute, write, and occasionally append. There are a few cases where the
+access mode may not be obvious. These include:
+
+ Signals:
+ A signal is a write operation from the subject task to
+ the object task.
+
+ Internet Domain IPC:
+ Transmission of a packet is considered a
+ write operation from the source task to the destination task.
+
+Smack restricts access based on the label attached to a subject and the label
+attached to the object it is trying to access. The rules enforced are, in
+order:
+
+ 1. Any access requested by a task labeled "*" is denied.
+ 2. A read or execute access requested by a task labeled "^"
+ is permitted.
+ 3. A read or execute access requested on an object labeled "_"
+ is permitted.
+ 4. Any access requested on an object labeled "*" is permitted.
+ 5. Any access requested by a task on an object with the same
+ label is permitted.
+ 6. Any access requested that is explicitly defined in the loaded
+ rule set is permitted.
+ 7. Any other access is denied.
+
+Smack Access Rules
+~~~~~~~~~~~~~~~~~~
+
+With the isolation provided by Smack access separation is simple. There are
+many interesting cases where limited access by subjects to objects with
+different labels is desired. One example is the familiar spy model of
+sensitivity, where a scientist working on a highly classified project would be
+able to read documents of lower classifications and anything she writes will
+be "born" highly classified. To accommodate such schemes Smack includes a
+mechanism for specifying rules allowing access between labels.
+
+Access Rule Format
+~~~~~~~~~~~~~~~~~~
+
+The format of an access rule is::
+
+ subject-label object-label access
+
+Where subject-label is the Smack label of the task, object-label is the Smack
+label of the thing being accessed, and access is a string specifying the sort
+of access allowed. The access specification is searched for letters that
+describe access modes:
+
+ a: indicates that append access should be granted.
+ r: indicates that read access should be granted.
+ w: indicates that write access should be granted.
+ x: indicates that execute access should be granted.
+ t: indicates that the rule requests transmutation.
+ b: indicates that the rule should be reported for bring-up.
+
+Uppercase values for the specification letters are allowed as well.
+Access mode specifications can be in any order. Examples of acceptable rules
+are::
+
+ TopSecret Secret rx
+ Secret Unclass R
+ Manager Game x
+ User HR w
+ Snap Crackle rwxatb
+ New Old rRrRr
+ Closed Off -
+
+Examples of unacceptable rules are::
+
+ Top Secret Secret rx
+ Ace Ace r
+ Odd spells waxbeans
+
+Spaces are not allowed in labels. Since a subject always has access to files
+with the same label specifying a rule for that case is pointless. Only
+valid letters (rwxatbRWXATB) and the dash ('-') character are allowed in
+access specifications. The dash is a placeholder, so "a-r" is the same
+as "ar". A lone dash is used to specify that no access should be allowed.
+
+Applying Access Rules
+~~~~~~~~~~~~~~~~~~~~~
+
+The developers of Linux rarely define new sorts of things, usually importing
+schemes and concepts from other systems. Most often, the other systems are
+variants of Unix. Unix has many endearing properties, but consistency of
+access control models is not one of them. Smack strives to treat accesses as
+uniformly as is sensible while keeping with the spirit of the underlying
+mechanism.
+
+File system objects including files, directories, named pipes, symbolic links,
+and devices require access permissions that closely match those used by mode
+bit access. To open a file for reading read access is required on the file. To
+search a directory requires execute access. Creating a file with write access
+requires both read and write access on the containing directory. Deleting a
+file requires read and write access to the file and to the containing
+directory. It is possible that a user may be able to see that a file exists
+but not any of its attributes by the circumstance of having read access to the
+containing directory but not to the differently labeled file. This is an
+artifact of the file name being data in the directory, not a part of the file.
+
+If a directory is marked as transmuting (SMACK64TRANSMUTE=TRUE) and the
+access rule that allows a process to create an object in that directory
+includes 't' access the label assigned to the new object will be that
+of the directory, not the creating process. This makes it much easier
+for two processes with different labels to share data without granting
+access to all of their files.
+
+IPC objects, message queues, semaphore sets, and memory segments exist in flat
+namespaces and access requests are only required to match the object in
+question.
+
+Process objects reflect tasks on the system and the Smack label used to access
+them is the same Smack label that the task would use for its own access
+attempts. Sending a signal via the kill() system call is a write operation
+from the signaler to the recipient. Debugging a process requires both reading
+and writing. Creating a new task is an internal operation that results in two
+tasks with identical Smack labels and requires no access checks.
+
+Sockets are data structures attached to processes and sending a packet from
+one process to another requires that the sender have write access to the
+receiver. The receiver is not required to have read access to the sender.
+
+Setting Access Rules
+~~~~~~~~~~~~~~~~~~~~
+
+The configuration file /etc/smack/accesses contains the rules to be set at
+system startup. The contents are written to the special file
+/sys/fs/smackfs/load2. Rules can be added at any time and take effect
+immediately. For any pair of subject and object labels there can be only
+one rule, with the most recently specified overriding any earlier
+specification.
+
+Task Attribute
+~~~~~~~~~~~~~~
+
+The Smack label of a process can be read from /proc/<pid>/attr/current. A
+process can read its own Smack label from /proc/self/attr/current. A
+privileged process can change its own Smack label by writing to
+/proc/self/attr/current but not the label of another process.
+
+File Attribute
+~~~~~~~~~~~~~~
+
+The Smack label of a filesystem object is stored as an extended attribute
+named SMACK64 on the file. This attribute is in the security namespace. It can
+only be changed by a process with privilege.
+
+Privilege
+~~~~~~~~~
+
+A process with CAP_MAC_OVERRIDE or CAP_MAC_ADMIN is privileged.
+CAP_MAC_OVERRIDE allows the process access to objects it would
+be denied otherwise. CAP_MAC_ADMIN allows a process to change
+Smack data, including rules and attributes.
+
+Smack Networking
+~~~~~~~~~~~~~~~~
+
+As mentioned before, Smack enforces access control on network protocol
+transmissions. Every packet sent by a Smack process is tagged with its Smack
+label. This is done by adding a CIPSO tag to the header of the IP packet. Each
+packet received is expected to have a CIPSO tag that identifies the label and
+if it lacks such a tag the network ambient label is assumed. Before the packet
+is delivered a check is made to determine that a subject with the label on the
+packet has write access to the receiving process and if that is not the case
+the packet is dropped.
+
+CIPSO Configuration
+~~~~~~~~~~~~~~~~~~~
+
+It is normally unnecessary to specify the CIPSO configuration. The default
+values used by the system handle all internal cases. Smack will compose CIPSO
+label values to match the Smack labels being used without administrative
+intervention. Unlabeled packets that come into the system will be given the
+ambient label.
+
+Smack requires configuration in the case where packets from a system that is
+not Smack that speaks CIPSO may be encountered. Usually this will be a Trusted
+Solaris system, but there are other, less widely deployed systems out there.
+CIPSO provides 3 important values, a Domain Of Interpretation (DOI), a level,
+and a category set with each packet. The DOI is intended to identify a group
+of systems that use compatible labeling schemes, and the DOI specified on the
+Smack system must match that of the remote system or packets will be
+discarded. The DOI is 3 by default. The value can be read from
+/sys/fs/smackfs/doi and can be changed by writing to /sys/fs/smackfs/doi.
+
+The label and category set are mapped to a Smack label as defined in
+/etc/smack/cipso.
+
+A Smack/CIPSO mapping has the form::
+
+ smack level [category [category]*]
+
+Smack does not expect the level or category sets to be related in any
+particular way and does not assume or assign accesses based on them. Some
+examples of mappings::
+
+ TopSecret 7
+ TS:A,B 7 1 2
+ SecBDE 5 2 4 6
+ RAFTERS 7 12 26
+
+The ":" and "," characters are permitted in a Smack label but have no special
+meaning.
+
+The mapping of Smack labels to CIPSO values is defined by writing to
+/sys/fs/smackfs/cipso2.
+
+In addition to explicit mappings Smack supports direct CIPSO mappings. One
+CIPSO level is used to indicate that the category set passed in the packet is
+in fact an encoding of the Smack label. The level used is 250 by default. The
+value can be read from /sys/fs/smackfs/direct and changed by writing to
+/sys/fs/smackfs/direct.
+
+Socket Attributes
+~~~~~~~~~~~~~~~~~
+
+There are two attributes that are associated with sockets. These attributes
+can only be set by privileged tasks, but any task can read them for their own
+sockets.
+
+ SMACK64IPIN:
+ The Smack label of the task object. A privileged
+ program that will enforce policy may set this to the star label.
+
+ SMACK64IPOUT:
+ The Smack label transmitted with outgoing packets.
+ A privileged program may set this to match the label of another
+ task with which it hopes to communicate.
+
+Smack Netlabel Exceptions
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You will often find that your labeled application has to talk to the outside,
+unlabeled world. To do this there's a special file /sys/fs/smackfs/netlabel
+where you can add some exceptions in the form of::
+
+ @IP1 LABEL1 or
+ @IP2/MASK LABEL2
+
+It means that your application will have unlabeled access to @IP1 if it has
+write access on LABEL1, and access to the subnet @IP2/MASK if it has write
+access on LABEL2.
+
+Entries in the /sys/fs/smackfs/netlabel file are matched by longest mask
+first, like in classless IPv4 routing.
+
+A special label '@' and an option '-CIPSO' can be used there::
+
+ @ means Internet, any application with any label has access to it
+ -CIPSO means standard CIPSO networking
+
+If you don't know what CIPSO is and don't plan to use it, you can just do::
+
+ echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
+ echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
+
+If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled
+Internet access, you can have::
+
+ echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
+ echo 192.168.0.0/16 -CIPSO > /sys/fs/smackfs/netlabel
+ echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
+
+Writing Applications for Smack
+------------------------------
+
+There are three sorts of applications that will run on a Smack system. How an
+application interacts with Smack will determine what it will have to do to
+work properly under Smack.
+
+Smack Ignorant Applications
+---------------------------
+
+By far the majority of applications have no reason whatever to care about the
+unique properties of Smack. Since invoking a program has no impact on the
+Smack label associated with the process the only concern likely to arise is
+whether the process has execute access to the program.
+
+Smack Relevant Applications
+---------------------------
+
+Some programs can be improved by teaching them about Smack, but do not make
+any security decisions themselves. The utility ls(1) is one example of such a
+program.
+
+Smack Enforcing Applications
+----------------------------
+
+These are special programs that not only know about Smack, but participate in
+the enforcement of system policy. In most cases these are the programs that
+set up user sessions. There are also network services that provide information
+to processes running with various labels.
+
+File System Interfaces
+----------------------
+
+Smack maintains labels on file system objects using extended attributes. The
+Smack label of a file, directory, or other file system object can be obtained
+using getxattr(2)::
+
+ len = getxattr("/", "security.SMACK64", value, sizeof (value));
+
+will put the Smack label of the root directory into value. A privileged
+process can set the Smack label of a file system object with setxattr(2)::
+
+ len = strlen("Rubble");
+ rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0);
+
+will set the Smack label of /foo to "Rubble" if the program has appropriate
+privilege.
+
+Socket Interfaces
+-----------------
+
+The socket attributes can be read using fgetxattr(2).
+
+A privileged process can set the Smack label of outgoing packets with
+fsetxattr(2)::
+
+ len = strlen("Rubble");
+ rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0);
+
+will set the Smack label "Rubble" on packets going out from the socket if the
+program has appropriate privilege::
+
+ rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0);
+
+will set the Smack label "*" as the object label against which incoming
+packets will be checked if the program has appropriate privilege.
+
+Administration
+--------------
+
+Smack supports some mount options:
+
+ smackfsdef=label:
+ specifies the label to give files that lack
+ the Smack label extended attribute.
+
+ smackfsroot=label:
+ specifies the label to assign the root of the
+ file system if it lacks the Smack extended attribute.
+
+ smackfshat=label:
+ specifies a label that must have read access to
+ all labels set on the filesystem. Not yet enforced.
+
+ smackfsfloor=label:
+ specifies a label to which all labels set on the
+ filesystem must have read access. Not yet enforced.
+
+These mount options apply to all file system types.
+
+Smack auditing
+--------------
+
+If you want Smack auditing of security events, you need to set CONFIG_AUDIT
+in your kernel configuration.
+By default, all denied events will be audited. You can change this behavior by
+writing a single character to the /sys/fs/smackfs/logging file::
+
+ 0 : no logging
+ 1 : log denied (default)
+ 2 : log accepted
+ 3 : log denied & accepted
+
+Events are logged as 'key=value' pairs, for each event you at least will get
+the subject, the object, the rights requested, the action, the kernel function
+that triggered the event, plus other pairs depending on the type of event
+audited.
+
+Bringup Mode
+------------
+
+Bringup mode provides logging features that can make application
+configuration and system bringup easier. Configure the kernel with
+CONFIG_SECURITY_SMACK_BRINGUP to enable these features. When bringup
+mode is enabled accesses that succeed due to rules marked with the "b"
+access mode will logged. When a new label is introduced for processes
+rules can be added aggressively, marked with the "b". The logging allows
+tracking of which rules actual get used for that label.
+
+Another feature of bringup mode is the "unconfined" option. Writing
+a label to /sys/fs/smackfs/unconfined makes subjects with that label
+able to access any object, and objects with that label accessible to
+all subjects. Any access that is granted because a label is unconfined
+is logged. This feature is dangerous, as files and directories may
+be created in places they couldn't if the policy were being enforced.
diff --git a/Documentation/admin-guide/LSM/Yama.rst b/Documentation/admin-guide/LSM/Yama.rst
new file mode 100644
index 000000000000..13468ea696b7
--- /dev/null
+++ b/Documentation/admin-guide/LSM/Yama.rst
@@ -0,0 +1,74 @@
+====
+Yama
+====
+
+Yama is a Linux Security Module that collects system-wide DAC security
+protections that are not handled by the core kernel itself. This is
+selectable at build-time with ``CONFIG_SECURITY_YAMA``, and can be controlled
+at run-time through sysctls in ``/proc/sys/kernel/yama``:
+
+ptrace_scope
+============
+
+As Linux grows in popularity, it will become a larger target for
+malware. One particularly troubling weakness of the Linux process
+interfaces is that a single user is able to examine the memory and
+running state of any of their processes. For example, if one application
+(e.g. Pidgin) was compromised, it would be possible for an attacker to
+attach to other running processes (e.g. Firefox, SSH sessions, GPG agent,
+etc) to extract additional credentials and continue to expand the scope
+of their attack without resorting to user-assisted phishing.
+
+This is not a theoretical problem. SSH session hijacking
+(http://www.storm.net.nz/projects/7) and arbitrary code injection
+(http://c-skills.blogspot.com/2007/05/injectso.html) attacks already
+exist and remain possible if ptrace is allowed to operate as before.
+Since ptrace is not commonly used by non-developers and non-admins, system
+builders should be allowed the option to disable this debugging system.
+
+For a solution, some applications use ``prctl(PR_SET_DUMPABLE, ...)`` to
+specifically disallow such ptrace attachment (e.g. ssh-agent), but many
+do not. A more general solution is to only allow ptrace directly from a
+parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still
+work), or with ``CAP_SYS_PTRACE`` (i.e. "gdb --pid=PID", and "strace -p PID"
+still work as root).
+
+In mode 1, software that has defined application-specific relationships
+between a debugging process and its inferior (crash handlers, etc),
+``prctl(PR_SET_PTRACER, pid, ...)`` can be used. An inferior can declare which
+other process (and its descendants) are allowed to call ``PTRACE_ATTACH``
+against it. Only one such declared debugging process can exists for
+each inferior at a time. For example, this is used by KDE, Chromium, and
+Firefox's crash handlers, and by Wine for allowing only Wine processes
+to ptrace each other. If a process wishes to entirely disable these ptrace
+restrictions, it can call ``prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)``
+so that any otherwise allowed process (even those in external pid namespaces)
+may attach.
+
+The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are:
+
+0 - classic ptrace permissions:
+ a process can ``PTRACE_ATTACH`` to any other
+ process running under the same uid, as long as it is dumpable (i.e.
+ did not transition uids, start privileged, or have called
+ ``prctl(PR_SET_DUMPABLE...)`` already). Similarly, ``PTRACE_TRACEME`` is
+ unchanged.
+
+1 - restricted ptrace:
+ a process must have a predefined relationship
+ with the inferior it wants to call ``PTRACE_ATTACH`` on. By default,
+ this relationship is that of only its descendants when the above
+ classic criteria is also met. To change the relationship, an
+ inferior can call ``prctl(PR_SET_PTRACER, debugger, ...)`` to declare
+ an allowed debugger PID to call ``PTRACE_ATTACH`` on the inferior.
+ Using ``PTRACE_TRACEME`` is unchanged.
+
+2 - admin-only attach:
+ only processes with ``CAP_SYS_PTRACE`` may use ptrace
+ with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``.
+
+3 - no attach:
+ no processes may use ptrace with ``PTRACE_ATTACH`` nor via
+ ``PTRACE_TRACEME``. Once set, this sysctl value cannot be changed.
+
+The original children-only logic was based on the restrictions in grsecurity.
diff --git a/Documentation/admin-guide/LSM/apparmor.rst b/Documentation/admin-guide/LSM/apparmor.rst
new file mode 100644
index 000000000000..3e9734bd0e05
--- /dev/null
+++ b/Documentation/admin-guide/LSM/apparmor.rst
@@ -0,0 +1,51 @@
+========
+AppArmor
+========
+
+What is AppArmor?
+=================
+
+AppArmor is MAC style security extension for the Linux kernel. It implements
+a task centered policy, with task "profiles" being created and loaded
+from user space. Tasks on the system that do not have a profile defined for
+them run in an unconfined state which is equivalent to standard Linux DAC
+permissions.
+
+How to enable/disable
+=====================
+
+set ``CONFIG_SECURITY_APPARMOR=y``
+
+If AppArmor should be selected as the default security module then set::
+
+ CONFIG_DEFAULT_SECURITY="apparmor"
+ CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1
+
+Build the kernel
+
+If AppArmor is not the default security module it can be enabled by passing
+``security=apparmor`` on the kernel's command line.
+
+If AppArmor is the default security module it can be disabled by passing
+``apparmor=0, security=XXXX`` (where ``XXXX`` is valid security module), on the
+kernel's command line.
+
+For AppArmor to enforce any restrictions beyond standard Linux DAC permissions
+policy must be loaded into the kernel from user space (see the Documentation
+and tools links).
+
+Documentation
+=============
+
+Documentation can be found on the wiki, linked below.
+
+Links
+=====
+
+Mailing List - apparmor@lists.ubuntu.com
+
+Wiki - http://apparmor.wiki.kernel.org/
+
+User space tools - https://launchpad.net/apparmor
+
+Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
new file mode 100644
index 000000000000..c980dfe9abf1
--- /dev/null
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -0,0 +1,41 @@
+===========================
+Linux Security Module Usage
+===========================
+
+The Linux Security Module (LSM) framework provides a mechanism for
+various security checks to be hooked by new kernel extensions. The name
+"module" is a bit of a misnomer since these extensions are not actually
+loadable kernel modules. Instead, they are selectable at build-time via
+CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the
+``"security=..."`` kernel command line argument, in the case where multiple
+LSMs were built into a given kernel.
+
+The primary users of the LSM interface are Mandatory Access Control
+(MAC) extensions which provide a comprehensive security policy. Examples
+include SELinux, Smack, Tomoyo, and AppArmor. In addition to the larger
+MAC extensions, other extensions can be built using the LSM to provide
+specific changes to system operation when these tweaks are not available
+in the core functionality of Linux itself.
+
+Without a specific LSM built into the kernel, the default LSM will be the
+Linux capabilities system. Most LSMs choose to extend the capabilities
+system, building their checks on top of the defined capability hooks.
+For more details on capabilities, see ``capabilities(7)`` in the Linux
+man-pages project.
+
+A list of the active security modules can be found by reading
+``/sys/kernel/security/lsm``. This is a comma separated list, and
+will always include the capability module. The list reflects the
+order in which checks are made. The capability module will always
+be first, followed by any "minor" modules (e.g. Yama) and then
+the one "major" module (e.g. SELinux) if there is one configured.
+
+.. toctree::
+ :maxdepth: 1
+
+ apparmor
+ LoadPin
+ SELinux
+ Smack
+ tomoyo
+ Yama
diff --git a/Documentation/admin-guide/LSM/tomoyo.rst b/Documentation/admin-guide/LSM/tomoyo.rst
new file mode 100644
index 000000000000..a5947218fa64
--- /dev/null
+++ b/Documentation/admin-guide/LSM/tomoyo.rst
@@ -0,0 +1,65 @@
+======
+TOMOYO
+======
+
+What is TOMOYO?
+===============
+
+TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel.
+
+LiveCD-based tutorials are available at
+
+http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/
+http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/
+
+Though these tutorials use non-LSM version of TOMOYO, they are useful for you
+to know what TOMOYO is.
+
+How to enable TOMOYO?
+=====================
+
+Build the kernel with ``CONFIG_SECURITY_TOMOYO=y`` and pass ``security=tomoyo`` on
+kernel's command line.
+
+Please see http://tomoyo.sourceforge.jp/2.3/ for details.
+
+Where is documentation?
+=======================
+
+User <-> Kernel interface documentation is available at
+http://tomoyo.sourceforge.jp/2.3/policy-reference.html .
+
+Materials we prepared for seminars and symposiums are available at
+http://sourceforge.jp/projects/tomoyo/docs/?category_id=532&language_id=1 .
+Below lists are chosen from three aspects.
+
+What is TOMOYO?
+ TOMOYO Linux Overview
+ http://sourceforge.jp/projects/tomoyo/docs/lca2009-takeda.pdf
+ TOMOYO Linux: pragmatic and manageable security for Linux
+ http://sourceforge.jp/projects/tomoyo/docs/freedomhectaipei-tomoyo.pdf
+ TOMOYO Linux: A Practical Method to Understand and Protect Your Own Linux Box
+ http://sourceforge.jp/projects/tomoyo/docs/PacSec2007-en-no-demo.pdf
+
+What can TOMOYO do?
+ Deep inside TOMOYO Linux
+ http://sourceforge.jp/projects/tomoyo/docs/lca2009-kumaneko.pdf
+ The role of "pathname based access control" in security.
+ http://sourceforge.jp/projects/tomoyo/docs/lfj2008-bof.pdf
+
+History of TOMOYO?
+ Realities of Mainlining
+ http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf
+
+What is future plan?
+====================
+
+We believe that inode based security and name based security are complementary
+and both should be used together. But unfortunately, so far, we cannot enable
+multiple LSM modules at the same time. We feel sorry that you have to give up
+SELinux/SMACK/AppArmor etc. when you want to use TOMOYO.
+
+We hope that LSM becomes stackable in future. Meanwhile, you can use non-LSM
+version of TOMOYO, available at http://tomoyo.sourceforge.jp/1.7/ .
+LSM version of TOMOYO is a subset of non-LSM version of TOMOYO. We are planning
+to port non-LSM version's functionalities to LSM versions.
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index b96e80f79e85..b5343c5aa224 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -55,12 +55,6 @@ Documentation
contains information about the problems, which may result by upgrading
your kernel.
- - The Documentation/DocBook/ subdirectory contains several guides for
- kernel developers and users. These guides can be rendered in a
- number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others.
- After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``,
- or ``make mandocs`` will render the documentation in the requested format.
-
Installing the kernel source
----------------------------
diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt
index c9cea2e39c21..6b71852dadc2 100644
--- a/Documentation/admin-guide/devices.txt
+++ b/Documentation/admin-guide/devices.txt
@@ -369,8 +369,10 @@
237 = /dev/loop-control Loopback control device
238 = /dev/vhost-net Host kernel accelerator for virtio net
239 = /dev/uhid User-space I/O driver support for HID subsystem
+ 240 = /dev/userio Serio driver testing device
+ 241 = /dev/vhost-vsock Host kernel driver for virtio vsock
- 240-254 Reserved for local use
+ 242-254 Reserved for local use
255 Reserved for MISC_DYNAMIC_MINOR
11 char Raw keyboard device (Linux/SPARC only)
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 8c60a8a32a1a..5bb9161dbe6a 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -61,6 +61,8 @@ configure specific aspects of kernel behavior to your liking.
java
ras
pm/index
+ thunderbolt
+ LSM/index
.. only:: subproject and html
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 15f79c27748d..f24ee1c99412 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -649,6 +649,13 @@
/proc/<pid>/coredump_filter.
See also Documentation/filesystems/proc.txt.
+ coresight_cpu_debug.enable
+ [ARM,ARM64]
+ Format: <bool>
+ Enable/disable the CPU sampling based debugging.
+ 0: default value, disable debugging
+ 1: enable debugging at boot time
+
cpuidle.off=1 [CPU_IDLE]
disable the cpuidle sub-system
@@ -720,7 +727,8 @@
See also Documentation/input/joystick-parport.txt
ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
- time. See Documentation/dynamic-debug-howto.txt for
+ time. See
+ Documentation/admin-guide/dynamic-debug-howto.rst for
details. Deprecated, see dyndbg.
debug [KNL] Enable kernel debugging (events log level).
@@ -866,6 +874,15 @@
dscc4.setup= [NET]
+ dt_cpu_ftrs= [PPC]
+ Format: {"off" | "known"}
+ Control how the dt_cpu_ftrs device-tree binding is
+ used for CPU feature discovery and setup (if it
+ exists).
+ off: Do not use it, fall back to legacy cpu table.
+ known: Do not pass through unknown features to guests
+ or userspace, only those that the kernel is aware of.
+
dump_apple_properties [X86]
Dump name and content of EFI device properties on
x86 Macs. Useful for driver authors to determine
@@ -874,7 +891,8 @@
dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
module.dyndbg[="val"]
Enable debug messages at boot time. See
- Documentation/dynamic-debug-howto.txt for details.
+ Documentation/admin-guide/dynamic-debug-howto.rst
+ for details.
nompx [X86] Disables Intel Memory Protection Extensions.
See Documentation/x86/intel_mpx.txt for more
@@ -945,6 +963,12 @@
must already be setup and configured. Options are not
yet supported.
+ owl,<addr>
+ Start an early, polled-mode console on a serial port
+ of an Actions Semi SoC, such as S500 or S900, at the
+ specified address. The serial port must already be
+ setup and configured. Options are not yet supported.
+
smh Use ARM semihosting calls for early console.
s3c2410,<addr>
@@ -1477,12 +1501,21 @@
in crypto/hash_info.h.
ima_policy= [IMA]
- The builtin measurement policy to load during IMA
- setup. Specyfing "tcb" as the value, measures all
- programs exec'd, files mmap'd for exec, and all files
- opened with the read mode bit set by either the
- effective uid (euid=0) or uid=0.
- Format: "tcb"
+ The builtin policies to load during IMA setup.
+ Format: "tcb | appraise_tcb | secure_boot"
+
+ The "tcb" policy measures all programs exec'd, files
+ mmap'd for exec, and all files opened with the read
+ mode bit set by either the effective uid (euid=0) or
+ uid=0.
+
+ The "appraise_tcb" policy appraises the integrity of
+ all files owned by root. (This is the equivalent
+ of ima_appraise_tcb.)
+
+ The "secure_boot" policy appraises the integrity
+ of files (eg. kexec kernel image, kernel modules,
+ firmware, policy, etc) based on file signatures.
ima_tcb [IMA] Deprecated. Use ima_policy= instead.
Load a policy which meets the needs of the Trusted
@@ -2127,6 +2160,12 @@
memmap=nn[KMG]@ss[KMG]
[KNL] Force usage of a specific region of memory.
Region of memory to be used is from ss to ss+nn.
+ If @ss[KMG] is omitted, it is equivalent to mem=nn[KMG],
+ which limits max address to nn[KMG].
+ Multiple different regions can be specified,
+ comma delimited.
+ Example:
+ memmap=100M@2G,100M#3G,1G!1024G
memmap=nn[KMG]#ss[KMG]
[KNL,ACPI] Mark specific memory as ACPI data.
@@ -2139,6 +2178,9 @@
memmap=64K$0x18690000
or
memmap=0x10000$0x18690000
+ Some bootloaders may need an escape character before '$',
+ like Grub2, otherwise '$' and the following number
+ will be eaten.
memmap=nn[KMG]!ss[KMG]
[KNL,X86] Mark specific memory as protected.
@@ -3229,21 +3271,17 @@
rcutree.gp_cleanup_delay= [KNL]
Set the number of jiffies to delay each step of
- RCU grace-period cleanup. This only has effect
- when CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is set.
+ RCU grace-period cleanup.
rcutree.gp_init_delay= [KNL]
Set the number of jiffies to delay each step of
- RCU grace-period initialization. This only has
- effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT
- is set.
+ RCU grace-period initialization.
rcutree.gp_preinit_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period pre-initialization, that is,
the propagation of recent CPU-hotplug changes up
- the rcu_node combining tree. This only has effect
- when CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is set.
+ the rcu_node combining tree.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
@@ -3319,6 +3357,17 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
+ rcuperf.gp_async= [KNL]
+ Measure performance of asynchronous
+ grace-period primitives such as call_rcu().
+
+ rcuperf.gp_async_max= [KNL]
+ Specify the maximum number of outstanding
+ callbacks per writer thread. When a writer
+ thread exceeds this limit, it invokes the
+ corresponding flavor of rcu_barrier() to allow
+ previously posted callbacks to drain.
+
rcuperf.gp_exp= [KNL]
Measure performance of expedited synchronous
grace-period primitives.
@@ -3346,17 +3395,22 @@
rcuperf.perf_runnable= [BOOT]
Start rcuperf running at boot time.
+ rcuperf.perf_type= [KNL]
+ Specify the RCU implementation to test.
+
rcuperf.shutdown= [KNL]
Shut the system down after performance tests
complete. This is useful for hands-off automated
testing.
- rcuperf.perf_type= [KNL]
- Specify the RCU implementation to test.
-
rcuperf.verbose= [KNL]
Enable additional printk() statements.
+ rcuperf.writer_holdoff= [KNL]
+ Write-side holdoff between grace periods,
+ in microseconds. The default of zero says
+ no holdoff.
+
rcutorture.cbflood_inter_holdoff= [KNL]
Set holdoff time (jiffies) between successive
callback-flood tests.
@@ -3794,6 +3848,15 @@
spia_pedr=
spia_peddr=
+ srcutree.counter_wrap_check [KNL]
+ Specifies how frequently to check for
+ grace-period sequence counter wrap for the
+ srcu_data structure's ->srcu_gp_seq_needed field.
+ The greater the number of bits set in this kernel
+ parameter, the less frequently counter wrap will
+ be checked for. Note that the bottom two bits
+ are ignored.
+
srcutree.exp_holdoff [KNL]
Specifies how many nanoseconds must elapse
since the end of the last SRCU grace period for
@@ -3802,6 +3865,13 @@
expediting. Set to zero to disable automatic
expediting.
+ stack_guard_gap= [MM]
+ override the default stack gap protection. The value
+ is in page units and it defines how many pages prior
+ to (for stacks growing down) resp. after (for stacks
+ growing up) the main stack are reserved for no other
+ mapping. Default value is 256 pages.
+
stacktrace [FTRACE]
Enabled the stack tracer on boot up.
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index 289c80f7760e..463cf7e73db8 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -1,4 +1,5 @@
.. |struct cpufreq_policy| replace:: :c:type:`struct cpufreq_policy <cpufreq_policy>`
+.. |intel_pstate| replace:: :doc:`intel_pstate <intel_pstate>`
=======================
CPU Performance Scaling
@@ -75,7 +76,7 @@ feedback registers, as that information is typically specific to the hardware
interface it comes from and may not be easily represented in an abstract,
platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers
to bypass the governor layer and implement their own performance scaling
-algorithms. That is done by the ``intel_pstate`` scaling driver.
+algorithms. That is done by the |intel_pstate| scaling driver.
``CPUFreq`` Policy Objects
@@ -174,13 +175,13 @@ necessary to restart the scaling governor so that it can take the new online CPU
into account. That is achieved by invoking the governor's ``->stop`` and
``->start()`` callbacks, in this order, for the entire policy.
-As mentioned before, the ``intel_pstate`` scaling driver bypasses the scaling
+As mentioned before, the |intel_pstate| scaling driver bypasses the scaling
governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
-Consequently, if ``intel_pstate`` is used, scaling governors are not attached to
+Consequently, if |intel_pstate| is used, scaling governors are not attached to
new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked
to register per-CPU utilization update callbacks for each policy. These
callbacks are invoked by the CPU scheduler in the same way as for scaling
-governors, but in the ``intel_pstate`` case they both determine the P-state to
+governors, but in the |intel_pstate| case they both determine the P-state to
use and change the hardware configuration accordingly in one go from scheduler
context.
@@ -257,7 +258,7 @@ are the following:
``scaling_available_governors``
List of ``CPUFreq`` scaling governors present in the kernel that can
- be attached to this policy or (if the ``intel_pstate`` scaling driver is
+ be attached to this policy or (if the |intel_pstate| scaling driver is
in use) list of scaling algorithms provided by the driver that can be
applied to this policy.
@@ -268,29 +269,29 @@ are the following:
``scaling_cur_freq``
Current frequency of all of the CPUs belonging to this policy (in kHz).
- For the majority of scaling drivers, this is the frequency of the last
- P-state requested by the driver from the hardware using the scaling
+ In the majority of cases, this is the frequency of the last P-state
+ requested by the scaling driver from the hardware using the scaling
interface provided by it, which may or may not reflect the frequency
the CPU is actually running at (due to hardware design and other
limitations).
- Some scaling drivers (e.g. ``intel_pstate``) attempt to provide
- information more precisely reflecting the current CPU frequency through
- this attribute, but that still may not be the exact current CPU
- frequency as seen by the hardware at the moment.
+ Some architectures (e.g. ``x86``) may attempt to provide information
+ more precisely reflecting the current CPU frequency through this
+ attribute, but that still may not be the exact current CPU frequency as
+ seen by the hardware at the moment.
``scaling_driver``
The scaling driver currently in use.
``scaling_governor``
The scaling governor currently attached to this policy or (if the
- ``intel_pstate`` scaling driver is in use) the scaling algorithm
+ |intel_pstate| scaling driver is in use) the scaling algorithm
provided by the driver that is currently applied to this policy.
This attribute is read-write and writing to it will cause a new scaling
governor to be attached to this policy or a new scaling algorithm
provided by the scaling driver to be applied to it (in the
- ``intel_pstate`` case), as indicated by the string written to this
+ |intel_pstate| case), as indicated by the string written to this
attribute (which must be one of the names listed by the
``scaling_available_governors`` attribute described above).
@@ -619,7 +620,7 @@ This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
the "boost" setting for the whole system. It is not present if the underlying
scaling driver does not support the frequency boost mechanism (or supports it,
but provides a driver-specific interface for controlling it, like
-``intel_pstate``).
+|intel_pstate|).
If the value in this file is 1, the frequency boost mechanism is enabled. This
means that either the hardware can be put into states in which it is able to
diff --git a/Documentation/admin-guide/pm/index.rst b/Documentation/admin-guide/pm/index.rst
index c80f087321fc..7f148f76f432 100644
--- a/Documentation/admin-guide/pm/index.rst
+++ b/Documentation/admin-guide/pm/index.rst
@@ -6,6 +6,7 @@ Power Management
:maxdepth: 2
cpufreq
+ intel_pstate
.. only:: subproject and html
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
new file mode 100644
index 000000000000..1d6249825efc
--- /dev/null
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -0,0 +1,753 @@
+===============================================
+``intel_pstate`` CPU Performance Scaling Driver
+===============================================
+
+::
+
+ Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+
+
+General Information
+===================
+
+``intel_pstate`` is a part of the
+:doc:`CPU performance scaling subsystem <cpufreq>` in the Linux kernel
+(``CPUFreq``). It is a scaling driver for the Sandy Bridge and later
+generations of Intel processors. Note, however, that some of those processors
+may not be supported. [To understand ``intel_pstate`` it is necessary to know
+how ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if
+you have not done that yet.]
+
+For the processors supported by ``intel_pstate``, the P-state concept is broader
+than just an operating frequency or an operating performance point (see the
+`LinuxCon Europe 2015 presentation by Kristen Accardi <LCEU2015_>`_ for more
+information about that). For this reason, the representation of P-states used
+by ``intel_pstate`` internally follows the hardware specification (for details
+refer to `Intel® 64 and IA-32 Architectures Software Developer’s Manual
+Volume 3: System Programming Guide <SDM_>`_). However, the ``CPUFreq`` core
+uses frequencies for identifying operating performance points of CPUs and
+frequencies are involved in the user space interface exposed by it, so
+``intel_pstate`` maps its internal representation of P-states to frequencies too
+(fortunately, that mapping is unambiguous). At the same time, it would not be
+practical for ``intel_pstate`` to supply the ``CPUFreq`` core with a table of
+available frequencies due to the possible size of it, so the driver does not do
+that. Some functionality of the core is limited by that.
+
+Since the hardware P-state selection interface used by ``intel_pstate`` is
+available at the logical CPU level, the driver always works with individual
+CPUs. Consequently, if ``intel_pstate`` is in use, every ``CPUFreq`` policy
+object corresponds to one logical CPU and ``CPUFreq`` policies are effectively
+equivalent to CPUs. In particular, this means that they become "inactive" every
+time the corresponding CPU is taken offline and need to be re-initialized when
+it goes back online.
+
+``intel_pstate`` is not modular, so it cannot be unloaded, which means that the
+only way to pass early-configuration-time parameters to it is via the kernel
+command line. However, its configuration can be adjusted via ``sysfs`` to a
+great extent. In some configurations it even is possible to unregister it via
+``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
+registered (see `below <status_attr_>`_).
+
+
+Operation Modes
+===============
+
+``intel_pstate`` can operate in three different modes: in the active mode with
+or without hardware-managed P-states support and in the passive mode. Which of
+them will be in effect depends on what kernel command line options are used and
+on the capabilities of the processor.
+
+Active Mode
+-----------
+
+This is the default operation mode of ``intel_pstate``. If it works in this
+mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
+policies contains the string "intel_pstate".
+
+In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
+provides its own scaling algorithms for P-state selection. Those algorithms
+can be applied to ``CPUFreq`` policies in the same way as generic scaling
+governors (that is, through the ``scaling_governor`` policy attribute in
+``sysfs``). [Note that different P-state selection algorithms may be chosen for
+different policies, but that is not recommended.]
+
+They are not generic scaling governors, but their names are the same as the
+names of some of those governors. Moreover, confusingly enough, they generally
+do not work in the same way as the generic governors they share the names with.
+For example, the ``powersave`` P-state selection algorithm provided by
+``intel_pstate`` is not a counterpart of the generic ``powersave`` governor
+(roughly, it corresponds to the ``schedutil`` and ``ondemand`` governors).
+
+There are two P-state selection algorithms provided by ``intel_pstate`` in the
+active mode: ``powersave`` and ``performance``. The way they both operate
+depends on whether or not the hardware-managed P-states (HWP) feature has been
+enabled in the processor and possibly on the processor model.
+
+Which of the P-state selection algorithms is used by default depends on the
+:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option.
+Namely, if that option is set, the ``performance`` algorithm will be used by
+default, and the other one will be used by default if it is not set.
+
+Active Mode With HWP
+~~~~~~~~~~~~~~~~~~~~
+
+If the processor supports the HWP feature, it will be enabled during the
+processor initialization and cannot be disabled after that. It is possible
+to avoid enabling it by passing the ``intel_pstate=no_hwp`` argument to the
+kernel in the command line.
+
+If the HWP feature has been enabled, ``intel_pstate`` relies on the processor to
+select P-states by itself, but still it can give hints to the processor's
+internal P-state selection logic. What those hints are depends on which P-state
+selection algorithm has been applied to the given policy (or to the CPU it
+corresponds to).
+
+Even though the P-state selection is carried out by the processor automatically,
+``intel_pstate`` registers utilization update callbacks with the CPU scheduler
+in this mode. However, they are not used for running a P-state selection
+algorithm, but for periodic updates of the current CPU frequency information to
+be made available from the ``scaling_cur_freq`` policy attribute in ``sysfs``.
+
+HWP + ``performance``
+.....................
+
+In this configuration ``intel_pstate`` will write 0 to the processor's
+Energy-Performance Preference (EPP) knob (if supported) or its
+Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
+internal P-state selection logic is expected to focus entirely on performance.
+
+This will override the EPP/EPB setting coming from the ``sysfs`` interface
+(see `Energy vs Performance Hints`_ below).
+
+Also, in this configuration the range of P-states available to the processor's
+internal P-state selection logic is always restricted to the upper boundary
+(that is, the maximum P-state that the driver is allowed to use).
+
+HWP + ``powersave``
+...................
+
+In this configuration ``intel_pstate`` will set the processor's
+Energy-Performance Preference (EPP) knob (if supported) or its
+Energy-Performance Bias (EPB) knob (otherwise) to whatever value it was
+previously set to via ``sysfs`` (or whatever default value it was
+set to by the platform firmware). This usually causes the processor's
+internal P-state selection logic to be less performance-focused.
+
+Active Mode Without HWP
+~~~~~~~~~~~~~~~~~~~~~~~
+
+This is the default operation mode for processors that do not support the HWP
+feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
+in the kernel command line. However, in this mode ``intel_pstate`` may refuse
+to work with the given processor if it does not recognize it. [Note that
+``intel_pstate`` will never refuse to work with any processor with the HWP
+feature enabled.]
+
+In this mode ``intel_pstate`` registers utilization update callbacks with the
+CPU scheduler in order to run a P-state selection algorithm, either
+``powersave`` or ``performance``, depending on the ``scaling_cur_freq`` policy
+setting in ``sysfs``. The current CPU frequency information to be made
+available from the ``scaling_cur_freq`` policy attribute in ``sysfs`` is
+periodically updated by those utilization update callbacks too.
+
+``performance``
+...............
+
+Without HWP, this P-state selection algorithm is always the same regardless of
+the processor model and platform configuration.
+
+It selects the maximum P-state it is allowed to use, subject to limits set via
+``sysfs``, every time the driver configuration for the given CPU is updated
+(e.g. via ``sysfs``).
+
+This is the default P-state selection algorithm if the
+:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
+is set.
+
+``powersave``
+.............
+
+Without HWP, this P-state selection algorithm generally depends on the
+processor model and/or the system profile setting in the ACPI tables and there
+are two variants of it.
+
+One of them is used with processors from the Atom line and (regardless of the
+processor model) on platforms with the system profile in the ACPI tables set to
+"mobile" (laptops mostly), "tablet", "appliance PC", "desktop", or
+"workstation". It is also used with processors supporting the HWP feature if
+that feature has not been enabled (that is, with the ``intel_pstate=no_hwp``
+argument in the kernel command line). It is similar to the algorithm
+implemented by the generic ``schedutil`` scaling governor except that the
+utilization metric used by it is based on numbers coming from feedback
+registers of the CPU. It generally selects P-states proportional to the
+current CPU utilization, so it is referred to as the "proportional" algorithm.
+
+The second variant of the ``powersave`` P-state selection algorithm, used in all
+of the other cases (generally, on processors from the Core line, so it is
+referred to as the "Core" algorithm), is based on the values read from the APERF
+and MPERF feedback registers and the previously requested target P-state.
+It does not really take CPU utilization into account explicitly, but as a rule
+it causes the CPU P-state to ramp up very quickly in response to increased
+utilization which is generally desirable in server environments.
+
+Regardless of the variant, this algorithm is run by the driver's utilization
+update callback for the given CPU when it is invoked by the CPU scheduler, but
+not more often than every 10 ms (that can be tweaked via ``debugfs`` in `this
+particular case <Tuning Interface in debugfs_>`_). Like in the ``performance``
+case, the hardware configuration is not touched if the new P-state turns out to
+be the same as the current one.
+
+This is the default P-state selection algorithm if the
+:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
+is not set.
+
+Passive Mode
+------------
+
+This mode is used if the ``intel_pstate=passive`` argument is passed to the
+kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too).
+Like in the active mode without HWP support, in this mode ``intel_pstate`` may
+refuse to work with the given processor if it does not recognize it.
+
+If the driver works in this mode, the ``scaling_driver`` policy attribute in
+``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
+Then, the driver behaves like a regular ``CPUFreq`` scaling driver. That is,
+it is invoked by generic scaling governors when necessary to talk to the
+hardware in order to change the P-state of a CPU (in particular, the
+``schedutil`` governor can invoke it directly from scheduler context).
+
+While in this mode, ``intel_pstate`` can be used with all of the (generic)
+scaling governors listed by the ``scaling_available_governors`` policy attribute
+in ``sysfs`` (and the P-state selection algorithms described above are not
+used). Then, it is responsible for the configuration of policy objects
+corresponding to CPUs and provides the ``CPUFreq`` core (and the scaling
+governors attached to the policy objects) with accurate information on the
+maximum and minimum operating frequencies supported by the hardware (including
+the so-called "turbo" frequency ranges). In other words, in the passive mode
+the entire range of available P-states is exposed by ``intel_pstate`` to the
+``CPUFreq`` core. However, in this mode the driver does not register
+utilization update callbacks with the CPU scheduler and the ``scaling_cur_freq``
+information comes from the ``CPUFreq`` core (and is the last frequency selected
+by the current scaling governor for the given policy).
+
+
+.. _turbo:
+
+Turbo P-states Support
+======================
+
+In the majority of cases, the entire range of P-states available to
+``intel_pstate`` can be divided into two sub-ranges that correspond to
+different types of processor behavior, above and below a boundary that
+will be referred to as the "turbo threshold" in what follows.
+
+The P-states above the turbo threshold are referred to as "turbo P-states" and
+the whole sub-range of P-states they belong to is referred to as the "turbo
+range". These names are related to the Turbo Boost technology allowing a
+multicore processor to opportunistically increase the P-state of one or more
+cores if there is enough power to do that and if that is not going to cause the
+thermal envelope of the processor package to be exceeded.
+
+Specifically, if software sets the P-state of a CPU core within the turbo range
+(that is, above the turbo threshold), the processor is permitted to take over
+performance scaling control for that core and put it into turbo P-states of its
+choice going forward. However, that permission is interpreted differently by
+different processor generations. Namely, the Sandy Bridge generation of
+processors will never use any P-states above the last one set by software for
+the given core, even if it is within the turbo range, whereas all of the later
+processor generations will take it as a license to use any P-states from the
+turbo range, even above the one set by software. In other words, on those
+processors setting any P-state from the turbo range will enable the processor
+to put the given core into all turbo P-states up to and including the maximum
+supported one as it sees fit.
+
+One important property of turbo P-states is that they are not sustainable. More
+precisely, there is no guarantee that any CPUs will be able to stay in any of
+those states indefinitely, because the power distribution within the processor
+package may change over time or the thermal envelope it was designed for might
+be exceeded if a turbo P-state was used for too long.
+
+In turn, the P-states below the turbo threshold generally are sustainable. In
+fact, if one of them is set by software, the processor is not expected to change
+it to a lower one unless in a thermal stress or a power limit violation
+situation (a higher P-state may still be used if it is set for another CPU in
+the same package at the same time, for example).
+
+Some processors allow multiple cores to be in turbo P-states at the same time,
+but the maximum P-state that can be set for them generally depends on the number
+of cores running concurrently. The maximum turbo P-state that can be set for 3
+cores at the same time usually is lower than the analogous maximum P-state for
+2 cores, which in turn usually is lower than the maximum turbo P-state that can
+be set for 1 core. The one-core maximum turbo P-state is thus the maximum
+supported one overall.
+
+The maximum supported turbo P-state, the turbo threshold (the maximum supported
+non-turbo P-state) and the minimum supported P-state are specific to the
+processor model and can be determined by reading the processor's model-specific
+registers (MSRs). Moreover, some processors support the Configurable TDP
+(Thermal Design Power) feature and, when that feature is enabled, the turbo
+threshold effectively becomes a configurable value that can be set by the
+platform firmware.
+
+Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
+the entire range of available P-states, including the whole turbo range, to the
+``CPUFreq`` core and (in the passive mode) to generic scaling governors. This
+generally causes turbo P-states to be set more often when ``intel_pstate`` is
+used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
+for more information).
+
+Moreover, since ``intel_pstate`` always knows what the real turbo threshold is
+(even if the Configurable TDP feature is enabled in the processor), its
+``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
+work as expected in all cases (that is, if set to disable turbo P-states, it
+always should prevent ``intel_pstate`` from using them).
+
+
+Processor Support
+=================
+
+To handle a given processor ``intel_pstate`` requires a number of different
+pieces of information on it to be known, including:
+
+ * The minimum supported P-state.
+
+ * The maximum supported `non-turbo P-state <turbo_>`_.
+
+ * Whether or not turbo P-states are supported at all.
+
+ * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
+ are supported).
+
+ * The scaling formula to translate the driver's internal representation
+ of P-states into frequencies and the other way around.
+
+Generally, ways to obtain that information are specific to the processor model
+or family. Although it often is possible to obtain all of it from the processor
+itself (using model-specific registers), there are cases in which hardware
+manuals need to be consulted to get to it too.
+
+For this reason, there is a list of supported processors in ``intel_pstate`` and
+the driver initialization will fail if the detected processor is not in that
+list, unless it supports the `HWP feature <Active Mode_>`_. [The interface to
+obtain all of the information listed above is the same for all of the processors
+supporting the HWP feature, which is why they all are supported by
+``intel_pstate``.]
+
+
+User Space Interface in ``sysfs``
+=================================
+
+Global Attributes
+-----------------
+
+``intel_pstate`` exposes several global attributes (files) in ``sysfs`` to
+control its functionality at the system level. They are located in the
+``/sys/devices/system/cpu/cpufreq/intel_pstate/`` directory and affect all
+CPUs.
+
+Some of them are not present if the ``intel_pstate=per_cpu_perf_limits``
+argument is passed to the kernel in the command line.
+
+``max_perf_pct``
+ Maximum P-state the driver is allowed to set in percent of the
+ maximum supported performance level (the highest supported `turbo
+ P-state <turbo_>`_).
+
+ This attribute will not be exposed if the
+ ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
+ command line.
+
+``min_perf_pct``
+ Minimum P-state the driver is allowed to set in percent of the
+ maximum supported performance level (the highest supported `turbo
+ P-state <turbo_>`_).
+
+ This attribute will not be exposed if the
+ ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
+ command line.
+
+``num_pstates``
+ Number of P-states supported by the processor (between 0 and 255
+ inclusive) including both turbo and non-turbo P-states (see
+ `Turbo P-states Support`_).
+
+ The value of this attribute is not affected by the ``no_turbo``
+ setting described `below <no_turbo_attr_>`_.
+
+ This attribute is read-only.
+
+``turbo_pct``
+ Ratio of the `turbo range <turbo_>`_ size to the size of the entire
+ range of supported P-states, in percent.
+
+ This attribute is read-only.
+
+.. _no_turbo_attr:
+
+``no_turbo``
+ If set (equal to 1), the driver is not allowed to set any turbo P-states
+ (see `Turbo P-states Support`_). If unset (equalt to 0, which is the
+ default), turbo P-states can be set by the driver.
+ [Note that ``intel_pstate`` does not support the general ``boost``
+ attribute (supported by some other scaling drivers) which is replaced
+ by this one.]
+
+ This attrubute does not affect the maximum supported frequency value
+ supplied to the ``CPUFreq`` core and exposed via the policy interface,
+ but it affects the maximum possible value of per-policy P-state limits
+ (see `Interpretation of Policy Attributes`_ below for details).
+
+.. _status_attr:
+
+``status``
+ Operation mode of the driver: "active", "passive" or "off".
+
+ "active"
+ The driver is functional and in the `active mode
+ <Active Mode_>`_.
+
+ "passive"
+ The driver is functional and in the `passive mode
+ <Passive Mode_>`_.
+
+ "off"
+ The driver is not functional (it is not registered as a scaling
+ driver with the ``CPUFreq`` core).
+
+ This attribute can be written to in order to change the driver's
+ operation mode or to unregister it. The string written to it must be
+ one of the possible values of it and, if successful, the write will
+ cause the driver to switch over to the operation mode represented by
+ that string - or to be unregistered in the "off" case. [Actually,
+ switching over from the active mode to the passive mode or the other
+ way around causes the driver to be unregistered and registered again
+ with a different set of callbacks, so all of its settings (the global
+ as well as the per-policy ones) are then reset to their default
+ values, possibly depending on the target operation mode.]
+
+ That only is supported in some configurations, though (for example, if
+ the `HWP feature is enabled in the processor <Active Mode With HWP_>`_,
+ the operation mode of the driver cannot be changed), and if it is not
+ supported in the current configuration, writes to this attribute with
+ fail with an appropriate error.
+
+Interpretation of Policy Attributes
+-----------------------------------
+
+The interpretation of some ``CPUFreq`` policy attributes described in
+:doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver
+and it generally depends on the driver's `operation mode <Operation Modes_>`_.
+
+First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
+``scaling_cur_freq`` attributes are produced by applying a processor-specific
+multiplier to the internal P-state representation used by ``intel_pstate``.
+Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
+attributes are capped by the frequency corresponding to the maximum P-state that
+the driver is allowed to set.
+
+If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
+not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
+and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
+Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
+``scaling_min_freq`` to go down to that value if they were above it before.
+However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
+restored after unsetting ``no_turbo``, unless these attributes have been written
+to after ``no_turbo`` was set.
+
+If ``no_turbo`` is not set, the maximum possible value of ``scaling_max_freq``
+and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
+which also is the value of ``cpuinfo_max_freq`` in either case.
+
+Next, the following policy attributes have special meaning if
+``intel_pstate`` works in the `active mode <Active Mode_>`_:
+
+``scaling_available_governors``
+ List of P-state selection algorithms provided by ``intel_pstate``.
+
+``scaling_governor``
+ P-state selection algorithm provided by ``intel_pstate`` currently in
+ use with the given policy.
+
+``scaling_cur_freq``
+ Frequency of the average P-state of the CPU represented by the given
+ policy for the time interval between the last two invocations of the
+ driver's utilization update callback by the CPU scheduler for that CPU.
+
+The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
+same as for other scaling drivers.
+
+Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
+depends on the operation mode of the driver. Namely, it is either
+"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
+`passive mode <Passive Mode_>`_).
+
+Coordination of P-State Limits
+------------------------------
+
+``intel_pstate`` allows P-state limits to be set in two ways: with the help of
+the ``max_perf_pct`` and ``min_perf_pct`` `global attributes
+<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
+``CPUFreq`` policy attributes. The coordination between those limits is based
+on the following rules, regardless of the current operation mode of the driver:
+
+ 1. All CPUs are affected by the global limits (that is, none of them can be
+ requested to run faster than the global maximum and none of them can be
+ requested to run slower than the global minimum).
+
+ 2. Each individual CPU is affected by its own per-policy limits (that is, it
+ cannot be requested to run faster than its own per-policy maximum and it
+ cannot be requested to run slower than its own per-policy minimum).
+
+ 3. The global and per-policy limits can be set independently.
+
+If the `HWP feature is enabled in the processor <Active Mode With HWP_>`_, the
+resulting effective values are written into its registers whenever the limits
+change in order to request its internal P-state selection logic to always set
+P-states within these limits. Otherwise, the limits are taken into account by
+scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
+every time before setting a new P-state for a CPU.
+
+Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
+is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
+at all and the only way to set the limits is by using the policy attributes.
+
+
+Energy vs Performance Hints
+---------------------------
+
+If ``intel_pstate`` works in the `active mode with the HWP feature enabled
+<Active Mode With HWP_>`_ in the processor, additional attributes are present
+in every ``CPUFreq`` policy directory in ``sysfs``. They are intended to allow
+user space to help ``intel_pstate`` to adjust the processor's internal P-state
+selection logic by focusing it on performance or on energy-efficiency, or
+somewhere between the two extremes:
+
+``energy_performance_preference``
+ Current value of the energy vs performance hint for the given policy
+ (or the CPU represented by it).
+
+ The hint can be changed by writing to this attribute.
+
+``energy_performance_available_preferences``
+ List of strings that can be written to the
+ ``energy_performance_preference`` attribute.
+
+ They represent different energy vs performance hints and should be
+ self-explanatory, except that ``default`` represents whatever hint
+ value was set by the platform firmware.
+
+Strings written to the ``energy_performance_preference`` attribute are
+internally translated to integer values written to the processor's
+Energy-Performance Preference (EPP) knob (if supported) or its
+Energy-Performance Bias (EPB) knob.
+
+[Note that tasks may by migrated from one CPU to another by the scheduler's
+load-balancing algorithm and if different energy vs performance hints are
+set for those CPUs, that may lead to undesirable outcomes. To avoid such
+issues it is better to set the same energy vs performance hint for all CPUs
+or to pin every task potentially sensitive to them to a specific CPU.]
+
+.. _acpi-cpufreq:
+
+``intel_pstate`` vs ``acpi-cpufreq``
+====================================
+
+On the majority of systems supported by ``intel_pstate``, the ACPI tables
+provided by the platform firmware contain ``_PSS`` objects returning information
+that can be used for CPU performance scaling (refer to the `ACPI specification`_
+for details on the ``_PSS`` objects and the format of the information returned
+by them).
+
+The information returned by the ACPI ``_PSS`` objects is used by the
+``acpi-cpufreq`` scaling driver. On systems supported by ``intel_pstate``
+the ``acpi-cpufreq`` driver uses the same hardware CPU performance scaling
+interface, but the set of P-states it can use is limited by the ``_PSS``
+output.
+
+On those systems each ``_PSS`` object returns a list of P-states supported by
+the corresponding CPU which basically is a subset of the P-states range that can
+be used by ``intel_pstate`` on the same system, with one exception: the whole
+`turbo range <turbo_>`_ is represented by one item in it (the topmost one). By
+convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
+than the frequency of the highest non-turbo P-state listed by it, but the
+corresponding P-state representation (following the hardware specification)
+returned for it matches the maximum supported turbo P-state (or is the
+special value 255 meaning essentially "go as high as you can get").
+
+The list of P-states returned by ``_PSS`` is reflected by the table of
+available frequencies supplied by ``acpi-cpufreq`` to the ``CPUFreq`` core and
+scaling governors and the minimum and maximum supported frequencies reported by
+it come from that list as well. In particular, given the special representation
+of the turbo range described above, this means that the maximum supported
+frequency reported by ``acpi-cpufreq`` is higher by 1 MHz than the frequency
+of the highest supported non-turbo P-state listed by ``_PSS`` which, of course,
+affects decisions made by the scaling governors, except for ``powersave`` and
+``performance``.
+
+For example, if a given governor attempts to select a frequency proportional to
+estimated CPU load and maps the load of 100% to the maximum supported frequency
+(possibly multiplied by a constant), then it will tend to choose P-states below
+the turbo threshold if ``acpi-cpufreq`` is used as the scaling driver, because
+in that case the turbo range corresponds to a small fraction of the frequency
+band it can use (1 MHz vs 1 GHz or more). In consequence, it will only go to
+the turbo range for the highest loads and the other loads above 50% that might
+benefit from running at turbo frequencies will be given non-turbo P-states
+instead.
+
+One more issue related to that may appear on systems supporting the
+`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
+turbo threshold. Namely, if that is not coordinated with the lists of P-states
+returned by ``_PSS`` properly, there may be more than one item corresponding to
+a turbo P-state in those lists and there may be a problem with avoiding the
+turbo range (if desirable or necessary). Usually, to avoid using turbo
+P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
+by ``_PSS``, but that is not sufficient when there are other turbo P-states in
+the list returned by it.
+
+Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
+`passive mode <Passive Mode_>`_, except that the number of P-states it can set
+is limited to the ones listed by the ACPI ``_PSS`` objects.
+
+
+Kernel Command Line Options for ``intel_pstate``
+================================================
+
+Several kernel command line options can be used to pass early-configuration-time
+parameters to ``intel_pstate`` in order to enforce specific behavior of it. All
+of them have to be prepended with the ``intel_pstate=`` prefix.
+
+``disable``
+ Do not register ``intel_pstate`` as the scaling driver even if the
+ processor is supported by it.
+
+``passive``
+ Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
+ start with.
+
+ This option implies the ``no_hwp`` one described below.
+
+``force``
+ Register ``intel_pstate`` as the scaling driver instead of
+ ``acpi-cpufreq`` even if the latter is preferred on the given system.
+
+ This may prevent some platform features (such as thermal controls and
+ power capping) that rely on the availability of ACPI P-states
+ information from functioning as expected, so it should be used with
+ caution.
+
+ This option does not work with processors that are not supported by
+ ``intel_pstate`` and on platforms where the ``pcc-cpufreq`` scaling
+ driver is used instead of ``acpi-cpufreq``.
+
+``no_hwp``
+ Do not enable the `hardware-managed P-states (HWP) feature
+ <Active Mode With HWP_>`_ even if it is supported by the processor.
+
+``hwp_only``
+ Register ``intel_pstate`` as the scaling driver only if the
+ `hardware-managed P-states (HWP) feature <Active Mode With HWP_>`_ is
+ supported by the processor.
+
+``support_acpi_ppc``
+ Take ACPI ``_PPC`` performance limits into account.
+
+ If the preferred power management profile in the FADT (Fixed ACPI
+ Description Table) is set to "Enterprise Server" or "Performance
+ Server", the ACPI ``_PPC`` limits are taken into account by default
+ and this option has no effect.
+
+``per_cpu_perf_limits``
+ Use per-logical-CPU P-State limits (see `Coordination of P-state
+ Limits`_ for details).
+
+
+Diagnostics and Tuning
+======================
+
+Trace Events
+------------
+
+There are two static trace events that can be used for ``intel_pstate``
+diagnostics. One of them is the ``cpu_frequency`` trace event generally used
+by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
+to ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if
+it works in the `active mode <Active Mode_>`_.
+
+The following sequence of shell commands can be used to enable them and see
+their output (if the kernel is generally configured to support event tracing)::
+
+ # cd /sys/kernel/debug/tracing/
+ # echo 1 > events/power/pstate_sample/enable
+ # echo 1 > events/power/cpu_frequency/enable
+ # cat trace
+ gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
+ cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2
+
+If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
+``cpu_frequency`` trace event will be triggered either by the ``schedutil``
+scaling governor (for the policies it is attached to), or by the ``CPUFreq``
+core (for the policies with other scaling governors).
+
+``ftrace``
+----------
+
+The ``ftrace`` interface can be used for low-level diagnostics of
+``intel_pstate``. For example, to check how often the function to set a
+P-state is called, the ``ftrace`` filter can be set to to
+:c:func:`intel_pstate_set_pstate`::
+
+ # cd /sys/kernel/debug/tracing/
+ # cat available_filter_functions | grep -i pstate
+ intel_pstate_set_pstate
+ intel_pstate_cpu_init
+ ...
+ # echo intel_pstate_set_pstate > set_ftrace_filter
+ # echo function > current_tracer
+ # cat trace | head -15
+ # tracer: function
+ #
+ # entries-in-buffer/entries-written: 80/80 #P:4
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| / delay
+ # TASK-PID CPU# |||| TIMESTAMP FUNCTION
+ # | | | |||| | |
+ Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func
+ gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func
+ gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
+ <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
+
+Tuning Interface in ``debugfs``
+-------------------------------
+
+The ``powersave`` algorithm provided by ``intel_pstate`` for `the Core line of
+processors in the active mode <powersave_>`_ is based on a `PID controller`_
+whose parameters were chosen to address a number of different use cases at the
+same time. However, it still is possible to fine-tune it to a specific workload
+and the ``debugfs`` interface under ``/sys/kernel/debug/pstate_snb/`` is
+provided for this purpose. [Note that the ``pstate_snb`` directory will be
+present only if the specific P-state selection algorithm matching the interface
+in it actually is in use.]
+
+The following files present in that directory can be used to modify the PID
+controller parameters at run time:
+
+| ``deadband``
+| ``d_gain_pct``
+| ``i_gain_pct``
+| ``p_gain_pct``
+| ``sample_rate_ms``
+| ``setpoint``
+
+Note, however, that achieving desirable results this way generally requires
+expert-level understanding of the power vs performance tradeoff, so extra care
+is recommended when attempting to do that.
+
+
+.. _LCEU2015: http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
+.. _SDM: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
+.. _ACPI specification: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
+.. _PID controller: https://en.wikipedia.org/wiki/PID_controller
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index 8c7bbf2c88d2..197896718f81 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -344,9 +344,9 @@ for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory
controllers. The following example will assume 2 channels:
+------------+-----------------------+
- | Chip | Channels |
- | Select +-----------+-----------+
- | rows | ``ch0`` | ``ch1`` |
+ | CS Rows | Channels |
+ +------------+-----------+-----------+
+ | | ``ch0`` | ``ch1`` |
+============+===========+===========+
| ``csrow0`` | DIMM_A0 | DIMM_B0 |
+------------+ | |
@@ -698,7 +698,7 @@ information indicating that errors have been detected::
The structure of the message is:
+---------------------------------------+-------------+
- | Content + Example |
+ | Content | Example |
+=======================================+=============+
| The memory controller | MC0 |
+---------------------------------------+-------------+
@@ -713,7 +713,7 @@ The structure of the message is:
+---------------------------------------+-------------+
| The error syndrome | 0xb741 |
+---------------------------------------+-------------+
- | Memory row | row 0 +
+ | Memory row | row 0 |
+---------------------------------------+-------------+
| Memory channel | channel 1 |
+---------------------------------------+-------------+
diff --git a/Documentation/admin-guide/thunderbolt.rst b/Documentation/admin-guide/thunderbolt.rst
new file mode 100644
index 000000000000..6a4cd1f159ca
--- /dev/null
+++ b/Documentation/admin-guide/thunderbolt.rst
@@ -0,0 +1,199 @@
+=============
+ Thunderbolt
+=============
+The interface presented here is not meant for end users. Instead there
+should be a userspace tool that handles all the low-level details, keeps
+database of the authorized devices and prompts user for new connections.
+
+More details about the sysfs interface for Thunderbolt devices can be
+found in ``Documentation/ABI/testing/sysfs-bus-thunderbolt``.
+
+Those users who just want to connect any device without any sort of
+manual work, can add following line to
+``/etc/udev/rules.d/99-local.rules``::
+
+ ACTION=="add", SUBSYSTEM=="thunderbolt", ATTR{authorized}=="0", ATTR{authorized}="1"
+
+This will authorize all devices automatically when they appear. However,
+keep in mind that this bypasses the security levels and makes the system
+vulnerable to DMA attacks.
+
+Security levels and how to use them
+-----------------------------------
+Starting from Intel Falcon Ridge Thunderbolt controller there are 4
+security levels available. The reason for these is the fact that the
+connected devices can be DMA masters and thus read contents of the host
+memory without CPU and OS knowing about it. There are ways to prevent
+this by setting up an IOMMU but it is not always available for various
+reasons.
+
+The security levels are as follows:
+
+ none
+ All devices are automatically connected by the firmware. No user
+ approval is needed. In BIOS settings this is typically called
+ *Legacy mode*.
+
+ user
+ User is asked whether the device is allowed to be connected.
+ Based on the device identification information available through
+ ``/sys/bus/thunderbolt/devices``. user then can do the decision.
+ In BIOS settings this is typically called *Unique ID*.
+
+ secure
+ User is asked whether the device is allowed to be connected. In
+ addition to UUID the device (if it supports secure connect) is sent
+ a challenge that should match the expected one based on a random key
+ written to ``key`` sysfs attribute. In BIOS settings this is
+ typically called *One time saved key*.
+
+ dponly
+ The firmware automatically creates tunnels for Display Port and
+ USB. No PCIe tunneling is done. In BIOS settings this is
+ typically called *Display Port Only*.
+
+The current security level can be read from
+``/sys/bus/thunderbolt/devices/domainX/security`` where ``domainX`` is
+the Thunderbolt domain the host controller manages. There is typically
+one domain per Thunderbolt host controller.
+
+If the security level reads as ``user`` or ``secure`` the connected
+device must be authorized by the user before PCIe tunnels are created
+(e.g the PCIe device appears).
+
+Each Thunderbolt device plugged in will appear in sysfs under
+``/sys/bus/thunderbolt/devices``. The device directory carries
+information that can be used to identify the particular device,
+including its name and UUID.
+
+Authorizing devices when security level is ``user`` or ``secure``
+-----------------------------------------------------------------
+When a device is plugged in it will appear in sysfs as follows::
+
+ /sys/bus/thunderbolt/devices/0-1/authorized - 0
+ /sys/bus/thunderbolt/devices/0-1/device - 0x8004
+ /sys/bus/thunderbolt/devices/0-1/device_name - Thunderbolt to FireWire Adapter
+ /sys/bus/thunderbolt/devices/0-1/vendor - 0x1
+ /sys/bus/thunderbolt/devices/0-1/vendor_name - Apple, Inc.
+ /sys/bus/thunderbolt/devices/0-1/unique_id - e0376f00-0300-0100-ffff-ffffffffffff
+
+The ``authorized`` attribute reads 0 which means no PCIe tunnels are
+created yet. The user can authorize the device by simply::
+
+ # echo 1 > /sys/bus/thunderbolt/devices/0-1/authorized
+
+This will create the PCIe tunnels and the device is now connected.
+
+If the device supports secure connect, and the domain security level is
+set to ``secure``, it has an additional attribute ``key`` which can hold
+a random 32 byte value used for authorization and challenging the device in
+future connects::
+
+ /sys/bus/thunderbolt/devices/0-3/authorized - 0
+ /sys/bus/thunderbolt/devices/0-3/device - 0x305
+ /sys/bus/thunderbolt/devices/0-3/device_name - AKiTiO Thunder3 PCIe Box
+ /sys/bus/thunderbolt/devices/0-3/key -
+ /sys/bus/thunderbolt/devices/0-3/vendor - 0x41
+ /sys/bus/thunderbolt/devices/0-3/vendor_name - inXtron
+ /sys/bus/thunderbolt/devices/0-3/unique_id - dc010000-0000-8508-a22d-32ca6421cb16
+
+Notice the key is empty by default.
+
+If the user does not want to use secure connect it can just ``echo 1``
+to the ``authorized`` attribute and the PCIe tunnels will be created in
+the same way than in ``user`` security level.
+
+If the user wants to use secure connect, the first time the device is
+plugged a key needs to be created and send to the device::
+
+ # key=$(openssl rand -hex 32)
+ # echo $key > /sys/bus/thunderbolt/devices/0-3/key
+ # echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized
+
+Now the device is connected (PCIe tunnels are created) and in addition
+the key is stored on the device NVM.
+
+Next time the device is plugged in the user can verify (challenge) the
+device using the same key::
+
+ # echo $key > /sys/bus/thunderbolt/devices/0-3/key
+ # echo 2 > /sys/bus/thunderbolt/devices/0-3/authorized
+
+If the challenge the device returns back matches the one we expect based
+on the key, the device is connected and the PCIe tunnels are created.
+However, if the challenge failed no tunnels are created and error is
+returned to the user.
+
+If the user still wants to connect the device it can either approve
+the device without a key or write new key and write 1 to the
+``authorized`` file to get the new key stored on the device NVM.
+
+Upgrading NVM on Thunderbolt device or host
+-------------------------------------------
+Since most of the functionality is handled in a firmware running on a
+host controller or a device, it is important that the firmware can be
+upgraded to the latest where possible bugs in it have been fixed.
+Typically OEMs provide this firmware from their support site.
+
+There is also a central site which has links where to download firmwares
+for some machines:
+
+ `Thunderbolt Updates <https://thunderbolttechnology.net/updates>`_
+
+Before you upgrade firmware on a device or host, please make sure it is
+the suitable. Failing to do that may render the device (or host) in a
+state where it cannot be used properly anymore without special tools!
+
+Host NVM upgrade on Apple Macs is not supported.
+
+Once the NVM image has been downloaded, you need to plug in a
+Thunderbolt device so that the host controller appears. It does not
+matter which device is connected (unless you are upgrading NVM on a
+device - then you need to connect that particular device).
+
+Note OEM-specific method to power the controller up ("force power") may
+be available for your system in which case there is no need to plug in a
+Thunderbolt device.
+
+After that we can write the firmware to the non-active parts of the NVM
+of the host or device. As an example here is how Intel NUC6i7KYK (Skull
+Canyon) Thunderbolt controller NVM is upgraded::
+
+ # dd if=KYK_TBT_FW_0018.bin of=/sys/bus/thunderbolt/devices/0-0/nvm_non_active0/nvmem
+
+Once the operation completes we can trigger NVM authentication and
+upgrade process as follows::
+
+ # echo 1 > /sys/bus/thunderbolt/devices/0-0/nvm_authenticate
+
+If no errors are returned, the host controller shortly disappears. Once
+it comes back the driver notices it and initiates a full power cycle.
+After a while the host controller appears again and this time it should
+be fully functional.
+
+We can verify that the new NVM firmware is active by running following
+commands::
+
+ # cat /sys/bus/thunderbolt/devices/0-0/nvm_authenticate
+ 0x0
+ # cat /sys/bus/thunderbolt/devices/0-0/nvm_version
+ 18.0
+
+If ``nvm_authenticate`` contains anything else than 0x0 it is the error
+code from the last authentication cycle, which means the authentication
+of the NVM image failed.
+
+Note names of the NVMem devices ``nvm_activeN`` and ``nvm_non_activeN``
+depends on the order they are registered in the NVMem subsystem. N in
+the name is the identifier added by the NVMem subsystem.
+
+Upgrading NVM when host controller is in safe mode
+--------------------------------------------------
+If the existing NVM is not properly authenticated (or is missing) the
+host controller goes into safe mode which means that only available
+functionality is flashing new NVM image. When in this mode the reading
+``nvm_version`` fails with ``ENODATA`` and the device identification
+information is missing.
+
+To recover from this mode, one needs to flash a valid NVM image to the
+host host controller in the same way it is done in the previous chapter.