aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-firmware-efi10
-rw-r--r--Documentation/ABI/testing/sysfs-firmware-efi-esrt81
-rw-r--r--Documentation/RCU/arrayRCU.txt20
-rw-r--r--Documentation/RCU/lockdep.txt10
-rw-r--r--Documentation/RCU/rcu_dereference.txt38
-rw-r--r--Documentation/RCU/whatisRCU.txt6
-rw-r--r--Documentation/cputopology.txt37
-rw-r--r--Documentation/devicetree/bindings/clock/at91-clock.txt2
-rw-r--r--Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt4
-rw-r--r--Documentation/devicetree/bindings/usb/renesas_usbhs.txt6
-rw-r--r--Documentation/filesystems/Locking4
-rw-r--r--Documentation/filesystems/automount-support.txt51
-rw-r--r--Documentation/filesystems/porting17
-rw-r--r--Documentation/filesystems/vfs.txt22
-rw-r--r--Documentation/i2c/slave-interface6
-rw-r--r--Documentation/kernel-parameters.txt43
-rw-r--r--Documentation/memory-barriers.txt62
-rw-r--r--Documentation/networking/udplite.txt2
-rw-r--r--Documentation/preempt-locking.txt2
-rw-r--r--Documentation/scheduler/sched-deadline.txt184
-rw-r--r--Documentation/x86/boot.txt1
-rw-r--r--Documentation/x86/kernel-stacks (renamed from Documentation/x86/x86_64/kernel-stacks)54
22 files changed, 485 insertions, 177 deletions
diff --git a/Documentation/ABI/testing/sysfs-firmware-efi b/Documentation/ABI/testing/sysfs-firmware-efi
index 05874da7ce80..e794eac32a90 100644
--- a/Documentation/ABI/testing/sysfs-firmware-efi
+++ b/Documentation/ABI/testing/sysfs-firmware-efi
@@ -18,3 +18,13 @@ Contact: Dave Young <dyoung@redhat.com>
Description: It shows the physical address of config table entry in the EFI
system table.
Users: Kexec
+
+What: /sys/firmware/efi/systab
+Date: April 2005
+Contact: linux-efi@vger.kernel.org
+Description: Displays the physical addresses of all EFI Configuration
+ Tables found via the EFI System Table. The order in
+ which the tables are printed forms an ABI and newer
+ versions are always printed first, i.e. ACPI20 comes
+ before ACPI.
+Users: dmidecode
diff --git a/Documentation/ABI/testing/sysfs-firmware-efi-esrt b/Documentation/ABI/testing/sysfs-firmware-efi-esrt
new file mode 100644
index 000000000000..6e431d1a4e79
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-efi-esrt
@@ -0,0 +1,81 @@
+What: /sys/firmware/efi/esrt/
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: Provides userland access to read the EFI System Resource Table
+ (ESRT), a catalog of firmware for which can be updated with
+ the UEFI UpdateCapsule mechanism described in section 7.5 of
+ the UEFI Standard.
+Users: fwupdate - https://github.com/rhinstaller/fwupdate
+
+What: /sys/firmware/efi/esrt/fw_resource_count
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: The number of entries in the ESRT
+
+What: /sys/firmware/efi/esrt/fw_resource_count_max
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: The maximum number of entries that /could/ be registered
+ in the allocation the table is currently in. This is
+ really only useful to the system firmware itself.
+
+What: /sys/firmware/efi/esrt/fw_resource_version
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: The version of the ESRT structure provided by the firmware.
+
+What: /sys/firmware/efi/esrt/entries/entry$N/
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: Each ESRT entry is identified by a GUID, and each gets a
+ subdirectory under entries/ .
+ example: /sys/firmware/efi/esrt/entries/entry0/
+
+What: /sys/firmware/efi/esrt/entries/entry$N/fw_type
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: What kind of firmware entry this is:
+ 0 - Unknown
+ 1 - System Firmware
+ 2 - Device Firmware
+ 3 - UEFI Driver
+
+What: /sys/firmware/efi/esrt/entries/entry$N/fw_class
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: This is the entry's guid, and will match the directory name.
+
+What: /sys/firmware/efi/esrt/entries/entry$N/fw_version
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: The version of the firmware currently installed. This is a
+ 32-bit unsigned integer.
+
+What: /sys/firmware/efi/esrt/entries/entry$N/lowest_supported_fw_version
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: The lowest version of the firmware that can be installed.
+
+What: /sys/firmware/efi/esrt/entries/entry$N/capsule_flags
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: Flags that must be passed to UpdateCapsule()
+
+What: /sys/firmware/efi/esrt/entries/entry$N/last_attempt_version
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: The last firmware version for which an update was attempted.
+
+What: /sys/firmware/efi/esrt/entries/entry$N/last_attempt_status
+Date: February 2015
+Contact: Peter Jones <pjones@redhat.com>
+Description: The result of the last firmware update attempt for the
+ firmware resource entry.
+ 0 - Success
+ 1 - Insufficient resources
+ 2 - Incorrect version
+ 3 - Invalid format
+ 4 - Authentication error
+ 5 - AC power event
+ 6 - Battery power event
+
diff --git a/Documentation/RCU/arrayRCU.txt b/Documentation/RCU/arrayRCU.txt
index 453ebe6953ee..f05a9afb2c39 100644
--- a/Documentation/RCU/arrayRCU.txt
+++ b/Documentation/RCU/arrayRCU.txt
@@ -10,7 +10,19 @@ also be used to protect arrays. Three situations are as follows:
3. Resizeable Arrays
-Each of these situations are discussed below.
+Each of these three situations involves an RCU-protected pointer to an
+array that is separately indexed. It might be tempting to consider use
+of RCU to instead protect the index into an array, however, this use
+case is -not- supported. The problem with RCU-protected indexes into
+arrays is that compilers can play way too many optimization games with
+integers, which means that the rules governing handling of these indexes
+are far more trouble than they are worth. If RCU-protected indexes into
+arrays prove to be particularly valuable (which they have not thus far),
+explicit cooperation from the compiler will be required to permit them
+to be safely used.
+
+That aside, each of the three RCU-protected pointer situations are
+described in the following sections.
Situation 1: Hash Tables
@@ -36,9 +48,9 @@ Quick Quiz: Why is it so important that updates be rare when
Situation 3: Resizeable Arrays
Use of RCU for resizeable arrays is demonstrated by the grow_ary()
-function used by the System V IPC code. The array is used to map from
-semaphore, message-queue, and shared-memory IDs to the data structure
-that represents the corresponding IPC construct. The grow_ary()
+function formerly used by the System V IPC code. The array is used
+to map from semaphore, message-queue, and shared-memory IDs to the data
+structure that represents the corresponding IPC construct. The grow_ary()
function does not acquire any locks; instead its caller must hold the
ids->sem semaphore.
diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt
index cd83d2348fef..da51d3068850 100644
--- a/Documentation/RCU/lockdep.txt
+++ b/Documentation/RCU/lockdep.txt
@@ -47,11 +47,6 @@ checking of rcu_dereference() primitives:
Use explicit check expression "c" along with
srcu_read_lock_held()(). This is useful in code that
is invoked by both SRCU readers and updaters.
- rcu_dereference_index_check(p, c):
- Use explicit check expression "c", but the caller
- must supply one of the rcu_read_lock_held() functions.
- This is useful in code that uses RCU-protected arrays
- that is invoked by both RCU readers and updaters.
rcu_dereference_raw(p):
Don't check. (Use sparingly, if at all.)
rcu_dereference_protected(p, c):
@@ -64,11 +59,6 @@ checking of rcu_dereference() primitives:
but retain the compiler constraints that prevent duplicating
or coalescsing. This is useful when when testing the
value of the pointer itself, for example, against NULL.
- rcu_access_index(idx):
- Return the value of the index and omit all barriers, but
- retain the compiler constraints that prevent duplicating
- or coalescsing. This is useful when when testing the
- value of the index itself, for example, against -1.
The rcu_dereference_check() check expression can be any boolean
expression, but would normally include a lockdep expression. However,
diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt
index ceb05da5a5ac..1e6c0da994f5 100644
--- a/Documentation/RCU/rcu_dereference.txt
+++ b/Documentation/RCU/rcu_dereference.txt
@@ -25,17 +25,6 @@ o You must use one of the rcu_dereference() family of primitives
for an example where the compiler can in fact deduce the exact
value of the pointer, and thus cause misordering.
-o Do not use single-element RCU-protected arrays. The compiler
- is within its right to assume that the value of an index into
- such an array must necessarily evaluate to zero. The compiler
- could then substitute the constant zero for the computation, so
- that the array index no longer depended on the value returned
- by rcu_dereference(). If the array index no longer depends
- on rcu_dereference(), then both the compiler and the CPU
- are within their rights to order the array access before the
- rcu_dereference(), which can cause the array access to return
- garbage.
-
o Avoid cancellation when using the "+" and "-" infix arithmetic
operators. For example, for a given variable "x", avoid
"(x-x)". There are similar arithmetic pitfalls from other
@@ -76,14 +65,15 @@ o Do not use the results from the boolean "&&" and "||" when
dereferencing. For example, the following (rather improbable)
code is buggy:
- int a[2];
- int index;
- int force_zero_index = 1;
+ int *p;
+ int *q;
...
- r1 = rcu_dereference(i1)
- r2 = a[r1 && force_zero_index]; /* BUGGY!!! */
+ p = rcu_dereference(gp)
+ q = &global_q;
+ q += p != &oom_p1 && p != &oom_p2;
+ r1 = *q; /* BUGGY!!! */
The reason this is buggy is that "&&" and "||" are often compiled
using branches. While weak-memory machines such as ARM or PowerPC
@@ -94,14 +84,15 @@ o Do not use the results from relational operators ("==", "!=",
">", ">=", "<", or "<=") when dereferencing. For example,
the following (quite strange) code is buggy:
- int a[2];
- int index;
- int flip_index = 0;
+ int *p;
+ int *q;
...
- r1 = rcu_dereference(i1)
- r2 = a[r1 != flip_index]; /* BUGGY!!! */
+ p = rcu_dereference(gp)
+ q = &global_q;
+ q += p > &oom_p;
+ r1 = *q; /* BUGGY!!! */
As before, the reason this is buggy is that relational operators
are often compiled using branches. And as before, although
@@ -193,6 +184,11 @@ o Be very careful about comparing pointers obtained from
pointer. Note that the volatile cast in rcu_dereference()
will normally prevent the compiler from knowing too much.
+ However, please note that if the compiler knows that the
+ pointer takes on only one of two values, a not-equal
+ comparison will provide exactly the information that the
+ compiler needs to deduce the value of the pointer.
+
o Disable any value-speculation optimizations that your compiler
might provide, especially if you are making use of feedback-based
optimizations that take data collected from prior runs. Such
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 88dfce182f66..5746b0c77f3e 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -256,7 +256,9 @@ rcu_dereference()
If you are going to be fetching multiple fields from the
RCU-protected structure, using the local variable is of
course preferred. Repeated rcu_dereference() calls look
- ugly and incur unnecessary overhead on Alpha CPUs.
+ ugly, do not guarantee that the same pointer will be returned
+ if an update happened while in the critical section, and incur
+ unnecessary overhead on Alpha CPUs.
Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section.
@@ -879,9 +881,7 @@ SRCU: Initialization/cleanup
All: lockdep-checked RCU-protected pointer access
- rcu_access_index
rcu_access_pointer
- rcu_dereference_index_check
rcu_dereference_raw
rcu_lockdep_assert
rcu_sleep_check
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 0aad6deb2d96..12b1b25b4da9 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -1,6 +1,6 @@
Export CPU topology info via sysfs. Items (attributes) are similar
-to /proc/cpuinfo.
+to /proc/cpuinfo output of some architectures:
1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
@@ -23,20 +23,35 @@ to /proc/cpuinfo.
4) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
internal kernel map of cpuX's hardware threads within the same
- core as cpuX
+ core as cpuX.
-5) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+5) /sys/devices/system/cpu/cpuX/topology/thread_siblings_list:
+
+ human-readable list of cpuX's hardware threads within the same
+ core as cpuX.
+
+6) /sys/devices/system/cpu/cpuX/topology/core_siblings:
internal kernel map of cpuX's hardware threads within the same
physical_package_id.
-6) /sys/devices/system/cpu/cpuX/topology/book_siblings:
+7) /sys/devices/system/cpu/cpuX/topology/core_siblings_list:
+
+ human-readable list of cpuX's hardware threads within the same
+ physical_package_id.
+
+8) /sys/devices/system/cpu/cpuX/topology/book_siblings:
internal kernel map of cpuX's hardware threads within the same
book_id.
+9) /sys/devices/system/cpu/cpuX/topology/book_siblings_list:
+
+ human-readable list of cpuX's hardware threads within the same
+ book_id.
+
To implement it in an architecture-neutral way, a new source file,
-drivers/base/topology.c, is to export the 4 or 6 attributes. The two book
+drivers/base/topology.c, is to export the 6 or 9 attributes. The three book
related sysfs files will only be created if CONFIG_SCHED_BOOK is selected.
For an architecture to support this feature, it must define some of
@@ -44,20 +59,22 @@ these macros in include/asm-XXX/topology.h:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
#define topology_book_id(cpu)
-#define topology_thread_cpumask(cpu)
+#define topology_sibling_cpumask(cpu)
#define topology_core_cpumask(cpu)
#define topology_book_cpumask(cpu)
-The type of **_id is int.
-The type of siblings is (const) struct cpumask *.
+The type of **_id macros is int.
+The type of **_cpumask macros is (const) struct cpumask *. The latter
+correspond with appropriate **_siblings sysfs attributes (except for
+topology_sibling_cpumask() which corresponds with thread_siblings).
To be consistent on all architectures, include/linux/topology.h
provides default definitions for any of the above macros that are
not defined by include/asm-XXX/topology.h:
1) physical_package_id: -1
2) core_id: 0
-3) thread_siblings: just the given CPU
-4) core_siblings: just the given CPU
+3) sibling_cpumask: just the given CPU
+4) core_cpumask: just the given CPU
For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
default definitions for topology_book_id() and topology_book_cpumask().
diff --git a/Documentation/devicetree/bindings/clock/at91-clock.txt b/Documentation/devicetree/bindings/clock/at91-clock.txt
index 7a4d4926f44e..5ba6450693b9 100644
--- a/Documentation/devicetree/bindings/clock/at91-clock.txt
+++ b/Documentation/devicetree/bindings/clock/at91-clock.txt
@@ -248,7 +248,7 @@ Required properties for peripheral clocks:
- #address-cells : shall be 1 (reg is used to encode clk id).
- clocks : shall be the master clock phandle.
e.g. clocks = <&mck>;
-- name: device tree node describing a specific system clock.
+- name: device tree node describing a specific peripheral clock.
* #clock-cells : from common clock binding; shall be set to 0.
* reg: peripheral id. See Atmel's datasheets to get a full
list of peripheral ids.
diff --git a/Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt b/Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt
index 4b641c7bf1c2..09089a6d69ed 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt
@@ -32,8 +32,8 @@ Example:
touchscreen-fuzz-x = <4>;
touchscreen-fuzz-y = <7>;
touchscreen-fuzz-pressure = <2>;
- touchscreen-max-x = <4096>;
- touchscreen-max-y = <4096>;
+ touchscreen-size-x = <4096>;
+ touchscreen-size-y = <4096>;
touchscreen-max-pressure = <2048>;
ti,x-plate-ohms = <280>;
diff --git a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt
index dc2a18f0b3a1..ddbe304beb21 100644
--- a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt
+++ b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt
@@ -15,10 +15,8 @@ Optional properties:
- phys: phandle + phy specifier pair
- phy-names: must be "usb"
- dmas: Must contain a list of references to DMA specifiers.
- - dma-names : Must contain a list of DMA names:
- - tx0 ... tx<n>
- - rx0 ... rx<n>
- - This <n> means DnFIFO in USBHS module.
+ - dma-names : named "ch%d", where %d is the channel number ranging from zero
+ to the number of channels (DnFIFOs) minus one.
Example:
usbhs: usb@e6590000 {
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0a926e2ba3ab..6a34a0f4d37c 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,8 +50,8 @@ prototypes:
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ const char *(*follow_link) (struct dentry *, void **);
+ void (*put_link) (struct inode *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
index 7cac200e2a85..7eb762eb3136 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.txt
@@ -1,41 +1,15 @@
-Support is available for filesystems that wish to do automounting support (such
-as kAFS which can be found in fs/afs/). This facility includes allowing
-in-kernel mounts to be performed and mountpoint degradation to be
-requested. The latter can also be requested by userspace.
+Support is available for filesystems that wish to do automounting
+support (such as kAFS which can be found in fs/afs/ and NFS in
+fs/nfs/). This facility includes allowing in-kernel mounts to be
+performed and mountpoint degradation to be requested. The latter can
+also be requested by userspace.
======================
IN-KERNEL AUTOMOUNTING
======================
-A filesystem can now mount another filesystem on one of its directories by the
-following procedure:
-
- (1) Give the directory a follow_link() operation.
-
- When the directory is accessed, the follow_link op will be called, and
- it will be provided with the location of the mountpoint in the nameidata
- structure (vfsmount and dentry).
-
- (2) Have the follow_link() op do the following steps:
-
- (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
- superblock and gain a vfsmount structure representing it.
-
- (b) Copy the nameidata provided as an argument and substitute the dentry
- argument into it the copy.
-
- (c) Call do_add_mount() to install the new vfsmount into the namespace's
- mountpoint tree, thus making it accessible to userspace. Use the
- nameidata set up in (b) as the destination.
-
- If the mountpoint will be automatically expired, then do_add_mount()
- should also be given the location of an expiration list (see further
- down).
-
- (d) Release the path in the nameidata argument and substitute in the new
- vfsmount and its root dentry. The ref counts on these will need
- incrementing.
+See section "Mount Traps" of Documentation/filesystems/autofs4.txt
Then from userspace, you can just do something like:
@@ -61,17 +35,18 @@ AUTOMATIC MOUNTPOINT EXPIRY
===========================
Automatic expiration of mountpoints is easy, provided you've mounted the
-mountpoint to be expired in the automounting procedure outlined above.
+mountpoint to be expired in the automounting procedure outlined separately.
To do expiration, you need to follow these steps:
- (3) Create at least one list off which the vfsmounts to be expired can be
- hung. Access to this list will be governed by the vfsmount_lock.
+ (1) Create at least one list off which the vfsmounts to be expired can be
+ hung.
- (4) In step (2c) above, the call to do_add_mount() should be provided with a
- pointer to this list. It will hang the vfsmount off of it if it succeeds.
+ (2) When a new mountpoint is created in the ->d_automount method, add
+ the mnt to the list using mnt_set_expiry()
+ mnt_set_expiry(newmnt, &afs_vfsmounts);
- (5) When you want mountpoints to be expired, call mark_mounts_for_expiry()
+ (3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
with a pointer to this list. This will process the list, marking every
vfsmount thereon for potential expiry on the next call.
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index e69274de8d0c..3eae250254d5 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -483,3 +483,20 @@ in your dentry operations instead.
--
[mandatory]
->aio_read/->aio_write are gone. Use ->read_iter/->write_iter.
+---
+[recommended]
+ for embedded ("fast") symlinks just set inode->i_link to wherever the
+ symlink body is and use simple_follow_link() as ->follow_link().
+--
+[mandatory]
+ calling conventions for ->follow_link() have changed. Instead of returning
+ cookie and using nd_set_link() to store the body to traverse, we return
+ the body to traverse and store the cookie using explicit void ** argument.
+ nameidata isn't passed at all - nd_jump_link() doesn't need it and
+ nd_[gs]et_link() is gone.
+--
+[mandatory]
+ calling conventions for ->put_link() have changed. It gets inode instead of
+ dentry, it does not get nameidata at all and it gets called only when cookie
+ is non-NULL. Note that link body isn't available anymore, so if you need it,
+ store it as cookie.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5d833b32bbcd..b403b29ef710 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,8 +350,8 @@ struct inode_operations {
int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- void * (*follow_link) (struct dentry *, struct nameidata *);
- void (*put_link) (struct dentry *, struct nameidata *, void *);
+ const char *(*follow_link) (struct dentry *, void **);
+ void (*put_link) (struct inode *, void *);
int (*permission) (struct inode *, int);
int (*get_acl)(struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
@@ -436,16 +436,18 @@ otherwise noted.
follow_link: called by the VFS to follow a symbolic link to the
inode it points to. Only required if you want to support
- symbolic links. This method returns a void pointer cookie
- that is passed to put_link().
+ symbolic links. This method returns the symlink body
+ to traverse (and possibly resets the current position with
+ nd_jump_link()). If the body won't go away until the inode
+ is gone, nothing else is needed; if it needs to be otherwise
+ pinned, the data needed to release whatever we'd grabbed
+ is to be stored in void * variable passed by address to
+ follow_link() instance.
put_link: called by the VFS to release resources allocated by
- follow_link(). The cookie returned by follow_link() is passed
- to this method as the last parameter. It is used by
- filesystems such as NFS where page cache is not stable
- (i.e. page that was installed when the symbolic link walk
- started might not be in the page cache at the end of the
- walk).
+ follow_link(). The cookie stored by follow_link() is passed
+ to this method as the last parameter; only called when
+ cookie isn't NULL.
permission: called by the VFS to check for access rights on a POSIX-like
filesystem.
diff --git a/Documentation/i2c/slave-interface b/Documentation/i2c/slave-interface
index 389bb5d61854..b228ca54bcf4 100644
--- a/Documentation/i2c/slave-interface
+++ b/Documentation/i2c/slave-interface
@@ -31,10 +31,10 @@ User manual
===========
I2C slave backends behave like standard I2C clients. So, you can instantiate
-them like described in the document 'instantiating-devices'. A quick example
-for instantiating the slave-eeprom driver from userspace:
+them as described in the document 'instantiating-devices'. A quick example for
+instantiating the slave-eeprom driver from userspace at address 0x64 on bus 1:
- # echo 0-0064 > /sys/bus/i2c/drivers/i2c-slave-eeprom/bind
+ # echo slave-24c02 0x64 > /sys/bus/i2c/devices/i2c-1/new_device
Each backend should come with separate documentation to describe its specific
behaviour and setup.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index a320a41e7412..7bd4501f0cf9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -943,6 +943,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Enable debug messages at boot time. See
Documentation/dynamic-debug-howto.txt for details.
+ nompx [X86] Disables Intel Memory Protection Extensions.
+ See Documentation/x86/intel_mpx.txt for more
+ information about the feature.
+
eagerfpu= [X86]
on enable eager fpu restore
off disable eager fpu restore
@@ -1487,6 +1491,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
not be supported.
+ ecs_off [Default Off]
+ By default, extended context tables will be supported if
+ the hardware advertises that it has support both for the
+ extended tables themselves, and also PASID support. With
+ this option set, extended tables will not be used even
+ on hardware which claims to support them.
intel_idle.max_cstate= [KNL,HW,ACPI,X86]
0 disables intel_idle and fall back on acpi_idle.
@@ -2998,11 +3008,34 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Set maximum number of finished RCU callbacks to
process in one batch.
+ rcutree.dump_tree= [KNL]
+ Dump the structure of the rcu_node combining tree
+ out at early boot. This is used for diagnostic
+ purposes, to verify correct tree setup.
+
+ rcutree.gp_cleanup_delay= [KNL]
+ Set the number of jiffies to delay each step of
+ RCU grace-period cleanup. This only has effect
+ when CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is set.
+
rcutree.gp_init_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period initialization. This only has
- effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT is
- set.
+ effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT
+ is set.
+
+ rcutree.gp_preinit_delay= [KNL]
+ Set the number of jiffies to delay each step of
+ RCU grace-period pre-initialization, that is,
+ the propagation of recent CPU-hotplug changes up
+ the rcu_node combining tree. This only has effect
+ when CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is set.
+
+ rcutree.rcu_fanout_exact= [KNL]
+ Disable autobalancing of the rcu_node combining
+ tree. This is used by rcutorture, and might
+ possibly be useful for architectures having high
+ cache-to-cache transfer latencies.
rcutree.rcu_fanout_leaf= [KNL]
Increase the number of CPUs assigned to each
@@ -3107,7 +3140,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
test, hence the "fake".
rcutorture.nreaders= [KNL]
- Set number of RCU readers.
+ Set number of RCU readers. The value -1 selects
+ N-1, where N is the number of CPUs. A value
+ "n" less than -1 selects N-n-2, where N is again
+ the number of CPUs. For example, -2 selects N
+ (the number of CPUs), -3 selects N+1, and so on.
rcutorture.object_debug= [KNL]
Enable debug-object double-call_rcu() testing.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index fe4020e4b468..13feb697271f 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -617,16 +617,16 @@ case what's actually required is:
However, stores are not speculated. This means that ordering -is- provided
for load-store control dependencies, as in the following example:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q) {
ACCESS_ONCE(b) = p;
}
-Control dependencies pair normally with other types of barriers.
-That said, please note that ACCESS_ONCE() is not optional! Without the
-ACCESS_ONCE(), might combine the load from 'a' with other loads from
-'a', and the store to 'b' with other stores to 'b', with possible highly
-counterintuitive effects on ordering.
+Control dependencies pair normally with other types of barriers. That
+said, please note that READ_ONCE_CTRL() is not optional! Without the
+READ_ONCE_CTRL(), the compiler might combine the load from 'a' with
+other loads from 'a', and the store to 'b' with other stores to 'b',
+with possible highly counterintuitive effects on ordering.
Worse yet, if the compiler is able to prove (say) that the value of
variable 'a' is always non-zero, it would be well within its rights
@@ -636,12 +636,15 @@ as follows:
q = a;
b = p; /* BUG: Compiler and CPU can both reorder!!! */
-So don't leave out the ACCESS_ONCE().
+Finally, the READ_ONCE_CTRL() includes an smp_read_barrier_depends()
+that DEC Alpha needs in order to respect control depedencies.
+
+So don't leave out the READ_ONCE_CTRL().
It is tempting to try to enforce ordering on identical stores on both
branches of the "if" statement as follows:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q) {
barrier();
ACCESS_ONCE(b) = p;
@@ -655,7 +658,7 @@ branches of the "if" statement as follows:
Unfortunately, current compilers will transform this as follows at high
optimization levels:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
barrier();
ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
if (q) {
@@ -685,7 +688,7 @@ memory barriers, for example, smp_store_release():
In contrast, without explicit memory barriers, two-legged-if control
ordering is guaranteed only when the stores differ, for example:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q) {
ACCESS_ONCE(b) = p;
do_something();
@@ -694,14 +697,14 @@ ordering is guaranteed only when the stores differ, for example:
do_something_else();
}
-The initial ACCESS_ONCE() is still required to prevent the compiler from
-proving the value of 'a'.
+The initial READ_ONCE_CTRL() is still required to prevent the compiler
+from proving the value of 'a'.
In addition, you need to be careful what you do with the local variable 'q',
otherwise the compiler might be able to guess the value and again remove
the needed conditional. For example:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q % MAX) {
ACCESS_ONCE(b) = p;
do_something();
@@ -714,7 +717,7 @@ If MAX is defined to be 1, then the compiler knows that (q % MAX) is
equal to zero, in which case the compiler is within its rights to
transform the above code into the following:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
ACCESS_ONCE(b) = p;
do_something_else();
@@ -725,7 +728,7 @@ is gone, and the barrier won't bring it back. Therefore, if you are
relying on this ordering, you should make sure that MAX is greater than
one, perhaps as follows:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
if (q % MAX) {
ACCESS_ONCE(b) = p;
@@ -742,14 +745,15 @@ of the 'if' statement.
You must also be careful not to rely too much on boolean short-circuit
evaluation. Consider this example:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (a || 1 > 0)
ACCESS_ONCE(b) = 1;
-Because the second condition is always true, the compiler can transform
-this example as following, defeating control dependency:
+Because the first condition cannot fault and the second condition is
+always true, the compiler can transform this example as following,
+defeating control dependency:
- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
ACCESS_ONCE(b) = 1;
This example underscores the need to ensure that the compiler cannot
@@ -762,8 +766,8 @@ demonstrated by two related examples, with the initial values of
x and y both being zero:
CPU 0 CPU 1
- ===================== =====================
- r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(y);
+ ======================= =======================
+ r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y);
if (r1 > 0) if (r2 > 0)
ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1;
@@ -783,7 +787,8 @@ But because control dependencies do -not- provide transitivity, the above
assertion can fail after the combined three-CPU example completes. If you
need the three-CPU example to provide ordering, you will need smp_mb()
between the loads and stores in the CPU 0 and CPU 1 code fragments,
-that is, just before or just after the "if" statements.
+that is, just before or just after the "if" statements. Furthermore,
+the original two-CPU example is very fragile and should be avoided.
These two examples are the LB and WWC litmus tests from this paper:
http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
@@ -791,6 +796,12 @@ site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
In summary:
+ (*) Control dependencies must be headed by READ_ONCE_CTRL().
+ Or, as a much less preferable alternative, interpose
+ be headed by READ_ONCE() or an ACCESS_ONCE() read and must
+ have smp_read_barrier_depends() between this read and the
+ control-dependent write.
+
(*) Control dependencies can order prior loads against later stores.
However, they do -not- guarantee any other sort of ordering:
Not prior loads against later loads, nor prior stores against
@@ -1784,10 +1795,9 @@ for each construct. These operations all imply certain barriers:
Memory operations issued before the ACQUIRE may be completed after
the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
- combined with a following ACQUIRE, orders prior loads against
- subsequent loads and stores and also orders prior stores against
- subsequent stores. Note that this is weaker than smp_mb()! The
- smp_mb__before_spinlock() primitive is free on many architectures.
+ combined with a following ACQUIRE, orders prior stores against
+ subsequent loads and stores. Note that this is weaker than smp_mb()!
+ The smp_mb__before_spinlock() primitive is free on many architectures.
(2) RELEASE operation implication:
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
index d727a3829100..53a726855e49 100644
--- a/Documentation/networking/udplite.txt
+++ b/Documentation/networking/udplite.txt
@@ -20,7 +20,7 @@
files/UDP-Lite-HOWTO.txt
o The Wireshark UDP-Lite WiKi (with capture files):
- http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+ https://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
o The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt
diff --git a/Documentation/preempt-locking.txt b/Documentation/preempt-locking.txt
index 57883ca2498b..e89ce6624af2 100644
--- a/Documentation/preempt-locking.txt
+++ b/Documentation/preempt-locking.txt
@@ -48,7 +48,7 @@ preemption must be disabled around such regions.
Note, some FPU functions are already explicitly preempt safe. For example,
kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
-However, math_state_restore must be called with preemption disabled.
+However, fpu__restore() must be called with preemption disabled.
RULE #3: Lock acquire and release must be performed by same task
diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
index 21461a0441c1..e114513a2731 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -8,6 +8,10 @@ CONTENTS
1. Overview
2. Scheduling algorithm
3. Scheduling Real-Time Tasks
+ 3.1 Definitions
+ 3.2 Schedulability Analysis for Uniprocessor Systems
+ 3.3 Schedulability Analysis for Multiprocessor Systems
+ 3.4 Relationship with SCHED_DEADLINE Parameters
4. Bandwidth management
4.1 System-wide settings
4.2 Task interface
@@ -43,7 +47,7 @@ CONTENTS
"deadline", to schedule tasks. A SCHED_DEADLINE task should receive
"runtime" microseconds of execution time every "period" microseconds, and
these "runtime" microseconds are available within "deadline" microseconds
- from the beginning of the period. In order to implement this behaviour,
+ from the beginning of the period. In order to implement this behavior,
every time the task wakes up, the scheduler computes a "scheduling deadline"
consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
scheduled using EDF[1] on these scheduling deadlines (the task with the
@@ -52,7 +56,7 @@ CONTENTS
"admission control" strategy (see Section "4. Bandwidth management") is used
(clearly, if the system is overloaded this guarantee cannot be respected).
- Summing up, the CBS[2,3] algorithms assigns scheduling deadlines to tasks so
+ Summing up, the CBS[2,3] algorithm assigns scheduling deadlines to tasks so
that each task runs for at most its runtime every period, avoiding any
interference between different tasks (bandwidth isolation), while the EDF[1]
algorithm selects the task with the earliest scheduling deadline as the one
@@ -63,7 +67,7 @@ CONTENTS
In more details, the CBS algorithm assigns scheduling deadlines to
tasks in the following way:
- - Each SCHED_DEADLINE task is characterised by the "runtime",
+ - Each SCHED_DEADLINE task is characterized by the "runtime",
"deadline", and "period" parameters;
- The state of the task is described by a "scheduling deadline", and
@@ -78,7 +82,7 @@ CONTENTS
then, if the scheduling deadline is smaller than the current time, or
this condition is verified, the scheduling deadline and the
- remaining runtime are re-initialised as
+ remaining runtime are re-initialized as
scheduling deadline = current time + deadline
remaining runtime = runtime
@@ -126,31 +130,37 @@ CONTENTS
suited for periodic or sporadic real-time tasks that need guarantees on their
timing behavior, e.g., multimedia, streaming, control applications, etc.
+3.1 Definitions
+------------------------
+
A typical real-time task is composed of a repetition of computation phases
(task instances, or jobs) which are activated on a periodic or sporadic
fashion.
- Each job J_j (where J_j is the j^th job of the task) is characterised by an
+ Each job J_j (where J_j is the j^th job of the task) is characterized by an
arrival time r_j (the time when the job starts), an amount of computation
time c_j needed to finish the job, and a job absolute deadline d_j, which
is the time within which the job should be finished. The maximum execution
- time max_j{c_j} is called "Worst Case Execution Time" (WCET) for the task.
+ time max{c_j} is called "Worst Case Execution Time" (WCET) for the task.
A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
d_j = r_j + D, where D is the task's relative deadline.
- The utilisation of a real-time task is defined as the ratio between its
+ Summing up, a real-time task can be described as
+ Task = (WCET, D, P)
+
+ The utilization of a real-time task is defined as the ratio between its
WCET and its period (or minimum inter-arrival time), and represents
the fraction of CPU time needed to execute the task.
- If the total utilisation sum_i(WCET_i/P_i) is larger than M (with M equal
+ If the total utilization U=sum(WCET_i/P_i) is larger than M (with M equal
to the number of CPUs), then the scheduler is unable to respect all the
deadlines.
- Note that total utilisation is defined as the sum of the utilisations
+ Note that total utilization is defined as the sum of the utilizations
WCET_i/P_i over all the real-time tasks in the system. When considering
multiple real-time tasks, the parameters of the i-th task are indicated
with the "_i" suffix.
- Moreover, if the total utilisation is larger than M, then we risk starving
+ Moreover, if the total utilization is larger than M, then we risk starving
non- real-time tasks by real-time tasks.
- If, instead, the total utilisation is smaller than M, then non real-time
+ If, instead, the total utilization is smaller than M, then non real-time
tasks will not be starved and the system might be able to respect all the
deadlines.
As a matter of fact, in this case it is possible to provide an upper bound
@@ -159,38 +169,119 @@ CONTENTS
More precisely, it can be proven that using a global EDF scheduler the
maximum tardiness of each task is smaller or equal than
((M − 1) · WCET_max − WCET_min)/(M − (M − 2) · U_max) + WCET_max
- where WCET_max = max_i{WCET_i} is the maximum WCET, WCET_min=min_i{WCET_i}
- is the minimum WCET, and U_max = max_i{WCET_i/P_i} is the maximum utilisation.
+ where WCET_max = max{WCET_i} is the maximum WCET, WCET_min=min{WCET_i}
+ is the minimum WCET, and U_max = max{WCET_i/P_i} is the maximum
+ utilization[12].
+
+3.2 Schedulability Analysis for Uniprocessor Systems
+------------------------
If M=1 (uniprocessor system), or in case of partitioned scheduling (each
real-time task is statically assigned to one and only one CPU), it is
possible to formally check if all the deadlines are respected.
If D_i = P_i for all tasks, then EDF is able to respect all the deadlines
- of all the tasks executing on a CPU if and only if the total utilisation
+ of all the tasks executing on a CPU if and only if the total utilization
of the tasks running on such a CPU is smaller or equal than 1.
If D_i != P_i for some task, then it is possible to define the density of
- a task as C_i/min{D_i,T_i}, and EDF is able to respect all the deadlines
- of all the tasks running on a CPU if the sum sum_i C_i/min{D_i,T_i} of the
- densities of the tasks running on such a CPU is smaller or equal than 1
- (notice that this condition is only sufficient, and not necessary).
+ a task as WCET_i/min{D_i,P_i}, and EDF is able to respect all the deadlines
+ of all the tasks running on a CPU if the sum of the densities of the tasks
+ running on such a CPU is smaller or equal than 1:
+ sum(WCET_i / min{D_i, P_i}) <= 1
+ It is important to notice that this condition is only sufficient, and not
+ necessary: there are task sets that are schedulable, but do not respect the
+ condition. For example, consider the task set {Task_1,Task_2} composed by
+ Task_1=(50ms,50ms,100ms) and Task_2=(10ms,100ms,100ms).
+ EDF is clearly able to schedule the two tasks without missing any deadline
+ (Task_1 is scheduled as soon as it is released, and finishes just in time
+ to respect its deadline; Task_2 is scheduled immediately after Task_1, hence
+ its response time cannot be larger than 50ms + 10ms = 60ms) even if
+ 50 / min{50,100} + 10 / min{100, 100} = 50 / 50 + 10 / 100 = 1.1
+ Of course it is possible to test the exact schedulability of tasks with
+ D_i != P_i (checking a condition that is both sufficient and necessary),
+ but this cannot be done by comparing the total utilization or density with
+ a constant. Instead, the so called "processor demand" approach can be used,
+ computing the total amount of CPU time h(t) needed by all the tasks to
+ respect all of their deadlines in a time interval of size t, and comparing
+ such a time with the interval size t. If h(t) is smaller than t (that is,
+ the amount of time needed by the tasks in a time interval of size t is
+ smaller than the size of the interval) for all the possible values of t, then
+ EDF is able to schedule the tasks respecting all of their deadlines. Since
+ performing this check for all possible values of t is impossible, it has been
+ proven[4,5,6] that it is sufficient to perform the test for values of t
+ between 0 and a maximum value L. The cited papers contain all of the
+ mathematical details and explain how to compute h(t) and L.
+ In any case, this kind of analysis is too complex as well as too
+ time-consuming to be performed on-line. Hence, as explained in Section
+ 4 Linux uses an admission test based on the tasks' utilizations.
+
+3.3 Schedulability Analysis for Multiprocessor Systems
+------------------------
On multiprocessor systems with global EDF scheduling (non partitioned
systems), a sufficient test for schedulability can not be based on the
- utilisations (it can be shown that task sets with utilisations slightly
- larger than 1 can miss deadlines regardless of the number of CPUs M).
- However, as previously stated, enforcing that the total utilisation is smaller
- than M is enough to guarantee that non real-time tasks are not starved and
- that the tardiness of real-time tasks has an upper bound.
+ utilizations or densities: it can be shown that even if D_i = P_i task
+ sets with utilizations slightly larger than 1 can miss deadlines regardless
+ of the number of CPUs.
+
+ Consider a set {Task_1,...Task_{M+1}} of M+1 tasks on a system with M
+ CPUs, with the first task Task_1=(P,P,P) having period, relative deadline
+ and WCET equal to P. The remaining M tasks Task_i=(e,P-1,P-1) have an
+ arbitrarily small worst case execution time (indicated as "e" here) and a
+ period smaller than the one of the first task. Hence, if all the tasks
+ activate at the same time t, global EDF schedules these M tasks first
+ (because their absolute deadlines are equal to t + P - 1, hence they are
+ smaller than the absolute deadline of Task_1, which is t + P). As a
+ result, Task_1 can be scheduled only at time t + e, and will finish at
+ time t + e + P, after its absolute deadline. The total utilization of the
+ task set is U = M · e / (P - 1) + P / P = M · e / (P - 1) + 1, and for small
+ values of e this can become very close to 1. This is known as "Dhall's
+ effect"[7]. Note: the example in the original paper by Dhall has been
+ slightly simplified here (for example, Dhall more correctly computed
+ lim_{e->0}U).
+
+ More complex schedulability tests for global EDF have been developed in
+ real-time literature[8,9], but they are not based on a simple comparison
+ between total utilization (or density) and a fixed constant. If all tasks
+ have D_i = P_i, a sufficient schedulability condition can be expressed in
+ a simple way:
+ sum(WCET_i / P_i) <= M - (M - 1) · U_max
+ where U_max = max{WCET_i / P_i}[10]. Notice that for U_max = 1,
+ M - (M - 1) · U_max becomes M - M + 1 = 1 and this schedulability condition
+ just confirms the Dhall's effect. A more complete survey of the literature
+ about schedulability tests for multi-processor real-time scheduling can be
+ found in [11].
+
+ As seen, enforcing that the total utilization is smaller than M does not
+ guarantee that global EDF schedules the tasks without missing any deadline
+ (in other words, global EDF is not an optimal scheduling algorithm). However,
+ a total utilization smaller than M is enough to guarantee that non real-time
+ tasks are not starved and that the tardiness of real-time tasks has an upper
+ bound[12] (as previously noted). Different bounds on the maximum tardiness
+ experienced by real-time tasks have been developed in various papers[13,14],
+ but the theoretical result that is important for SCHED_DEADLINE is that if
+ the total utilization is smaller or equal than M then the response times of
+ the tasks are limited.
+
+3.4 Relationship with SCHED_DEADLINE Parameters
+------------------------
- SCHED_DEADLINE can be used to schedule real-time tasks guaranteeing that
- the jobs' deadlines of a task are respected. In order to do this, a task
- must be scheduled by setting:
+ Finally, it is important to understand the relationship between the
+ SCHED_DEADLINE scheduling parameters described in Section 2 (runtime,
+ deadline and period) and the real-time task parameters (WCET, D, P)
+ described in this section. Note that the tasks' temporal constraints are
+ represented by its absolute deadlines d_j = r_j + D described above, while
+ SCHED_DEADLINE schedules the tasks according to scheduling deadlines (see
+ Section 2).
+ If an admission test is used to guarantee that the scheduling deadlines
+ are respected, then SCHED_DEADLINE can be used to schedule real-time tasks
+ guaranteeing that all the jobs' deadlines of a task are respected.
+ In order to do this, a task must be scheduled by setting:
- runtime >= WCET
- deadline = D
- period <= P
- IOW, if runtime >= WCET and if period is >= P, then the scheduling deadlines
+ IOW, if runtime >= WCET and if period is <= P, then the scheduling deadlines
and the absolute deadlines (d_j) coincide, so a proper admission control
allows to respect the jobs' absolute deadlines for this task (this is what is
called "hard schedulability property" and is an extension of Lemma 1 of [2]).
@@ -206,6 +297,39 @@ CONTENTS
Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
Technical Report. http://disi.unitn.it/~abeni/tr-98-01.pdf
+ 4 - J. Y. Leung and M.L. Merril. A Note on Preemptive Scheduling of
+ Periodic, Real-Time Tasks. Information Processing Letters, vol. 11,
+ no. 3, pp. 115-118, 1980.
+ 5 - S. K. Baruah, A. K. Mok and L. E. Rosier. Preemptively Scheduling
+ Hard-Real-Time Sporadic Tasks on One Processor. Proceedings of the
+ 11th IEEE Real-time Systems Symposium, 1990.
+ 6 - S. K. Baruah, L. E. Rosier and R. R. Howell. Algorithms and Complexity
+ Concerning the Preemptive Scheduling of Periodic Real-Time tasks on
+ One Processor. Real-Time Systems Journal, vol. 4, no. 2, pp 301-324,
+ 1990.
+ 7 - S. J. Dhall and C. L. Liu. On a real-time scheduling problem. Operations
+ research, vol. 26, no. 1, pp 127-140, 1978.
+ 8 - T. Baker. Multiprocessor EDF and Deadline Monotonic Schedulability
+ Analysis. Proceedings of the 24th IEEE Real-Time Systems Symposium, 2003.
+ 9 - T. Baker. An Analysis of EDF Schedulability on a Multiprocessor.
+ IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 8,
+ pp 760-768, 2005.
+ 10 - J. Goossens, S. Funk and S. Baruah, Priority-Driven Scheduling of
+ Periodic Task Systems on Multiprocessors. Real-Time Systems Journal,
+ vol. 25, no. 2–3, pp. 187–205, 2003.
+ 11 - R. Davis and A. Burns. A Survey of Hard Real-Time Scheduling for
+ Multiprocessor Systems. ACM Computing Surveys, vol. 43, no. 4, 2011.
+ http://www-users.cs.york.ac.uk/~robdavis/papers/MPSurveyv5.0.pdf
+ 12 - U. C. Devi and J. H. Anderson. Tardiness Bounds under Global EDF
+ Scheduling on a Multiprocessor. Real-Time Systems Journal, vol. 32,
+ no. 2, pp 133-189, 2008.
+ 13 - P. Valente and G. Lipari. An Upper Bound to the Lateness of Soft
+ Real-Time Tasks Scheduled by EDF on Multiprocessors. Proceedings of
+ the 26th IEEE Real-Time Systems Symposium, 2005.
+ 14 - J. Erickson, U. Devi and S. Baruah. Improved tardiness bounds for
+ Global EDF. Proceedings of the 22nd Euromicro Conference on
+ Real-Time Systems, 2010.
+
4. Bandwidth management
=======================
@@ -218,10 +342,10 @@ CONTENTS
no guarantee can be given on the actual scheduling of the -deadline tasks.
As already stated in Section 3, a necessary condition to be respected to
- correctly schedule a set of real-time tasks is that the total utilisation
+ correctly schedule a set of real-time tasks is that the total utilization
is smaller than M. When talking about -deadline tasks, this requires that
the sum of the ratio between runtime and period for all tasks is smaller
- than M. Notice that the ratio runtime/period is equivalent to the utilisation
+ than M. Notice that the ratio runtime/period is equivalent to the utilization
of a "traditional" real-time task, and is also often referred to as
"bandwidth".
The interface used to control the CPU bandwidth that can be allocated
@@ -251,7 +375,7 @@ CONTENTS
The system wide settings are configured under the /proc virtual file system.
For now the -rt knobs are used for -deadline admission control and the
- -deadline runtime is accounted against the -rt runtime. We realise that this
+ -deadline runtime is accounted against the -rt runtime. We realize that this
isn't entirely desirable; however, it is better to have a small interface for
now, and be able to change it easily later. The ideal situation (see 5.) is to
run -rt tasks from a -deadline server; in which case the -rt bandwidth is a
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 88b85899d309..7c1f9fad6674 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -1124,7 +1124,6 @@ The boot loader *must* fill out the following fields in bp,
o hdr.code32_start
o hdr.cmd_line_ptr
- o hdr.cmdline_size
o hdr.ramdisk_image (if applicable)
o hdr.ramdisk_size (if applicable)
diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/kernel-stacks
index e3c8a49d1a2f..0f3a6c201943 100644
--- a/Documentation/x86/x86_64/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
Most of the text from Keith Owens, hacked by AK
x86_64 page size (PAGE_SIZE) is 4K.
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
The currently assigned IST stacks are :-
-* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
-
- Used for interrupt 12 - Stack Fault Exception (#SS).
-
- This allows the CPU to recover from invalid stack segments. Rarely
- happens.
-
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
Used for interrupt 8 - Double Fault Exception (#DF).
@@ -99,3 +95,47 @@ The currently assigned IST stacks are :-
assumptions about the previous state of the kernel stack.
For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+ values on the kernel stack, from earlier function calls. This is
+ the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+ up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+ the right order, and try to cross from one stack into another
+ reconstructing the call chain. This works most of the time.